At 09:41 28.02.2007 +1030, Geoff Russell wrote:
>There is a warning in the documentation for ?factor  (R version 2.3.0)
>as follows:
>
>" The interpretation of a factor depends on both the codes and the
>  '"levels"' attribute.  Be careful only to compare factors with the
>  same set of levels (in the same order).  In particular,
>  'as.numeric' applied to a factor is meaningless, and may happen by
>  implicit coercion.  To "revert" a factor 'f' to its original
>  numeric values, 'as.numeric(levels(f))[f]' is recommended and
>  slightly more efficient than 'as.numeric(as.character(f))'.
>
>
>But as.numeric seems to work fine whereas as.numeric(levels(f))[f] doesn't
>always do anything useful.
>
>For example:
>
>> f<-factor(1:3,labels=c("A","B","C"))
>> f
>[1] A B C
>Levels: A B C
>> as.numeric(f)
>[1] 1 2 3
>> as.numeric(levels(f))[f]
>[1] NA NA NA
>Warning message:
>NAs introduced by coercion
>
>And also,
>
>> f<-factor(1:3,labels=c(1,5,6))
>> f
>[1] 1 5 6
>Levels: 1 5 6
>> as.numeric(f)
>[1] 1 2 3
>> as.numeric(levels(f))[f]
>[1] 1 5 6
>
>Is the documentation wrong, or is the code wrong, or have I missed
>something?
>
>Cheers,
>Geoff Russell
>
>______________________________________________
>R-help@stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>From "R Language Definition"

"2.3.1 Factors

Factors are used to describe items that can have a finite number of values
(gender, social class, etc.). 
...
Factors are currently implemented using an integer array to specify the
actual levels and a second array of names that are mapped to the integers.
Rather unfortunately users often make use of the implementation in order to
make some calculations easier. This, however, is an implementation issue
and is not guaranteed to hold in all implementations of R."

In my view factors are (miss)used in different, not necessarily connected
ways.
A factor may represent a statistical concept i.e. a categorical variable.
Further it may be an (internal) way of data reduction or some method for
labelling values.
In my view these concepts should not be mixed up and would I recommend to
avoid factors for data reduction and labelling.

Heinz

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to