At 09:41 28.02.2007 +1030, Geoff Russell wrote: >There is a warning in the documentation for ?factor (R version 2.3.0) >as follows: > >" The interpretation of a factor depends on both the codes and the > '"levels"' attribute. Be careful only to compare factors with the > same set of levels (in the same order). In particular, > 'as.numeric' applied to a factor is meaningless, and may happen by > implicit coercion. To "revert" a factor 'f' to its original > numeric values, 'as.numeric(levels(f))[f]' is recommended and > slightly more efficient than 'as.numeric(as.character(f))'. > > >But as.numeric seems to work fine whereas as.numeric(levels(f))[f] doesn't >always do anything useful. > >For example: > >> f<-factor(1:3,labels=c("A","B","C")) >> f >[1] A B C >Levels: A B C >> as.numeric(f) >[1] 1 2 3 >> as.numeric(levels(f))[f] >[1] NA NA NA >Warning message: >NAs introduced by coercion > >And also, > >> f<-factor(1:3,labels=c(1,5,6)) >> f >[1] 1 5 6 >Levels: 1 5 6 >> as.numeric(f) >[1] 1 2 3 >> as.numeric(levels(f))[f] >[1] 1 5 6 > >Is the documentation wrong, or is the code wrong, or have I missed >something? > >Cheers, >Geoff Russell > >______________________________________________ >R-help@stat.math.ethz.ch mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. > >From "R Language Definition"
"2.3.1 Factors Factors are used to describe items that can have a finite number of values (gender, social class, etc.). ... Factors are currently implemented using an integer array to specify the actual levels and a second array of names that are mapped to the integers. Rather unfortunately users often make use of the implementation in order to make some calculations easier. This, however, is an implementation issue and is not guaranteed to hold in all implementations of R." In my view factors are (miss)used in different, not necessarily connected ways. A factor may represent a statistical concept i.e. a categorical variable. Further it may be an (internal) way of data reduction or some method for labelling values. In my view these concepts should not be mixed up and would I recommend to avoid factors for data reduction and labelling. Heinz ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.