I'd like to make the distinction between the purpose of factors, i.e., what they are intended for, and how that purpose is accomplished.
Their purpose is for use in statistical models. The simplest example is analysis of variance, where predictors are commonly referred to as factors. Factors in R are intended to be used as factors in statistical models. Similarly, in the anova literature, the different values of the predictor are often referred to as levels. So R creates factors by grouping the array categories into levels, as you described. Underlying the levels are numeric codes that the modeling functions use. Try as.numeric(statef) and compare with as.numeric(state) Because of this, I personally don't make anything into a factor unless I intend to use it in a model. Or, occasionally, because of a useful "side effect." For example: (the following needs to be viewed using a monospaced font) > set.seed(21) > mns <- sample(month.abb,100,replace=TRUE) > table(mns) mns Apr Aug Dec Feb Jan Jul Jun Mar May Nov Oct Sep 3 12 18 8 8 14 2 9 4 6 8 8 ## same: > mnsf1 <- factor(mns) > table(mnsf1) mnsf1 Apr Aug Dec Feb Jan Jul Jun Mar May Nov Oct Sep 3 12 18 8 8 14 2 9 4 6 8 8 ## now the months are in the "correct" order > mnsf2 <- factor(mns, levels=month.abb) > table(mnsf2) mnsf2 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 8 8 9 3 4 2 14 12 8 8 6 18 Compare > sort(mnsf1) > sort(mnsf2)and compare how the underlying numeric codes are assigned to the categories. So, I know this wasn't about your main question, but I hope you find it useful anyway. -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 3/30/12 9:50 AM, "Julio Sergio" <julioser...@gmail.com> wrote: > >I'm trying to figure out about factors, however the on-line documentation >is >rather sparse. I guess, factors are intended for grouping arrays members >into >categories, which R names "Levels". And so we have: > > * state <- c("tas", "sa", "qld", "nsw", "nsw", "nt", "wa", "wa", > "qld", "vic", "nsw", "vic", "qld", "qld", "sa", "tas", > "sa", "nt", "wa", "vic", "qld", "nsw", "nsw", "wa", > "sa", "act", "nsw", "vic", "vic", "act") > * statef <- factor(state) > * statef > [1] tas sa qld nsw nsw nt wa wa qld vic nsw vic qld qld sa tas sa >nt wa > [20] vic qld nsw nsw wa sa act nsw vic vic act > Levels: act nsw nt qld sa tas vic wa > >With this, just visually, I know what the cateogries or Levels are. >Nonetheless, >two questions arise here: How can I have, computationally as opposed to >visually, access to the names of these categories, and how do I get the >indexes >of the original array elements that belong to a particular category, say, >"act"? >This is, for instance, to select from another "parallel" array, the >corresponding elements, say > > > * incomes <- c(60, 49, 40, 61, 64, 60, 59, 54, 62, 69, 70, 42, 56, > 61, 61, 61, 58, 51, 48, 65, 49, 49, 41, 48, 52, 46, > 59, 46, 58, 43) > >So to select, the corresponding elements to "act": > > 46 43 > > >Do you have any comments on this? > >Thanks, > >--Sergio. > >______________________________________________ >R-help@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.