It comes down to 2 simple rules: 1. If you don't care about the order of the factor levels, then it doesn't matter how R codes the relationship 2. If you do care about the order, then tell R what order you want.
Consider the following: > x <- c(9,3,15,9,15,9,3) > factor(x) [1] 9 3 15 9 15 9 3 Levels: 3 9 15 > factor(as.character(x)) [1] 9 3 15 9 15 9 3 Levels: 15 3 9 > factor(x, levels=unique(x)) [1] 9 3 15 9 15 9 3 Levels: 9 3 15 The last looks most like what you want, but for many uses, all 3 will give equivalent results. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- > project.org] On Behalf Of Dimitri Liakhovitski > Sent: Friday, February 13, 2009 10:54 AM > To: marc_schwa...@comcast.net > Cc: R-Help List > Subject: Re: [R] tapply bug? - levels of a factor in a data frame after > tapply are intermixed > > Sorry - one clarification: > When I run: > > test$xx - the what I am currently seeing is: > [1] 9 3 15 > Levels: 3 9 15 > But what I am expecting to be seeing is: > [1] 9 3 15 > Levels: 9 3 15 > Or maybe: Levels: 2 1 3 > > > On Fri, Feb 13, 2009 at 12:38 PM, Dimitri Liakhovitski > <ld7...@gmail.com> wrote: > > On Fri, Feb 13, 2009 at 12:24 PM, Marc Schwartz > > <marc_schwa...@comcast.net> wrote: > >> on 02/13/2009 11:09 AM Dimitri Liakhovitski wrote: > >>> Hello! I have encountered a really weird problem. Maybe you've > >>> encountered it before? > >>> I have a large data frame "importances". It has one factor ($A) > with 3 > >>> levels: 3, 9, and 15. $B is a regular numeric variable. > >>> Below I am picking a really small sub-frame (just 3 rows) based on > >>> "indices". "indices" were chosen so that all 3 levels of A are > >>> present: > >>> > >>> indices=c(14329,14209,14353) > >>> > test=data.frame(yy=importances[["B']][indices],xx=importances[["A"]][in > dices]) > >>> Here is what the new data frame "test" looks like: > >>> > >>> yy xx > >>> 1 -0.009984006 9 > >>> 2 -2.339904131 3 > >>> 3 -0.008427385 15 > >>> > >>> Here is the structure of "test": > >>>> str(test) > >>> 'data.frame': 3 obs. of 2 variables: > >>> $ yy: num -0.00998 -2.3399 -0.00843 > >>> $ xx: Factor w/ 3 levels "3","9","15": 2 1 3 > >>> > >>> Notice - the order of factor levels for xx is not 1 2 3 as it > should > >>> be but 2 1 3. How come? > >>> > >>> Or also look at this: > >>>> test$xx > >>> [1] 9 3 15 > >>> Levels: 3 9 15 > >>> > >>> Same thing. > >>> Do you know what might be the reason? > >>> > >>> Thank you very much! > >> > >> The output of str() is showing you the factor levels of test$xx, > >> followed by the internal integer codes for the three actual values > of > >> test$xx, 9, 3, and 15: > >> > >>> str(test$xx) > >> Factor w/ 3 levels "3","9","15": 2 1 3 > >> > >>> levels(test$xx) > >> [1] "3" "9" "15" > >> > >>> as.integer(test$xx) > >> [1] 2 1 3 > >> > >> 9 is the second level, hence the 2 > >> 3 is the first level, hence the 1 > >> 15 is the third level, hence the 3. > >> > >> No problems, just clarification needed on what you are seeing. > >> > >> Note that you do not reference anything above regarding tapply() as > per > >> your subject line, though I suspect that I have an idea as to why > you did... > >> > >> HTH, > >> > >> Marc Schwartz > >> > >> > > > > Marc (and everyone), I expected it to show: > > $ xx: Factor w/ 3 levels "3","9","15": 1 2 3 > > rather than what I am seeing: > > $ xx: Factor w/ 3 levels "3","9","15": 2 1 3 > > Because 3 is level 1, 9 is level 2 and 15 is level 3. > > I have several other factors in my original data frame. And I've done > > that tapply for all of them (for the same dependent variable) - and > in > > all of them the first level was 1, the second 2, etc. > > Why I am concerned about the problem? Because I am plotting the means > > of the numeric variable against the levels of the factor and it's > > important to me that the factor levels are correct (in the right > > order)... > > Dimitri > > > > > > -- > > Dimitri Liakhovitski > > MarketTools, Inc. > > dimitri.liakhovit...@markettools.com > > > > > > -- > Dimitri Liakhovitski > MarketTools, Inc. > dimitri.liakhovit...@markettools.com > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.