?with Bert Gunter Genentech Nonclinical Biostatisics
-----Original Message----- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Steve Lianoglou Sent: Tuesday, August 11, 2009 12:27 PM To: Jim Bouldin Cc: r-help@r-project.org Subject: Re: [R] problem selecting rows meeting a criterion Hi, See comments in line: On Aug 11, 2009, at 2:45 PM, Jim Bouldin wrote: > > No problem John, thanks for your help, and also thanks to Dan and > Patrick. > Wasn't able to read or try anybody's suggestions yesterday. Here's > what > I've discovered in the meantime: > > What I did not include yesterday is that my original data frame, > called > "data", was this: > > X Y V3 > 1 1 1 0.000000 > 2 2 1 8.062258 > 3 3 1 2.236068 > 4 4 1 6.324555 > 5 5 1 5.000000 > 6 1 2 8.062258 > 7 2 2 0.000000 > 8 3 2 9.486833 > 9 4 2 2.236068 > 10 5 2 5.656854 > 11 1 3 2.236068 > 12 2 3 9.486833 > 13 3 3 0.000000 > 14 4 3 8.062258 > 15 5 3 5.099020 > 16 1 4 6.324555 > 17 2 4 2.236068 > 18 3 4 8.062258 > 19 4 4 0.000000 > 20 5 4 5.385165 > 21 1 5 5.000000 > 22 2 5 5.656854 > 23 3 5 5.099020 > 24 4 5 5.385165 > 25 5 5 0.000000 > > To this data frame I applied the following command: > > data <- data[data$V3 >0,];data #to remove all rows where V3 = 0 > > giving me this (the point from which I started yesterday): > > X Y V3 > 2 2 1 8.062258 > 3 3 1 2.236068 > 4 4 1 6.324555 > 5 5 1 5.000000 > 6 1 2 8.062258 > 8 3 2 9.486833 > 9 4 2 2.236068 > 10 5 2 5.656854 > 11 1 3 2.236068 > 12 2 3 9.486833 > 14 4 3 8.062258 > 15 5 3 5.099020 > 16 1 4 6.324555 > 17 2 4 2.236068 > 18 3 4 8.062258 > 20 5 4 5.385165 > 21 1 5 5.000000 > 22 2 5 5.656854 > 23 3 5 5.099020 > 24 4 5 5.385165 > > So far so good. But when I then submit the command >> data = data[X>Y,] #to select all rows where X > Y This won't work in general, and is probably only working in this particular case because you already have defined somewhere in your workspace vars named X and Y. What you wrote above isn't taking the values X,Y from data$X and data $Y, respectively, but rather from var X and Y defined elsewhere. Instead of doing data[X > Y], do: data[data$X > data$Y,] This should get you what you're expecting. > I get the problem result already mentioned, namely: > > X Y V3 > 3 3 1 2.236068 > 4 4 1 6.324555 > 5 5 1 5.000000 > 6 1 2 8.062258 > 10 5 2 5.656854 > 11 1 3 2.236068 > 12 2 3 9.486833 > 17 2 4 2.236068 > 18 3 4 8.062258 > 24 4 5 5.385165 > > which is clearly wrong! It doesn't matter if I give a new name to > the data > frame at each step or not, or whether I use the name "data" or not. > It > always gives the same wrong answer. > > However, if I instead use the command: > subset(data, X>Y), I get the right answer, namely: > > X Y V3 > 2 2 1 8.062258 > 3 3 1 2.236068 > 4 4 1 6.324555 > 5 5 1 5.000000 > 8 3 2 9.486833 > 9 4 2 2.236068 > 10 5 2 5.656854 > 14 4 3 8.062258 > 15 5 3 5.099020 > 20 5 4 5.385165 That's because when you are using X, and Y in your subset(...) call, THIS takes X and Y to mean data$X and data$Y. > OK so the lesson so far is "use the subset function". Hopefully you're learning a slightly different lesson now :-) Does that clear things up at all? -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.