Re: [R] subsetting a data.frame to the 'unique' of a column

Spencer Graves Thu, 23 Dec 2004 10:08:36 -0800

Thanks, Bert, for the correction. Moreover, I see now that mine didn't even give an acceptable answer, converting levels "a" and "c" of the factor DF$c to 1 and 3. I confess I didn't read the documentation before replying. Here is "duplicate with my example case:

> DF[!duplicated(DF$a), ]
 a b c
1 1 1 a
3 2 3 c

     Thanks again for the correction.  spencer graves

Berton Gunter wrote:

Spencer's solution is considerably more inefficient then using duplicated()
and subscripting: in a small example with 3 columns and 10000 rows, it took
5 times as long on my Windows setup.
The reason is that aggregate() is basically a wrapper for tapply and tapply
basically loops in R. duplicated() loops in C (and uses hashing, I believe).
Cheers,
-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
"The business of the statistician is to catalyze the scientific learning
process."  - George E. P. Box
-----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Spencer Graves Sent: Thursday, December 23, 2004 9:06 AM To: Göran Broström Cc: Rudi Alberts; r-help@stat.math.ethz.ch Subject: Re: [R] subsetting a data.frame to the 'unique' of a column

What about "aggregate"?
DF <- data.frame(a=c(1,1,2), b=1:3, c=letters[1:3])
aggregate(DF[2:3], DF[1], function(x)x[1])
 a b c
1 1 1 1
2 2 3 3
     hope this helps.  spencer graves
Göran Broström wrote:
On Thu, Dec 23, 2004 at 11:28:31AM -0800, Rudi Alberts wrote:
Hi,
I often run into this problem: I have a data.frame with one column containing entries that are not unique. What I then want is a subset of the data.frame in which the entries in that column have become the 'unique' of the original column. Normally I program around it by taking the unique of the column and making a new data.frame with it and filling the rest of the data.

(By the way, when moving to the smaller data.frame for
example 5 rows

with the same value in that column will be replaced by one

row for that
value. I don't mind which of the rows now..)
something like this, however, this gives me the complete df.
df[df$colname %in% unique(df$colname),]
or this, which doesnt work
df[df$colname == unique(df$colname),]
Use 'duplicated':
df[!duplicated(df$colname), ]
--
Spencer Graves, PhD, Senior Development Engineer
O:  (408)938-4420;  mobile:  (408)655-4567
______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


--
Spencer Graves, PhD, Senior Development Engineer
O:  (408)938-4420;  mobile:  (408)655-4567

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] subsetting a data.frame to the 'unique' of a column

Reply via email to