Thanks, Bert, for the correction. Moreover, I see now that mine didn't even give an acceptable answer, converting levels "a" and "c" of the factor DF$c to 1 and 3. I confess I didn't read the documentation before replying. Here is "duplicate with my example case:

> DF[!duplicated(DF$a), ]
 a b c
1 1 1 a
3 2 3 c

     Thanks again for the correction.  spencer graves

Berton Gunter wrote:

Spencer's solution is considerably more inefficient then using duplicated()
and subscripting: in a small example with 3 columns and 10000 rows, it took
5 times as long on my Windows setup.

The reason is that aggregate() is basically a wrapper for tapply and tapply
basically loops in R. duplicated() loops in C (and uses hashing, I believe).

Cheers,

-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA

"The business of the statistician is to catalyze the scientific learning
process."  - George E. P. Box





-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Spencer Graves
Sent: Thursday, December 23, 2004 9:06 AM
To: Göran Broström
Cc: Rudi Alberts; r-help@stat.math.ethz.ch
Subject: Re: [R] subsetting a data.frame to the 'unique' of a column


What about "aggregate"?

DF <- data.frame(a=c(1,1,2), b=1:3, c=letters[1:3])
aggregate(DF[2:3], DF[1], function(x)x[1])
 a b c
1 1 1 1
2 2 3 3

     hope this helps.  spencer graves

Göran Broström wrote:



On Thu, Dec 23, 2004 at 11:28:31AM -0800, Rudi Alberts wrote:




Hi,

I often run into this problem:
I have a data.frame with one column containing entries that are not
unique. What I then want is a subset of the data.frame in which
the entries in that column have become the 'unique' of the original
column. Normally I program around it by taking the unique of the column and
making a new data.frame with it and filling the rest of the data.


(By the way, when moving to the smaller data.frame for

example 5 rows


with the same value in that column will be replaced by one

row for that


value. I don't mind which of the rows now..)


something like this, however, this gives me the complete df.

df[df$colname %in% unique(df$colname),]

or this, which doesnt work

df[df$colname == unique(df$colname),]





Use 'duplicated':





df[!duplicated(df$colname), ]








--
Spencer Graves, PhD, Senior Development Engineer
O:  (408)938-4420;  mobile:  (408)655-4567

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html









-- Spencer Graves, PhD, Senior Development Engineer O: (408)938-4420; mobile: (408)655-4567

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to