Thanks, Bert, for the correction. Moreover, I see now that mine
didn't even give an acceptable answer, converting levels "a" and "c" of
the factor DF$c to 1 and 3. I confess I didn't read the documentation
before replying. Here is "duplicate with my example case:
> DF[!duplicated(DF$a), ]
a b c
1 1 1 a
3 2 3 c
Thanks again for the correction. spencer graves
Berton Gunter wrote:
Spencer's solution is considerably more inefficient then using duplicated()
and subscripting: in a small example with 3 columns and 10000 rows, it took
5 times as long on my Windows setup.
The reason is that aggregate() is basically a wrapper for tapply and tapply
basically loops in R. duplicated() loops in C (and uses hashing, I believe).
Cheers,
-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
"The business of the statistician is to catalyze the scientific learning
process." - George E. P. Box
-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Spencer Graves
Sent: Thursday, December 23, 2004 9:06 AM
To: Göran Broström
Cc: Rudi Alberts; r-help@stat.math.ethz.ch
Subject: Re: [R] subsetting a data.frame to the 'unique' of a column
What about "aggregate"?
DF <- data.frame(a=c(1,1,2), b=1:3, c=letters[1:3])
aggregate(DF[2:3], DF[1], function(x)x[1])
a b c
1 1 1 1
2 2 3 3
hope this helps. spencer graves
Göran Broström wrote:
On Thu, Dec 23, 2004 at 11:28:31AM -0800, Rudi Alberts wrote:
Hi,
I often run into this problem:
I have a data.frame with one column containing entries that are not
unique. What I then want is a subset of the data.frame in which
the entries in that column have become the 'unique' of the original
column.
Normally I program around it by taking the unique of the column and
making a new data.frame with it and filling the rest of the data.
(By the way, when moving to the smaller data.frame for
example 5 rows
with the same value in that column will be replaced by one
row for that
value. I don't mind which of the rows now..)
something like this, however, this gives me the complete df.
df[df$colname %in% unique(df$colname),]
or this, which doesnt work
df[df$colname == unique(df$colname),]
Use 'duplicated':
df[!duplicated(df$colname), ]
--
Spencer Graves, PhD, Senior Development Engineer
O: (408)938-4420; mobile: (408)655-4567
______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
--
Spencer Graves, PhD, Senior Development Engineer
O: (408)938-4420; mobile: (408)655-4567
______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html