Re: [R] how to efficiently compute set unique?

Steve Taylor Mon, 21 Jun 2010 18:57:45 -0700

The original question was about a matrix, not a vector and this is much slower:
 
x <- sample(100000, size=13584763, replace=T)
dim(x) <- c(13584763, 1)
system.time(unique(x))
So the solution would be:
 
unique(as.vector(x))

>>> 

From: Duncan Murdoch <murdoch.dun...@gmail.com>
To:G FANG <fanggan...@gmail.com>
CC:<r-help@r-project.org>
Date: 22/Jun/2010 1:20p
Subject: Re: [R] how to efficiently compute set unique?
On 21/06/2010 9:06 PM, G FANG wrote:
> Hi,
>
> I want to get the unique set from a large numeric k by 1 vector, k is
> in tens of millions
>
> when I used the matlab function unique, it takes less than 10 secs
>
> but when I tried to use the unique in R with similar CPU and memory,
> it is not done in minutes
>
> I am wondering, am I using the function in the right way?
>
> dim(cntxtn)
> [1] 13584763        1
> uniqueCntxt = unique(cntxtn);    # this is taking really long

What type is cntxtn?  If I do that sort of thing on a numeric vector, 
it's quite fast:

> x <- sample(100000, size=13584763, replace=T)
> system.time(unique(x))
   user  system elapsed
   3.61    0.14    3.75

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide http://www.R ( http://www.r/ 
)-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to efficiently compute set unique?

Reply via email to