On Jun 21, 2010, at 9:18 PM, Duncan Murdoch wrote:
On 21/06/2010 9:06 PM, G FANG wrote:
Hi,
I want to get the unique set from a large numeric k by 1 vector, k is
in tens of millions
when I used the matlab function unique, it takes less than 10 secs
but when I tried to use the unique in R with similar CPU and memory,
it is not done in minutes
I am wondering, am I using the function in the right way?
dim(cntxtn)
[1] 13584763 1
uniqueCntxt = unique(cntxtn); # this is taking really long
What type is cntxtn? If I do that sort of thing on a numeric
vector, it's quite fast:
> x <- sample(100000, size=13584763, replace=T)
> system.time(unique(x))
user system elapsed
3.61 0.14 3.75
If it's a factor, it could be as simple as:
levels(cntxtn) # since the work of "unique-ification" has already
been done.
> x <- factor(sample(100000, size=13584763, replace=T))
> system.time(levels(x))
user system elapsed
0 0 0
> system.time(y <- levels(x))
user system elapsed
0 0 0
--
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.