Dear R community,

I have a 2 million by 2 matrix that looks like this:

x<-sample(1:15,2000000, replace=T)
y<-sample(1:10*1000, 2000000, replace=T)
      x     y
[1,] 10  4000
[2,]  3  1000
[3,]  3  4000
[4,]  8  6000
[5,]  2  9000
[6,]  3  8000
[7,]  2 10000
(...)


The first column is a population expansion factor for the number in the
second column (household income). I want to expand the second column
with the first so that I end up with a vector beginning with 10
observations of 4000, then 3 observations of 1000 and so on. In my mind
the natural approach would be to create a NULL vector and append the
expansions:

myvar<-NULL
myvar<-append(myvar, replicate(x[1],y[1]), 1)

for (i in 2:length(x)) {
myvar<-append(myvar,replicate(x[i],y[i]),sum(x[1:i])+1)
}

to end with a vector of sum(x), which in my real database corresponds
to 22 million observations.

This works fine --if I only run it for the first, say, 1000
observations. If I try to perform this on all 2 million observations
it takes long, way too long for this to be useful (I left it running
11 hours yesterday to no avail).


I know R performs well with operations on relatively large vectors. Why
is this so inefficient? And what would be the smart way to do this?

Thanks in advance.
Alex

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to