Hi
I have 6 rather big data sets (between 400000 and 800000 lines) on transport 
data (times, distances and travelers between nodes). They all have a common 
index (start-end nodes).
I want to aggregate this data, but for that I have to merge them.
I tried to use "merge" with the result that R (3.0.1) crashes (Windows 8 
machine, 16 Gb Ram).
Then I tried the join from the data.table package. Here I got the message that 
2^34 is too big (no idea why it is 2^34 as it is a left join).
Then I decided to do a loop using the tables and assigning them, which takes a 
very, very long time (still running at the moment).

Here is the code:
for (i in 1:length(dataP$Start)){
    c<-dataP$Start[i]
    d<-dataP$End[i]
    dataP[J(c,d)]$OEV.T<-ttoevP[J(c,d)]$OEV.T
}

dataP has 800'000 lines and ttoevP has about 500'000 lines.

Any hints to speed up this process are welcome.

Renger
_________________________________________
Centre of Economic Research (CER-ETH)
Zürichbergstrasse 18 (ZUE)
CH - 8032 Zürich
+41 44 632 02 63
mailto: reng...@etzh.ch<mailto:reng...@etzh.ch>
blog.modelworks.ch


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to