Hi I have 6 rather big data sets (between 400000 and 800000 lines) on transport data (times, distances and travelers between nodes). They all have a common index (start-end nodes). I want to aggregate this data, but for that I have to merge them. I tried to use "merge" with the result that R (3.0.1) crashes (Windows 8 machine, 16 Gb Ram). Then I tried the join from the data.table package. Here I got the message that 2^34 is too big (no idea why it is 2^34 as it is a left join). Then I decided to do a loop using the tables and assigning them, which takes a very, very long time (still running at the moment).
Here is the code: for (i in 1:length(dataP$Start)){ c<-dataP$Start[i] d<-dataP$End[i] dataP[J(c,d)]$OEV.T<-ttoevP[J(c,d)]$OEV.T } dataP has 800'000 lines and ttoevP has about 500'000 lines. Any hints to speed up this process are welcome. Renger _________________________________________ Centre of Economic Research (CER-ETH) Zürichbergstrasse 18 (ZUE) CH - 8032 Zürich +41 44 632 02 63 mailto: reng...@etzh.ch<mailto:reng...@etzh.ch> blog.modelworks.ch [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.