On May 9, 2006, at 1:32 PM, Jason Barnhart wrote: > 1) So the original problem remains unsolved?
The question was answered but the problem remains unsolved. The question was, why am I getting an error "cannot allocate vector" when reading in a 100 MM integer list. The answer appears to be: 1) R loads the entire data set into RAM 2) on a 32-bit system R max'es out at 3 GB 3) loading 100 MM integer entries into a data.frame requires more than 3 GB of RAM (5-10 GB based on projections from 10 MM entries) So, the new question is, how does one work around such limits? > You can load data but lack memory to do more (or so it appears). It > seems to me that your options are: > a) ensure that the --max-mem-size option is allowing R to > utilize all available RAM --max-mem-size doesn't exist in my version: $ R --max-mem-size WARNING: unknown option '--max-mem-size' Do different versions of R on different OSes and different platforms have different options? FWIW, here's the usage statement from ?mem.limits: R --min-vsize=vl --max-vsize=vu --min-nsize=nl --max-nsize=nu --max- ppsize=N > b) sample if possible, i.e. are 20MM necessary Yes, or within a factor of 4 of that. > c) load in matrices or vectors, then "process" or analyze Yes, I just need to learn more of the R language to do what I want. > d) load data in database that R connects to, use that engine for > processing I have a gut feeling something like this is the way to go. > e) drop unnecessary columns from data.frame Yes. Currently, one of the fields is an identifier field which is a long text field (30+ chars). That should probably be converted to an integer to conserve on both time and space. > f) analyze subsets of the data (variable-wise--review fewer vars > at a time) Possibly. > g) buy more RAM (32 vs 64 bit architecture should not be the > issue, since you use LINUX) 32-bit seems to be the limit. We've got 6 GB of RAM and 8 GB of swap. Despite that R chokes well before those limits are reached. > h) ??? Yes, possibly some other solution we haven't considered. > 2) Not finding memory.limit() is very odd. You should consider > reviewing the bug reporting process to determine if this should be > reported. Here's an example of my output. > > memory.limit() > [1] 1782579200 Do different versions of R on different OSes and different platforms have different functions? > 3) This may not be the correct way to look at the timing > differences you experienced. However, it seems R is holding up well. > > 10MM 100MM ratio-100MM/10MM > cat 0.04 7.60 190.00 > scan 9.93 92.27 9.29 > ratio scan/cat 248.25 12.14 I re-ran the timing test for the 100 MM file taking caching into account. Linux with 6 GB has no problem caching the 100 MM file (600 MB): 10MM 100MM ratio-100MM/10MM cat 0.04 0.38 9.50 scan 9.93 92.27 9.29 ratio scan/cat 248.25 242.82 > Please let me know how you resolve. I'm curious about your solution > HTH, Indeed, very helpful. I'm learning more about R every day. Thanks for your feedback. Regards, - Robert http://www.cwelug.org/downloads Help others get OpenSource software. Distribute FLOSS for Windows, Linux, *BSD, and MacOS X with BitTorrent ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html