I routinely compute with a 2,500,000-row dataset with 16 columns,
which takes 410MB of storage; my Windows box has 4GB, which avoids
thrashing.  As long as I'm careful not to compute and save multiple
copies of the entire data frame (because 32-bit Windows R is limited
to about 1.5GB address space total, including any intermediate
results), R works impressively well and fast with this dataset for
selections, calculations, cross-tabs, plotting, etc.  For example,
simple single-column statistics and cross-tabs take << 1 sec., summary
of the whole thing takes 16 sec. A linear regression between two
numeric columns takes < 20 sec. Plotting of all 2.5M points takes a
while, but that is no surprise (and is usually pointless [sic]
anyway). I have not tried to do any compute-intensive statistical
calculations on the whole data set.

The main (but minor) annoyance with it is that it takes about 90 secs
to load into memory using R's native binary "save" format, so I tend
to keep the process lying around rather than re-starting and
re-loading for each analysis. Fortunately, garbage collection is very
effective in reclaiming unused storage as long as I'm careful to
remove unnecessary objects.

            -s


On Wed, Nov 26, 2008 at 7:42 AM, iwalters <[EMAIL PROTECTED]> wrote:
>
> I'm currently working with very large datasets that consist out of 1,000,000
> + rows.  Is it at all possible to use R for datasets this size or should I
> rather consider C++/Java.
>
>
> --
> View this message in context: 
> http://www.nabble.com/increasing-memory-limit-in-Windows-Server-2008-64-bit-tp20675880p20699700.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to