Robert,

You are hitting a known problem with data.frame ... row names.   The row
names alone are the reason the data.frame takes 10 times more memory than
the vector.  Your 100MM integer vector takes 381MB when you scan() it right?
[4bytes*10^8/1024^2]  But when you try and create a data.frame instead,
something like 3.8GB would be required by the data.frame.  This is beyond
the practical 32bit limit, and nothing you do with memory options will solve
that.

 The good news is that Prof Ripley has fixed the problem with data.frame row
names in the latest development version of R.  You could try that,  it
should be much more efficient i.e. a data.frame with a single integer column
length 100MM should have object.size 381MB, just like a vector.

However, how many columns do you have to deal with?    3GB allows
100,000,000 x 7 columns of integer in memory.  That doesn't leave any room
for copies, or types greater in size than integer, so you are still
limited.  Above that, as others have suggested,  you need to connect to an
RDMS,  or go 64-bit is much easier if that is possible for you.

 I'd be interested to hear how you get on.

Regards,
Mark

> Message: 41
> Date: Tue, 9 May 2006 15:27:58 -0500
> From: Robert Citek <[EMAIL PROTECTED]>
> Subject: Re: [R] large data set, error: cannot allocate vector
> To: r-help@stat.math.ethz.ch
> Message-ID:
>        < [EMAIL PROTECTED]>
> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
>
>
> On May 9, 2006, at 1:32 PM, Jason Barnhart wrote:
>
> > 1) So the original problem remains unsolved?
>
> The question was answered but the problem remains unsolved.  The
> question was, why am I getting an error "cannot allocate vector" when
> reading in a 100 MM integer list.  The answer appears to be:
>
> 1) R loads the entire data set into RAM
> 2) on a 32-bit system R max'es out at 3 GB
> 3) loading 100 MM integer entries into a data.frame requires more
> than 3 GB of RAM (5-10 GB based on projections from 10 MM entries)
>
> So, the new question is, how does one work around such limits?
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to