Re: [R] limits of a data frame size for reading into R

2010-08-05 Thread Matthew Keller
I sometimes have to work with vectors/matrices with > 2^31 - 1
elements. I have found the bigmemory package to be of great help. My
lab is also going to learn sqldf package for getting bits of big data
into/out of R. Learning both of those packages should help you work
with large datasets in R.

That said, I still hold out hope that someday, the powers that be - or
some hotshot operation like R+ or Revolutions - will see that
increasing numbers of users will routinely need to access > 2^31-1
elements, and that the packages above are a band-aid on a deeper
issue: using such large datasets with ease in R. As of now, it remains
quite awkward.

Matt



On Tue, Aug 3, 2010 at 12:32 PM, Duncan Murdoch
 wrote:
> On 03/08/2010 2:28 PM, Dimitri Liakhovitski wrote:
>>
>> And once one above the limit that Jim indicated - is there anything one
>> can do?
>>
>
> Yes, there are several packages for handling datasets that are too big to
> fit in memory:  biglm, ff, etc.  You need to change your code to work with
> them, so it's a lot of work to do something unusual, but there are
> possibilities.
>
> Duncan Murdoch
>
>> Thank you!
>> Dimitri
>>
>>
>> On Tue, Aug 3, 2010 at 2:12 PM, Dimitri Liakhovitski
>>  wrote:
>> > Thanks a lot, it's very helpful!
>> > Dimitri
>> >
>> > On Tue, Aug 3, 2010 at 1:53 PM, Duncan Murdoch
>> >  wrote:
>> >> On 03/08/2010 1:10 PM, Dimitri Liakhovitski wrote:
>> >>>
>> >>> I understand the question I am about to ask is rather vague and
>> >>> depends on the task and my PC memory. However, I'll give it a try:
>> >>>
>> >>> Let's assume the goal is just to read in the data frame into R and
>> >>> then do some simple analyses with it (e.g., multiple regression of
>> >>> some variables onto some - just a few - variables).
>> >>>
>> >>> Is there a limit to the number of columns of a data frame that R can
>> >>> handle? I am asking because where I work many use SAS and they are
>> >>> running into the limit of >~13,700columns there.
>> >>>
>> >>> Since I am asking - is there a limit to the number of rows?
>> >>>
>> >>> Or is the correct way of asking the question: my PC's memory is X. The
>> >>> .txt tab-delimited file I am trying to read in has the size of YYY Mb,
>> >>> can I read it in?
>> >>>
>> >>
>> >> Besides what Jim said, there is a 2^31-1 limit on the number of
>> >> elements in
>> >> a vector.  Dataframes are vectors of vectors, so you can have at most
>> >> 2^31-1
>> >> rows and 2^31-1 columns.  Matrices are vectors, so they're limited to
>> >> 2^31-1
>> >> elements in total.
>> >> This is only likely to be a limitation on a 64 bit machine; in 32 bits
>> >> you'll run out of memory first.
>> >>
>> >> Duncan Murdoch
>> >>
>> >
>> >
>> >
>> > --
>> > Dimitri Liakhovitski
>> > Ninah Consulting
>> > www.ninah.com
>> >
>>
>>
>>
>>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] limits of a data frame size for reading into R

2010-08-03 Thread Duncan Murdoch

On 03/08/2010 2:28 PM, Dimitri Liakhovitski wrote:

And once one above the limit that Jim indicated - is there anything one can do?
  


Yes, there are several packages for handling datasets that are too big 
to fit in memory:  biglm, ff, etc.  You need to change your code to work 
with them, so it's a lot of work to do something unusual, but there are 
possibilities.


Duncan Murdoch


Thank you!
Dimitri


On Tue, Aug 3, 2010 at 2:12 PM, Dimitri Liakhovitski
 wrote:
> Thanks a lot, it's very helpful!
> Dimitri
>
> On Tue, Aug 3, 2010 at 1:53 PM, Duncan Murdoch  
wrote:
>> On 03/08/2010 1:10 PM, Dimitri Liakhovitski wrote:
>>>
>>> I understand the question I am about to ask is rather vague and
>>> depends on the task and my PC memory. However, I'll give it a try:
>>>
>>> Let's assume the goal is just to read in the data frame into R and
>>> then do some simple analyses with it (e.g., multiple regression of
>>> some variables onto some - just a few - variables).
>>>
>>> Is there a limit to the number of columns of a data frame that R can
>>> handle? I am asking because where I work many use SAS and they are
>>> running into the limit of >~13,700columns there.
>>>
>>> Since I am asking - is there a limit to the number of rows?
>>>
>>> Or is the correct way of asking the question: my PC's memory is X. The
>>> .txt tab-delimited file I am trying to read in has the size of YYY Mb,
>>> can I read it in?
>>>
>>
>> Besides what Jim said, there is a 2^31-1 limit on the number of elements in
>> a vector.  Dataframes are vectors of vectors, so you can have at most 2^31-1
>> rows and 2^31-1 columns.  Matrices are vectors, so they're limited to 2^31-1
>> elements in total.
>> This is only likely to be a limitation on a 64 bit machine; in 32 bits
>> you'll run out of memory first.
>>
>> Duncan Murdoch
>>
>
>
>
> --
> Dimitri Liakhovitski
> Ninah Consulting
> www.ninah.com
>






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] limits of a data frame size for reading into R

2010-08-03 Thread Dimitri Liakhovitski
And once one above the limit that Jim indicated - is there anything one can do?
Thank you!
Dimitri


On Tue, Aug 3, 2010 at 2:12 PM, Dimitri Liakhovitski
 wrote:
> Thanks a lot, it's very helpful!
> Dimitri
>
> On Tue, Aug 3, 2010 at 1:53 PM, Duncan Murdoch  
> wrote:
>> On 03/08/2010 1:10 PM, Dimitri Liakhovitski wrote:
>>>
>>> I understand the question I am about to ask is rather vague and
>>> depends on the task and my PC memory. However, I'll give it a try:
>>>
>>> Let's assume the goal is just to read in the data frame into R and
>>> then do some simple analyses with it (e.g., multiple regression of
>>> some variables onto some - just a few - variables).
>>>
>>> Is there a limit to the number of columns of a data frame that R can
>>> handle? I am asking because where I work many use SAS and they are
>>> running into the limit of >~13,700columns there.
>>>
>>> Since I am asking - is there a limit to the number of rows?
>>>
>>> Or is the correct way of asking the question: my PC's memory is X. The
>>> .txt tab-delimited file I am trying to read in has the size of YYY Mb,
>>> can I read it in?
>>>
>>
>> Besides what Jim said, there is a 2^31-1 limit on the number of elements in
>> a vector.  Dataframes are vectors of vectors, so you can have at most 2^31-1
>> rows and 2^31-1 columns.  Matrices are vectors, so they're limited to 2^31-1
>> elements in total.
>> This is only likely to be a limitation on a 64 bit machine; in 32 bits
>> you'll run out of memory first.
>>
>> Duncan Murdoch
>>
>
>
>
> --
> Dimitri Liakhovitski
> Ninah Consulting
> www.ninah.com
>



-- 
Dimitri Liakhovitski
Ninah Consulting
www.ninah.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] limits of a data frame size for reading into R

2010-08-03 Thread Dimitri Liakhovitski
Thanks a lot, it's very helpful!
Dimitri

On Tue, Aug 3, 2010 at 1:53 PM, Duncan Murdoch  wrote:
> On 03/08/2010 1:10 PM, Dimitri Liakhovitski wrote:
>>
>> I understand the question I am about to ask is rather vague and
>> depends on the task and my PC memory. However, I'll give it a try:
>>
>> Let's assume the goal is just to read in the data frame into R and
>> then do some simple analyses with it (e.g., multiple regression of
>> some variables onto some - just a few - variables).
>>
>> Is there a limit to the number of columns of a data frame that R can
>> handle? I am asking because where I work many use SAS and they are
>> running into the limit of >~13,700columns there.
>>
>> Since I am asking - is there a limit to the number of rows?
>>
>> Or is the correct way of asking the question: my PC's memory is X. The
>> .txt tab-delimited file I am trying to read in has the size of YYY Mb,
>> can I read it in?
>>
>
> Besides what Jim said, there is a 2^31-1 limit on the number of elements in
> a vector.  Dataframes are vectors of vectors, so you can have at most 2^31-1
> rows and 2^31-1 columns.  Matrices are vectors, so they're limited to 2^31-1
> elements in total.
> This is only likely to be a limitation on a 64 bit machine; in 32 bits
> you'll run out of memory first.
>
> Duncan Murdoch
>



-- 
Dimitri Liakhovitski
Ninah Consulting
www.ninah.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] limits of a data frame size for reading into R

2010-08-03 Thread Duncan Murdoch

On 03/08/2010 1:10 PM, Dimitri Liakhovitski wrote:

I understand the question I am about to ask is rather vague and
depends on the task and my PC memory. However, I'll give it a try:

Let's assume the goal is just to read in the data frame into R and
then do some simple analyses with it (e.g., multiple regression of
some variables onto some - just a few - variables).

Is there a limit to the number of columns of a data frame that R can
handle? I am asking because where I work many use SAS and they are
running into the limit of >~13,700columns there.

Since I am asking - is there a limit to the number of rows?

Or is the correct way of asking the question: my PC's memory is X. The
.txt tab-delimited file I am trying to read in has the size of YYY Mb,
can I read it in?
  


Besides what Jim said, there is a 2^31-1 limit on the number of elements 
in a vector.  Dataframes are vectors of vectors, so you can have at most 
2^31-1 rows and 2^31-1 columns.  Matrices are vectors, so they're 
limited to 2^31-1 elements in total. 

This is only likely to be a limitation on a 64 bit machine; in 32 bits 
you'll run out of memory first.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] limits of a data frame size for reading into R

2010-08-03 Thread jim holtman
You probably don't want an object that is larger than about 25% of the
physical memory so that copies can be made during some processing.  If
you are running on a 32-bit system which will limit you to at most 3GB
of memory, then your largest object should not be greater than 800MB.
If you want to have 13,700 columns of numeric data (takes 8 bytes per
element), then each row would require about 100KB and that would mean
you would probably have an object with about 8000 rows.

64-bit is probably limited by how much you want to spend for memory.

On Tue, Aug 3, 2010 at 1:10 PM, Dimitri Liakhovitski
 wrote:
> I understand the question I am about to ask is rather vague and
> depends on the task and my PC memory. However, I'll give it a try:
>
> Let's assume the goal is just to read in the data frame into R and
> then do some simple analyses with it (e.g., multiple regression of
> some variables onto some - just a few - variables).
>
> Is there a limit to the number of columns of a data frame that R can
> handle? I am asking because where I work many use SAS and they are
> running into the limit of >~13,700columns there.
>
> Since I am asking - is there a limit to the number of rows?
>
> Or is the correct way of asking the question: my PC's memory is X. The
> .txt tab-delimited file I am trying to read in has the size of YYY Mb,
> can I read it in?
>
> Thanks a lot!
>
> --
> Dimitri Liakhovitski
> Ninah Consulting
> www.ninah.com
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] limits of a data frame size for reading into R

2010-08-03 Thread Dimitri Liakhovitski
I understand the question I am about to ask is rather vague and
depends on the task and my PC memory. However, I'll give it a try:

Let's assume the goal is just to read in the data frame into R and
then do some simple analyses with it (e.g., multiple regression of
some variables onto some - just a few - variables).

Is there a limit to the number of columns of a data frame that R can
handle? I am asking because where I work many use SAS and they are
running into the limit of >~13,700columns there.

Since I am asking - is there a limit to the number of rows?

Or is the correct way of asking the question: my PC's memory is X. The
.txt tab-delimited file I am trying to read in has the size of YYY Mb,
can I read it in?

Thanks a lot!

-- 
Dimitri Liakhovitski
Ninah Consulting
www.ninah.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.