> On Sep 17, 2017, at 9:24 PM, Ajay Arvind Rao <ajayarvind....@gmrgroup.in> 
> wrote:
> 
> Hi,
> 
> We are using open source license of R to analyze data at our organization. 
> The system configuration are as follows:
> 
> *        System configuration:
> 
> o   Operating System - Windows 7 Enterprise SP1, 64 bit (Desktop)
> 
> o   RAM - 8 GB
> 
> o   Processor - i5-6500 @ 3.2 Ghz
> 
> *        R Version:
> 
> o   R Studio 1.0.136
> 
> o   R 3.4.0
> 
> While trying to merge two datasets we received the following resource error 
> message on running the code
> Code: merg_data <- 
> merge(x=Data_1Junto30Jun,y=flight_code,by.x="EB_FLNO1",by.y="EB_FLNO1",all.x 
> = TRUE)
> Error Message: Error: cannot allocate vector of size 5.8 Gb
> 
> Later we tried running the code differently but error still remained
> Code: merg_data <- sqldf("Select * from Data_1Junto30Jun as a inner join 
> flight_code as b on a.EB_FLNO1=b.EB_FLNO1")
> Error Message: Error: cannot allocate vector of size 200.0 Mb
> 
> We have upgraded the RAM to 8 GB couple of months back. Can you let us know 
> options to resolve the above issue without having to increase the RAM? The 
> size of the datasets are as follows:
> 
> *        Data_1Junto30Jun (513476 obs of 32 variables). Data size - 172033368 
> bytes / 172 MB
> 
> *        flight_code (478105 obs of 2 variables). Data size - 3836304 bytes / 
> 4 MB
> 
> 
> Help with determining system requirement:
> Is there a way to determine minimum system requirement (hardware and software)

There are some packages for working with data "out of memory". See bigmemory 
and other "big*" packages. See also the data.table package which has many 
satisfied users.  There are also several packages for handling data through 
database connections. That would be probably the preferred method for your use 
case.

R objects are almost always copied when an assignment is made and this means 
that you need at a minimum at least twice as much free (and in  _continuous_ 
chunks) memory. You will often be breaking up the memory with other code and 
other out-of-R processes. Windows was in the past notorious for having poor 
memory management. I don't know if Windows 7 continued that tradition or 
whether later versions might be useful to avoid  the problem.

A dataframe will consume about 10 bytes per row for numeric columns. Factor and 
character vectors are hashed so the memory consumed will depend on the degree 
of duplication of entries. That will also affect the merge operations. Merges 
will give you a Cartesian product so if you merge two dataframes with lots of 
duplicates you will often get a message such as: "Error: cannot allocate vector 
of size 5.8 Gb"

The second error you cite suggests that much of your 8Gb of storage has been 
fragmented.

Most of this information should be available via searching in Rhelp or RSeek.


> depending on size of the data, the way the data is loaded into R (directly 
> from server or in a flat file) and the type of analysis to be run?

No difference for the source of data but cannot comment on the type of analysis 
because that part of the question is too vague. (Aside from mentioning the 
issue of Cartesian multiplication of merge results which often trips up new 
users of database technology.)

> We have not been able to get any specific information related to this and are 
> estimating the requirements through a trial and error method. Any information 
> on this front will be helpful.

This suggests an impoverished ability for searching:

https://stackoverflow.com/search?q=%5Br%5D+memory+limitations

https://stackoverflow.com/search?q=%5Br%5D+memory+limitations+windows

http://markmail.org/search/?q=list%3Aorg.r-project.r-help+memory+limitations+windows

-- 
David.

> 
> 
>       [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.'   
-Gehm's Corollary to Clarke's Third Law

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to