You might want to look into the packages bigmemory and biganalytics. Corey
On Tue, Aug 9, 2011 at 8:38 PM, Chris Howden <ch...@trickysolutions.com.au>wrote: > Hi, > > Im trying to do a hierarchical cluster analysis in R with a Big Data set. > Im running into problems using the dist() function. > > Ive been looking at a few threads about Rs memory and have read the > memory limits section in R help. However Im no computer expert so Im > hoping Ive misunderstood something and R can handle my Big Data set, > somehow. Although at the moment I think my dataset is simply too big and > there is no way around it, but Id like to be proved wrong! > > My data set has 90523 rows of data and 24 columns. > > My understanding is that this means the distance matrix has a min of > 90523^2 elements which is 8194413529. Which roughly translates as 8GB of > memory being required (if I assume each entry requires 1 bit). I only have > 4GB on a 32bit build of windows and R. So there is no way thats going to > work. > > So then I thought of getting access to a more powerful computer, and maybe > using cloud computing. > > However the R memory limit help mentions On all builds of R, the maximum > length (number of elements) of a vector is 2^31 - 1 ~ 2*10^9. Now as the > distance matrix I require has more elements than this does this mean its > too big for R no matter what I do? > > Any ideas would be welcome. > > Thanks. > > > Chris Howden > Founding Partner > Tricky Solutions > Tricky Solutions 4 Tricky Problems > Evidence Based Strategic Development, IP Commercialisation and Innovation, > Data Analysis, Modelling and Training > (mobile) 0410 689 945 > (fax / office) > ch...@trickysolutions.com.au > > Disclaimer: The information in this email and any attachments to it are > confidential and may contain legally privileged information. If you are > not the named or intended recipient, please delete this communication and > contact us immediately. Please note you are not authorised to copy, use or > disclose this communication or any attachments without our consent. > Although this email has been checked by anti-virus software, there is a > risk that email messages may be corrupted or infected by viruses or other > interferences. No responsibility is accepted for such interference. Unless > expressly stated, the views of the writer are not those of the company. > Tricky Solutions always does our best to provide accurate forecasts and > analyses based on the data supplied, however it is possible that some > important predictors were not included in the data sent to us. Information > provided by us should not be solely relied upon when making decisions and > clients should use their own judgement. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- *The mark of a successful man is one that has spent an entire day on the bank of a river without feeling guilty about it.* [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.