You might want to look into the packages bigmemory and biganalytics.

Corey

On Tue, Aug 9, 2011 at 8:38 PM, Chris Howden
<ch...@trickysolutions.com.au>wrote:

> Hi,
>
> I’m trying to do a hierarchical cluster analysis in R with a Big Data set.
> I’m running into problems using the dist() function.
>
> I’ve been looking at a few threads about R’s memory and have read the
> memory limits section in R help. However I’m no computer expert so I’m
> hoping I’ve misunderstood something and R can handle my Big Data set,
> somehow. Although at the moment I think my dataset is simply too big and
> there is no way around it, but I’d like to be proved wrong!
>
> My data set has 90523 rows of data and 24 columns.
>
> My understanding is that this means the distance matrix has a min of
> 90523^2 elements which is 8194413529. Which roughly translates as 8GB of
> memory being required (if I assume each entry requires 1 bit). I only have
> 4GB on a 32bit build of windows and R. So there is no way that’s going to
> work.
>
> So then I thought of getting access to a more powerful computer, and maybe
> using cloud computing.
>
> However the R memory limit help mentions  “On all builds of R, the maximum
> length (number of elements) of a vector is 2^31 - 1 ~ 2*10^9”. Now as the
> distance matrix I require has more elements than this does this mean it’s
> too big for R no matter what I do?
>
> Any ideas would be welcome.
>
> Thanks.
>
>
> Chris Howden
> Founding Partner
> Tricky Solutions
> Tricky Solutions 4 Tricky Problems
> Evidence Based Strategic Development, IP Commercialisation and Innovation,
> Data Analysis, Modelling and Training
> (mobile) 0410 689 945
> (fax / office)
> ch...@trickysolutions.com.au
>
> Disclaimer: The information in this email and any attachments to it are
> confidential and may contain legally privileged information. If you are
> not the named or intended recipient, please delete this communication and
> contact us immediately. Please note you are not authorised to copy, use or
> disclose this communication or any attachments without our consent.
> Although this email has been checked by anti-virus software, there is a
> risk that email messages may be corrupted or infected by viruses or other
> interferences. No responsibility is accepted for such interference. Unless
> expressly stated, the views of the writer are not those of the company.
> Tricky Solutions always does our best to provide accurate forecasts and
> analyses based on the data supplied, however it is possible that some
> important predictors were not included in the data sent to us. Information
> provided by us should not be solely relied upon when making decisions and
> clients should use their own judgement.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
*The mark of a successful man is one that has spent an entire day on the
bank of a river without feeling guilty about it.*

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to