The essential issue is that the matrix you need to manipulate is very large. This is not a new problem, and about a year ago I exchanged ideas with the Rff package developers (things have been on the back burner since due to recession woes and illness issues). These ideas were based on some very small codes from my 1979 book "Compact numerical methods for computers". This contains a code that takes a matrix row-wise from a file and builds a triangular decomposition as well as a list of orthogonal transformations, then does an svd of the result. Your problem would work on the transpose. This is a whole lot different from how R users generally work, so there are lots of interfacing and similar issues. Also there are likely more efficient computational methods than the one I used -- but I was working in 1974 on an HP9830 desk calculator with the matrix on punched cards to develop this. And it has a short code that can be written in a fairly vectorized way in R only, which may make the human/computer trade-off favourable, depending on how many times you need to run such problems.

However, the main point is that you need to use some sort of "out of core" (how dated that sounds!) method, which is and will remain an issue for systems like R that work on objects in memory.

I'm willing to kibbitz on such work, but it would go best if there are 3-4 folk involved to bring different skills to the table.

John Nash




Message: 128
Date: Thu, 20 Aug 2009 17:45:00 -0700 (PDT)
From: misha680 <mk144...@bcm.edu>
Subject: [R]  Principle components analysis on a large dataset
To: r-help@r-project.org
Message-ID: <25072510.p...@talk.nabble.com>
Content-Type: text/plain; charset=us-ascii


Dear Sirs:

Please pardon me I am very new to R. I have been using MATLAB.

I was wondering if R would allow me to do principal components analysis on a
very large
dataset.

Specifically, our dataset has 68800 variables and around 6000 observations.
Matlab gives "out of memory" errors. I have tried also doing princomp in
pieces, but this does not seem to quite work for our approach.

Anything that might help much appreciated. If anyone has had experience
doing this in R much appreciated.

Thank you
Misha
-- View this message in context: http://www.nabble.com/Principle-components-analysis-on-a-large-dataset-tp25072510p25072510.html Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to