Re: [R] PCA for Binary data
On Tue, 12 Jun 2007, Spencer Graves wrote: The problem with applying prcomp to binary data is that it's not clear what problem you are solving. The standard principal components and factor analysis models assume that the observations are linear combinations of unobserved common factors (shared variability), normally distributed, plus normal noise, independent between observations and variables. Those assumptions are clearly violated for binary data. RSiteSearch(PCA for binary data) produced references to 'ade4' and 'FactoMineR'. Have you considered these? I have not used them, but FactoMineR included functions for 'Multiple Factor Analysis for Mixed [quantitative and qualitative] Data' AFAIK, that is not using 'factor analysis' in the same sense as e.g. factanal(). Continuous underlying variables with binary manifest variables is part of latent variable analysis. Package 'ltm' covers a variety of such models. But to begin to give advice we would need to know the scientific problem for which Ranga Chandra Gudivada is looking for a tool. Simon Blomberg mentioned ordination, but that is only one of several classes of uses of PCA (which finds a linear subspace that both has maximal variance within and is least-squares fitting to the data). Hope this helps. Spencer Graves Josh Gilbert wrote: I don't understand, what's wrong with using prcomp in this situation? On Sunday 10 June 2007 12:50 pm, Ranga Chandra Gudivada wrote: Hi, I was wondering whether there is any package implementing Principal Component Analysis for Binary data Thanks chandra - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PCA for Binary data
Dear Prof Brian Ripley, Would you also recommend some packages for non-binary data to do variable and feature selection? Thanks a lot! Alex On 6/12/07, Prof Brian Ripley [EMAIL PROTECTED] wrote: On Tue, 12 Jun 2007, Spencer Graves wrote: The problem with applying prcomp to binary data is that it's not clear what problem you are solving. The standard principal components and factor analysis models assume that the observations are linear combinations of unobserved common factors (shared variability), normally distributed, plus normal noise, independent between observations and variables. Those assumptions are clearly violated for binary data. RSiteSearch(PCA for binary data) produced references to 'ade4' and 'FactoMineR'. Have you considered these? I have not used them, but FactoMineR included functions for 'Multiple Factor Analysis for Mixed [quantitative and qualitative] Data' AFAIK, that is not using 'factor analysis' in the same sense as e.g. factanal(). Continuous underlying variables with binary manifest variables is part of latent variable analysis. Package 'ltm' covers a variety of such models. But to begin to give advice we would need to know the scientific problem for which Ranga Chandra Gudivada is looking for a tool. Simon Blomberg mentioned ordination, but that is only one of several classes of uses of PCA (which finds a linear subspace that both has maximal variance within and is least-squares fitting to the data). Hope this helps. Spencer Graves Josh Gilbert wrote: I don't understand, what's wrong with using prcomp in this situation? On Sunday 10 June 2007 12:50 pm, Ranga Chandra Gudivada wrote: Hi, I was wondering whether there is any package implementing Principal Component Analysis for Binary data Thanks chandra - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PCA for Binary data
I don't understand, what's wrong with using prcomp in this situation? On Sunday 10 June 2007 12:50 pm, Ranga Chandra Gudivada wrote: Hi, I was wondering whether there is any package implementing Principal Component Analysis for Binary data Thanks chandra - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PCA for Binary data
The problem with applying prcomp to binary data is that it's not clear what problem you are solving. The standard principal components and factor analysis models assume that the observations are linear combinations of unobserved common factors (shared variability), normally distributed, plus normal noise, independent between observations and variables. Those assumptions are clearly violated for binary data. RSiteSearch(PCA for binary data) produced references to 'ade4' and 'FactoMineR'. Have you considered these? I have not used them, but FactoMineR included functions for 'Multiple Factor Analysis for Mixed [quantitative and qualitative] Data' Hope this helps. Spencer Graves Josh Gilbert wrote: I don't understand, what's wrong with using prcomp in this situation? On Sunday 10 June 2007 12:50 pm, Ranga Chandra Gudivada wrote: Hi, I was wondering whether there is any package implementing Principal Component Analysis for Binary data Thanks chandra - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PCA for Binary data
You might try (detrended) correspondence analysis, which is designed for count data, if it makes sense to treat your binary data that way. I've used ade4 and also vegan, and they are both good packages for these types of ordinations. You could also look at non-metric multidimensional scaling. There seems to be 2 schools of ordination. The Europeans like eigenanalysis methods (PCA, correspondence analysis, multiple correspondence analysis, coinertia analysis etc.). The Americans seem to prefer MDS. Cheers, Simon. This is On Tue, 2007-06-12 at 20:17 -0700, Spencer Graves wrote: The problem with applying prcomp to binary data is that it's not clear what problem you are solving. The standard principal components and factor analysis models assume that the observations are linear combinations of unobserved common factors (shared variability), normally distributed, plus normal noise, independent between observations and variables. Those assumptions are clearly violated for binary data. RSiteSearch(PCA for binary data) produced references to 'ade4' and 'FactoMineR'. Have you considered these? I have not used them, but FactoMineR included functions for 'Multiple Factor Analysis for Mixed [quantitative and qualitative] Data' Hope this helps. Spencer Graves Josh Gilbert wrote: I don't understand, what's wrong with using prcomp in this situation? On Sunday 10 June 2007 12:50 pm, Ranga Chandra Gudivada wrote: Hi, I was wondering whether there is any package implementing Principal Component Analysis for Binary data Thanks chandra - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Simon Blomberg, BSc (Hons), PhD, MAppStat. Lecturer and Consultant Statistician Faculty of Biological and Chemical Sciences The University of Queensland St. Lucia Queensland 4072 Australia Room 320, Goddard Building (8) T: +61 7 3365 2506 email: S.Blomberg1_at_uq.edu.au The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. - John Tukey. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.