Re: [R] PCA for Binary data

2007-06-13 Thread Prof Brian Ripley
On Tue, 12 Jun 2007, Spencer Graves wrote:

  The problem with applying prcomp to binary data is that it's not
 clear what problem you are solving.

  The standard principal components and factor analysis models
 assume that the observations are linear combinations of unobserved
 common factors (shared variability), normally distributed, plus normal
 noise, independent between observations and variables.  Those
 assumptions are clearly violated for binary data.

  RSiteSearch(PCA for binary data) produced references to 'ade4'
 and 'FactoMineR'.  Have you considered these?  I have not used them, but
 FactoMineR included functions for 'Multiple Factor Analysis for Mixed
 [quantitative and qualitative] Data'

AFAIK, that is not using 'factor analysis' in the same sense as e.g. 
factanal().

Continuous underlying variables with binary manifest variables is part of 
latent variable analysis.  Package 'ltm' covers a variety of such models.

But to begin to give advice we would need to know the scientific problem 
for which Ranga Chandra Gudivada is looking for a tool. Simon Blomberg 
mentioned ordination, but that is only one of several classes of uses of 
PCA (which finds a linear subspace that both has maximal variance within 
and is least-squares fitting to the data).


  Hope this helps.
  Spencer Graves

 Josh Gilbert wrote:
 I don't understand, what's wrong with using prcomp in this situation?

 On Sunday 10 June 2007 12:50 pm, Ranga Chandra Gudivada wrote:

 Hi,

 I was wondering whether there is any package implementing Principal
 Component Analysis for Binary data

   Thanks chandra


 -


 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented, minimal,
 self-contained, reproducible code.


 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] PCA for Binary data

2007-06-13 Thread ssls sddd
Dear Prof Brian Ripley,

Would you also recommend some packages for non-binary data to do
variable and feature selection?

Thanks a lot!

Alex


On 6/12/07, Prof Brian Ripley [EMAIL PROTECTED] wrote:

 On Tue, 12 Jun 2007, Spencer Graves wrote:

   The problem with applying prcomp to binary data is that it's not
  clear what problem you are solving.
 
   The standard principal components and factor analysis models
  assume that the observations are linear combinations of unobserved
  common factors (shared variability), normally distributed, plus normal
  noise, independent between observations and variables.  Those
  assumptions are clearly violated for binary data.
 
   RSiteSearch(PCA for binary data) produced references to 'ade4'
  and 'FactoMineR'.  Have you considered these?  I have not used them, but
  FactoMineR included functions for 'Multiple Factor Analysis for Mixed
  [quantitative and qualitative] Data'

 AFAIK, that is not using 'factor analysis' in the same sense as e.g.
 factanal().

 Continuous underlying variables with binary manifest variables is part of
 latent variable analysis.  Package 'ltm' covers a variety of such models.

 But to begin to give advice we would need to know the scientific problem
 for which Ranga Chandra Gudivada is looking for a tool. Simon Blomberg
 mentioned ordination, but that is only one of several classes of uses of
 PCA (which finds a linear subspace that both has maximal variance within
 and is least-squares fitting to the data).

 
   Hope this helps.
   Spencer Graves
 
  Josh Gilbert wrote:
  I don't understand, what's wrong with using prcomp in this situation?
 
  On Sunday 10 June 2007 12:50 pm, Ranga Chandra Gudivada wrote:
 
  Hi,
 
  I was wondering whether there is any package implementing
 Principal
  Component Analysis for Binary data
 
Thanks chandra
 
 
  -
 
 
  [[alternative HTML version deleted]]
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html and provide commented,
 minimal,
  self-contained, reproducible code.
 
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 --
 Brian D. Ripley,  [EMAIL PROTECTED]
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] PCA for Binary data

2007-06-12 Thread Josh Gilbert
I don't understand, what's wrong with using prcomp in this situation?

On Sunday 10 June 2007 12:50 pm, Ranga Chandra Gudivada wrote:
 Hi,

 I was wondering whether there is any package implementing Principal
 Component Analysis for Binary data

   Thanks chandra


 -


   [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented, minimal,
 self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] PCA for Binary data

2007-06-12 Thread Spencer Graves
  The problem with applying prcomp to binary data is that it's not 
clear what problem you are solving. 

  The standard principal components and factor analysis models 
assume that the observations are linear combinations of unobserved 
common factors (shared variability), normally distributed, plus normal 
noise, independent between observations and variables.  Those 
assumptions are clearly violated for binary data. 

  RSiteSearch(PCA for binary data) produced references to 'ade4' 
and 'FactoMineR'.  Have you considered these?  I have not used them, but 
FactoMineR included functions for 'Multiple Factor Analysis for Mixed 
[quantitative and qualitative] Data'
  
  Hope this helps. 
  Spencer Graves

Josh Gilbert wrote:
 I don't understand, what's wrong with using prcomp in this situation?

 On Sunday 10 June 2007 12:50 pm, Ranga Chandra Gudivada wrote:
   
 Hi,

 I was wondering whether there is any package implementing Principal
 Component Analysis for Binary data

   Thanks chandra


 -


  [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented, minimal,
 self-contained, reproducible code.
 

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] PCA for Binary data

2007-06-12 Thread Simon Blomberg
You might try (detrended) correspondence analysis, which is designed for
count data, if it makes sense to treat your binary data  that way.
I've used ade4 and also vegan, and they are both good packages for these
types of ordinations. You could also look at non-metric multidimensional
scaling. There seems to be 2 schools of ordination. The Europeans like
eigenanalysis methods (PCA, correspondence analysis, multiple
correspondence analysis, coinertia analysis etc.). The Americans seem to
prefer MDS.

Cheers,

Simon.

 This is On Tue, 2007-06-12 at 20:17 -0700, Spencer Graves wrote:
 The problem with applying prcomp to binary data is that it's not 
 clear what problem you are solving. 
 
   The standard principal components and factor analysis models 
 assume that the observations are linear combinations of unobserved 
 common factors (shared variability), normally distributed, plus normal 
 noise, independent between observations and variables.  Those 
 assumptions are clearly violated for binary data. 
 
   RSiteSearch(PCA for binary data) produced references to 'ade4' 
 and 'FactoMineR'.  Have you considered these?  I have not used them, but 
 FactoMineR included functions for 'Multiple Factor Analysis for Mixed 
 [quantitative and qualitative] Data'
   
   Hope this helps. 
   Spencer Graves
 
 Josh Gilbert wrote:
  I don't understand, what's wrong with using prcomp in this situation?
 
  On Sunday 10 June 2007 12:50 pm, Ranga Chandra Gudivada wrote:

  Hi,
 
  I was wondering whether there is any package implementing Principal
  Component Analysis for Binary data
 
Thanks chandra
 
 
  -
 
 
 [[alternative HTML version deleted]]
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html and provide commented, minimal,
  self-contained, reproducible code.
  
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
Simon Blomberg, BSc (Hons), PhD, MAppStat. 
Lecturer and Consultant Statistician 
Faculty of Biological and Chemical Sciences 
The University of Queensland 
St. Lucia Queensland 4072 
Australia

Room 320, Goddard Building (8)
T: +61 7 3365 2506 
email: S.Blomberg1_at_uq.edu.au 

The combination of some data and an aching desire for 
an answer does not ensure that a reasonable answer can 
be extracted from a given body of data. - John Tukey.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.