Re: diffrence in PCA of MLib vs H2o in R

2015-03-24 Thread Sean Owen
Those implementations are computing an SVD of the input matrix
directly, and while you generally need the columns to have mean 0, you
can turn that off with the options you cite.

I don't think this is possible in the MLlib implementation, since it
is computing the principal components by computing eigenvectors of the
covariance matrix. The means inherently don't matter either way in
this computation.

On Tue, Mar 24, 2015 at 6:13 AM, roni roni.epi...@gmail.com wrote:
 I am trying to compute PCA  using  computePrincipalComponents.
 I  also computed PCA using h2o in R and R's prcomp. The answers I get from
 H2o and R's prComp (non h2o) is same when I set the options for H2o as
 standardized=FALSE and for r's prcomp as center = false.

 How do I make sure that the settings for MLib PCA is same as I am using for
 H2o or prcomp.

 Thanks
 Roni

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: diffrence in PCA of MLib vs H2o in R

2015-03-24 Thread roni
Reza,
That SVD.v matches the H2o and R prComp (non-centered)
Thanks
-R

On Tue, Mar 24, 2015 at 11:38 AM, Sean Owen so...@cloudera.com wrote:

 (Oh sorry, I've only been thinking of TallSkinnySVD)

 On Tue, Mar 24, 2015 at 6:36 PM, Reza Zadeh r...@databricks.com wrote:
  If you want to do a nonstandard (or uncentered) PCA, you can call
  computeSVD on RowMatrix, and look at the resulting 'V' Matrix.
 
  That should match the output of the other two systems.
 
  Reza
 
  On Tue, Mar 24, 2015 at 3:53 AM, Sean Owen so...@cloudera.com wrote:
 
  Those implementations are computing an SVD of the input matrix
  directly, and while you generally need the columns to have mean 0, you
  can turn that off with the options you cite.
 
  I don't think this is possible in the MLlib implementation, since it
  is computing the principal components by computing eigenvectors of the
  covariance matrix. The means inherently don't matter either way in
  this computation.
 
  On Tue, Mar 24, 2015 at 6:13 AM, roni roni.epi...@gmail.com wrote:
   I am trying to compute PCA  using  computePrincipalComponents.
   I  also computed PCA using h2o in R and R's prcomp. The answers I get
   from
   H2o and R's prComp (non h2o) is same when I set the options for H2o as
   standardized=FALSE and for r's prcomp as center = false.
  
   How do I make sure that the settings for MLib PCA is same as I am
 using
   for
   H2o or prcomp.
  
   Thanks
   Roni
 
  -
  To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
  For additional commands, e-mail: user-h...@spark.apache.org
 
 



Re: diffrence in PCA of MLib vs H2o in R

2015-03-24 Thread Reza Zadeh
Great!

On Tue, Mar 24, 2015 at 2:53 PM, roni roni.epi...@gmail.com wrote:

 Reza,
 That SVD.v matches the H2o and R prComp (non-centered)
 Thanks
 -R

 On Tue, Mar 24, 2015 at 11:38 AM, Sean Owen so...@cloudera.com wrote:

 (Oh sorry, I've only been thinking of TallSkinnySVD)

 On Tue, Mar 24, 2015 at 6:36 PM, Reza Zadeh r...@databricks.com wrote:
  If you want to do a nonstandard (or uncentered) PCA, you can call
  computeSVD on RowMatrix, and look at the resulting 'V' Matrix.
 
  That should match the output of the other two systems.
 
  Reza
 
  On Tue, Mar 24, 2015 at 3:53 AM, Sean Owen so...@cloudera.com wrote:
 
  Those implementations are computing an SVD of the input matrix
  directly, and while you generally need the columns to have mean 0, you
  can turn that off with the options you cite.
 
  I don't think this is possible in the MLlib implementation, since it
  is computing the principal components by computing eigenvectors of the
  covariance matrix. The means inherently don't matter either way in
  this computation.
 
  On Tue, Mar 24, 2015 at 6:13 AM, roni roni.epi...@gmail.com wrote:
   I am trying to compute PCA  using  computePrincipalComponents.
   I  also computed PCA using h2o in R and R's prcomp. The answers I get
   from
   H2o and R's prComp (non h2o) is same when I set the options for H2o
 as
   standardized=FALSE and for r's prcomp as center = false.
  
   How do I make sure that the settings for MLib PCA is same as I am
 using
   for
   H2o or prcomp.
  
   Thanks
   Roni
 
  -
  To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
  For additional commands, e-mail: user-h...@spark.apache.org
 
 





Re: diffrence in PCA of MLib vs H2o in R

2015-03-24 Thread Reza Zadeh
If you want to do a nonstandard (or uncentered) PCA, you can call
computeSVD on RowMatrix, and look at the resulting 'V' Matrix.

That should match the output of the other two systems.

Reza

On Tue, Mar 24, 2015 at 3:53 AM, Sean Owen so...@cloudera.com wrote:

 Those implementations are computing an SVD of the input matrix
 directly, and while you generally need the columns to have mean 0, you
 can turn that off with the options you cite.

 I don't think this is possible in the MLlib implementation, since it
 is computing the principal components by computing eigenvectors of the
 covariance matrix. The means inherently don't matter either way in
 this computation.

 On Tue, Mar 24, 2015 at 6:13 AM, roni roni.epi...@gmail.com wrote:
  I am trying to compute PCA  using  computePrincipalComponents.
  I  also computed PCA using h2o in R and R's prcomp. The answers I get
 from
  H2o and R's prComp (non h2o) is same when I set the options for H2o as
  standardized=FALSE and for r's prcomp as center = false.
 
  How do I make sure that the settings for MLib PCA is same as I am using
 for
  H2o or prcomp.
 
  Thanks
  Roni

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: diffrence in PCA of MLib vs H2o in R

2015-03-24 Thread Sean Owen
(Oh sorry, I've only been thinking of TallSkinnySVD)

On Tue, Mar 24, 2015 at 6:36 PM, Reza Zadeh r...@databricks.com wrote:
 If you want to do a nonstandard (or uncentered) PCA, you can call
 computeSVD on RowMatrix, and look at the resulting 'V' Matrix.

 That should match the output of the other two systems.

 Reza

 On Tue, Mar 24, 2015 at 3:53 AM, Sean Owen so...@cloudera.com wrote:

 Those implementations are computing an SVD of the input matrix
 directly, and while you generally need the columns to have mean 0, you
 can turn that off with the options you cite.

 I don't think this is possible in the MLlib implementation, since it
 is computing the principal components by computing eigenvectors of the
 covariance matrix. The means inherently don't matter either way in
 this computation.

 On Tue, Mar 24, 2015 at 6:13 AM, roni roni.epi...@gmail.com wrote:
  I am trying to compute PCA  using  computePrincipalComponents.
  I  also computed PCA using h2o in R and R's prcomp. The answers I get
  from
  H2o and R's prComp (non h2o) is same when I set the options for H2o as
  standardized=FALSE and for r's prcomp as center = false.
 
  How do I make sure that the settings for MLib PCA is same as I am using
  for
  H2o or prcomp.
 
  Thanks
  Roni

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org