Re: diffrence in PCA of MLib vs H2o in R
Those implementations are computing an SVD of the input matrix directly, and while you generally need the columns to have mean 0, you can turn that off with the options you cite. I don't think this is possible in the MLlib implementation, since it is computing the principal components by computing eigenvectors of the covariance matrix. The means inherently don't matter either way in this computation. On Tue, Mar 24, 2015 at 6:13 AM, roni roni.epi...@gmail.com wrote: I am trying to compute PCA using computePrincipalComponents. I also computed PCA using h2o in R and R's prcomp. The answers I get from H2o and R's prComp (non h2o) is same when I set the options for H2o as standardized=FALSE and for r's prcomp as center = false. How do I make sure that the settings for MLib PCA is same as I am using for H2o or prcomp. Thanks Roni - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: diffrence in PCA of MLib vs H2o in R
Reza, That SVD.v matches the H2o and R prComp (non-centered) Thanks -R On Tue, Mar 24, 2015 at 11:38 AM, Sean Owen so...@cloudera.com wrote: (Oh sorry, I've only been thinking of TallSkinnySVD) On Tue, Mar 24, 2015 at 6:36 PM, Reza Zadeh r...@databricks.com wrote: If you want to do a nonstandard (or uncentered) PCA, you can call computeSVD on RowMatrix, and look at the resulting 'V' Matrix. That should match the output of the other two systems. Reza On Tue, Mar 24, 2015 at 3:53 AM, Sean Owen so...@cloudera.com wrote: Those implementations are computing an SVD of the input matrix directly, and while you generally need the columns to have mean 0, you can turn that off with the options you cite. I don't think this is possible in the MLlib implementation, since it is computing the principal components by computing eigenvectors of the covariance matrix. The means inherently don't matter either way in this computation. On Tue, Mar 24, 2015 at 6:13 AM, roni roni.epi...@gmail.com wrote: I am trying to compute PCA using computePrincipalComponents. I also computed PCA using h2o in R and R's prcomp. The answers I get from H2o and R's prComp (non h2o) is same when I set the options for H2o as standardized=FALSE and for r's prcomp as center = false. How do I make sure that the settings for MLib PCA is same as I am using for H2o or prcomp. Thanks Roni - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: diffrence in PCA of MLib vs H2o in R
Great! On Tue, Mar 24, 2015 at 2:53 PM, roni roni.epi...@gmail.com wrote: Reza, That SVD.v matches the H2o and R prComp (non-centered) Thanks -R On Tue, Mar 24, 2015 at 11:38 AM, Sean Owen so...@cloudera.com wrote: (Oh sorry, I've only been thinking of TallSkinnySVD) On Tue, Mar 24, 2015 at 6:36 PM, Reza Zadeh r...@databricks.com wrote: If you want to do a nonstandard (or uncentered) PCA, you can call computeSVD on RowMatrix, and look at the resulting 'V' Matrix. That should match the output of the other two systems. Reza On Tue, Mar 24, 2015 at 3:53 AM, Sean Owen so...@cloudera.com wrote: Those implementations are computing an SVD of the input matrix directly, and while you generally need the columns to have mean 0, you can turn that off with the options you cite. I don't think this is possible in the MLlib implementation, since it is computing the principal components by computing eigenvectors of the covariance matrix. The means inherently don't matter either way in this computation. On Tue, Mar 24, 2015 at 6:13 AM, roni roni.epi...@gmail.com wrote: I am trying to compute PCA using computePrincipalComponents. I also computed PCA using h2o in R and R's prcomp. The answers I get from H2o and R's prComp (non h2o) is same when I set the options for H2o as standardized=FALSE and for r's prcomp as center = false. How do I make sure that the settings for MLib PCA is same as I am using for H2o or prcomp. Thanks Roni - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: diffrence in PCA of MLib vs H2o in R
If you want to do a nonstandard (or uncentered) PCA, you can call computeSVD on RowMatrix, and look at the resulting 'V' Matrix. That should match the output of the other two systems. Reza On Tue, Mar 24, 2015 at 3:53 AM, Sean Owen so...@cloudera.com wrote: Those implementations are computing an SVD of the input matrix directly, and while you generally need the columns to have mean 0, you can turn that off with the options you cite. I don't think this is possible in the MLlib implementation, since it is computing the principal components by computing eigenvectors of the covariance matrix. The means inherently don't matter either way in this computation. On Tue, Mar 24, 2015 at 6:13 AM, roni roni.epi...@gmail.com wrote: I am trying to compute PCA using computePrincipalComponents. I also computed PCA using h2o in R and R's prcomp. The answers I get from H2o and R's prComp (non h2o) is same when I set the options for H2o as standardized=FALSE and for r's prcomp as center = false. How do I make sure that the settings for MLib PCA is same as I am using for H2o or prcomp. Thanks Roni - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: diffrence in PCA of MLib vs H2o in R
(Oh sorry, I've only been thinking of TallSkinnySVD) On Tue, Mar 24, 2015 at 6:36 PM, Reza Zadeh r...@databricks.com wrote: If you want to do a nonstandard (or uncentered) PCA, you can call computeSVD on RowMatrix, and look at the resulting 'V' Matrix. That should match the output of the other two systems. Reza On Tue, Mar 24, 2015 at 3:53 AM, Sean Owen so...@cloudera.com wrote: Those implementations are computing an SVD of the input matrix directly, and while you generally need the columns to have mean 0, you can turn that off with the options you cite. I don't think this is possible in the MLlib implementation, since it is computing the principal components by computing eigenvectors of the covariance matrix. The means inherently don't matter either way in this computation. On Tue, Mar 24, 2015 at 6:13 AM, roni roni.epi...@gmail.com wrote: I am trying to compute PCA using computePrincipalComponents. I also computed PCA using h2o in R and R's prcomp. The answers I get from H2o and R's prComp (non h2o) is same when I set the options for H2o as standardized=FALSE and for r's prcomp as center = false. How do I make sure that the settings for MLib PCA is same as I am using for H2o or prcomp. Thanks Roni - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org