[ https://issues.apache.org/jira/browse/MAHOUT-369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Danny Leshem updated MAHOUT-369: -------------------------------- Attachment: MAHOUT-369.patch Attached is a simple patch that fixed the two issues raised. The right way to do this is to introduce a new unit-test that fails with the current version (e.g. decompose a fixed matrix and verify all its known eigenvalues are found). The attached patch has no such code. All relevant unit-tests pass (I'm getting errors for a few org.apache.mahout.clustering tests, nothing related to this change though). > Issues with DistributedLanczosSolver output > ------------------------------------------- > > Key: MAHOUT-369 > URL: https://issues.apache.org/jira/browse/MAHOUT-369 > Project: Mahout > Issue Type: Bug > Components: Math > Affects Versions: 0.3, 0.4 > Reporter: Danny Leshem > Fix For: 0.4 > > Attachments: MAHOUT-369.patch > > > DistributedLanczosSolver (line 99) claims to persist eigenVectors.numRows() > vectors. > {code} > log.info("Persisting " + eigenVectors.numRows() + " eigenVectors and > eigenValues to: " + outputPath); > {code} > However, a few lines later (line 106) we have > {code} > for(int i=0; i<eigenVectors.numRows() - 1; i++) { > ... > } > {code} > which only persists eigenVectors.numRows()-1 vectors. > Seems like the most significant eigenvector (i.e. the one with the largest > eigenvalue) is omitted... off by one bug? > Also, I think it would be better if the eigenvectors are persisted in > *reverse* order, meaning the most significant vector is marked "0", the 2nd > most significant is marked "1", etc. > This, for two reasons: > 1) When performing another PCA on the same corpus (say, with more principal > componenets), corresponding eigenvalues can be easily matched and compared. > 2) Makes it easier to discard the least significant principal components, > which for Lanczos decomposition are usually garbage. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira