[ 
https://issues.apache.org/jira/browse/MAHOUT-369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Leshem updated MAHOUT-369:
--------------------------------

    Attachment: MAHOUT-369.patch

Attached is a simple patch that fixed the two issues raised.

The right way to do this is to introduce a new unit-test that fails with the 
current version (e.g. decompose a fixed matrix and verify all its known 
eigenvalues are found). The attached patch has no such code.

All relevant unit-tests pass (I'm getting errors for a few 
org.apache.mahout.clustering tests, nothing related to this change though).

> Issues with DistributedLanczosSolver output
> -------------------------------------------
>
>                 Key: MAHOUT-369
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-369
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>    Affects Versions: 0.3, 0.4
>            Reporter: Danny Leshem
>             Fix For: 0.4
>
>         Attachments: MAHOUT-369.patch
>
>
> DistributedLanczosSolver (line 99) claims to persist eigenVectors.numRows() 
> vectors.
> {code}
>     log.info("Persisting " + eigenVectors.numRows() + " eigenVectors and 
> eigenValues to: " + outputPath);
> {code}
> However, a few lines later (line 106) we have
> {code}
>     for(int i=0; i<eigenVectors.numRows() - 1; i++) {
>         ...
>     }
> {code}
> which only persists eigenVectors.numRows()-1 vectors.
> Seems like the most significant eigenvector (i.e. the one with the largest 
> eigenvalue) is omitted... off by one bug?
> Also, I think it would be better if the eigenvectors are persisted in 
> *reverse* order, meaning the most significant vector is marked "0", the 2nd 
> most significant is marked "1", etc.
> This, for two reasons:
> 1) When performing another PCA on the same corpus (say, with more principal 
> componenets), corresponding eigenvalues can be easily matched and compared.  
> 2) Makes it easier to discard the least significant principal components, 
> which for Lanczos decomposition are usually garbage.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to