Jake Mannix created MAHOUT-1009:
-----------------------------------

             Summary: Remove old LDA implementation from codebase
                 Key: MAHOUT-1009
                 URL: https://issues.apache.org/jira/browse/MAHOUT-1009
             Project: Mahout
          Issue Type: Improvement
          Components: Clustering
    Affects Versions: 0.7
            Reporter: Jake Mannix
            Priority: Minor
             Fix For: 0.7


The old LDA is unmaintained and unsupported.  We already (since 0.6) have a 
newer, faster version in the o.a.m.clustering.lda.cvb package, which I'm 
actively working on and using in production at Twitter.  We should delete the 
old o.a.m.clustering.lda codebase.

Normally, I'd say that we should at the same time promote 
o.a.m.clustering.lda.cvb up a package-level, but that would cause some serious 
merge conflicts on my GitHub branch (with updates/improvements/new features 
targetted for 0.8), so we can get users on this new code by simply changing the 
driver.classes.props to have "lda" point to CVB0Driver as the main().

One thing which goes away entirely, is the LDAPrintTopics class, but it's 
replaced by simply doing VectorDumper with the -sort option on the model files, 
which is more standard anyways.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to