[ 
https://issues.apache.org/jira/browse/MAHOUT-399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13120675#comment-13120675
 ] 

Michael Lazarus commented on MAHOUT-399:
----------------------------------------

We should mark it as won't fix.  I can add my unit tests to Jake's 
implementation when it comes in and I have time, if you like.  It looks like he 
is taking a good approach by distributing the collapsed Gibbs sampling and then 
by optimizing the sampling of the Markov chain which easily provides a 10x 
scale up improvement.  That works well.




                
> LDA on Mahout 0.3 does not converge to correct solution for overlapping 
> pyramids toy problem.
> ---------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-399
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-399
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.3, 0.4, 0.5
>         Environment: Mac OS X 10.6.2, Hadoop 0.20.2, Mahout 0.3.
>            Reporter: Michael Lazarus
>            Assignee: Grant Ingersoll
>              Labels: lda, mahout
>             Fix For: 0.6
>
>         Attachments: Overlapping Pyramids Toy Dataset.pdf, olt.tar
>
>
> Hello,
> Apologies if I have not labeled this correctly.
> I have run a toy problem on Mahout 0.3 (locally) for LDA that I used to test 
> Blei's c version of LDA that he posts on his site. It has an exact solution 
> that the LDA should converge to.  Please see attached PDF that describes the 
> intended output.
> Is LDA working?  The following output indicates some sort of collapsing 
> behavior to me.
> T0    T1      T2      T3      T4
> x     w       x       u       x
> u     u       g       j       n
> l     r       i       m       l
> j     q       h       h       p
> v     p       e       i       q
> e     t       f       g       v
> d     s       d       f       o
> b     c       b       n       k
> y     f       c       l       m
> w     v       u       v       u
> c     d       p       y       t
> k     o       l       r       r
> i     b       j       k       j
> f     e       k       e       f
> g     x       y       s       y
> t     y       w       b       w
> h     i       s       p       s
> o     l       v       x       d
> q     j       t       d       i
> n     k       o       t       b
> The intended output is (again, please see attached):
> D     I       N       S       X
> d     i       n       s       x
> c     h       m       t       y
> e     j       o       r       w
> b     k       l       u       v
> f     g       p       q       a
> a     f       k       p       b
> g     l       q       v       u
> h     m       j       w       t
> y     u       r       o       c
> n     s       d       d       i
> s     e       x       f       f
> r     q       i       i       n
> m     v       w       c       o
> o     w       u       a       h
> q     n       s       h       g
> p     t       c       x       d
> t     x       f       e       l
> x     d       e       j       s
> w     y       g       b       j
> i     r       y       n       r
> u     o       h       y       m
> k     b       t       l       e
> v     c       a       m       k
> j     a       b       g       p
> l     p       v       k       q
> What tests do you run to make sure the output is correct?
> Thank you,
> Mike.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to