[ 
https://issues.apache.org/jira/browse/SPARK-5556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518378#comment-14518378
 ] 

Pedro Rodriguez commented on SPARK-5556:
----------------------------------------

I will start working on it again then. It would be great for that research 
project to result in Gibbs being added. The refactoring ended up roadblocking 
that quite a bit.

I think [~gq] was working on something called LightLDA. I don't know the 
specifics of the algorithm, but the sampler scales theoretically O(1) with 
topics. My implementation has something which in the testing I did looks like 
in practice it is O(1) or very near it.

To get Gibbs merged in (or as a candidate implementation), how does this look:
1. Refactor code to fit the PR that you just merged
2. Use the testing harness you used for the EM LDA to test with the same 
conditions. This should be fairly easy since you already did all the work to 
get things pipelining correctly.
3. If it scales well, then merge or consider other applications
4. Code review somewhere in there.

> Latent Dirichlet Allocation (LDA) using Gibbs sampler 
> ------------------------------------------------------
>
>                 Key: SPARK-5556
>                 URL: https://issues.apache.org/jira/browse/SPARK-5556
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Guoqiang Li
>            Assignee: Pedro Rodriguez
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to