[ 
https://issues.apache.org/jira/browse/SPARK-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14576371#comment-14576371
 ] 

Joseph K. Bradley commented on SPARK-5567:
------------------------------------------

Keeping the shared code in an object method sounds reasonable to me.  Inference 
should be significantly easier, with the topics fixed.  But it is unclear how 
we "should" do inference and what type of prediction we should return.  Does 
this sound reasonable:
* With the topics fixed and the docConcentration parameter(s) fixed, it should 
be straightforward to compute the MAP prediction for topicDistributions (since 
we can sum out the token-topic assignments as is done in collapsed Gibbs 
sampling).
* People probably just want the MAP predictions, right?  I'm assuming they 
would not want more details about the distribution beyond the mode.
* With this setup, we could share the prediction code between all LDA models.

> Add prediction methods to LDA
> -----------------------------
>
>                 Key: SPARK-5567
>                 URL: https://issues.apache.org/jira/browse/SPARK-5567
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 1.3.0
>            Reporter: Joseph K. Bradley
>
> LDA currently supports prediction on the training set.  E.g., you can call 
> logLikelihood and topicDistributions to get that info for the training data.  
> However, it should support the same functionality for new (test) documents.
> This will require inference but should be able to use the same code, with a 
> few modification to keep the inferred topics fixed.
> Note: The API for these methods is already in the code but is commented out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to