[ 
https://issues.apache.org/jira/browse/SPARK-20082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15945892#comment-15945892
 ] 

Mathieu D edited comment on SPARK-20082 at 3/28/17 8:39 PM:
------------------------------------------------------------

[~yuhaoyan] would you mind having a look to this PR ? Right now, I added an 
initialModel, suported only by the Online optimizer.

The implementation is inspired from the KMeans one. The initialModel is used as 
a replacement of the initial randomized matrix.

Regarding the EM optimizer, in the same way, we could use an existing model 
instead of a randomly weighted graph, by adding new doc vertices and new 
doc->term edges to the existing graph. But it's unclear for me how the new doc 
vertices should be weighted when added. Right now for a new model, docs and 
terms vertices are weighted randomly, with the same total weight on docs and 
terms. If I add new docs to an existing graph, how to initialize the weights on 
this side ?



was (Author: mathieude):
[~yuhaoyan] would you mind having a look to this PR ? Right now, I added an 
initialModel, suported only by the Online optimizer.

Regarding the EM optimizer, I could add new doc vertices and new doc->term 
edges to the existing graph. But it's unclear for me how the new doc vertices 
should be weighted when added. Right now for a new model, docs and terms 
vertices are weighted randomly, with the same total weight on docs and terms. 
If I add new docs to an existing graph, how to initialize the weights on this 
side ?

> Incremental update of LDA model, by adding initialModel as start point
> ----------------------------------------------------------------------
>
>                 Key: SPARK-20082
>                 URL: https://issues.apache.org/jira/browse/SPARK-20082
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>    Affects Versions: 2.1.0
>            Reporter: Mathieu D
>
> Some mllib models support an initialModel to start from and update it 
> incrementally with new data.
> From what I understand of OnlineLDAOptimizer, it is possible to incrementally 
> update an existing model with batches of new documents.
> I suggest to add an initialModel as a start point for LDA.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to