[jira] [Commented] (SPARK-20082) Incremental update of LDA model, by adding initialModel as start point

2021-06-08 Thread Jira


[ 
https://issues.apache.org/jira/browse/SPARK-20082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359164#comment-17359164
 ] 

Cristina Flores Fernández commented on SPARK-20082:
---

Hi, is this feature finally implemented?

> Incremental update of LDA model, by adding initialModel as start point
> --
>
> Key: SPARK-20082
> URL: https://issues.apache.org/jira/browse/SPARK-20082
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 2.1.0
>Reporter: Mathieu DESPRIEE
>Priority: Major
>  Labels: bulk-closed
>
> Some mllib models support an initialModel to start from and update it 
> incrementally with new data.
> From what I understand of OnlineLDAOptimizer, it is possible to incrementally 
> update an existing model with batches of new documents.
> I suggest to add an initialModel as a start point for LDA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20082) Incremental update of LDA model, by adding initialModel as start point

2019-03-13 Thread yuhao yang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791938#comment-16791938
 ] 

yuhao yang commented on SPARK-20082:


Yuhao is taking family bonding leave from March 7th to Apr 19th . Please expect 
delayed email response. Conctact +86 13738085700 for anything urgent.

Thanks,
Yuhao


> Incremental update of LDA model, by adding initialModel as start point
> --
>
> Key: SPARK-20082
> URL: https://issues.apache.org/jira/browse/SPARK-20082
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 2.1.0
>Reporter: Mathieu DESPRIEE
>Priority: Major
>
> Some mllib models support an initialModel to start from and update it 
> incrementally with new data.
> From what I understand of OnlineLDAOptimizer, it is possible to incrementally 
> update an existing model with batches of new documents.
> I suggest to add an initialModel as a start point for LDA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20082) Incremental update of LDA model, by adding initialModel as start point

2019-03-13 Thread Marcellus de Castro Tavares (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791919#comment-16791919
 ] 

Marcellus de Castro Tavares commented on SPARK-20082:
-

Hi, is this feature still on the roadmap? It's been in progress for a while.

Thanks

> Incremental update of LDA model, by adding initialModel as start point
> --
>
> Key: SPARK-20082
> URL: https://issues.apache.org/jira/browse/SPARK-20082
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 2.1.0
>Reporter: Mathieu DESPRIEE
>Priority: Major
>
> Some mllib models support an initialModel to start from and update it 
> incrementally with new data.
> From what I understand of OnlineLDAOptimizer, it is possible to incrementally 
> update an existing model with batches of new documents.
> I suggest to add an initialModel as a start point for LDA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20082) Incremental update of LDA model, by adding initialModel as start point

2017-06-30 Thread yuhao yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070883#comment-16070883
 ] 

yuhao yang commented on SPARK-20082:


I'm OK with only supporting initialModel for Online LDA now. For EM LDA, an 
initial model is also possible, but we may need some extra check depending on 
if EM can fit on new documents.

I'll make a pass on the current implementation. But we still need the opinion 
and final check from [~josephkb] or other committers.

> Incremental update of LDA model, by adding initialModel as start point
> --
>
> Key: SPARK-20082
> URL: https://issues.apache.org/jira/browse/SPARK-20082
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 2.1.0
>Reporter: Mathieu DESPRIEE
>
> Some mllib models support an initialModel to start from and update it 
> incrementally with new data.
> From what I understand of OnlineLDAOptimizer, it is possible to incrementally 
> update an existing model with batches of new documents.
> I suggest to add an initialModel as a start point for LDA.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20082) Incremental update of LDA model, by adding initialModel as start point

2017-06-28 Thread Mathieu DESPRIEE (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066816#comment-16066816
 ] 

Mathieu DESPRIEE commented on SPARK-20082:
--

I updated the PR.

Basically, here is the approach :
- only Online optimizer is supported, any use with EM optimizer is rejected. If 
incremental is also desirable for EM, I suggest we open another JIRA for it, to 
take the time discussing the initialization with an existing graph and new 
documents.
- I added an {{initialModel}} parameter that is used to initialize doc 
concentration and topic matrix from it.

 [~yuhaoyan], could you check it please ?

> Incremental update of LDA model, by adding initialModel as start point
> --
>
> Key: SPARK-20082
> URL: https://issues.apache.org/jira/browse/SPARK-20082
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 2.1.0
>Reporter: Mathieu DESPRIEE
>
> Some mllib models support an initialModel to start from and update it 
> incrementally with new data.
> From what I understand of OnlineLDAOptimizer, it is possible to incrementally 
> update an existing model with batches of new documents.
> I suggest to add an initialModel as a start point for LDA.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20082) Incremental update of LDA model, by adding initialModel as start point

2017-05-24 Thread yuhao yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16022379#comment-16022379
 ] 

yuhao yang commented on SPARK-20082:


refer to https://issues.apache.org/jira/browse/SPARK-20767 for some insights 
shared by [~cezden]
{quote}
Technical aspects:
1. The implementation of LDA fitting does not currently allow the coefficients 
pre-setting (private setter), as noted by a comment in the source code of 
OnlineLDAOptimizer.setLambda: "This is only used for testing now. In the 
future, it can help support training stop/resume".
2. The lambda matrix is always randomly initialized by the optimizer, which 
needs fixing for preset lambda matrix.
{quote}

> Incremental update of LDA model, by adding initialModel as start point
> --
>
> Key: SPARK-20082
> URL: https://issues.apache.org/jira/browse/SPARK-20082
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 2.1.0
>Reporter: Mathieu D
>
> Some mllib models support an initialModel to start from and update it 
> incrementally with new data.
> From what I understand of OnlineLDAOptimizer, it is possible to incrementally 
> update an existing model with batches of new documents.
> I suggest to add an initialModel as a start point for LDA.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20082) Incremental update of LDA model, by adding initialModel as start point

2017-04-06 Thread yuhao yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15959368#comment-15959368
 ] 

yuhao yang commented on SPARK-20082:


Sorry I'm occupied by some internal project this week. I'll find some time to 
look into it this weekend or early next week.

> Incremental update of LDA model, by adding initialModel as start point
> --
>
> Key: SPARK-20082
> URL: https://issues.apache.org/jira/browse/SPARK-20082
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 2.1.0
>Reporter: Mathieu D
>
> Some mllib models support an initialModel to start from and update it 
> incrementally with new data.
> From what I understand of OnlineLDAOptimizer, it is possible to incrementally 
> update an existing model with batches of new documents.
> I suggest to add an initialModel as a start point for LDA.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20082) Incremental update of LDA model, by adding initialModel as start point

2017-04-06 Thread Mathieu D (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958754#comment-15958754
 ] 

Mathieu D commented on SPARK-20082:
---

[~yuhaoyan] or [~josephkb] any feedback on this approach and PR ?

> Incremental update of LDA model, by adding initialModel as start point
> --
>
> Key: SPARK-20082
> URL: https://issues.apache.org/jira/browse/SPARK-20082
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 2.1.0
>Reporter: Mathieu D
>
> Some mllib models support an initialModel to start from and update it 
> incrementally with new data.
> From what I understand of OnlineLDAOptimizer, it is possible to incrementally 
> update an existing model with batches of new documents.
> I suggest to add an initialModel as a start point for LDA.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20082) Incremental update of LDA model, by adding initialModel as start point

2017-03-28 Thread Mathieu D (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15945892#comment-15945892
 ] 

Mathieu D commented on SPARK-20082:
---

[~yuhaoyan] would you mind having a look to this PR. Right now, I added an 
initialModel only for the Online optimizer.

Regarding the EM optimizer, I could add new doc vertices and new doc->term 
edges to the existing graph. But it's unclear for me how the new doc vertices 
should be weighted when added. Right now for a new model, docs and terms 
vertices are weighted randomly, with the same total weight on docs and terms. 
If I add new docs to an existing graph, how to initialize the weights on this 
side ?

> Incremental update of LDA model, by adding initialModel as start point
> --
>
> Key: SPARK-20082
> URL: https://issues.apache.org/jira/browse/SPARK-20082
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 2.1.0
>Reporter: Mathieu D
>
> Some mllib models support an initialModel to start from and update it 
> incrementally with new data.
> From what I understand of OnlineLDAOptimizer, it is possible to incrementally 
> update an existing model with batches of new documents.
> I suggest to add an initialModel as a start point for LDA.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20082) Incremental update of LDA model, by adding initialModel as start point

2017-03-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15945866#comment-15945866
 ] 

Apache Spark commented on SPARK-20082:
--

User 'mdespriee' has created a pull request for this issue:
https://github.com/apache/spark/pull/17461

> Incremental update of LDA model, by adding initialModel as start point
> --
>
> Key: SPARK-20082
> URL: https://issues.apache.org/jira/browse/SPARK-20082
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 2.1.0
>Reporter: Mathieu D
>
> Some mllib models support an initialModel to start from and update it 
> incrementally with new data.
> From what I understand of OnlineLDAOptimizer, it is possible to incrementally 
> update an existing model with batches of new documents.
> I suggest to add an initialModel as a start point for LDA.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20082) Incremental update of LDA model, by adding initialModel as start point

2017-03-24 Thread yuhao yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940960#comment-15940960
 ] 

yuhao yang commented on SPARK-20082:


Yes, that's one of the things that we should improve for LDA.
If you're interested in working on the issue, could you please first share some 
rough design, given the complexity from both EM and Online optimizers and 
models.

> Incremental update of LDA model, by adding initialModel as start point
> --
>
> Key: SPARK-20082
> URL: https://issues.apache.org/jira/browse/SPARK-20082
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 2.1.0
>Reporter: Mathieu D
>
> Some mllib models support an initialModel to start from and update it 
> incrementally with new data.
> From what I understand of OnlineLDAOptimizer, it is possible to incrementally 
> update an existing model with batches of new documents.
> I suggest to add an initialModel as a start point for LDA.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org