[jira] [Comment Edited] (SPARK-14864) [MLLIB] Implement Doc2Vec

2016-05-01 Thread Peter Mountanos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266024#comment-15266024
 ] 

Peter Mountanos edited comment on SPARK-14864 at 5/2/16 12:45 AM:
--

I will try to work out this feature if no one else has made any progress.


was (Author: peter.mounta...@nyu.edu):
I will try to work out this issue if no one else has made any progress.

> [MLLIB] Implement Doc2Vec
> -
>
> Key: SPARK-14864
> URL: https://issues.apache.org/jira/browse/SPARK-14864
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Peter Mountanos
>Priority: Minor
>
> It would be useful to implement Doc2Vec, as described in the paper 
> [Distributed Representations of Sentences and 
> Documents|https://cs.stanford.edu/~quocle/paragraph_vector.pdf]. Gensim has 
> an implementation [Deep learning with 
> paragraph2vec|https://radimrehurek.com/gensim/models/doc2vec.html]. 
> Le & Mikolov show that when aggregating Word2Vec vector representations for a 
> paragraph/document, it does not perform well for prediction tasks. Instead, 
> they propose the Paragraph Vector implementation, which provides 
> state-of-the-art results on several text classification and sentiment 
> analysis tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14864) [MLLIB] Implement Doc2Vec

2016-05-01 Thread Peter Mountanos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266024#comment-15266024
 ] 

Peter Mountanos commented on SPARK-14864:
-

I will try to work out this issue if no one else has made any progress.

> [MLLIB] Implement Doc2Vec
> -
>
> Key: SPARK-14864
> URL: https://issues.apache.org/jira/browse/SPARK-14864
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Peter Mountanos
>Priority: Minor
>
> It would be useful to implement Doc2Vec, as described in the paper 
> [Distributed Representations of Sentences and 
> Documents|https://cs.stanford.edu/~quocle/paragraph_vector.pdf]. Gensim has 
> an implementation [Deep learning with 
> paragraph2vec|https://radimrehurek.com/gensim/models/doc2vec.html]. 
> Le & Mikolov show that when aggregating Word2Vec vector representations for a 
> paragraph/document, it does not perform well for prediction tasks. Instead, 
> they propose the Paragraph Vector implementation, which provides 
> state-of-the-art results on several text classification and sentiment 
> analysis tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14864) [MLLIB] Implement Doc2Vec

2016-04-23 Thread Peter Mountanos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15255275#comment-15255275
 ] 

Peter Mountanos commented on SPARK-14864:
-

[~prudenko] [~cqnguyen] I noticed previous discussion of possibly implementing 
Doc2Vec in issue SPARK-4101. Has there been any headway on this?

> [MLLIB] Implement Doc2Vec
> -
>
> Key: SPARK-14864
> URL: https://issues.apache.org/jira/browse/SPARK-14864
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Peter Mountanos
>Priority: Minor
>
> It would be useful to implement Doc2Vec, as described in the paper 
> [Distributed Representations of Sentences and 
> Documents|https://cs.stanford.edu/~quocle/paragraph_vector.pdf]. Gensim has 
> an implementation [Deep learning with 
> paragraph2vec|https://radimrehurek.com/gensim/models/doc2vec.html]. 
> Le & Mikolov show that when aggregating Word2Vec vector representations for a 
> paragraph/document, it does not perform well for prediction tasks. Instead, 
> they propose the Paragraph Vector implementation, which provides 
> state-of-the-art results on several text classification and sentiment 
> analysis tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14864) [MLLIB] Implement Doc2Vec

2016-04-22 Thread Peter Mountanos (JIRA)
Peter Mountanos created SPARK-14864:
---

 Summary: [MLLIB] Implement Doc2Vec
 Key: SPARK-14864
 URL: https://issues.apache.org/jira/browse/SPARK-14864
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Peter Mountanos
Priority: Minor


It would be useful to implement Doc2Vec, as described in the paper [Distributed 
Representations of Sentences and 
Documents|https://cs.stanford.edu/~quocle/paragraph_vector.pdf]. Gensim has an 
implementation [Deep learning with 
paragraph2vec|https://radimrehurek.com/gensim/models/doc2vec.html]. 

Le & Mikolov show that when aggregating Word2Vec vector representations for a 
paragraph/document, it does not perform well for prediction tasks. Instead, 
they propose the Paragraph Vector implementation, which provides 
state-of-the-art results on several text classification and sentiment analysis 
tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org