[jira] [Comment Edited] (SPARK-14864) [MLLIB] Implement Doc2Vec
[ https://issues.apache.org/jira/browse/SPARK-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266024#comment-15266024 ] Peter Mountanos edited comment on SPARK-14864 at 5/2/16 12:45 AM: -- I will try to work out this feature if no one else has made any progress. was (Author: peter.mounta...@nyu.edu): I will try to work out this issue if no one else has made any progress. > [MLLIB] Implement Doc2Vec > - > > Key: SPARK-14864 > URL: https://issues.apache.org/jira/browse/SPARK-14864 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Peter Mountanos >Priority: Minor > > It would be useful to implement Doc2Vec, as described in the paper > [Distributed Representations of Sentences and > Documents|https://cs.stanford.edu/~quocle/paragraph_vector.pdf]. Gensim has > an implementation [Deep learning with > paragraph2vec|https://radimrehurek.com/gensim/models/doc2vec.html]. > Le & Mikolov show that when aggregating Word2Vec vector representations for a > paragraph/document, it does not perform well for prediction tasks. Instead, > they propose the Paragraph Vector implementation, which provides > state-of-the-art results on several text classification and sentiment > analysis tasks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14864) [MLLIB] Implement Doc2Vec
[ https://issues.apache.org/jira/browse/SPARK-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266024#comment-15266024 ] Peter Mountanos commented on SPARK-14864: - I will try to work out this issue if no one else has made any progress. > [MLLIB] Implement Doc2Vec > - > > Key: SPARK-14864 > URL: https://issues.apache.org/jira/browse/SPARK-14864 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Peter Mountanos >Priority: Minor > > It would be useful to implement Doc2Vec, as described in the paper > [Distributed Representations of Sentences and > Documents|https://cs.stanford.edu/~quocle/paragraph_vector.pdf]. Gensim has > an implementation [Deep learning with > paragraph2vec|https://radimrehurek.com/gensim/models/doc2vec.html]. > Le & Mikolov show that when aggregating Word2Vec vector representations for a > paragraph/document, it does not perform well for prediction tasks. Instead, > they propose the Paragraph Vector implementation, which provides > state-of-the-art results on several text classification and sentiment > analysis tasks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14864) [MLLIB] Implement Doc2Vec
[ https://issues.apache.org/jira/browse/SPARK-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15255275#comment-15255275 ] Peter Mountanos commented on SPARK-14864: - [~prudenko] [~cqnguyen] I noticed previous discussion of possibly implementing Doc2Vec in issue SPARK-4101. Has there been any headway on this? > [MLLIB] Implement Doc2Vec > - > > Key: SPARK-14864 > URL: https://issues.apache.org/jira/browse/SPARK-14864 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Peter Mountanos >Priority: Minor > > It would be useful to implement Doc2Vec, as described in the paper > [Distributed Representations of Sentences and > Documents|https://cs.stanford.edu/~quocle/paragraph_vector.pdf]. Gensim has > an implementation [Deep learning with > paragraph2vec|https://radimrehurek.com/gensim/models/doc2vec.html]. > Le & Mikolov show that when aggregating Word2Vec vector representations for a > paragraph/document, it does not perform well for prediction tasks. Instead, > they propose the Paragraph Vector implementation, which provides > state-of-the-art results on several text classification and sentiment > analysis tasks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14864) [MLLIB] Implement Doc2Vec
Peter Mountanos created SPARK-14864: --- Summary: [MLLIB] Implement Doc2Vec Key: SPARK-14864 URL: https://issues.apache.org/jira/browse/SPARK-14864 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Peter Mountanos Priority: Minor It would be useful to implement Doc2Vec, as described in the paper [Distributed Representations of Sentences and Documents|https://cs.stanford.edu/~quocle/paragraph_vector.pdf]. Gensim has an implementation [Deep learning with paragraph2vec|https://radimrehurek.com/gensim/models/doc2vec.html]. Le & Mikolov show that when aggregating Word2Vec vector representations for a paragraph/document, it does not perform well for prediction tasks. Instead, they propose the Paragraph Vector implementation, which provides state-of-the-art results on several text classification and sentiment analysis tasks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org