Peter Mountanos created SPARK-14864: ---------------------------------------
Summary: [MLLIB] Implement Doc2Vec Key: SPARK-14864 URL: https://issues.apache.org/jira/browse/SPARK-14864 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Peter Mountanos Priority: Minor It would be useful to implement Doc2Vec, as described in the paper [Distributed Representations of Sentences and Documents|https://cs.stanford.edu/~quocle/paragraph_vector.pdf]. Gensim has an implementation [Deep learning with paragraph2vec|https://radimrehurek.com/gensim/models/doc2vec.html]. Le & Mikolov show that when aggregating Word2Vec vector representations for a paragraph/document, it does not perform well for prediction tasks. Instead, they propose the Paragraph Vector implementation, which provides state-of-the-art results on several text classification and sentiment analysis tasks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org