[jira] [Commented] (SPARK-12016) word2vec load model can't use findSynonyms to get words

Apache Spark (JIRA) Wed, 02 Dec 2015 02:11:24 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-12016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15035580#comment-15035580
 ]


Apache Spark commented on SPARK-12016:
--------------------------------------

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/10100

> word2vec load model can't use findSynonyms to get words 
> --------------------------------------------------------
>
>                 Key: SPARK-12016
>                 URL: https://issues.apache.org/jira/browse/SPARK-12016
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.5.2
>         Environment: ubuntu 14.04
>            Reporter: yuangang.liu
>
> I use word2vec.fit to train a word2vecModel and then save the model to file 
> system. when I load the model from file system, I found I can use 
> transform('a') to get a vector, but I can't use findSynonyms('a', 2) to get 
> some words.
> I use the fellow code to test word2vec
> from pyspark import SparkContext
> from pyspark.mllib.feature import Word2Vec, Word2VecModel
> import os, tempfile
> from shutil import rmtree
> if __name__ == '__main__':
>     sc = SparkContext('local', 'test')
>     sentence = "a b " * 100 + "a c " * 10
>     localDoc = [sentence, sentence]
>     doc = sc.parallelize(localDoc).map(lambda line: line.split(" "))
>     model = Word2Vec().setVectorSize(10).setSeed(42).fit(doc)
>     syms = model.findSynonyms("a", 2)
>     print [s[0] for s in syms]
>     path = tempfile.mkdtemp()
>     model.save(sc, path)
>     sameModel = Word2VecModel.load(sc, path)
>     print model.transform("a") == sameModel.transform("a")
>     syms = sameModel.findSynonyms("a", 2)
>     print [s[0] for s in syms]
>     try:
>         rmtree(path)
>     except OSError:
>         pass
> I got "[u'b', u'c']" when the first printf
> then the “True” and " [u'__class__'] "
> I don't know how to get 'b' or 'c' with sameModel.findSynonyms("a", 2)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12016) word2vec load model can't use findSynonyms to get words

Reply via email to