[ https://issues.apache.org/jira/browse/SPARK-12016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15035580#comment-15035580 ]
Apache Spark commented on SPARK-12016: -------------------------------------- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/10100 > word2vec load model can't use findSynonyms to get words > -------------------------------------------------------- > > Key: SPARK-12016 > URL: https://issues.apache.org/jira/browse/SPARK-12016 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 1.5.2 > Environment: ubuntu 14.04 > Reporter: yuangang.liu > > I use word2vec.fit to train a word2vecModel and then save the model to file > system. when I load the model from file system, I found I can use > transform('a') to get a vector, but I can't use findSynonyms('a', 2) to get > some words. > I use the fellow code to test word2vec > from pyspark import SparkContext > from pyspark.mllib.feature import Word2Vec, Word2VecModel > import os, tempfile > from shutil import rmtree > if __name__ == '__main__': > sc = SparkContext('local', 'test') > sentence = "a b " * 100 + "a c " * 10 > localDoc = [sentence, sentence] > doc = sc.parallelize(localDoc).map(lambda line: line.split(" ")) > model = Word2Vec().setVectorSize(10).setSeed(42).fit(doc) > syms = model.findSynonyms("a", 2) > print [s[0] for s in syms] > path = tempfile.mkdtemp() > model.save(sc, path) > sameModel = Word2VecModel.load(sc, path) > print model.transform("a") == sameModel.transform("a") > syms = sameModel.findSynonyms("a", 2) > print [s[0] for s in syms] > try: > rmtree(path) > except OSError: > pass > I got "[u'b', u'c']" when the first printf > then the “True” and " [u'__class__'] " > I don't know how to get 'b' or 'c' with sameModel.findSynonyms("a", 2) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org