[ https://issues.apache.org/jira/browse/SPARK-13289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15168619#comment-15168619 ]
Nick Pentreath commented on SPARK-13289: ---------------------------------------- Master branch should be building now. Can you try again? > Word2Vec generate infinite distances when numIterations>5 > --------------------------------------------------------- > > Key: SPARK-13289 > URL: https://issues.apache.org/jira/browse/SPARK-13289 > Project: Spark > Issue Type: Bug > Components: MLlib > Affects Versions: 1.6.0 > Environment: Linux, Scala > Reporter: Qi Dai > Labels: features > > I recently ran some word2vec experiments on a cluster with 50 executors on > some large text dataset but find out that when number of iterations is larger > than 5 the distance between words will be all infinite. My code looks like > this: > val text = sc.textFile("/project/NLP/1_biliion_words/train").map(_.split(" > ").toSeq) > import org.apache.spark.mllib.feature.{Word2Vec, Word2VecModel} > val word2vec = new > Word2Vec().setMinCount(25).setVectorSize(96).setNumPartitions(99).setNumIterations(10).setWindowSize(5) > val model = word2vec.fit(text) > val synonyms = model.findSynonyms("who", 40) > for((synonym, cosineSimilarity) <- synonyms) { > println(s"$synonym $cosineSimilarity") > } > The results are: > to Infinity > and Infinity > that Infinity > with Infinity > said Infinity > it Infinity > by Infinity > be Infinity > have Infinity > he Infinity > has Infinity > his Infinity > an Infinity > ) Infinity > not Infinity > who Infinity > I Infinity > had Infinity > their Infinity > were Infinity > they Infinity > but Infinity > been Infinity > I tried many different datasets and different words for finding synonyms. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org