GitHub user nzw0301 opened a pull request: https://github.com/apache/spark/pull/19372
[MLLIB] Fix update equation of learning rate in Word2Vec.scala ## What changes were proposed in this pull request? Current equation of learning rate is incorrect when `numIterations` > `1`. This PR is based on [original C code](https://github.com/tmikolov/word2vec/blob/master/word2vec.c#L393). cc: @mengxr ## How was this patch tested? manual tests I modified [this example code](https://spark.apache.org/docs/2.1.1/mllib-feature-extraction.html#example). ### `numIteration=1` #### Code ```scala import org.apache.spark.mllib.feature.{Word2Vec, Word2VecModel} val input = sc.textFile("data/mllib/sample_lda_data.txt").map(line => line.split(" ").toSeq) val word2vec = new Word2Vec() val model = word2vec.fit(input) val synonyms = model.findSynonyms("1", 5) for((synonym, cosineSimilarity) <- synonyms) { println(s"$synonym $cosineSimilarity") } ``` #### Result ``` 0 0.3267880082130432 2 0.21420614421367645 3 0.19923636317253113 9 0.1063166931271553 4 0.0397246889770031 ``` ### `numIteration=5` #### Code ```scala import org.apache.spark.mllib.feature.{Word2Vec, Word2VecModel} val input = sc.textFile("data/mllib/sample_lda_data.txt").map(line => line.split(" ").toSeq) val word2vec = new Word2Vec() word2vec.setNumIterations(5) val model = word2vec.fit(input) val synonyms = model.findSynonyms("1", 5) for((synonym, cosineSimilarity) <- synonyms) { println(s"$synonym $cosineSimilarity") } ``` #### Result ``` 2 0.9803512096405029 0 0.9774332642555237 3 0.9450059533119202 4 0.9394038319587708 9 -0.7876168489456177 ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/nzw0301/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19372.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19372 ---- commit e2a7d393e141405f658a68f99bc4a1f53816db95 Author: Kento NOZAWA <k_...@klis.tsukuba.ac.jp> Date: 2017-09-27T17:04:03Z Update equation of lr ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org