[ https://issues.apache.org/jira/browse/SPARK-5261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14396112#comment-14396112 ]
Sean Owen commented on SPARK-5261: ---------------------------------- I think they both come down to a minCount that is too low. If you're going to reopen, can you please follow up on the request to try that? or provide your data set? I don't think it's actionable if there's no follow-up. > In some cases ,The value of word's vector representation is too big > ------------------------------------------------------------------- > > Key: SPARK-5261 > URL: https://issues.apache.org/jira/browse/SPARK-5261 > Project: Spark > Issue Type: Bug > Components: MLlib > Affects Versions: 1.2.0 > Reporter: Guoqiang Li > > {code} > val word2Vec = new Word2Vec() > word2Vec. > setVectorSize(100). > setSeed(42L). > setNumIterations(5). > setNumPartitions(36) > {code} > The average absolute value of the word's vector representation is 60731.8 > {code} > val word2Vec = new Word2Vec() > word2Vec. > setVectorSize(100). > setSeed(42L). > setNumIterations(5). > setNumPartitions(1) > {code} > The average absolute value of the word's vector representation is 0.13889 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org