[ https://issues.apache.org/jira/browse/SPARK-17629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joseph K. Bradley resolved SPARK-17629. --------------------------------------- Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 16811 [https://github.com/apache/spark/pull/16811] > Add local version of Word2Vec findSynonyms for spark.ml > ------------------------------------------------------- > > Key: SPARK-17629 > URL: https://issues.apache.org/jira/browse/SPARK-17629 > Project: Spark > Issue Type: New Feature > Components: ML > Affects Versions: 2.2.0 > Reporter: Asher Krim > Assignee: Asher Krim > Priority: Minor > Fix For: 2.2.0 > > > ml Word2Vec's findSynonyms methods depart from mllib in that they return > distributed results, rather than the results directly: > {code} > def findSynonyms(word: String, num: Int): DataFrame = { > val spark = SparkSession.builder().getOrCreate() > spark.createDataFrame(wordVectors.findSynonyms(word, num)).toDF("word", > "similarity") > } > {code} > What was the reason for this decision? I would think that most users would > request a reasonably small number of results back, and want to use them > directly on the driver, similar to the _take_ method on dataframes. Returning > parallelized results creates a costly round trip for the data that doesn't > seem necessary. > The original PR: https://github.com/apache/spark/pull/7263 > [~MechCoder] - do you perhaps recall the reason? -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org