Github user keypointt commented on the issue: https://github.com/apache/spark/pull/17451 hi @MLnick , I'm stuck when trying to add test cases for python I tried below code chunk in pyspark terminal via `./bin/pyspark` ``` from pyspark.ml.feature import Word2Vec sent = ("a b " * 100 + "a c " * 10).split(" ") doc = spark.createDataFrame([(sent,), (sent,)], ["sentence"]) word2Vec = Word2Vec(vectorSize=5, seed=42, inputCol="sentence", outputCol="model") model = word2Vec.fit(doc) model.findSynonyms("a", 2) model.findSynonymsArray("a", 2) ``` and for `findSynonyms()`, I got results as expected: ``` >>> model.findSynonyms("a", 2) hahaha: Dataset JavaObject id=o143 DataFrame[word: string, similarity: double] ``` but for `findSynonymsArray()` I got below, which has no data ``` >>> model.findSynonymsArray("a", 2) [{u'__class__': u'scala.Tuple2'}, {u'__class__': u'scala.Tuple2'}] ``` I tried to debug and found `r` is in `elif isinstance(r, (JavaArray, JavaList)):` and dumped directly. It seems `Py4J` is not handling the returned object properly?https://github.com/apache/spark/blob/master/python/pyspark/ml/common.py#L90 could you please give me a hint here? I'm now trying to dig more into Py4J but it could take me some time. Thank you very much
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org