Github user keypointt commented on the issue: https://github.com/apache/spark/pull/17451 ``` >>> from pyspark.ml.feature import Word2Vec >>> sent = ("a b " * 100 + "a c " * 10).split(" ") >>> doc = spark.createDataFrame([(sent,), (sent,)], ["sentence"]) >>> word2Vec = Word2Vec(vectorSize=5, seed=42, inputCol="sentence", outputCol="model") >>> model = word2Vec.fit(doc) ``` above is the setup, and I created the `vec` below. It's fitting in `model.findSynonyms` nicely ``` >>> from pyspark.ml.linalg import Vectors >>> vec = Vectors.dense([0.267, -0.2691, 0.058, -0.0801, 0.1821, 0.4162, 0.0259, -0.2163, 0.1787, 0.0764]) >>> model.findSynonyms(vec, 2) DataFrame[word: string, similarity: double] ``` but `vec` cannot fit in `model.findSynonymsArray` even its type is `<class 'pyspark.ml.linalg.DenseVector'>` ``` >>> model.findSynonymsArray(vec, 2) word: [0.267,-0.2691,0.058,-0.0801,0.1821,0.4162,0.0259,-0.2163,0.1787,0.0764] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/renxin/Documents/workspace/spark/python/pyspark/ml/feature.py", line 2951, in findSynonymsArray tuples = self._java_obj.findSynonymsArray(word, num) File "/Users/renxin/Documents/workspace/spark/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py", line 1160, in __call__ File "/Users/renxin/Documents/workspace/spark/python/pyspark/sql/utils.py", line 63, in deco return f(*a, **kw) File "/Users/renxin/Documents/workspace/spark/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py", line 324, in get_return_value py4j.protocol.Py4JError: An error occurred while calling o65.findSynonymsArray. Trace: py4j.Py4JException: Method findSynonymsArray([class java.util.ArrayList, class java.lang.Integer]) does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:274) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745) >>> type(vec) <class 'pyspark.ml.linalg.DenseVector'> ``` here `vec` is taken as `java.util.ArrayList` does `self._java_obj.findSynonymsArray(word, num)` behave differently from `self._call_java("findSynonyms", word, num)` for Vector type? thank you Holden ð
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org