[GitHub] spark issue #17451: [SPARK-19866][ML][PySpark] Add local version of Word2Vec...

keypointt Sat, 01 Apr 2017 15:24:06 -0700

Github user keypointt commented on the issue:

    https://github.com/apache/spark/pull/17451
  
    hi @MLnick , I'm stuck when trying to add test cases for python
    
    I tried below code chunk in pyspark terminal via `./bin/pyspark`
    
    ```
    from pyspark.ml.feature import Word2Vec
    
    sent = ("a b " * 100 + "a c " * 10).split(" ")
    doc = spark.createDataFrame([(sent,), (sent,)], ["sentence"])
    word2Vec = Word2Vec(vectorSize=5, seed=42, inputCol="sentence", 
outputCol="model")
    model = word2Vec.fit(doc)
    
    model.findSynonyms("a", 2)
    model.findSynonymsArray("a", 2)
    ```
    and for `findSynonyms()`, I got results as expected:
    ```
    >>> model.findSynonyms("a", 2)
    hahaha:  Dataset
    JavaObject id=o143
    DataFrame[word: string, similarity: double]
    ```
    but for `findSynonymsArray()` I got below, which has no data
    ```
    >>> model.findSynonymsArray("a", 2)
    [{u'__class__': u'scala.Tuple2'}, {u'__class__': u'scala.Tuple2'}]
    ```
    
    I tried to debug and found `r` is in `elif isinstance(r, (JavaArray, 
JavaList)):` and dumped directly. It seems `Py4J` is not handling the returned 
object 
properly?https://github.com/apache/spark/blob/master/python/pyspark/ml/common.py#L90
    
    could you please give me a hint here? I'm now trying to dig more into Py4J 
but it could take me some time. Thank you very much



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17451: [SPARK-19866][ML][PySpark] Add local version of Word2Vec...

Reply via email to