Bryan Cutler created SPARK-17161:
------------------------------------

             Summary: Add PySpark-ML JavaWrapper convienience function to 
create py4j JavaArrays
                 Key: SPARK-17161
                 URL: https://issues.apache.org/jira/browse/SPARK-17161
             Project: Spark
          Issue Type: Improvement
          Components: ML, PySpark
            Reporter: Bryan Cutler
            Priority: Minor


Often in Spark ML, there are classes that use a Scala `Array` to construct.  In 
order to add the same API to Python, a Java-friendly alternate constructor 
needs to exist to be compatible with py4j when converting from a list.  This is 
because the current conversion in PySpark _py2java creates a 
java.util.ArrayList, as shown in this error msg

{noformat}
Py4JError: An error occurred while calling 
None.org.apache.spark.ml.feature.CountVectorizerModel. Trace:
py4j.Py4JException: Constructor 
org.apache.spark.ml.feature.CountVectorizerModel([class java.util.ArrayList]) 
does not exist
        at 
py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:179)
        at 
py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:196)
        at py4j.Gateway.invoke(Gateway.java:235)
{noformat}

Creating an alternate constructor can be avoided by creating a py4j JavaArray 
using {{new_array}}.  This type is compatible with the Scala `Array` currently 
used in classes like {{CountVectorizerModel}} and {{StringIndexerModel}}.

Most of the boiler-plate Python code to do this can be put in a convenience 
function inside of  ml.JavaWrapper to give a clean way of constructing ML 
objects without adding special constructors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to