Bryan Cutler created SPARK-17161: ------------------------------------ Summary: Add PySpark-ML JavaWrapper convienience function to create py4j JavaArrays Key: SPARK-17161 URL: https://issues.apache.org/jira/browse/SPARK-17161 Project: Spark Issue Type: Improvement Components: ML, PySpark Reporter: Bryan Cutler Priority: Minor
Often in Spark ML, there are classes that use a Scala `Array` to construct. In order to add the same API to Python, a Java-friendly alternate constructor needs to exist to be compatible with py4j when converting from a list. This is because the current conversion in PySpark _py2java creates a java.util.ArrayList, as shown in this error msg {noformat} Py4JError: An error occurred while calling None.org.apache.spark.ml.feature.CountVectorizerModel. Trace: py4j.Py4JException: Constructor org.apache.spark.ml.feature.CountVectorizerModel([class java.util.ArrayList]) does not exist at py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:179) at py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:196) at py4j.Gateway.invoke(Gateway.java:235) {noformat} Creating an alternate constructor can be avoided by creating a py4j JavaArray using {{new_array}}. This type is compatible with the Scala `Array` currently used in classes like {{CountVectorizerModel}} and {{StringIndexerModel}}. Most of the boiler-plate Python code to do this can be put in a convenience function inside of ml.JavaWrapper to give a clean way of constructing ML objects without adding special constructors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org