[ https://issues.apache.org/jira/browse/SPARK-12834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-12834: ------------------------------------ Assignee: (was: Apache Spark) > Use type conversion instead of Ser/De of Pickle to transform JavaArray and > JavaList > ----------------------------------------------------------------------------------- > > Key: SPARK-12834 > URL: https://issues.apache.org/jira/browse/SPARK-12834 > Project: Spark > Issue Type: Improvement > Reporter: Xusen Yin > > According to the Ser/De code in Python side: > {code:title=StringIndexerModel|theme=FadeToGrey|linenumbers=true|language=python|firstline=0001|collapse=false} > def _java2py(sc, r, encoding="bytes"): > if isinstance(r, JavaObject): > clsName = r.getClass().getSimpleName() > # convert RDD into JavaRDD > if clsName != 'JavaRDD' and clsName.endswith("RDD"): > r = r.toJavaRDD() > clsName = 'JavaRDD' > if clsName == 'JavaRDD': > jrdd = sc._jvm.SerDe.javaToPython(r) > return RDD(jrdd, sc) > if clsName == 'DataFrame': > return DataFrame(r, SQLContext.getOrCreate(sc)) > if clsName in _picklable_classes: > r = sc._jvm.SerDe.dumps(r) > elif isinstance(r, (JavaArray, JavaList)): > try: > r = sc._jvm.SerDe.dumps(r) > except Py4JJavaError: > pass # not pickable > if isinstance(r, (bytearray, bytes)): > r = PickleSerializer().loads(bytes(r), encoding=encoding) > return r > {code} > We use SerDe.dumps to serialize JavaArray and JavaList in PythonMLLibAPI, > then deserialize them with PickleSerializer in Python side. However, there is > no need to transform them in such an inefficient way. Instead of it, we can > use type conversion to convert them, e.g. list(JavaArray) or list(JavaList). > What's more, there is an issue to Ser/De Scala Array as I said in > https://issues.apache.org/jira/browse/SPARK-12780 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org