[jira] [Assigned] (SPARK-12834) Use type conversion instead of Ser/De of Pickle to transform JavaArray and JavaList

2016-01-15 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12834:


Assignee: (was: Apache Spark)

> Use type conversion instead of Ser/De of Pickle to transform JavaArray and 
> JavaList
> ---
>
> Key: SPARK-12834
> URL: https://issues.apache.org/jira/browse/SPARK-12834
> Project: Spark
>  Issue Type: Improvement
>Reporter: Xusen Yin
>
> According to the Ser/De code in Python side:
> {code:title=StringIndexerModel|theme=FadeToGrey|linenumbers=true|language=python|firstline=0001|collapse=false}
>   def _java2py(sc, r, encoding="bytes"):
> if isinstance(r, JavaObject):
> clsName = r.getClass().getSimpleName()
> # convert RDD into JavaRDD
> if clsName != 'JavaRDD' and clsName.endswith("RDD"):
> r = r.toJavaRDD()
> clsName = 'JavaRDD'
> if clsName == 'JavaRDD':
> jrdd = sc._jvm.SerDe.javaToPython(r)
> return RDD(jrdd, sc)
> if clsName == 'DataFrame':
> return DataFrame(r, SQLContext.getOrCreate(sc))
> if clsName in _picklable_classes:
> r = sc._jvm.SerDe.dumps(r)
> elif isinstance(r, (JavaArray, JavaList)):
> try:
> r = sc._jvm.SerDe.dumps(r)
> except Py4JJavaError:
> pass  # not pickable
> if isinstance(r, (bytearray, bytes)):
> r = PickleSerializer().loads(bytes(r), encoding=encoding)
> return r
> {code}
> We use SerDe.dumps to serialize JavaArray and JavaList in PythonMLLibAPI, 
> then deserialize them with PickleSerializer in Python side. However, there is 
> no need to transform them in such an inefficient way. Instead of it, we can 
> use type conversion to convert them, e.g. list(JavaArray) or list(JavaList). 
> What's more, there is an issue to Ser/De Scala Array as I said in 
> https://issues.apache.org/jira/browse/SPARK-12780



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12834) Use type conversion instead of Ser/De of Pickle to transform JavaArray and JavaList

2016-01-15 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12834:


Assignee: Apache Spark

> Use type conversion instead of Ser/De of Pickle to transform JavaArray and 
> JavaList
> ---
>
> Key: SPARK-12834
> URL: https://issues.apache.org/jira/browse/SPARK-12834
> Project: Spark
>  Issue Type: Improvement
>Reporter: Xusen Yin
>Assignee: Apache Spark
>
> According to the Ser/De code in Python side:
> {code:title=StringIndexerModel|theme=FadeToGrey|linenumbers=true|language=python|firstline=0001|collapse=false}
>   def _java2py(sc, r, encoding="bytes"):
> if isinstance(r, JavaObject):
> clsName = r.getClass().getSimpleName()
> # convert RDD into JavaRDD
> if clsName != 'JavaRDD' and clsName.endswith("RDD"):
> r = r.toJavaRDD()
> clsName = 'JavaRDD'
> if clsName == 'JavaRDD':
> jrdd = sc._jvm.SerDe.javaToPython(r)
> return RDD(jrdd, sc)
> if clsName == 'DataFrame':
> return DataFrame(r, SQLContext.getOrCreate(sc))
> if clsName in _picklable_classes:
> r = sc._jvm.SerDe.dumps(r)
> elif isinstance(r, (JavaArray, JavaList)):
> try:
> r = sc._jvm.SerDe.dumps(r)
> except Py4JJavaError:
> pass  # not pickable
> if isinstance(r, (bytearray, bytes)):
> r = PickleSerializer().loads(bytes(r), encoding=encoding)
> return r
> {code}
> We use SerDe.dumps to serialize JavaArray and JavaList in PythonMLLibAPI, 
> then deserialize them with PickleSerializer in Python side. However, there is 
> no need to transform them in such an inefficient way. Instead of it, we can 
> use type conversion to convert them, e.g. list(JavaArray) or list(JavaList). 
> What's more, there is an issue to Ser/De Scala Array as I said in 
> https://issues.apache.org/jira/browse/SPARK-12780



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org