[jira] [Comment Edited] (SPARK-5361) python tuple not supported while converting PythonRDD back to JavaRDD

Winston Chen (JIRA) Thu, 22 Jan 2015 14:25:39 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-5361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288331#comment-14288331
 ]


Winston Chen edited comment on SPARK-5361 at 1/22/15 10:23 PM:
---------------------------------------------------------------

Found a good way to reproduce it:

{noformat}
from pyspark.rdd import RDD

dl = [
    (u'2', {u'director': u'David Lean'}), 
    (u'7', {u'director': u'Andrew Dominik'})
]

dl_rdd = sc.parallelize(dl)
tmp = dl_rdd._to_java_object_rdd()
tmp2 = sc._jvm.SerDe.javaToPython(tmp)
t = RDD(tmp2, sc)
t.count()

tmp = t._to_java_object_rdd()
tmp2 = sc._jvm.SerDe.javaToPython(tmp)
t = RDD(tmp2, sc)
t.count() # it blows up here during the 2nd time of conversion
{noformat}

I am going to make a test case from this example.


was (Author: wingchen):
Found a good way to reproduce it:

{noformat}
from pyspark.rdd import RDD

dl = [
    (u'2', {u'director': u'David Lean'}), 
    (u'7', {u'director': u'Andrew Dominik'})
]

dl_rdd = sc.parallelize(dl)
tmp = dl_rdd._to_java_object_rdd()
tmp2 = sc._jvm.SerDe.javaToPython(tmp)
t = RDD(tmp2, sc)
t.count()

tmp = t._to_java_object_rdd()
tmp2 = sc._jvm.SerDe.javaToPython(tmp)
t = RDD(tmp2, sc)
t.count() # it blows up here
{noformat}

I am going to make a test case from this example.

> python tuple not supported while converting PythonRDD back to JavaRDD
> ---------------------------------------------------------------------
>
>                 Key: SPARK-5361
>                 URL: https://issues.apache.org/jira/browse/SPARK-5361
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>            Reporter: Winston Chen
>
> Existing `SerDeUtil.pythonToJava` implementation does not count in tuple 
> cases: Pyrolite `python tuple` => `java Object[]`.
> So with the following data:
> {noformat}
> [
> (u'2', {u'director': u'David Lean', u'genres': (u'Adventure', u'Biography', 
> u'Drama'), u'title': u'Lawrence of Arabia', u'year': 1962}), 
> (u'7', {u'director': u'Andrew Dominik', u'genres': (u'Biography', u'Crime', 
> u'Drama'), u'title': u'The Assassination of Jesse James by the Coward Robert 
> Ford', u'year': 2007})
> ]
> {noformat}
> Exceptions happen at the `genres` part:
> {noformat}
> 15/01/16 10:28:31 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 7)
> java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to 
> java.util.ArrayList
>       at 
> org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:157)
>       at 
> org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:153)
>       at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>       at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308)
> {noformat}
> There is already a pull-request for this bug:
> https://github.com/apache/spark/pull/4146



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-5361) python tuple not supported while converting PythonRDD back to JavaRDD

Reply via email to