[ https://issues.apache.org/jira/browse/SPARK-5361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288331#comment-14288331 ]
Winston Chen commented on SPARK-5361: ------------------------------------- Found a good way to reproduce it: {noformat} from pyspark.rdd import RDD dl = [ (u'2', {u'director': u'David Lean'}), (u'7', {u'director': u'Andrew Dominik'}) ] dl_rdd = sc.parallelize(dl) tmp = dl_rdd._to_java_object_rdd() tmp2 = sc._jvm.SerDe.javaToPython(tmp) t = RDD(tmp2, sc) t.count() tmp = t._to_java_object_rdd() tmp2 = sc._jvm.SerDe.javaToPython(tmp) t = RDD(tmp2, sc) t.count() # it blows up here {noformat} I am going to make a test case from this example. > python tuple not supported while converting PythonRDD back to JavaRDD > --------------------------------------------------------------------- > > Key: SPARK-5361 > URL: https://issues.apache.org/jira/browse/SPARK-5361 > Project: Spark > Issue Type: Bug > Components: PySpark > Reporter: Winston Chen > > Existing `SerDeUtil.pythonToJava` implementation does not count in tuple > cases: Pyrolite `python tuple` => `java Object[]`. > So with the following data: > {noformat} > [ > (u'2', {u'director': u'David Lean', u'genres': (u'Adventure', u'Biography', > u'Drama'), u'title': u'Lawrence of Arabia', u'year': 1962}), > (u'7', {u'director': u'Andrew Dominik', u'genres': (u'Biography', u'Crime', > u'Drama'), u'title': u'The Assassination of Jesse James by the Coward Robert > Ford', u'year': 2007}) > ] > {noformat} > Exceptions happen at the `genres` part: > {noformat} > 15/01/16 10:28:31 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 7) > java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to > java.util.ArrayList > at > org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:157) > at > org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:153) > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308) > {noformat} > There is already a pull-request for this bug: > https://github.com/apache/spark/pull/4146 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org