Winston Chen created SPARK-5361: ----------------------------------- Summary: add in tuple handling for converting python RDD back to JavaRDD Key: SPARK-5361 URL: https://issues.apache.org/jira/browse/SPARK-5361 Project: Spark Issue Type: Bug Components: PySpark Reporter: Winston Chen
Existing `SerDeUtil.pythonToJava` implementation does not count in tuple cases: Pyrolite `python tuple` => `java Object[]`. So with the following data: ``` [ (u'2', {u'director': u'David Lean', u'genres': (u'Adventure', u'Biography', u'Drama'), u'title': u'Lawrence of Arabia', u'year': 1962}), (u'7', {u'director': u'Andrew Dominik', u'genres': (u'Biography', u'Crime', u'Drama'), u'title': u'The Assassination of Jesse James by the Coward Robert Ford', u'year': 2007}) ] ``` Exceptions happen with the `genres` part: ``` 15/01/16 10:28:31 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 7) java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to java.util.ArrayList at org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:157) at org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:153) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308) ``` This pull request adds in tuple handling both in `SerDeUtil.pythonToJava` and `JavaToWritableConverter.convertToWritable`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org