Winston Chen created SPARK-5361:
-----------------------------------

             Summary: add in tuple handling for converting python RDD back to 
JavaRDD
                 Key: SPARK-5361
                 URL: https://issues.apache.org/jira/browse/SPARK-5361
             Project: Spark
          Issue Type: Bug
          Components: PySpark
            Reporter: Winston Chen


Existing `SerDeUtil.pythonToJava` implementation does not count in tuple cases: 
Pyrolite `python tuple` => `java Object[]`.

So with the following data:

```
[
(u'2', {u'director': u'David Lean', u'genres': (u'Adventure', u'Biography', 
u'Drama'), u'title': u'Lawrence of Arabia', u'year': 1962}), 
(u'7', {u'director': u'Andrew Dominik', u'genres': (u'Biography', u'Crime', 
u'Drama'), u'title': u'The Assassination of Jesse James by the Coward Robert 
Ford', u'year': 2007})
]
```

Exceptions happen with the `genres` part:

```
15/01/16 10:28:31 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 7)
java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to 
java.util.ArrayList
        at 
org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:157)
        at 
org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:153)
        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308)
```

This pull request adds in tuple handling both in `SerDeUtil.pythonToJava` and 
`JavaToWritableConverter.convertToWritable`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to