Charles Hayden created SPARK-5558:
-------------------------------------

             Summary: pySpark zip function unexpected errors
                 Key: SPARK-5558
                 URL: https://issues.apache.org/jira/browse/SPARK-5558
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 1.2.0
            Reporter: Charles Hayden


Example:
x = sc.parallelize(range(0,5))
y = x.map(lambda x: x+1000, preservesPartitioning=True)
y.take(10)
# Also fails with the following
#y.toDebugString()
x.zip(y).collect()

Fails in the JVM: py4J: org.apache.spark.SparkException: 
Can only zip RDDs with same number of elements in each partition

If the range is changed to range(0,1000) it fails in pySpark code:
ValueError: Can not deserialize RDD with different number of items in pair: 
(100, 1) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to