Charles Hayden created SPARK-5558: ------------------------------------- Summary: pySpark zip function unexpected errors Key: SPARK-5558 URL: https://issues.apache.org/jira/browse/SPARK-5558 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.2.0 Reporter: Charles Hayden
Example: x = sc.parallelize(range(0,5)) y = x.map(lambda x: x+1000, preservesPartitioning=True) y.take(10) # Also fails with the following #y.toDebugString() x.zip(y).collect() Fails in the JVM: py4J: org.apache.spark.SparkException: Can only zip RDDs with same number of elements in each partition If the range is changed to range(0,1000) it fails in pySpark code: ValueError: Can not deserialize RDD with different number of items in pair: (100, 1) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org