Hi! I'm on Spark 1.6.1 in local mode on Windows.
And have issue with zip of zip'pping of two RDDs of __equal__ size and __equal__ partitions number (I also tried to repartition both RDDs to one partition). I get such exception when I do rdd1.zip(rdd2).count(): File "c:\spark\python\lib\pyspark.zip\pyspark\worker.py", line 111, in main File "c:\spark\python\lib\pyspark.zip\pyspark\worker.py", line 106, in process File "c:\spark\python\lib\pyspark.zip\pyspark\serializers.py", line 263, in dump_stream vs = list(itertools.islice(iterator, batch)) File "c:\spark\python\pyspark\rddsampler.py", line 95, in func for obj in iterator: File "c:\spark\python\lib\pyspark.zip\pyspark\serializers.py", line 322, in load_stream " in pair: (%d, %d)" % (len(keys), len(vals))) ValueError: Can not deserialize RDD with different number of items in pair: (256, 512)