Hi!

I'm on Spark 1.6.1 in local mode on Windows.

And have issue with zip of zip'pping of two RDDs of __equal__ size and
__equal__ partitions number (I also tried to repartition both RDDs to one
partition).
I get such exception when I do rdd1.zip(rdd2).count():

File "c:\spark\python\lib\pyspark.zip\pyspark\worker.py", line 111, in main
  File "c:\spark\python\lib\pyspark.zip\pyspark\worker.py", line 106, in process
  File "c:\spark\python\lib\pyspark.zip\pyspark\serializers.py", line
263, in dump_stream
    vs = list(itertools.islice(iterator, batch))
  File "c:\spark\python\pyspark\rddsampler.py", line 95, in func
    for obj in iterator:
  File "c:\spark\python\lib\pyspark.zip\pyspark\serializers.py", line
322, in load_stream
    " in pair: (%d, %d)" % (len(keys), len(vals)))
ValueError: Can not deserialize RDD with different number of items in
pair: (256, 512)

Reply via email to