From API docs: Zips this RDD with another one, returning key-value
pairs with the first element in each RDD, second element in each RDD,
etc. Assumes that the two RDDs have the *same number of partitions*
and the *same number of elements in each partition* (e.g. one was made
through a map on the other).
Basically, one RDD should be a mapped RDD of the other, or both RDDs
are mapped RDDs of the same RDD.
Btw, your message says Dell - Internal Use - Confidential...
Best,
Xiangrui
On Tue, Apr 1, 2014 at 7:27 PM, patrick_nico...@dell.com wrote:
Dell - Internal Use - Confidential
I got an exception can't zip RDDs with unusual numbers of Partitions when
I apply any action (reduce, collect) of dataset created by zipping two
dataset of 10 million entries each. The problem occurs independently of the
number of partitions or when I let Spark creates those partitions.
Interestingly enough, I do not have problem zipping datasets of 1 and 2.5
million entries.
A similar problem was reported on this board with 0.8 but remember if the
problem was fixed.
Any idea? Any workaround?
I appreciate.