Re: Issue with zip and partitions

2014-04-02 Thread Xiangrui Meng
From API docs: Zips this RDD with another one, returning key-value
pairs with the first element in each RDD, second element in each RDD,
etc. Assumes that the two RDDs have the *same number of partitions*
and the *same number of elements in each partition* (e.g. one was made
through a map on the other).

Basically, one RDD should be a mapped RDD of the other, or both RDDs
are mapped RDDs of the same RDD.

Btw, your message says Dell - Internal Use - Confidential...

Best,
Xiangrui

On Tue, Apr 1, 2014 at 7:27 PM,  patrick_nico...@dell.com wrote:
 Dell - Internal Use - Confidential

 I got an exception can't zip RDDs with unusual numbers of Partitions when
 I apply any action (reduce, collect) of dataset created by zipping two
 dataset of 10 million entries each.  The problem occurs independently of the
 number of partitions or when I let Spark creates those partitions.



 Interestingly enough, I do not have problem zipping datasets of 1 and 2.5
 million entries.

 A similar problem was reported on this board with 0.8 but remember if the
 problem was fixed.



 Any idea? Any workaround?



 I appreciate.


Issue with zip and partitions

2014-04-01 Thread Patrick_Nicolas
Dell - Internal Use - Confidential
I got an exception can't zip RDDs with unusual numbers of Partitions when I 
apply any action (reduce, collect) of dataset created by zipping two dataset of 
10 million entries each.  The problem occurs independently of the number of 
partitions or when I let Spark creates those partitions.

Interestingly enough, I do not have problem zipping datasets of 1 and 2.5 
million entries.
A similar problem was reported on this board with 0.8 but remember if the 
problem was fixed.

Any idea? Any workaround?

I appreciate.