I've discovered that it was noticed a year ago that RDD zip() does not work 
when the number of partitions does not evenly divide the total number of 
elements in the RDD:

https://groups.google.com/forum/#!msg/spark-users/demrmjHFnoc/Ek3ijiXHr2MJ

I will enter a JIRA ticket just as soon as the ASF Jira system will let me 
reset my password.



On Sunday, May 11, 2014 4:40 AM, Michael Malak <michaelma...@yahoo.com> wrote:

Is this a bug?

scala> sc.parallelize(1 to 2,4).zip(sc.parallelize(11 to 12,4)).collect
res0: Array[(Int, Int)] = Array((1,11), (2,12))

scala> sc.parallelize(1L to 2L,4).zip(sc.parallelize(11 to 12,4)).collect
res1: Array[(Long, Int)] = Array((2,11))

Reply via email to