Hi Xiangrui,
I'm sorry. I didn't recognize your mail.
What I did is a workaround only working for my special case.
It does not scale and only works for small data sets but that is fine
for me so far.
Kind Regards,
Niklas
def securlyZipRdds[A, B: ClassTag](rdd1: RDD[A], rdd2: RDD[B]):
RDD[(A,
sample 2 * n tuples, split them into two parts, balance the sizes of
these parts by filtering some tuples out
How do you guarantee that the two RDDs have the same size?
-Xiangrui
On Fri, Jan 9, 2015 at 3:40 AM, Niklas Wilcke
1wil...@informatik.uni-hamburg.de wrote:
Hi Spark community,
I have
Hi Spark community,
I have a problem with zipping two RDDs of the same size and same number
of partitions.
The error message says that zipping is only allowed on RDDs which are
partitioned into chunks of exactly the same sizes.
How can I assure this? My workaround at the moment is to repartition