Assume the following where both updatePairRDD and deletePairRDD are both 
HashPartitioned.  Before the union, each one of these has 512 partitions.   The 
new created updateDeletePairRDD has 1024 partitions.  Is this the 
general/expected behavior for a union (the number of partitions to double)?
JavaPairRDD<String,String> updateDeletePairRDD = 
updatePairRDD.union(deletePairRDD);
Then a similar question for subtractByKey.  In the example below, 
baselinePairRDD is HashPartitioned (with 512 partitions).  We know from above 
that updateDeletePairRDD has 1024 partitions.  The newly created 
workSubtractBaselinePairRDD has 512 partitions.  This makes sense because we 
are only 'subtracting' records from the baselinePairRDD and one wouldn't think 
the number of partitions would increase.  Is this the general/expected behavior 
for a subractByKey?

JavaPairRDD<String,String> workSubtractBaselinePairRDD = 
baselinePairRDD.subtractByKey(updateDeletePairRDD);

Reply via email to