Incorrect number of records after left outer join (I think)

2015-02-19 Thread Darin McBeath
Consider the following left outer join potentialDailyModificationsRDD = reducedDailyPairRDD.leftOuterJoin(baselinePairRDD).partitionBy(new HashPartitioner(1024)).persist(StorageLevel.MEMORY_AND_DISK_SER()); Below are the record counts for the RDDs involved Number of records for

Re: Incorrect number of records after left outer join (I think)

2015-02-19 Thread Imran Rashid
if you have duplicate values for a key, join creates all pairs. Eg. if you 2 values for key X in rdd A 2 values for key X in rdd B, then a.join(B) will have 4 records for key X On Thu, Feb 19, 2015 at 3:39 PM, Darin McBeath ddmcbe...@yahoo.com.invalid wrote: Consider the following left outer