Hello Imran,
Thanks for your response. I noticed the intersection and subtract
methods for a RDD, does they work based on hash off all the fields in a RDD
record ?
- Himanish
On Thu, Feb 19, 2015 at 6:11 PM, Imran Rashid iras...@cloudera.com wrote:
the more scalable alternative is to do a
Hi,
I have two RDD's with csv data as below :
RDD-1
101970_5854301840,fbcf5485-e696-4100-9468-a17ec7c5bb43,19229261643
101970_5854301839,fbaf5485-e696-4100-9468-a17ec7c5bb39,9229261645
101970_5854301839,fbbf5485-e696-4100-9468-a17ec7c5bb39,9229261647
the more scalable alternative is to do a join (or a variant like cogroup,
leftOuterJoin, subtractByKey etc. found in PairRDDFunctions)
the downside is this requires a shuffle of both your RDDs
On Thu, Feb 19, 2015 at 3:36 PM, Himanish Kushary himan...@gmail.com
wrote:
Hi,
I have two RDD's