Hi,
I need to split a RDD into 3 different RDD using filter-transformation.
I have cached the original RDD before using filter.
The input is lopsided leaving some executors with heavy load while others
with less; so I have repartitioned it.
*DAG-lineage I expected:*
I/P RDD --> MAP RDD -->
Hi,
You should perform an action (e.g. count, take, saveAs*, etc. ) in order
for your RDDs to be cached since cache/persist are lazy functions. You
might also want to do coalesce instead of repartition to avoid shuffling.
Thanks,
Deng
On Mon, Nov 2, 2015 at 5:53 PM, Sushrut Ikhar