Dear Sab ,
I must appreciate your kind reply very much, it would be much helpful.
On Monday, December 21, 2015 8:49 PM, Sabarish Sasidharan
wrote:
collect() will bring everything to driver and is costly. Instead of using
collect() + parallelize, you
collect() will bring everything to driver and is costly. Instead of using
collect() + parallelize, you could use rdd1.checkpoint() along with a more
efficient action like rdd1.count(). This you can do within the for loop.
Hopefully you are using the Kryo serializer already.
Regards
Sab
On Mon,