question on setup() and cleanup() methods for map() and reduce()

2014-04-28 Thread Parsian, Mahmoud
In classic MapReduce/Hadoop, you may optionally define setup() and cleanup() 
methods.
They ( setup() and cleanup() ) are called for each task, so if you have 20 
mappers running, the setup/cleanup will be called for each one.
What is the equivalent of these in Spark?

Thanks,
best regards,
Mahmoud




Re: question on setup() and cleanup() methods for map() and reduce()

2014-04-28 Thread Ameet Kini
I don't think there is a setup() or cleanup() in Spark but you can usually
achieve the same using mapPartitions and having the setup code at the top
of the mapPartitions and cleanup at the end.

The reason why this usually works is that in Hadoop map/reduce, each map
task runs over an input split. If you call mapPartitions over a HadoopRDD,
each partition is effectively an input split.

Ameet


On Mon, Apr 28, 2014 at 9:22 PM, Parsian, Mahmoud mpars...@illumina.comwrote:

  In classic MapReduce/Hadoop, you may optionally define setup() and
 cleanup() methods.
  They ( setup() and cleanup() ) are called for each task, so if you have
 20 mappers running, the setup/cleanup will be called for each one.
 What is the equivalent of these in Spark?

  Thanks,
   best regards,
 Mahmoud





Re: question on setup() and cleanup() methods for map() and reduce()

2014-04-28 Thread Parsian, Mahmoud
Thank you very much Ameet!
Can you please point me to an example?
Best,
Mahmoud

Sent from my iPhone

On Apr 28, 2014, at 6:32 PM, Ameet Kini 
ameetk...@gmail.commailto:ameetk...@gmail.com wrote:

I don't think there is a setup() or cleanup() in Spark but you can usually 
achieve the same using mapPartitions and having the setup code at the top of 
the mapPartitions and cleanup at the end.

The reason why this usually works is that in Hadoop map/reduce, each map task 
runs over an input split. If you call mapPartitions over a HadoopRDD, each 
partition is effectively an input split.

Ameet


On Mon, Apr 28, 2014 at 9:22 PM, Parsian, Mahmoud 
mpars...@illumina.commailto:mpars...@illumina.com wrote:
In classic MapReduce/Hadoop, you may optionally define setup() and cleanup() 
methods.
They ( setup() and cleanup() ) are called for each task, so if you have 20 
mappers running, the setup/cleanup will be called for each one.
What is the equivalent of these in Spark?

Thanks,
best regards,
Mahmoud