I don't think there is a setup() or cleanup() in Spark but you can usually
achieve the same using mapPartitions and having the "setup" code at the top
of the mapPartitions and "cleanup" at the end.

The reason why this usually works is that in Hadoop map/reduce, each map
task runs over an input split. If you call mapPartitions over a HadoopRDD,
each partition is effectively an input split.


On Mon, Apr 28, 2014 at 9:22 PM, Parsian, Mahmoud <mpars...@illumina.com>wrote:

>  In classic MapReduce/Hadoop, you may optionally define setup() and
> cleanup() methods.
>  They ( setup() and cleanup() ) are called for each task, so if you have
> 20 mappers running, the setup/cleanup will be called for each one.
> What is the equivalent of these in Spark?
>  Thanks,
>   best regards,
> Mahmoud

Reply via email to