Re: question on setup() and cleanup() methods for map() and reduce()
Thank you very much Ameet! Can you please point me to an example? Best, Mahmoud Sent from my iPhone On Apr 28, 2014, at 6:32 PM, "Ameet Kini" mailto:ameetk...@gmail.com>> wrote: I don't think there is a setup() or cleanup() in Spark but you can usually achieve the same using mapPartitions and having the "setup" code at the top of the mapPartitions and "cleanup" at the end. The reason why this usually works is that in Hadoop map/reduce, each map task runs over an input split. If you call mapPartitions over a HadoopRDD, each partition is effectively an input split. Ameet On Mon, Apr 28, 2014 at 9:22 PM, Parsian, Mahmoud mailto:mpars...@illumina.com>> wrote: In classic MapReduce/Hadoop, you may optionally define setup() and cleanup() methods. They ( setup() and cleanup() ) are called for each task, so if you have 20 mappers running, the setup/cleanup will be called for each one. What is the equivalent of these in Spark? Thanks, best regards, Mahmoud
Re: question on setup() and cleanup() methods for map() and reduce()
I don't think there is a setup() or cleanup() in Spark but you can usually achieve the same using mapPartitions and having the "setup" code at the top of the mapPartitions and "cleanup" at the end. The reason why this usually works is that in Hadoop map/reduce, each map task runs over an input split. If you call mapPartitions over a HadoopRDD, each partition is effectively an input split. Ameet On Mon, Apr 28, 2014 at 9:22 PM, Parsian, Mahmoud wrote: > In classic MapReduce/Hadoop, you may optionally define setup() and > cleanup() methods. > They ( setup() and cleanup() ) are called for each task, so if you have > 20 mappers running, the setup/cleanup will be called for each one. > What is the equivalent of these in Spark? > > Thanks, > best regards, > Mahmoud > > >
question on setup() and cleanup() methods for map() and reduce()
In classic MapReduce/Hadoop, you may optionally define setup() and cleanup() methods. They ( setup() and cleanup() ) are called for each task, so if you have 20 mappers running, the setup/cleanup will be called for each one. What is the equivalent of these in Spark? Thanks, best regards, Mahmoud