I am not sure specifically about specific purpose of this function but
Spark needs to remove elements from the closure that may be included by
default but not really needed so as to serialize it & send it to executors
to operate on RDD. For example a function in Map function of RDD  may
reference objects inside the class, so you may want to send across those
objects but not the whole parent class.


Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>



On Mon, Jul 28, 2014 at 8:28 PM, Wang, Jensen <jensen.w...@sap.com> wrote:

>  Hi, All
>
>               Before sc.runJob invokes dagScheduler.runJob, the func
> performed on the rdd is “cleaned” by ClosureCleaner.clearn.
>
>              Why  spark has to do this? What’s the purpose?
>

Reply via email to