I am not sure specifically about specific purpose of this function but Spark needs to remove elements from the closure that may be included by default but not really needed so as to serialize it & send it to executors to operate on RDD. For example a function in Map function of RDD may reference objects inside the class, so you may want to send across those objects but not the whole parent class.
Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi <https://twitter.com/mayur_rustagi> On Mon, Jul 28, 2014 at 8:28 PM, Wang, Jensen <jensen.w...@sap.com> wrote: > Hi, All > > Before sc.runJob invokes dagScheduler.runJob, the func > performed on the rdd is “cleaned” by ClosureCleaner.clearn. > > Why spark has to do this? What’s the purpose? >