Aaron, On Thu, Jan 15, 2015 at 5:05 PM, Aaron Davidson <ilike...@gmail.com> wrote:
> Scala for-loops are implemented as closures using anonymous inner classes > which are instantiated once and invoked many times. This means, though, > that the code inside the loop is actually sitting inside a class, which > confuses Spark's Closure Cleaner, whose job is to remove unused references > from closures to make otherwise-unserializable objects serializable. > > My understanding is, in particular, that the closure cleaner will null out > unused fields in the closure, but cannot go past the first level of depth > (i.e., it will not follow field references and null out *their *unused, > and possibly unserializable, references), because this could end up > mutating state outside of the closure itself. Thus, the extra level of > depth of the closure that was introduced by the anonymous class (where > presumably the "outer this" pointer is considered "used" by the closure > cleaner) is sufficient to make it unserializable. > Now, two weeks later, let me add that this is one of the most helpful comments I have received on this mailing list! This insight helped me save 90% of the time I spent with debugging NotSerializableExceptions. Thank you very much! Tobias