Re: Serializability: for vs. while loops
Aaron, On Thu, Jan 15, 2015 at 5:05 PM, Aaron Davidson wrote: > Scala for-loops are implemented as closures using anonymous inner classes > which are instantiated once and invoked many times. This means, though, > that the code inside the loop is actually sitting inside a class, which > confuses Spark's Closure Cleaner, whose job is to remove unused references > from closures to make otherwise-unserializable objects serializable. > > My understanding is, in particular, that the closure cleaner will null out > unused fields in the closure, but cannot go past the first level of depth > (i.e., it will not follow field references and null out *their *unused, > and possibly unserializable, references), because this could end up > mutating state outside of the closure itself. Thus, the extra level of > depth of the closure that was introduced by the anonymous class (where > presumably the "outer this" pointer is considered "used" by the closure > cleaner) is sufficient to make it unserializable. > Now, two weeks later, let me add that this is one of the most helpful comments I have received on this mailing list! This insight helped me save 90% of the time I spent with debugging NotSerializableExceptions. Thank you very much! Tobias
Re: Serializability: for vs. while loops
Aaron, thanks for your mail! On Thu, Jan 15, 2015 at 5:05 PM, Aaron Davidson wrote: > Scala for-loops are implemented as closures using anonymous inner classes > [...] > While loops, on the other hand, involve none of this trickery, and > everyone is happy. > Ah, I was suspecting something like that... thank you very much for the detailed explanation! Tobias
Re: Serializability: for vs. while loops
Scala for-loops are implemented as closures using anonymous inner classes which are instantiated once and invoked many times. This means, though, that the code inside the loop is actually sitting inside a class, which confuses Spark's Closure Cleaner, whose job is to remove unused references from closures to make otherwise-unserializable objects serializable. My understanding is, in particular, that the closure cleaner will null out unused fields in the closure, but cannot go past the first level of depth (i.e., it will not follow field references and null out *their *unused, and possibly unserializable, references), because this could end up mutating state outside of the closure itself. Thus, the extra level of depth of the closure that was introduced by the anonymous class (where presumably the "outer this" pointer is considered "used" by the closure cleaner) is sufficient to make it unserializable. While loops, on the other hand, involve none of this trickery, and everyone is happy. On Wed, Jan 14, 2015 at 11:37 PM, Tobias Pfeiffer wrote: > Hi, > > sorry, I don't like questions about serializability myself, but still... > > Can anyone give me a hint why > > for (i <- 0 to (maxId - 1)) { ... } > > throws a NotSerializableException in the loop body while > > var i = 0 > while (i < maxId) { > // same code as in the for loop > i += 1 > } > > works fine? I guess there is something fundamentally different in the way > Scala realizes for loops? > > Thanks > Tobias >