Re: Serializability: for vs. while loops

2015-01-25 Thread Tobias Pfeiffer
Aaron,

On Thu, Jan 15, 2015 at 5:05 PM, Aaron Davidson ilike...@gmail.com wrote:

 Scala for-loops are implemented as closures using anonymous inner classes
 which are instantiated once and invoked many times. This means, though,
 that the code inside the loop is actually sitting inside a class, which
 confuses Spark's Closure Cleaner, whose job is to remove unused references
 from closures to make otherwise-unserializable objects serializable.

 My understanding is, in particular, that the closure cleaner will null out
 unused fields in the closure, but cannot go past the first level of depth
 (i.e., it will not follow field references and null out *their *unused,
 and possibly unserializable, references), because this could end up
 mutating state outside of the closure itself. Thus, the extra level of
 depth of the closure that was introduced by the anonymous class (where
 presumably the outer this pointer is considered used by the closure
 cleaner) is sufficient to make it unserializable.


Now, two weeks later, let me add that this is one of the most helpful
comments I have received on this mailing list! This insight helped me save
90% of the time I spent with debugging NotSerializableExceptions.
Thank you very much!

Tobias


Re: Serializability: for vs. while loops

2015-01-15 Thread Aaron Davidson
Scala for-loops are implemented as closures using anonymous inner classes
which are instantiated once and invoked many times. This means, though,
that the code inside the loop is actually sitting inside a class, which
confuses Spark's Closure Cleaner, whose job is to remove unused references
from closures to make otherwise-unserializable objects serializable.

My understanding is, in particular, that the closure cleaner will null out
unused fields in the closure, but cannot go past the first level of depth
(i.e., it will not follow field references and null out *their *unused, and
possibly unserializable, references), because this could end up mutating
state outside of the closure itself. Thus, the extra level of depth of the
closure that was introduced by the anonymous class (where presumably the
outer this pointer is considered used by the closure cleaner) is
sufficient to make it unserializable.

While loops, on the other hand, involve none of this trickery, and everyone
is happy.

On Wed, Jan 14, 2015 at 11:37 PM, Tobias Pfeiffer t...@preferred.jp wrote:

 Hi,

 sorry, I don't like questions about serializability myself, but still...

 Can anyone give me a hint why

   for (i - 0 to (maxId - 1)) {  ...  }

 throws a NotSerializableException in the loop body while

   var i = 0
   while (i  maxId) {
 // same code as in the for loop
 i += 1
   }

 works fine? I guess there is something fundamentally different in the way
 Scala realizes for loops?

 Thanks
 Tobias



Re: Serializability: for vs. while loops

2015-01-15 Thread Tobias Pfeiffer
Aaron,

thanks for your mail!

On Thu, Jan 15, 2015 at 5:05 PM, Aaron Davidson ilike...@gmail.com wrote:

 Scala for-loops are implemented as closures using anonymous inner classes
 [...]
 While loops, on the other hand, involve none of this trickery, and
 everyone is happy.


Ah, I was suspecting something like that... thank you very much for the
detailed explanation!

Tobias