Re: Serializability: for vs. while loops

2015-01-25 Thread Tobias Pfeiffer
Aaron,

On Thu, Jan 15, 2015 at 5:05 PM, Aaron Davidson  wrote:

> Scala for-loops are implemented as closures using anonymous inner classes
> which are instantiated once and invoked many times. This means, though,
> that the code inside the loop is actually sitting inside a class, which
> confuses Spark's Closure Cleaner, whose job is to remove unused references
> from closures to make otherwise-unserializable objects serializable.
>
> My understanding is, in particular, that the closure cleaner will null out
> unused fields in the closure, but cannot go past the first level of depth
> (i.e., it will not follow field references and null out *their *unused,
> and possibly unserializable, references), because this could end up
> mutating state outside of the closure itself. Thus, the extra level of
> depth of the closure that was introduced by the anonymous class (where
> presumably the "outer this" pointer is considered "used" by the closure
> cleaner) is sufficient to make it unserializable.
>

Now, two weeks later, let me add that this is one of the most helpful
comments I have received on this mailing list! This insight helped me save
90% of the time I spent with debugging NotSerializableExceptions.
Thank you very much!

Tobias


Re: Serializability: for vs. while loops

2015-01-15 Thread Tobias Pfeiffer
Aaron,

thanks for your mail!

On Thu, Jan 15, 2015 at 5:05 PM, Aaron Davidson  wrote:

> Scala for-loops are implemented as closures using anonymous inner classes
> [...]
> While loops, on the other hand, involve none of this trickery, and
> everyone is happy.
>

Ah, I was suspecting something like that... thank you very much for the
detailed explanation!

Tobias


Re: Serializability: for vs. while loops

2015-01-15 Thread Aaron Davidson
Scala for-loops are implemented as closures using anonymous inner classes
which are instantiated once and invoked many times. This means, though,
that the code inside the loop is actually sitting inside a class, which
confuses Spark's Closure Cleaner, whose job is to remove unused references
from closures to make otherwise-unserializable objects serializable.

My understanding is, in particular, that the closure cleaner will null out
unused fields in the closure, but cannot go past the first level of depth
(i.e., it will not follow field references and null out *their *unused, and
possibly unserializable, references), because this could end up mutating
state outside of the closure itself. Thus, the extra level of depth of the
closure that was introduced by the anonymous class (where presumably the
"outer this" pointer is considered "used" by the closure cleaner) is
sufficient to make it unserializable.

While loops, on the other hand, involve none of this trickery, and everyone
is happy.

On Wed, Jan 14, 2015 at 11:37 PM, Tobias Pfeiffer  wrote:

> Hi,
>
> sorry, I don't like questions about serializability myself, but still...
>
> Can anyone give me a hint why
>
>   for (i <- 0 to (maxId - 1)) {  ...  }
>
> throws a NotSerializableException in the loop body while
>
>   var i = 0
>   while (i < maxId) {
> // same code as in the for loop
> i += 1
>   }
>
> works fine? I guess there is something fundamentally different in the way
> Scala realizes for loops?
>
> Thanks
> Tobias
>