Agree, I'm not pushing for it unless there's other evidence. The closure
check does entail serialization, not just checking serializability, note.
I don't like flags either but this one sounded like it could actually be
something a user wanted to vary, globally, for runs of the same code.
On Tue,
Agreed on the pros / cons, esp driver could be the data science notebook.
Is it worthwhile making it configurable?
From: Sean Owen
Sent: Monday, January 21, 2019 10:42 AM
To: Reynold Xin
Cc: dev
Subject: Re: Make proactive check for closure serializability
None except the bug / PR I linked to, which is really just a bug in
the RowMatrix implementation; a 2GB closure isn't reasonable.
I doubt it's much overhead in the common case, because closures are
small and this extra check happens once per execution of the closure.
I can also imagine
Did you actually observe a perf issue?
On Mon, Jan 21, 2019 at 10:04 AM Sean Owen wrote:
> The ClosureCleaner proactively checks that closures passed to
> transformations like RDD.map() are serializable, before they're
> executed. It does this by just serializing it with the JavaSerializer.
>
>
The ClosureCleaner proactively checks that closures passed to
transformations like RDD.map() are serializable, before they're
executed. It does this by just serializing it with the JavaSerializer.
That's a nice feature, although there's overhead in always trying to
serialize the closure ahead of