Agreed on the pros / cons, esp driver could be the data science notebook.
Is it worthwhile making it configurable?


________________________________
From: Sean Owen <sro...@gmail.com>
Sent: Monday, January 21, 2019 10:42 AM
To: Reynold Xin
Cc: dev
Subject: Re: Make proactive check for closure serializability optional?

None except the bug / PR I linked to, which is really just a bug in
the RowMatrix implementation; a 2GB closure isn't reasonable.
I doubt it's much overhead in the common case, because closures are
small and this extra check happens once per execution of the closure.

I can also imagine middle-ground cases where people are dragging along
largeish 10MB closures (like, a model or some data) and this could add
non-trivial memory pressure on the driver. They should be broadcasting
those things, sure.

Given just that I'd leave it alone, but was wondering if anyone had
ever had the same thought or more arguments that it should be
disable-able. In 'production' one would imagine all the closures do
serialize correctly and so this is just a bit overhead that could be
skipped.

On Mon, Jan 21, 2019 at 12:17 PM Reynold Xin <r...@databricks.com> wrote:
>
> Did you actually observe a perf issue?
>
> On Mon, Jan 21, 2019 at 10:04 AM Sean Owen <sro...@gmail.com> wrote:
>>
>> The ClosureCleaner proactively checks that closures passed to
>> transformations like RDD.map() are serializable, before they're
>> executed. It does this by just serializing it with the JavaSerializer.
>>
>> That's a nice feature, although there's overhead in always trying to
>> serialize the closure ahead of time, especially if the closure is
>> large. It shouldn't be large, usually. But I noticed it when coming up
>> with this fix: https://github.com/apache/spark/pull/23600
>>
>> It made me wonder, should this be optional, or even not the default?
>> Closures that don't serialize still fail, just later when an action is
>> invoked. I don't feel strongly about it, just checking if anyone had
>> pondered this before.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to