Re: Make proactive check for closure serializability optional?

2019-01-22 Thread Sean Owen
Agree, I'm not pushing for it unless there's other evidence. The closure check does entail serialization, not just checking serializability, note. I don't like flags either but this one sounded like it could actually be something a user wanted to vary, globally, for runs of the same code. On Tue,

Re: Make proactive check for closure serializability optional?

2019-01-21 Thread Felix Cheung
Agreed on the pros / cons, esp driver could be the data science notebook. Is it worthwhile making it configurable? From: Sean Owen Sent: Monday, January 21, 2019 10:42 AM To: Reynold Xin Cc: dev Subject: Re: Make proactive check for closure serializability

Re: Make proactive check for closure serializability optional?

2019-01-21 Thread Sean Owen
None except the bug / PR I linked to, which is really just a bug in the RowMatrix implementation; a 2GB closure isn't reasonable. I doubt it's much overhead in the common case, because closures are small and this extra check happens once per execution of the closure. I can also imagine

Re: Make proactive check for closure serializability optional?

2019-01-21 Thread Reynold Xin
Did you actually observe a perf issue? On Mon, Jan 21, 2019 at 10:04 AM Sean Owen wrote: > The ClosureCleaner proactively checks that closures passed to > transformations like RDD.map() are serializable, before they're > executed. It does this by just serializing it with the JavaSerializer. > >

Make proactive check for closure serializability optional?

2019-01-21 Thread Sean Owen
The ClosureCleaner proactively checks that closures passed to transformations like RDD.map() are serializable, before they're executed. It does this by just serializing it with the JavaSerializer. That's a nice feature, although there's overhead in always trying to serialize the closure ahead of