The ClosureCleaner proactively checks that closures passed to
transformations like RDD.map() are serializable, before they're
executed. It does this by just serializing it with the JavaSerializer.

That's a nice feature, although there's overhead in always trying to
serialize the closure ahead of time, especially if the closure is
large. It shouldn't be large, usually. But I noticed it when coming up
with this fix: https://github.com/apache/spark/pull/23600

It made me wonder, should this be optional, or even not the default?
Closures that don't serialize still fail, just later when an action is
invoked. I don't feel strongly about it, just checking if anyone had
pondered this before.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to