The ClosureCleaner proactively checks that closures passed to transformations like RDD.map() are serializable, before they're executed. It does this by just serializing it with the JavaSerializer.
That's a nice feature, although there's overhead in always trying to serialize the closure ahead of time, especially if the closure is large. It shouldn't be large, usually. But I noticed it when coming up with this fix: https://github.com/apache/spark/pull/23600 It made me wonder, should this be optional, or even not the default? Closures that don't serialize still fail, just later when an action is invoked. I don't feel strongly about it, just checking if anyone had pondered this before. --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org