Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-66868725 IMO, this patch still needs a lot of work before it will be ready to merge. I'm not convinced that telling me which RDD referenced the unserializable object, by itself, will be a helpful debugging tool. In many cases, it's obvious which object is non-serializable: for instance, say that I try to serialize a database connection pool instance. If it's an explicitly-referenced user-created object, then it's usually not too hard to find out the source of the reference. The hard cases are where implicit references to non-serializable objects like SparkContext have been included in the closure. In these cases, I might only have one RDD in my dependency chain and still run into serialization issues, in which case I don't feel that this patch's current approach will be very helpful to me for debugging. It would be much more useful to print a chain of references to a non-serializable object of the appropriate type. Do you mind running some examples in the `spark-shell` and pasting the output generated by this patch? This would help me and other reviewers to asses whether this patch's current approach is useful. There are also many code style issues here, but I don't want to spend too much time commenting on them before we make sure that the high-level approach is okay. Other reviewers: please take a look at the JIRA and chime in here. Do you think that this patch's current functionality is useful, or should we block / wait in favor of a more full-featured solution? I think that we have plenty of time before 1.3.0, so I'm in favor of taking more time to implement a more full-featured approach since I don't think we're in a huge rush for this feature.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org