Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r23050776 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -789,6 +792,44 @@ class DAGScheduler( } } + /** + * Helper function to check whether an RDD and its dependencies are serializable. + * + * This hook is exposed here primarily for testing purposes. + * + * Note: This function is defined separately from the SerializationHelper.isSerializable() + * since DAGScheduler.isSerializable() is passed as a parameter to the RDDWalker class's graph + * traversal, which would otherwise require knowledge of the closureSerializer + * (which was undesirable). + * + * @param rdd - Rdd to attempt to serialize + * @return Array[SerializedRdd] - + * Return an array of Either objects indicating if serialization is successful. + * Each object represents the RDD or a dependency of the RDD + * Success: ByteBuffer - The serialized RDD + * Failure: String - The reason for the failure. + * + */ + def tryToSerializeRddDeps(rdd: RDD[_]): Array[RDDTrace] = { --- End diff -- I think initially it might be good to keep this private and just expose it as an internal utility that is triggered when we actually see serialization issues. Once we get some more experience with it in practice we can open up a debugging API.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org