[ https://issues.apache.org/jira/browse/SPARK-36919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tianhan Hu updated SPARK-36919: ------------------------------- Description: Migrating a Spark application from 2.4.x to 3.1.x and finding a difference in the exception chaining behavior. In a case of parsing a malformed CSV, where the root cause exception should be {{Caused by: java.lang.RuntimeException: Malformed CSV record}}, only the top level exception is kept, and all lower level exceptions and root cause are lost. Thus, when we call {{ExceptionUtils.getRootCause}} on the exception, we still get itself. The reason for the difference is that {{RuntimeException}} is wrapped in {{BadRecordException}}, which has unserializable fields. When we try to serialize the exception from tasks and deserialize from scheduler, the exception is lost. This PR makes unserializable fields of {{BadRecordException}} transient, so the rest of the exception could be serialized and deserialized properly. was: Migrating a Spark application from 2.4.x to 3.1.x and finding a difference in the exception chaining behavior. In a case of parsing a malformed CSV, where the root cause exception should be {{Caused by: org.apache.spark.sql.catalyst.csv.MalformedCSVException: Malformed CSV record}}, only the top level exception is kept, and all lower level exceptions and root cause are lost. Thus, when we call {{ExceptionUtils.getRootCause}} on the exception, we still get itself. The reason for the difference is that {{MalformedCSVException}} is now wrapped in {{BadRecordException}}, which has unserializable fields. When we try to serialize the exception from tasks and deserialize from scheduler, the exception is lost. This PR makes unserializable fields of {{BadRecordException}} transient, so the rest of the exception could be serialized and deserialized properly. > Make BadRecordException serializable > ------------------------------------ > > Key: SPARK-36919 > URL: https://issues.apache.org/jira/browse/SPARK-36919 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 3.2.0, 3.3.0, 3.2.1 > Reporter: Tianhan Hu > Priority: Minor > > Migrating a Spark application from 2.4.x to 3.1.x and finding a difference in > the exception chaining behavior. In a case of parsing a malformed CSV, where > the root cause exception should be {{Caused by: java.lang.RuntimeException: > Malformed CSV record}}, only the top level exception is kept, and all lower > level exceptions and root cause are lost. Thus, when we call > {{ExceptionUtils.getRootCause}} on the exception, we still get itself. > The reason for the difference is that {{RuntimeException}} is wrapped in > {{BadRecordException}}, which has unserializable fields. When we try to > serialize the exception from tasks and deserialize from scheduler, the > exception is lost. > This PR makes unserializable fields of {{BadRecordException}} transient, so > the rest of the exception could be serialized and deserialized properly. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org