xkrogen opened a new pull request #33308:
URL: https://github.com/apache/spark/pull/33308


   ### What changes were proposed in this pull request?
   This unifies struct schema mismatch-handling logic between `AvroSerializer` 
and `AvroDeserializer`, pushing it into `AvroUtils` which is used by both. The 
newly unified exception-handling logic is updated to provide more contextual 
information in error messages. When a schema mismatch is found, previously we 
would only report the first missing field that is found, but there may be any 
others as well, which can make it less clear what exactly is going wrong. Now, 
we will report on all missing fields.
   
   ### Why are the changes needed?
   While working on #31490, we discussed that there is room for improvement in 
how schema mismatch errors are reported 
([comment1](https://github.com/apache/spark/pull/31490#discussion_r659970793), 
[comment2](https://github.com/apache/spark/pull/31490#issuecomment-869866848)). 
Additionally, the logic between `AvroSerializer` and `AvroDeserializer` was 
quite similar for handling these issues, but didn't share common code, causing 
duplication and making it harder to see exactly what differences existed 
between the two.
   
   ### Does this PR introduce _any_ user-facing change?
   Some error messages when matching Catalyst struct schemas against Avro 
record schemas now include more information.
   
   ### How was this patch tested?
   New unit tests added.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to