aditiwari01 opened a new issue #2675: URL: https://github.com/apache/hudi/issues/2675
As per HUDI confluence (https://cwiki.apache.org/confluence/display/HUDI/FAQ#FAQ-What'sHudi'sschemaevolutionstory), as long as schema is backward compatible, hudi will support seamless read/writes. However, when I try to add a new column to my MOR table, I can successfully keep on writing but I can only read in read_optimised manner and not in snapshot manner. The snapshot query fails with **org.apache.avro.AvroTypeException:missing required field newCol**. Attaching sample spark-shell commands to reproduce the issue on dummy data: [Hudi_sample_commands.txt](https://github.com/apache/hudi/files/6139970/Hudi_sample_commands.txt) With some debugging the issue seems to be in: https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieAvroDataBlock.java#L127 When we try to deserialize the older payloads into newer schema(with nullable new column), it fails with the above error. I tried a workaround wherein if (readerSchema != writerSchema), read as writerSchema then convert the payload to readerSchema. This approach is working fine for me in my POCs. However, since Hudi guarnatees schema evolution, I would like to know if I'm missing some config or is this a bug? And how does my workaround fits in case if it's a bug? We have a usecase where we do not want to constraint on backword compatible schema changes and we see MOR as viable fit. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org