aditiwari01 opened a new issue #2675:
URL: https://github.com/apache/hudi/issues/2675


   As per HUDI confluence 
(https://cwiki.apache.org/confluence/display/HUDI/FAQ#FAQ-What'sHudi'sschemaevolutionstory),
   as long as schema is backward compatible, hudi will support seamless 
read/writes.
   
   However, when I try to add a new column to my MOR table, I can successfully 
keep on writing but I can only read in read_optimised manner and not in 
snapshot manner. 
   
   The snapshot query fails with **org.apache.avro.AvroTypeException:missing 
required field newCol**.
   
   Attaching sample spark-shell commands to reproduce the issue on dummy data:
   
[Hudi_sample_commands.txt](https://github.com/apache/hudi/files/6139970/Hudi_sample_commands.txt)
   
   With some debugging the issue seems to be in:
   
https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieAvroDataBlock.java#L127
   
   When we try to deserialize the older payloads into newer schema(with 
nullable new column), it fails with the above error.
   
   I tried a workaround wherein if (readerSchema != writerSchema), read as 
writerSchema then convert the payload to readerSchema. This approach is working 
fine for me in my POCs.
   
   However, since Hudi guarnatees schema evolution, I would like to know if I'm 
missing some config or is this a bug? And how does my workaround fits in case 
if it's a bug? We have a usecase where we do not want to constraint on backword 
compatible schema changes and we see MOR as viable fit.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to