[GitHub] [hudi] aditiwari01 opened a new issue #2675: [SUPPORT] Unable to query MOR table after schema evolution

GitBox Mon, 15 Mar 2021 02:42:32 -0700


aditiwari01 opened a new issue #2675:
URL: https://github.com/apache/hudi/issues/2675

As per HUDI confluence
(https://cwiki.apache.org/confluence/display/HUDI/FAQ#FAQ-What'sHudi'sschemaevolutionstory),
as long as schema is backward compatible, hudi will support seamless
read/writes.

However, when I try to add a new column to my MOR table, I can successfully
keep on writing but I can only read in read_optimised manner and not in
snapshot manner.

The snapshot query fails with **org.apache.avro.AvroTypeException:missing
required field newCol**.

Attaching sample spark-shell commands to reproduce the issue on dummy data:

[Hudi_sample_commands.txt](https://github.com/apache/hudi/files/6139970/Hudi_sample_commands.txt)

With some debugging the issue seems to be in:

https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieAvroDataBlock.java#L127

When we try to deserialize the older payloads into newer schema(with
nullable new column), it fails with the above error.

I tried a workaround wherein if (readerSchema != writerSchema), read as
writerSchema then convert the payload to readerSchema. This approach is working
fine for me in my POCs.

However, since Hudi guarnatees schema evolution, I would like to know if I'm
missing some config or is this a bug? And how does my workaround fits in case
if it's a bug? We have a usecase where we do not want to constraint on backword
compatible schema changes and we see MOR as viable fit.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] aditiwari01 opened a new issue #2675: [SUPPORT] Unable to query MOR table after schema evolution

Reply via email to