[GitHub] [hudi] kazdy commented on issue #8018: [SUPPORT] why is the schema evolution done while not setting hoodie.schema.on.read.enable

via GitHub Fri, 03 Mar 2023 03:48:36 -0800


kazdy commented on issue #8018:
URL: https://github.com/apache/hudi/issues/8018#issuecomment-1453414031


   @danny0405 isn't it that with `hoodie.schema.on.read.enable=false` Hudi 
fallbacks on the default "out of the box" schema evolution that uses avro 
schema resolution (which allows for new columns to be added at the end of the 
schema)?
   
   When it comes to reconciling schema, when I was playing with it in 0.10 and 
0.11 it was allowing wider schemas on write, but when incoming cols were 
missing then these were added to match "current" target schema. 
   So for me reconciling schema worked like this:
   wider schema -> accept as new table schema
   missing columns -> add missing columns to match current table schema
   
   There's a hacky way to prevent schema evolution.
   One can get schema from metastore or from file containing avro schema 
definition for the table, then read it in your spark job and pass it to 
https://hudi.apache.org/docs/configurations#hoodiewriteschema which should 
overwrite the writer schema and effectively drop new columns.
   
   Again, MERGE INTO stmt enforces the target table schema when writing records.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] kazdy commented on issue #8018: [SUPPORT] why is the schema evolution done while not setting hoodie.schema.on.read.enable

Reply via email to