prashantwason commented on issue #2331:
URL: https://github.com/apache/hudi/issues/2331#issuecomment-746959114


   I think the distinction is in UPDATE use-cases. Consider this scenario:
   
   t1: Insert with Schema 1: file1.parquet is created and records have schema1
   
   t2: Update with Schema 2: Suppose schema 2 has 1 field deleted. A single 
record is being updated. This will lead to file1.parquet being read and 
re-written (after update of the single record) into file2.parquet. But all 
records in file2.parquet would no have the deleted field.
   
   Another scenario is possible where the deleted field is later added back 
with a different incompatible "type" (e.g. an int field was deleted and another 
field with same name but "string" type was added). This schema will have issues 
reading historical data within the dataset which was written with older schema.
   
   If you want to delete field within a HUDI dataset, it may be simpler to copy 
the dataset using a new schema.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to