prashantwason commented on issue #2331: URL: https://github.com/apache/hudi/issues/2331#issuecomment-746959114
I think the distinction is in UPDATE use-cases. Consider this scenario: t1: Insert with Schema 1: file1.parquet is created and records have schema1 t2: Update with Schema 2: Suppose schema 2 has 1 field deleted. A single record is being updated. This will lead to file1.parquet being read and re-written (after update of the single record) into file2.parquet. But all records in file2.parquet would no have the deleted field. Another scenario is possible where the deleted field is later added back with a different incompatible "type" (e.g. an int field was deleted and another field with same name but "string" type was added). This schema will have issues reading historical data within the dataset which was written with older schema. If you want to delete field within a HUDI dataset, it may be simpler to copy the dataset using a new schema. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org