CTTY commented on PR #9071:
URL: https://github.com/apache/hudi/pull/9071#issuecomment-1613955565

   @danny0405 Hi Danny, thanks for taking a look
   
   We found when Hudi uses `AwsGlueCatalogSyncTool` to sync schema changes to 
Glue, it only changes table schema without cascading partition level schema. 
But this behavior is actually expected because we never implemented cascading 
behavior for `AwsGlueCatalogSyncClient` 
[LOC](https://github.com/apache/hudi/blob/dc3aa399ffc4875abba7be5833ebabca222eb6ff/hudi-aws/src/main/java/org/apache/hudi/aws/sync/AWSGlueCatalogSyncClient.java#L333)
   
   This would cause problems when users change their schema later on. Because 
the schema changes it not cascaded, only newer partitions would use the new 
schema and older schema would still have old schema in Glue. Then when users 
use engines like Athena that's aware of partition-level schema to query Glue 
catalog it would seem the older partition is not readable due to failures 
described here: [Athena partition schema mismatch 
errors](https://docs.aws.amazon.com/athena/latest/ug/updates-and-partitions.html)
 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to