CTTY commented on PR #9071: URL: https://github.com/apache/hudi/pull/9071#issuecomment-1613955565
@danny0405 Hi Danny, thanks for taking a look We found when Hudi uses `AwsGlueCatalogSyncTool` to sync schema changes to Glue, it only changes table schema without cascading partition level schema. But this behavior is actually expected because we never implemented cascading behavior for `AwsGlueCatalogSyncClient` [LOC](https://github.com/apache/hudi/blob/dc3aa399ffc4875abba7be5833ebabca222eb6ff/hudi-aws/src/main/java/org/apache/hudi/aws/sync/AWSGlueCatalogSyncClient.java#L333) This would cause problems when users change their schema later on. Because the schema changes it not cascaded, only newer partitions would use the new schema and older schema would still have old schema in Glue. Then when users use engines like Athena that's aware of partition-level schema to query Glue catalog it would seem the older partition is not readable due to failures described here: [Athena partition schema mismatch errors](https://docs.aws.amazon.com/athena/latest/ug/updates-and-partitions.html) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org