Takch02 commented on PR #4137: URL: https://github.com/apache/amoro/pull/4137#issuecomment-4122314457
> Thanks for adding the non-partitioned coverage. I think the new assertions are still a bit weak for the expected-success cases. > > For unpartitioned tables, the current implementation still updates the Hive table location to only the directory of the first added file. So if the rewritten/overwritten files are actually under multiple directories, the commit may succeed, but Hive may still see only part of the data. > > Because of that, checking only `lastedAddedFiles()` and `planFiles()` does not seem sufficient here. These assertions show that the Iceberg commit succeeded, but they do not verify that the Hive-side result is actually correct. Thank you for the review! You are right. Currently, I was only checking the Iceberg assertion and not looking at the Hive side. I fully understand that in non-partitioned tables, if files exist in multiple directories, the Hive location might only point to the first folder, meaning only a portion of the data can be visible. I will add the actual data validation code for the Hive side and push it soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
