Takch02 commented on PR #4137:
URL: https://github.com/apache/amoro/pull/4137#issuecomment-4122314457

   > Thanks for adding the non-partitioned coverage. I think the new assertions 
are still a bit weak for the expected-success cases.
   > 
   > For unpartitioned tables, the current implementation still updates the 
Hive table location to only the directory of the first added file. So if the 
rewritten/overwritten files are actually under multiple directories, the commit 
may succeed, but Hive may still see only part of the data.
   > 
   > Because of that, checking only `lastedAddedFiles()` and `planFiles()` does 
not seem sufficient here. These assertions show that the Iceberg commit 
succeeded, but they do not verify that the Hive-side result is actually correct.
   
   Thank you for the review!
   
   You are right. Currently, I was only checking the Iceberg assertion and not 
looking at the Hive side.
   
   I fully understand that in non-partitioned tables, if files exist in 
multiple directories, the Hive location might only point to the first folder, 
meaning only a portion of the data can be visible.
   
   I will add the actual data validation code for the Hive side and push it 
soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to