Takch02 commented on PR #4137:
URL: https://github.com/apache/amoro/pull/4137#issuecomment-4126678516

   Hi @xxubai, I added Hive-side validation, and as you mentioned, Hive can 
only see the files located in the directory of the first added file. The files 
in other directories become invisible to Hive, which causes the validation with 
all afterFiles to fail.
   
   ```
       int totalLiveFiles = 
Lists.newArrayList(baseStore.newScan().planFiles()).size();
       Assert.assertEquals(addFiles.size(), totalLiveFiles);
   
         Table hiveTable = TEST_HMS.getHiveClient()
                 .getTable(getMixedTable().id().getDatabase(), 
getMixedTable().id().getTableName());
         String hiveLocation = hiveTable.getSd().getLocation();
   
         List<DataFile> filesVisibleToHive = afterFiles.stream()
                 .filter(f -> f.path().toString().startsWith(hiveLocation))
                 .collect(Collectors.toList());
   
         UpdateHiveFilesTestHelpers.validateHiveTableValues(
                 TEST_HMS.getHiveClient(), getMixedTable(), 
filesVisibleToHive); // OK
   
         UpdateHiveFilesTestHelpers.validateHiveTableValues(
                 TEST_HMS.getHiveClient(), getMixedTable(), afterFiles);  // 
Failed
   ```
   
   Given this behavior, how should we handle this in the test? Should we expect 
a CannotAlterHiveLocationException here as well (similar to how partitioned 
tables behave when spanning multiple directories), or should we just assert 
that only a subset of files is visible to Hive to document the current 
limitation?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to