Takch02 commented on PR #4137:
URL: https://github.com/apache/amoro/pull/4137#issuecomment-4126678516
Hi @xxubai, I added Hive-side validation, and as you mentioned, Hive can
only see the files located in the directory of the first added file. The files
in other directories become invisible to Hive, which causes the validation with
all afterFiles to fail.
```
int totalLiveFiles =
Lists.newArrayList(baseStore.newScan().planFiles()).size();
Assert.assertEquals(addFiles.size(), totalLiveFiles);
Table hiveTable = TEST_HMS.getHiveClient()
.getTable(getMixedTable().id().getDatabase(),
getMixedTable().id().getTableName());
String hiveLocation = hiveTable.getSd().getLocation();
List<DataFile> filesVisibleToHive = afterFiles.stream()
.filter(f -> f.path().toString().startsWith(hiveLocation))
.collect(Collectors.toList());
UpdateHiveFilesTestHelpers.validateHiveTableValues(
TEST_HMS.getHiveClient(), getMixedTable(),
filesVisibleToHive); // OK
UpdateHiveFilesTestHelpers.validateHiveTableValues(
TEST_HMS.getHiveClient(), getMixedTable(), afterFiles); //
Failed
```
Given this behavior, how should we handle this in the test? Should we expect
a CannotAlterHiveLocationException here as well (similar to how partitioned
tables behave when spanning multiple directories), or should we just assert
that only a subset of files is visible to Hive to document the current
limitation?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]