Takch02 commented on PR #4137: URL: https://github.com/apache/amoro/pull/4137#issuecomment-4154897514
> Hi @xxubai, I added Hive-side validation, and as you mentioned, Hive can only see the files located in the directory of the first added file. The files in other directories become invisible to Hive, which causes the validation with all afterFiles to fail. > > ``` > List<DataFile> afterFiles = HiveDataTestHelpers.lastedAddedFiles(baseStore); > > int totalLiveFiles = Lists.newArrayList(baseStore.newScan().planFiles()).size(); > Assert.assertEquals(addFiles.size(), totalLiveFiles); > > Table hiveTable = TEST_HMS.getHiveClient() > .getTable(getMixedTable().id().getDatabase(), getMixedTable().id().getTableName()); > String hiveLocation = hiveTable.getSd().getLocation(); > > List<DataFile> filesVisibleToHive = afterFiles.stream() > .filter(f -> f.path().toString().startsWith(hiveLocation)) > .collect(Collectors.toList()); > > UpdateHiveFilesTestHelpers.validateHiveTableValues( > TEST_HMS.getHiveClient(), getMixedTable(), filesVisibleToHive); // OK > > UpdateHiveFilesTestHelpers.validateHiveTableValues( > TEST_HMS.getHiveClient(), getMixedTable(), afterFiles); // Failed > ``` > > Given this behavior, how should we handle this in the test? Should we expect a CannotAlterHiveLocationException here as well (similar to how partitioned tables behave when spanning multiple directories), or should we just assert that only a subset of files is visible to Hive to document the current limitation? @xxubai Please review the PR! Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
