Vihang Karajgaonkar has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/13665 )
Change subject: IMPALA-8663 : FileMetadataLoader should skip listing files in hidden and tmp directories ...................................................................... IMPALA-8663 : FileMetadataLoader should skip listing files in hidden and tmp directories The FileMetadataLoader is used to load the file information in when the table is loaded. By default, it lists all the files in the table/partition directory. Currently, it only skips the filenames which are invalid (hidden files and ones starting with "_" etc). However, it does not skip the directories which are temporary or hidden. In case of Hive when data is inserted into a table, it creates a temporary staging directory which is a hidden directory under the table location. When the insert in hive is completed, such staging directories are removed. But if there is a refresh called during that time, FileMetadataLoader will add the files in the staging directory as well. Not only this could cause temporary invalid results but it causes table to go in a bad state when these temporary directories are removed. The only work-around in such a case to issue a refresh on the table again. This patch adds logic in the filemetadataloader to ignore such temporary staging directories. Unfortunately, hadoop does not provide a API which can recursively list files in a directory and skip certain directories. This patch addes this logic of filtering into existing RecursingIterator in FileSystemUtil. Testing: 1. Added a new test in FilemetadataloaderTest and modified existing ones in AcidUtils 2. Ran concurrent inserts from Hive while issuing refresh in a loop on Impala side. Earlier this would cause the table to go into a bad state. Now, it works fine for the staging directories. It still runs into a FileNotFoundException from the impalad when there are insert overwrite statements in Hive Change-Id: I2c4a22908304fe9e377d77d6c18d401c3f3294aa --- M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java 3 files changed, 89 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/13665/3 -- To view, visit http://gerrit.cloudera.org:8080/13665 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2c4a22908304fe9e377d77d6c18d401c3f3294aa Gerrit-Change-Number: 13665 Gerrit-PatchSet: 3 Gerrit-Owner: Vihang Karajgaonkar <vih...@cloudera.com> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org>