Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13665 )
Change subject: IMPALA-8663 : FileMetadataLoader should skip hidden and tmp directories ...................................................................... Patch Set 11: (3 comments) http://gerrit.cloudera.org:8080/#/c/13665/8/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java: http://gerrit.cloudera.org:8080/#/c/13665/8/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@545 PS8, Line 545: : * If 'recursive' is true, all > Actually, I added this line in the doc since I noticed that the RecursingIt hrm, were the semantics already broken then? See the unit test AcidUtilsTest.testAcidStateNoBase: assertFiltering(new String[]{ "base_01.txt", "post_upgrade.txt", "delta_000005_000005_0000/", "delta_000005_000005_0000/lmn.txt", "base_000010/", "delta_0000012_0000012_0000/", "delta_0000012_0000012_0000/0000_0", "delta_0000012_0000012_0000/0000_1"}, "", // writeIdList that accepts all transactions as valid new String[]{ // Post upgrade files are ignored if there is a valid base. "delta_0000012_0000012_0000/0000_0", "delta_0000012_0000012_0000/0000_1"}); ie here the empty base directory has important semantics. Maybe we don't have a valid e2e test case to reproduce this yet because of this Hive bug: https://issues.apache.org/jira/browse/HIVE-21750 but it certainly seems like this is an issue. Let's file a JIRA to make sure we validate this case before releasing this feature. http://gerrit.cloudera.org:8080/#/c/13665/11/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java: http://gerrit.cloudera.org:8080/#/c/13665/11/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@731 PS11, Line 731: // we use listFiles API to natively recurse on S3 for performance reasons, so we Are we really gaining much from using this RecursiveIteratorWithFilter class for the S3 case now? It seems like this is getting to be a bit of spaghetti, with this being named "Recursive" but having a flag that makes it non-recursive. What about just having a separate class like 'FilteringIterator' which wraps RemoteIterator and does filtering, and keep this one as always-recursive, so the code path is easier to follow? http://gerrit.cloudera.org:8080/#/c/13665/11/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@736 PS11, Line 736: LOG.info("Ignoring {} since its contained in a ignored directory", This probably shouldn't be at INFO level -- To view, visit http://gerrit.cloudera.org:8080/13665 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2c4a22908304fe9e377d77d6c18d401c3f3294aa Gerrit-Change-Number: 13665 Gerrit-PatchSet: 11 Gerrit-Owner: Vihang Karajgaonkar <vih...@cloudera.com> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com> Gerrit-Comment-Date: Mon, 01 Jul 2019 22:40:45 +0000 Gerrit-HasComments: Yes