Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13665 )

Change subject: IMPALA-8663 : FileMetadataLoader should skip hidden and tmp 
directories
......................................................................


Patch Set 11:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/13665/8/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

http://gerrit.cloudera.org:8080/#/c/13665/8/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@545
PS8, Line 545:
             :    * If 'recursive' is true, all
> Actually, I added this line in the doc since I noticed that the RecursingIt
hrm, were the semantics already broken then? See the unit test 
AcidUtilsTest.testAcidStateNoBase:

    assertFiltering(new String[]{
            "base_01.txt",
            "post_upgrade.txt",
            "delta_000005_000005_0000/",
            "delta_000005_000005_0000/lmn.txt",
            "base_000010/",
            "delta_0000012_0000012_0000/",
            "delta_0000012_0000012_0000/0000_0",
            "delta_0000012_0000012_0000/0000_1"},
        "", // writeIdList that accepts all transactions as valid
        new String[]{
            // Post upgrade files are ignored if there is a valid base.
            "delta_0000012_0000012_0000/0000_0",
            "delta_0000012_0000012_0000/0000_1"});


ie here the empty base directory has important semantics. Maybe we don't have a 
valid e2e test case to reproduce this yet because of this Hive bug: 
https://issues.apache.org/jira/browse/HIVE-21750 but it certainly seems like 
this is an issue. Let's file a JIRA to make sure we validate this case before 
releasing this feature.


http://gerrit.cloudera.org:8080/#/c/13665/11/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

http://gerrit.cloudera.org:8080/#/c/13665/11/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@731
PS11, Line 731:       // we use listFiles API to natively recurse on S3 for 
performance reasons, so we
Are we really gaining much from using this RecursiveIteratorWithFilter class 
for the S3 case now? It seems like this is getting to be a bit of spaghetti, 
with this being named "Recursive" but having a flag that makes it non-recursive.

What about just having a separate class like 'FilteringIterator' which wraps 
RemoteIterator and does filtering, and keep this one as always-recursive, so 
the code path is easier to follow?


http://gerrit.cloudera.org:8080/#/c/13665/11/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@736
PS11, Line 736:         LOG.info("Ignoring {} since its contained in a ignored 
directory",
This probably shouldn't be at INFO level



--
To view, visit http://gerrit.cloudera.org:8080/13665
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2c4a22908304fe9e377d77d6c18d401c3f3294aa
Gerrit-Change-Number: 13665
Gerrit-PatchSet: 11
Gerrit-Owner: Vihang Karajgaonkar <vih...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com>
Gerrit-Comment-Date: Mon, 01 Jul 2019 22:40:45 +0000
Gerrit-HasComments: Yes

Reply via email to