Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13665 )

Change subject: IMPALA-8663 : FileMetadataLoader should skip hidden and tmp 
directories
......................................................................


Patch Set 8:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/13665/8/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

http://gerrit.cloudera.org:8080/#/c/13665/8/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@541
PS8, Line 541: temporary directories
the code now seems to also skip any files that match this pattern, not just 
directories. I think that's probably OK but we should be clear on the semantics.


http://gerrit.cloudera.org:8080/#/c/13665/8/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@545
PS8, Line 545: all underlying files (except which are
             :    * in the ignored directories)
does this mean we no longer yield directories, and only yield files? does this 
not break the case of an empty base data dir, which actually has semantic value?


http://gerrit.cloudera.org:8080/#/c/13665/8/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@568
PS8, Line 568: isS3AFileSystem(p)
why'd you change from isS3AFileSystem(fs) to isS3AFileSystem(p)? In the case 
that 'p' isn't a fully qualified path, this will check the wrong filesystem, 
and we don't have any preconditions checking that 'p' must be fully qualified.


http://gerrit.cloudera.org:8080/#/c/13665/8/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@568
PS8, Line 568: isS3AFileSystem
perhaps we can add a @VisibleForTesting way we can make this path get used even 
for non-S3, so we can get test coverage here?


http://gerrit.cloudera.org:8080/#/c/13665/8/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@626
PS8, Line 626: LISTING_TYPE
style nit: enums should be named like classes (ListingType)


http://gerrit.cloudera.org:8080/#/c/13665/8/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@699
PS8, Line 699:       // if the current file is on a ignored path return early
             :       if (isIgnoredPath(fileStatus)) return;
how does this prevent recursion into tmp dirs in the recursive listFiles case? 
It seems like this is only checking the filename (last path component) and not 
intermediate path components, which we'd need to be doing to filter out 
contents of subdirectories for the "native recursion"



--
To view, visit http://gerrit.cloudera.org:8080/13665
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2c4a22908304fe9e377d77d6c18d401c3f3294aa
Gerrit-Change-Number: 13665
Gerrit-PatchSet: 8
Gerrit-Owner: Vihang Karajgaonkar <vih...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com>
Gerrit-Comment-Date: Thu, 27 Jun 2019 06:46:34 +0000
Gerrit-HasComments: Yes

Reply via email to