Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13665 )
Change subject: IMPALA-8663 : FileMetadataLoader should skip hidden and tmp directories ...................................................................... Patch Set 8: (6 comments) http://gerrit.cloudera.org:8080/#/c/13665/8/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java: http://gerrit.cloudera.org:8080/#/c/13665/8/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@541 PS8, Line 541: temporary directories the code now seems to also skip any files that match this pattern, not just directories. I think that's probably OK but we should be clear on the semantics. http://gerrit.cloudera.org:8080/#/c/13665/8/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@545 PS8, Line 545: all underlying files (except which are : * in the ignored directories) does this mean we no longer yield directories, and only yield files? does this not break the case of an empty base data dir, which actually has semantic value? http://gerrit.cloudera.org:8080/#/c/13665/8/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@568 PS8, Line 568: isS3AFileSystem(p) why'd you change from isS3AFileSystem(fs) to isS3AFileSystem(p)? In the case that 'p' isn't a fully qualified path, this will check the wrong filesystem, and we don't have any preconditions checking that 'p' must be fully qualified. http://gerrit.cloudera.org:8080/#/c/13665/8/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@568 PS8, Line 568: isS3AFileSystem perhaps we can add a @VisibleForTesting way we can make this path get used even for non-S3, so we can get test coverage here? http://gerrit.cloudera.org:8080/#/c/13665/8/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@626 PS8, Line 626: LISTING_TYPE style nit: enums should be named like classes (ListingType) http://gerrit.cloudera.org:8080/#/c/13665/8/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@699 PS8, Line 699: // if the current file is on a ignored path return early : if (isIgnoredPath(fileStatus)) return; how does this prevent recursion into tmp dirs in the recursive listFiles case? It seems like this is only checking the filename (last path component) and not intermediate path components, which we'd need to be doing to filter out contents of subdirectories for the "native recursion" -- To view, visit http://gerrit.cloudera.org:8080/13665 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2c4a22908304fe9e377d77d6c18d401c3f3294aa Gerrit-Change-Number: 13665 Gerrit-PatchSet: 8 Gerrit-Owner: Vihang Karajgaonkar <vih...@cloudera.com> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com> Gerrit-Comment-Date: Thu, 27 Jun 2019 06:46:34 +0000 Gerrit-HasComments: Yes