Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13665 )
Change subject: IMPALA-8663 : FileMetadataLoader should skip hidden and tmp directories ...................................................................... Patch Set 7: (8 comments) http://gerrit.cloudera.org:8080/#/c/13665/6/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java: http://gerrit.cloudera.org:8080/#/c/13665/6/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@545 PS6, Line 545: * If 'recursive' is true, all underlying files (except which are > this javadoc should be updated to indicate that tmp/hidden files and direct Done http://gerrit.cloudera.org:8080/#/c/13665/6/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@566 PS6, Line 566: // even though it returns Locate > It looks like this will break the above-mentioned optimization. In the case hmm.. yeah thats true. Thanks for catching that. I think this method is a bit confusing or rather the name of the method is confusing. In my opinion its cleaner if listStatus does just what it says and the caller invokes listStatus v/s listFiles dependending of if its S3 file system or not. However, I can see why it is done this way so that we don't inadvertently introduce a inefficient code path if we don't know about this S3 perf optimization. http://gerrit.cloudera.org:8080/#/c/13665/6/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@576 PS6, Line 576: > The naming and this comment are a bit opposite from each other. The predica agreed. Changed this to a simple static method called isIgnoredDirectory http://gerrit.cloudera.org:8080/#/c/13665/6/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@580 PS6, Line 580: * which tools like Hive create in the table/partition directories when a query is > is this ever used as a Predicate? It seems like in this file it's only used Done http://gerrit.cloudera.org:8080/#/c/13665/6/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@592 PS6, Line 592: boolean recursive) throws IOException { > nit: add an inline comment like this: /*useListStatus=*/false Changed the input arguments to use a Enum for get the LISTING_TYPE http://gerrit.cloudera.org:8080/#/c/13665/6/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@633 PS6, Line 633: > nit: weird spacing that's inconsistent with most of the java code in impala Done http://gerrit.cloudera.org:8080/#/c/13665/6/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@681 PS6, Line 681: // state) : while (curFile_ == null) { : if (curIter_.hasNext()) { : // Consume the next file or directory from the current iterator. : handleFileStat(curIter_.next()); : } else if (!iters_.empty()) { : // We ran out of entries in the current one, but we might still have : > IMO this javadoc is a bit redundant with the implementation and maybe not n agreed. removed it http://gerrit.cloudera.org:8080/#/c/13665/6/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@694 PS6, Line 694: > this should say "Do not", right? Done -- To view, visit http://gerrit.cloudera.org:8080/13665 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2c4a22908304fe9e377d77d6c18d401c3f3294aa Gerrit-Change-Number: 13665 Gerrit-PatchSet: 7 Gerrit-Owner: Vihang Karajgaonkar <vih...@cloudera.com> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com> Gerrit-Comment-Date: Tue, 25 Jun 2019 22:50:40 +0000 Gerrit-HasComments: Yes