Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13665 )
Change subject: IMPALA-8663 : FileMetadataLoader should skip listing files in hidden and tmp directories ...................................................................... Patch Set 3: (6 comments) http://gerrit.cloudera.org:8080/#/c/13665/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java: http://gerrit.cloudera.org:8080/#/c/13665/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@580 PS2, Line 580: /** : * Predicate useful to filter out hidden directories and temporary staging directories : * which tools like Hive create in the table/partition directories when a query is : * inserting data into them. : */ : public static final Predicate <FileStatus> HIDDEN_DIRECTORIES_PREDICATE = fileStatus -> { : if (!fileStatus.isDirectory()) return false; : String filename = fileStatus.getPath().getName(); : return filename.startsWith(".") || filename.startsWith("_tmp."); : }; : : /** : * Wrapper around FileSystem.listFiles(), similar to the listStatus() wrapper above. : */ : public static RemoteIterator<? extends FileStatus> listFiles(FileSystem fs, Path p, : boolean recursive) throws IOException { : try { : return new RemoteIteratorWithFilter(fs, p, recursive, false); : } catch (FileNotFoundException e) { : if (LOG.isWarnEnabled()) LOG.warn("Path does not exist: " + p.toString(), e); : return null; : } : } : : /* > You could also implement a Predicate<FileStatus> overriding test(). Done http://gerrit.cloudera.org:8080/#/c/13665/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@656 PS2, Line 656: private RemoteIteratorWithFilter(FileSystem fs, Path startPath, > I think the refactor is confusing. RecursiingIterator can also have isRecur Renamed it to RemoteIteratorWithFilter which hopefully is clearer http://gerrit.cloudera.org:8080/#/c/13665/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@716 PS2, Line 716: > nit: no braces Done http://gerrit.cloudera.org:8080/#/c/13665/2/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java File fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java: http://gerrit.cloudera.org:8080/#/c/13665/2/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java@93 PS2, Line 93: hdfs://localhost:20500/ > Do we need these qualifiers? I think the Configuration should resolve it to Don't think we need them. I just followed what was being done in the other tests of this class. I personally think its clearer than letting Configuration resolve it to the default FileSystem http://gerrit.cloudera.org:8080/#/c/13665/2/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java@100 PS2, Line 100: dstFs.deleteOnExit(tmpTestPath); > Do you need wrap the fs in try-with-resources for this to work? don't think so. This will clean up the directory when the JVM exits. The other way to do this would be a finally block to explicitly remove the directory before exiting. http://gerrit.cloudera.org:8080/#/c/13665/2/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java@112 PS2, Line 112: 24 > Instead, we could try loading sourcePath and substitute the values? I am vary of using load() to test load() method's behavior. Seems like a anti-pattern. -- To view, visit http://gerrit.cloudera.org:8080/13665 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2c4a22908304fe9e377d77d6c18d401c3f3294aa Gerrit-Change-Number: 13665 Gerrit-PatchSet: 3 Gerrit-Owner: Vihang Karajgaonkar <vih...@cloudera.com> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com> Gerrit-Comment-Date: Wed, 19 Jun 2019 21:43:26 +0000 Gerrit-HasComments: Yes