Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13665 )

Change subject: IMPALA-8663 : FileMetadataLoader should skip listing files in 
hidden and tmp directories
......................................................................


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/13665/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

http://gerrit.cloudera.org:8080/#/c/13665/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@580
PS2, Line 580: /**
             :    * Predicate useful to filter out hidden directories and 
temporary staging directories
             :    * which tools like Hive create in the table/partition 
directories when a query is
             :    * inserting data into them.
             :    */
             :   public static final Predicate <FileStatus> 
HIDDEN_DIRECTORIES_PREDICATE = fileStatus -> {
             :     if (!fileStatus.isDirectory()) return false;
             :     String filename = fileStatus.getPath().getName();
             :     return filename.startsWith(".") || 
filename.startsWith("_tmp.");
             :   };
             :
             :   /**
             :    * Wrapper around FileSystem.listFiles(), similar to the 
listStatus() wrapper above.
             :    */
             :   public static RemoteIterator<? extends FileStatus> 
listFiles(FileSystem fs, Path p,
             :       boolean recursive) throws IOException {
             :     try {
             :       return new RemoteIteratorWithFilter(fs, p, recursive, 
false);
             :     } catch (FileNotFoundException e) {
             :       if (LOG.isWarnEnabled()) LOG.warn("Path does not exist: " 
+ p.toString(), e);
             :       return null;
             :     }
             :   }
             :
             :   /*
> You could also implement a Predicate<FileStatus> overriding test().
Done


http://gerrit.cloudera.org:8080/#/c/13665/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@656
PS2, Line 656:     private RemoteIteratorWithFilter(FileSystem fs, Path 
startPath,
> I think the refactor is confusing. RecursiingIterator can also have isRecur
Renamed it to RemoteIteratorWithFilter which hopefully is clearer


http://gerrit.cloudera.org:8080/#/c/13665/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@716
PS2, Line 716:
> nit: no braces
Done


http://gerrit.cloudera.org:8080/#/c/13665/2/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
File fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java:

http://gerrit.cloudera.org:8080/#/c/13665/2/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java@93
PS2, Line 93: hdfs://localhost:20500/
> Do we need these qualifiers? I think the Configuration should resolve it to
Don't think we need them. I just followed what was being done in the other 
tests of this class. I personally think its clearer than letting Configuration 
resolve it to the default FileSystem


http://gerrit.cloudera.org:8080/#/c/13665/2/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java@100
PS2, Line 100: dstFs.deleteOnExit(tmpTestPath);
> Do you need wrap the fs in try-with-resources for this to work?
don't think so. This will clean up the directory when the JVM exits. The other 
way to do this would be a finally block to explicitly remove the directory 
before exiting.


http://gerrit.cloudera.org:8080/#/c/13665/2/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java@112
PS2, Line 112: 24
> Instead, we could try loading sourcePath and substitute the values?
I am vary of using load() to test load() method's behavior. Seems like a 
anti-pattern.



--
To view, visit http://gerrit.cloudera.org:8080/13665
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2c4a22908304fe9e377d77d6c18d401c3f3294aa
Gerrit-Change-Number: 13665
Gerrit-PatchSet: 3
Gerrit-Owner: Vihang Karajgaonkar <vih...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com>
Gerrit-Comment-Date: Wed, 19 Jun 2019 21:43:26 +0000
Gerrit-HasComments: Yes

Reply via email to