Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/11227 )
Change subject: IMPALA-7047. Refreshing partitions should not make an RPC per file ...................................................................... Patch Set 4: (3 comments) Addressed "reversed polarity" issue Bharath pointed out. The tests look like they need a bit of attention. http://gerrit.cloudera.org:8080/#/c/11227/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/11227/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@564 PS3, Line 564: // If there are no existing files in the partition, use the non-incremental path. : boolean useExistingFds = partition.hasFileDescriptors(); : FileMetadataLoadStats stats; : if (useExistingFds) { : > This looks incorrect. Shouldn't we swap the if and else? Agree. If we have at least one file status, then useExistingFds is true, and we can use the refresh version. http://gerrit.cloudera.org:8080/#/c/11227/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/11227/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@568 PS4, Line 568: stats = refreshFileMetadata(partDir, Collections.singletonList(partition)); Question about the implementation of refresh (that code is not in this patch.) It relies on an API that returns all statuses in one array. Do we ever have cases where that array could be too large? Should we revert to the (iterator-based) full load if the number of files in a directory is above some limit? 100K maybe? http://gerrit.cloudera.org:8080/#/c/11227/4/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java File fe/src/test/java/org/apache/impala/catalog/CatalogTest.java: http://gerrit.cloudera.org:8080/#/c/11227/4/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java@371 PS4, Line 371: assertEquals(0L, (long)opsCounts.getLong("op_get_file_block_locations")); Something is odd. This test did not fail for the "reverse polarity" bug that Bharath pointed out. And, it still passed after fixing the bug. Is maybe this test not doing what we think it is? -- To view, visit http://gerrit.cloudera.org:8080/11227 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2051b96599206164aaa06ecbdf64374c46eda956 Gerrit-Change-Number: 11227 Gerrit-PatchSet: 4 Gerrit-Owner: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Paul Rogers <par0...@yahoo.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com> Gerrit-Comment-Date: Tue, 27 Nov 2018 06:08:32 +0000 Gerrit-HasComments: Yes