Hello Bharath Vissapragada, Tianyi Wang, Impala Public Jenkins, Vuk Ercegovac,
I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11027 to look at the new patch set (#4). Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded ...................................................................... IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded Prior to this patch, when a table is first loaded, the catalog iterated over each of the partition directories and called getFileStatus() on each, serially, to determine the overall access level of the table. In some testing, each such call took 1-2ms, so this could add many seconds to the overall table load time for a table with thousands of partitions and also add to the NN load. This patch adds some batch pre-fetching of file status information: for any parent directory which contains more than one partition, we use the listStatus() API to fetch the FileStatus objects in bulk. A new unit test verifies the number of API calls made to the NameNode during a table load. Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734 --- M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java A fe/src/main/java/org/apache/impala/util/FsPermissionCache.java M fe/src/main/java/org/apache/impala/util/FsPermissionChecker.java M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java 4 files changed, 218 insertions(+), 59 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/11027/4 -- To view, visit http://gerrit.cloudera.org:8080/11027 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734 Gerrit-Change-Number: 11027 Gerrit-PatchSet: 4 Gerrit-Owner: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com>