Hello Bharath Vissapragada, Tianyi Wang, Vuk Ercegovac,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/11027

to review the following change.


Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition 
when table is loaded
......................................................................

IMPALA-7320. Avoid calling getFileStatus() for each partition when table is 
loaded

Prior to this patch, when a table is first loaded, the catalog iterated
over each of the partition directories and called getFileStatus() on
each, serially, to determine the overall access level of the table.

In some testing, each such call took 1-2ms, so this could add many
seconds to the overall table load time for a table with thousands of
partitions and also add to the NN load.

This patch adds some batch pre-fetching of file status information: for
any parent directory which contains more than one partition, we use the
listStatus() API to fetch the FileStatus objects in bulk.

A new unit test verifies the number of API calls made to the NameNode
during a table load.

Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A fe/src/main/java/org/apache/impala/util/FsPermissionCache.java
M fe/src/main/java/org/apache/impala/util/FsPermissionChecker.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
4 files changed, 200 insertions(+), 54 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/11027/1
--
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com>

Reply via email to