[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

2018-08-09 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition 
when table is loaded
..

IMPALA-7320. Avoid calling getFileStatus() for each partition when table is 
loaded

Prior to this patch, when a table is first loaded, the catalog iterated
over each of the partition directories and called getFileStatus() on
each, serially, to determine the overall access level of the table.

In some testing, each such call took 1-2ms, so this could add many
seconds to the overall table load time for a table with thousands of
partitions and also add to the NN load.

This patch adds some batch pre-fetching of file status information: for
any parent directory which contains more than one partition, we use the
listStatus() API to fetch the FileStatus objects in bulk.

A new unit test verifies the number of API calls made to the NameNode
during a table load.

Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Reviewed-on: http://gerrit.cloudera.org:8080/11027
Tested-by: Impala Public Jenkins 
Reviewed-by: Todd Lipcon 
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A fe/src/main/java/org/apache/impala/util/FsPermissionCache.java
M fe/src/main/java/org/apache/impala/util/FsPermissionChecker.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
4 files changed, 218 insertions(+), 59 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Todd Lipcon: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 9
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

2018-08-09 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition 
when table is loaded
..


Patch Set 8: Code-Review+2

Forwarding +2


--
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 8
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 09 Aug 2018 21:39:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

2018-08-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition 
when table is loaded
..


Patch Set 8: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 8
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 09 Aug 2018 00:52:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

2018-08-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition 
when table is loaded
..


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2957/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 8
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 08 Aug 2018 21:39:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

2018-08-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition 
when table is loaded
..


Patch Set 6: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/2943/


--
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 6
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 07 Aug 2018 20:26:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

2018-08-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition 
when table is loaded
..


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2943/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 6
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 07 Aug 2018 17:10:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

2018-08-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition 
when table is loaded
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/143/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 01 Aug 2018 23:11:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

2018-08-01 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition 
when table is loaded
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/11027/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/11027/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1566
PS3, Line 1566: void
> nit: Java-style comment: Returning FsPermissionCache  (vs) passing the outp
Done


http://gerrit.cloudera.org:8080/#/c/11027/3/fe/src/main/java/org/apache/impala/util/FsPermissionCache.java
File fe/src/main/java/org/apache/impala/util/FsPermissionCache.java:

http://gerrit.cloudera.org:8080/#/c/11027/3/fe/src/main/java/org/apache/impala/util/FsPermissionCache.java@45
PS3, Line 45: FsPermissionChecker checker = FsPermissionChecker.getInstance();
> Make it a static class member may be?
this is already just an accessor for a static instance in FsPermissionChecker, 
so there isn't really any benefit to "caching" the result



--
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 01 Aug 2018 22:20:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

2018-07-27 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition 
when table is loaded
..


Patch Set 3: Code-Review+2

(3 comments)

http://gerrit.cloudera.org:8080/#/c/11027/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/11027/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1557
PS2, Line 1557:   }
> I added a simple heuristic here that compares size(msPartitions) vs the num
That looks reasonable to me.


http://gerrit.cloudera.org:8080/#/c/11027/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/11027/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1566
PS3, Line 1566: void
nit: Java-style comment: Returning FsPermissionCache  (vs) passing the output 
param by reference.

I don't see this discussed in the Google Java-style guide, but I have a feeling 
that the former is preferred. The latter appears more C++ style to me.


http://gerrit.cloudera.org:8080/#/c/11027/3/fe/src/main/java/org/apache/impala/util/FsPermissionCache.java
File fe/src/main/java/org/apache/impala/util/FsPermissionCache.java:

http://gerrit.cloudera.org:8080/#/c/11027/3/fe/src/main/java/org/apache/impala/util/FsPermissionCache.java@45
PS3, Line 45: FsPermissionChecker checker = FsPermissionChecker.getInstance();
Make it a static class member may be?



--
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 27 Jul 2018 06:41:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

2018-07-25 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition 
when table is loaded
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11027/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/11027/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1557
PS2, Line 1557: preloadPermissionsCache(msPartitions, permCache);
> Same question as above. Could this be an overkill if size(msPartitions) is
I added a simple heuristic here that compares size(msPartitions) vs the number 
of existing partitions, and only uses this trick if the number of new 
partitions is 3x the number of existing partitions. That should catch RECOVER 
PARTITIONS use case when basically constructing the entire hierarchy from an 
existing table path, but not affect "alter table add" or RECOVER when it just 
adds a few. Let me know what you think



--
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 26 Jul 2018 01:42:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

2018-07-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition 
when table is loaded
..


Patch Set 3:

Build Started https://jenkins.impala.io/job/gerrit-code-review-checks/69/

Running initial code review checks. This is experimental - please report any 
issues to tarmstr...@cloudera.com or on this JIRA: IMPALA-7317


--
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 26 Jul 2018 01:40:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

2018-07-25 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition 
when table is loaded
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11027/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/11027/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@957
PS2, Line 957: preloadPermissionsCache(msPartitions, permCache);
> Looking at the consumers of this, alterTableAddPartitions() and alterTableR
Good point about the incremental updates. Let me take this back to the drawing 
board and think about that.



--
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 25 Jul 2018 16:32:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

2018-07-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition 
when table is loaded
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/37/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 24 Jul 2018 19:40:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

2018-07-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition 
when table is loaded
..


Patch Set 2:

Build Started https://jenkins.impala.io/job/gerrit-code-review-checks/37/

Running initial code review checks. This is experimental - please report any 
issues to tarmstr...@cloudera.com or on this JIRA: IMPALA-7317


--
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 24 Jul 2018 18:46:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

2018-07-24 Thread Todd Lipcon (Code Review)
Hello Bharath Vissapragada, Tianyi Wang, Vuk Ercegovac, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11027

to look at the new patch set (#2).

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition 
when table is loaded
..

IMPALA-7320. Avoid calling getFileStatus() for each partition when table is 
loaded

Prior to this patch, when a table is first loaded, the catalog iterated
over each of the partition directories and called getFileStatus() on
each, serially, to determine the overall access level of the table.

In some testing, each such call took 1-2ms, so this could add many
seconds to the overall table load time for a table with thousands of
partitions and also add to the NN load.

This patch adds some batch pre-fetching of file status information: for
any parent directory which contains more than one partition, we use the
listStatus() API to fetch the FileStatus objects in bulk.

A new unit test verifies the number of API calls made to the NameNode
during a table load.

Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A fe/src/main/java/org/apache/impala/util/FsPermissionCache.java
M fe/src/main/java/org/apache/impala/util/FsPermissionChecker.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
4 files changed, 205 insertions(+), 59 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/11027/2
--
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

2018-07-24 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition 
when table is loaded
..


Patch Set 1:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@857
PS1, Line 857: HdfsPartition partition = 
createPartition(msPartition.getSd(), msPartition, permCache);
> longline
Done


http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1005
PS1, Line 1005: );
> Can you add the table name too.
Done


http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1015
PS1, Line 1015:* Throws CatalogException if one of the supplied storage 
descriptors contains metadata that
> nit: longline
Done


http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1026
PS1, Line 1026:   HdfsPartition hdfsPartition = 
createPartition(partition.getSd(), partition, permCache);
> longline
Done


http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1625
PS1, Line 1625:   HdfsPartition partition = 
createPartition(msPartition.getSd(), msPartition, permCache);
> longline
Done


http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1675
PS1, Line 1675: );
> , ioe);
Done


http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/util/FsPermissionCache.java
File fe/src/main/java/org/apache/impala/util/FsPermissionCache.java:

http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/util/FsPermissionCache.java@40
PS1, Line 40: FileSystem
> Technically not all paths need to be fully-qualified. I'll see if we can re
I looked through the source for FileStatus.getPath() and it seems to "do the 
right thing" here -- if the path that we listed is qualified, it returns a 
similarly-qualified path on the resulting children, so things should match up 
appropriately without having to include the FileSystem here.


http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/util/FsPermissionCache.java@45
PS1, Line 45: if (perms != null) {
:   return perms;
: }
> single line.
Done



--
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 24 Jul 2018 18:38:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

2018-07-24 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition 
when table is loaded
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/util/FsPermissionCache.java
File fe/src/main/java/org/apache/impala/util/FsPermissionCache.java:

http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/util/FsPermissionCache.java@40
PS1, Line 40: FileSystem
> qq. What is the use of having this as a part of map's key? Isn't it already
Technically not all paths need to be fully-qualified. I'll see if we can 
restrict this function to only take fully-qualified paths, or ensure 
canonicalization, etc.



--
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 24 Jul 2018 16:08:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

2018-07-24 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition 
when table is loaded
..


Patch Set 1:

(8 comments)

Patch lgtm overall. Bunch of nits and one clarification.

http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@857
PS1, Line 857: HdfsPartition partition = 
createPartition(msPartition.getSd(), msPartition, permCache);
longline


http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1005
PS1, Line 1005: );
Can you add the table name too.


http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1015
PS1, Line 1015:* Throws CatalogException if one of the supplied storage 
descriptors contains metadata that
nit: longline


http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1026
PS1, Line 1026:   HdfsPartition hdfsPartition = 
createPartition(partition.getSd(), partition, permCache);
longline


http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1625
PS1, Line 1625:   HdfsPartition partition = 
createPartition(msPartition.getSd(), msPartition, permCache);
longline


http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1675
PS1, Line 1675: );
, ioe);


http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/util/FsPermissionCache.java
File fe/src/main/java/org/apache/impala/util/FsPermissionCache.java:

http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/util/FsPermissionCache.java@40
PS1, Line 40: FileSystem
qq. What is the use of having this as a part of map's key? Isn't it already a 
part of Path's URI?


http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/util/FsPermissionCache.java@45
PS1, Line 45: if (perms != null) {
:   return perms;
: }
single line.



--
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 24 Jul 2018 06:54:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

2018-07-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition 
when table is loaded
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/25/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 24 Jul 2018 00:53:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

2018-07-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition 
when table is loaded
..


Patch Set 1:

Build Started https://jenkins.impala.io/job/gerrit-code-review-checks/25/

Running initial code review checks. This is experimental - please report any 
issues to tarmstr...@cloudera.com or on this JIRA: IMPALA-7317


--
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Mon, 23 Jul 2018 23:58:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

2018-07-23 Thread Todd Lipcon (Code Review)
Hello Bharath Vissapragada, Tianyi Wang, Vuk Ercegovac,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/11027

to review the following change.


Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition 
when table is loaded
..

IMPALA-7320. Avoid calling getFileStatus() for each partition when table is 
loaded

Prior to this patch, when a table is first loaded, the catalog iterated
over each of the partition directories and called getFileStatus() on
each, serially, to determine the overall access level of the table.

In some testing, each such call took 1-2ms, so this could add many
seconds to the overall table load time for a table with thousands of
partitions and also add to the NN load.

This patch adds some batch pre-fetching of file status information: for
any parent directory which contains more than one partition, we use the
listStatus() API to fetch the FileStatus objects in bulk.

A new unit test verifies the number of API calls made to the NameNode
during a table load.

Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A fe/src/main/java/org/apache/impala/util/FsPermissionCache.java
M fe/src/main/java/org/apache/impala/util/FsPermissionChecker.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
4 files changed, 200 insertions(+), 54 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/11027/1
--
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Vuk Ercegovac