Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition 
when table is loaded
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/11027/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/11027/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@897
PS2, Line 897: this
update?


http://gerrit.cloudera.org:8080/#/c/11027/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@957
PS2, Line 957:     preloadPermissionsCache(msPartitions, permCache);
Looking at the consumers of this, alterTableAddPartitions() and 
alterTableRecoverPartitions(), I have a feeling that this optimization in this 
codepath is an overkill (for large tables, say 100k partitions and every 
partition under tableDir).

It is probably useful in alterTableRecoverPartitions() where we could add large 
number of partitions in a single batch, but probably not in 
alterTableAddPartitions().

Also, both these methods only add partitions in batches of 
MAX_PARTITION_UPDATES_PER_RPC (500). So we could ideally we repeating RPCs in 
preloadPermissionsCache for ceil(num_parts/ MAX_PARTITION_UPDATES_PER_RPC) 
times.

Thoughts?


http://gerrit.cloudera.org:8080/#/c/11027/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1557
PS2, Line 1557:     preloadPermissionsCache(msPartitions, permCache);
Same question as above. Could this be an overkill if size(msPartitions) is 
small? Say we are refreshing 10 partitions in a 100k partition table?

Looking at the listStatusIterator() Impl, it seems to be using some 
RemoteIterator impl that does batching of RPCs. I couldn't figure out what is 
the basis for batching, but I get a feeling that the number of RPCs could be > 
1.

Additionally, we create a bunch of temporary objects for each partition and 
that could affect GC

Just curious if I'm getting it right and what your thoughts are.



--
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com>
Gerrit-Comment-Date: Wed, 25 Jul 2018 06:34:16 +0000
Gerrit-HasComments: Yes

Reply via email to