Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11027 )
Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded ...................................................................... Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/11027/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/11027/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@897 PS2, Line 897: this update? http://gerrit.cloudera.org:8080/#/c/11027/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@957 PS2, Line 957: preloadPermissionsCache(msPartitions, permCache); Looking at the consumers of this, alterTableAddPartitions() and alterTableRecoverPartitions(), I have a feeling that this optimization in this codepath is an overkill (for large tables, say 100k partitions and every partition under tableDir). It is probably useful in alterTableRecoverPartitions() where we could add large number of partitions in a single batch, but probably not in alterTableAddPartitions(). Also, both these methods only add partitions in batches of MAX_PARTITION_UPDATES_PER_RPC (500). So we could ideally we repeating RPCs in preloadPermissionsCache for ceil(num_parts/ MAX_PARTITION_UPDATES_PER_RPC) times. Thoughts? http://gerrit.cloudera.org:8080/#/c/11027/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1557 PS2, Line 1557: preloadPermissionsCache(msPartitions, permCache); Same question as above. Could this be an overkill if size(msPartitions) is small? Say we are refreshing 10 partitions in a 100k partition table? Looking at the listStatusIterator() Impl, it seems to be using some RemoteIterator impl that does batching of RPCs. I couldn't figure out what is the basis for batching, but I get a feeling that the number of RPCs could be > 1. Additionally, we create a bunch of temporary objects for each partition and that could affect GC Just curious if I'm getting it right and what your thoughts are. -- To view, visit http://gerrit.cloudera.org:8080/11027 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734 Gerrit-Change-Number: 11027 Gerrit-PatchSet: 2 Gerrit-Owner: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com> Gerrit-Comment-Date: Wed, 25 Jul 2018 06:34:16 +0000 Gerrit-HasComments: Yes