Bharath Vissapragada has uploaded a new patch set (#5). Change subject: IMPALA-4172/IMPALA-3653: Improvements to block metadata loading ......................................................................
IMPALA-4172/IMPALA-3653: Improvements to block metadata loading This patch improves the block metadata loading (locations and disk storage IDs) for partitioned and un-partitioned tables in the Catalog server. Without this patch: ------------------ We loop throuh each and every file in the table/partition directories and call getFileBlockLocations() on it to obtain the block metadata. This results in large no. of RPC calls to namenode, especially with tables with large no. of files/partitions. With this patch: --------------- We move the block metadata querying to use listStatus() call which accepts a directory as input and fetches the 'BlockLocation' objects for every file recursively in that directory. This improves the metadata loading in the following ways. - For non-partitioned tables, we query all the BlockLocations in a single RPC call in the base table directory and load the corresponding disk IDs. - For partitioned tables, we query the BlockLocations for all the partitions residing under the base table directories in a single RPC and then load every partition with non-default partition directory separately. Also, this patch does away with VolumeIds returned by the HDFS NN and uses the new StorageIDs returned by the BlockLocation class. These StorageIDs are UUID strings and hence are mapped to a per-node 0-based index as expected by the backend. In the upcoming versions of Hadoop APIs, getFileBlockStorageLocations() is deprecated and instead the listStatus() returns BlockLocations with storage IDs embedded. This patch makes use of this improvement to reduce an additional RPC to the NN to fetch the storage locations. Change-Id: Ie127658172e6e70dae441374530674a4ac9d5d26 --- A fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java 3 files changed, 387 insertions(+), 250 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/5148/5 -- To view, visit http://gerrit.cloudera.org:8080/5148 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie127658172e6e70dae441374530674a4ac9d5d26 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com>