Hello Aman Sinha, Vihang Karajgaonkar, Impala Public Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/17505 to look at the new patch set (#3). Change subject: IMPALA-7501: Slim down partition meta in LocalCatalog mode ...................................................................... IMPALA-7501: Slim down partition meta in LocalCatalog mode In LocalCatalog mode, the coordinator caches HMS partition objects in its local cache. HMS partition contains many fields that Impala won't used, and some fields that are suboptimal, most of them are in the StorageDescriptor: - partition-level schema (list<FieldSchema>) which is never used. - location string which can be prefix-compressed since the prefix is usually the table location. - input/outputFormat string which is always duplicated and can be represented by an integer(enum). An experiment on a large table with 478 columns and 87320 partitions (one non-empty file per partition) shows that more than 90% of the memory space (1.1GB) are occupied by these fields. The dominant part is the partition-level schema which consumed 76% of the cache. On the other hand, these unused or suboptimal fields are got in one response from catalogd, wrapped in TPartialPartitionInfo which finally belongs to a TGetPartialCatalogObjectResponse. They dramatically increase the serialized thrift object size of the response, which has a 2GB array size limit in JVM. Fetching meta of many partitions from catalogd could cause it runs into OOM error that hits the 2GB limit (e.g. IMPALA-9896). This patch extracts the HMS partition object and replaces it with the fields that Impala actually uses. In the LocalCatalog cache, the HMS partition object is replaced with - hms parameters - write id - HdfsStorageDescriptor which represents the input/output format and some delimiters. - prefix-compressed location The hms_partition field of TPartialPartitionInfo is also extracted with corresponding fields. However, CatalogHmsAPIHelper still requires the whole hms partition object. So the hms_partition field is kept for its usage. To distinguish the different requirements, we add a new field, want_hms_partition in TTableInfoSelector. The existing 'want_partition_metadata' field means returning these extracted fields, and the 'want_hms_partition' field means returning the whole HMS partition object. Improvement results in the above case: - reduce the heap usage from 1.1GB to 113.2MB, objects from 41m to 2.3m - reduce the response size from 1.7GB to 28.41MB. Tests: - Run local-catalog related tests locally - Run CORE tests Change-Id: I307e7a8193b54a7b3ab93d9ebd194766bbdbd977 --- M be/src/runtime/descriptors.cc M common/thrift/CatalogObjects.thrift M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/GetPartialCatalogObjectRequestBuilder.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsStorageDescriptor.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java M fe/src/test/java/org/apache/impala/catalog/local/CatalogdMetaProviderTest.java 18 files changed, 279 insertions(+), 134 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/17505/3 -- To view, visit http://gerrit.cloudera.org:8080/17505 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I307e7a8193b54a7b3ab93d9ebd194766bbdbd977 Gerrit-Change-Number: 17505 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com>