Quanlong Huang created HIVE-26893:
-------------------------------------
Summary: Extend get partitions APIs to ignore partition schemas
Key: HIVE-26893
URL: https://issues.apache.org/jira/browse/HIVE-26893
Project: Hive
Issue Type: New Feature
Components: Metastore
Reporter: Quanlong Huang
There are several HMS APIs that return a list of partitions, e.g.
get_partitions_ps(), get_partitions_by_names(), add_partitions_req() with
needResult=true, etc. Each partition instance will have a unique list of
FieldSchemas as the partition schema:
{code:java}
org.apache.hadoop.hive.metastore.api.Partition
-> org.apache.hadoop.hive.metastore.api.StorageDescriptor
-> cols: list<org.apache.hadoop.hive.metastore.api.FieldSchema> {code}
This could occupy a large memory footprint for wide tables (e.g. with 2k cols).
See the heap histogram in IMPALA-11812 as an example.
Some engines like Impala doesn't actually use/respect the partition level
schema. It's a waste of network/serde resource to transmit them. It'd be nice
if these APIs provide an optional boolean flag for ignoring partition schemas.
So HMS clients (e.g. Impala) don't need to clear them later (to save mem).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)