[ https://issues.apache.org/jira/browse/IMPALA-11969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Quanlong Huang updated IMPALA-11969: ------------------------------------ Labels: catalog-2024 (was: ) > Move incremental statistics out of the partition params to reduce partition's > memory footprint > ---------------------------------------------------------------------------------------------- > > Key: IMPALA-11969 > URL: https://issues.apache.org/jira/browse/IMPALA-11969 > Project: IMPALA > Issue Type: Improvement > Components: Catalog, Frontend > Affects Versions: Impala 4.2.0 > Reporter: Aman Sinha > Assignee: Sai Hemanth Gantasala > Priority: Major > Labels: catalog-2024 > > Impala stores incremental stats ([1] and [2]) within the partition params of > each partition object. The total bytes consumed by the incremental stats is > estimated as: > {noformat} > 200 bytes per column * num_columns * num_partitions bytes > {noformat} > This means that for wide tables the partition object can get bloated and for > a few thousand partitions, this adds up. It also affects Hive because the > same partition objects are fetched by Hive and can lead to memory pressure > even though Hive does not need those incremental stats. > The intent of the partition parameters in a Partition object was primarily to > keep certain properties or small objects. We should move incremental stats > out of the partition params and store them separately in HMS. There is a > PART_COL_STATS in HMS that could potentially be used (with some redesign of > the schema) or a separate table could be added. > Additional notes: > * Only the latest version of incremental stats is stored. They are stored in > partition parameters using keys like "impala_intermediate_stats_num_chunks" > and "impala_intermediate_stats_chunk0", "impala_intermediate_stats_chunk1", > "impala_intermediate_stats_chunk2", etc. The chunks are used to cap each > value size to be < 4000. [3] > * Impala compresses the incr stats of all columns into a byte array for each > partition. It'd be nice if HMS can compress them as well to save space. > [1] > https://github.com/apache/impala/blob/master/common/thrift/CatalogObjects.thrift#L261 > [2] > https://github.com/apache/impala/blob/master/common/thrift/CatalogObjects.thrift#L232 > [3] > https://github.com/apache/impala/blob/419aa2e30db326f02e9b4ec563ef7864e82df86e/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java#L182 -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org