[ https://issues.apache.org/jira/browse/IMPALA-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dimitris Tsirogiannis reassigned IMPALA-3173: --------------------------------------------- Assignee: (was: Dimitris Tsirogiannis) > Reduce catalog's memory footprint > --------------------------------- > > Key: IMPALA-3173 > URL: https://issues.apache.org/jira/browse/IMPALA-3173 > Project: IMPALA > Issue Type: Improvement > Components: Catalog > Affects Versions: Impala 2.2.4 > Reporter: Dimitris Tsirogiannis > Priority: Critical > Labels: catalog-server, performance, usability > > An initial analysis of catalog's heap dumps shows that we can probably reduce > it's memory footprint by: a) avoid storing redundant information about > catalog entities such as partitions, b) using more compressed data > structures. > Currently, for a table with 2 int columns and 1 int partition column and > without incremental stats, we use: > * *~930B* per partition out of which ~500B are used on hmsParameters_ > (<String, String>Map), ~190B on cachedMsPartitionDescriptor_, and ~200B > (depending on path) on location. > * *~800B* per file descriptor out of which ~530B go to file_blocks and the > rest are used for storing the file_name. > * Every HdfsTable also uses two maps that replicate partition locations and > file names (e.g. perPartitionFileDescMap_ and nameToPartitionMap_). > A table like that with 100,000 partitions and 10 files per partition requires > 1GB and 1.4GB of memory w and w/o incremental stats, respectively. > This is a parent JIRA of IMPALA-2840. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org