[ 
https://issues.apache.org/jira/browse/IMPALA-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dimitris Tsirogiannis reassigned IMPALA-3173:
---------------------------------------------

    Assignee:     (was: Dimitris Tsirogiannis)

> Reduce catalog's memory footprint
> ---------------------------------
>
>                 Key: IMPALA-3173
>                 URL: https://issues.apache.org/jira/browse/IMPALA-3173
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>    Affects Versions: Impala 2.2.4
>            Reporter: Dimitris Tsirogiannis
>            Priority: Critical
>              Labels: catalog-server, performance, usability
>
> An initial analysis of catalog's heap dumps shows that we can probably reduce 
> it's memory footprint by: a) avoid storing redundant information about 
> catalog entities such as partitions, b) using more compressed data 
> structures.  
> Currently, for a table with 2 int columns and 1 int partition column and 
> without incremental stats, we use:
> * *~930B* per partition out of which ~500B are used on hmsParameters_ 
> (<String, String>Map),  ~190B on cachedMsPartitionDescriptor_,  and ~200B 
> (depending on path) on location.
> * *~800B* per file descriptor out of which ~530B go to file_blocks and the 
> rest are used for storing the file_name.
> * Every HdfsTable also uses two maps that replicate partition locations and 
> file names (e.g. perPartitionFileDescMap_ and nameToPartitionMap_). 
> A table like that with 100,000 partitions and 10 files per partition requires 
> 1GB and 1.4GB of memory w and w/o incremental stats, respectively. 
> This is a parent JIRA of IMPALA-2840.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to