[jira] [Assigned] (HIVE-20380) explore storing multiple CBs in a single cache buffer in LLAP cache

Sergey Shelukhin (JIRA) Tue, 21 Aug 2018 15:31:46 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-20380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sergey Shelukhin reassigned HIVE-20380:
---------------------------------------

    Assignee: Sergey Shelukhin

> explore storing multiple CBs in a single cache buffer in LLAP cache
> -------------------------------------------------------------------
>
>                 Key: HIVE-20380
>                 URL: https://issues.apache.org/jira/browse/HIVE-20380
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Major
>
> Lately ORC CBs are becoming ridiculously small. First there's the 4Kb minimum 
> (instead of 256Kb), then after we moved metadata cache off-heap, the index 
> streams that are all tiny take up a lot of CBs and waste space. 
> Wasted space can require larger cache and lead to cache OOMs on some 
> workloads.
> Reducing min.alloc solves this problem, but then there's a lot of heap (and 
> probably compute) overhead to track all these buffers. Arguably even the 4Kb 
> min.alloc is too small.
> We should store contiguous CBs in the same buffer; to start, we can do it for 
> ROW_INDEX streams. That probably means reading all ROW_INDEX streams instead 
> of doing projection when we see that they are too small.
> We need to investigate what the pattern is for ORC data blocks. One option is 
> to increase min.alloc and then consolidate multiple 4-8Kb CBs, but only for 
> the same stream. However larger min.alloc will result in wastage for really 
> small streams, so we can also consolidate multiple streams (potentially 
> across columns) if needed. This will result in some priority anomalies but 
> they probably ok.
> Another consideration is making tracking less object oriented, in particular 
> passing around integer indexes instead of objects and storing state in giant 
> arrays somewhere (potentially with some optimizations for less common 
> things), instead of every buffers getting its own object. 
> cc [~gopalv] [~prasanth_j]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-20380) explore storing multiple CBs in a single cache buffer in LLAP cache

Reply via email to