[
https://issues.apache.org/jira/browse/HIVE-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sergey Shelukhin updated HIVE-9269:
-----------------------------------
Fix Version/s: llap
> LLAP: introduce low-level cache for ORC
> ---------------------------------------
>
> Key: HIVE-9269
> URL: https://issues.apache.org/jira/browse/HIVE-9269
> Project: Hive
> Issue Type: Sub-task
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Fix For: llap
>
>
> There are two distinct options for caching encoded data in row-columnar
> format - caching logical chunks (e.g. for ORC stripe x column, or rg x
> column), or caching physical chunks (e.g. for ORC, compression buffers,
> entire stripes, ...). For highly selective queries, the former will probably
> result in better cache utilization and less undesirable priority phenomena.
> It will also be easier to use for different formats.
> However, given that logical chunks are variable-sized, it's harder to
> implement. Prototype has a form of cache like that, but it has some serious
> shortcomings in its current form. Additionally, high-level cache will operate
> above ACID logic in file format and would thus require cache invalidation,
> which is as we know one of the only hard things in CS.
> Low level cache for ORC case, however, is easier to implement due to nearly
> fixed uncompressed size of compression buffers; these, at 256k default, are
> also sufficiently granular. While not having the benefit of having ACID
> delta-s already merged like a high-level cache would have, it will work with
> ACID out of the box.
> This JIRA is to implement low level cache.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)