Sergey Shelukhin created HIVE-9269:
--------------------------------------
Summary: LLAP: introduce low-level cache for ORC
Key: HIVE-9269
URL: https://issues.apache.org/jira/browse/HIVE-9269
Project: Hive
Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
There are two distinct options for caching encoded data in row-columnar format
- caching logical chunks (e.g. for ORC stripe x column, or rg x column), or
caching physical chunks (e.g. for ORC, compression buffers, entire stripes,
...). For highly selective queries, the former will probably result in better
cache utilization and less undesirable priority phenomena. It will also be
easier to use for different formats.
However, given that logical chunks are variable-sized, it's harder to
implement. Prototype has a form of cache like that, but it has some serious
shortcomings in its current form. Additionally, high-level cache will operate
above ACID logic in file format and would thus require cache invalidation,
which is as we know one of the only hard things in CS.
Low level cache for ORC case, however, is easier to implement due to nearly
fixed uncompressed size of compression buffers; these, at 256k default, are
also sufficiently granular. While not having the benefit of having ACID delta-s
already merged like a high-level cache would have, it will work with ACID out
of the box.
This JIRA is to implement low level cache.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)