Sergey Shelukhin created HIVE-9805:
--------------------------------------
Summary: LLAP: consider specialized "transient" metadata cache
Key: HIVE-9805
URL: https://issues.apache.org/jira/browse/HIVE-9805
Project: Hive
Issue Type: Sub-task
Reporter: Sergey Shelukhin
Fix For: llap
Due to the nature of cache now (metadata cache + disk cache), when data is read
from ORC, whole bunch of processing is still done with metadata, columns,
streams, contexts, offsets, etc. to get the data that is in cache. Essentially
only the disk reads are eliminated, everything else is as if we are reading an
unknown file.
We could have a better metadata representation that is saved during first read
- for example, (file, stripe) -> DiskRange[] (incl. cache buffers that are not
locked) + multi-dimensional array per column per stream per RG pointing to
offsets in DiskRange array.
That way if such structure is found in cache, reader can avoid all the
calculation and just do dumb conversion into results to pass to decoder plus
disk reading for missing parts.
This java cache cannot figure in the main data eviction policy so it should be
small. With java objects no cache locking is needed, we can evict while someone
is still using the structure, and it will be GCed
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)