[
https://issues.apache.org/jira/browse/HIVE-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294004#comment-14294004
]
Sergey Shelukhin commented on HIVE-9418:
----------------------------------------
[~prasanth_j] - can you review
https://github.com/apache/hive/commit/c7242290923fdfdcadf4408a7ba3970fefac8d7c
? This is the first part of this JIRA, where "old" ORC path can hypothetically
use cache.
Esp. the ORC changes.
The idea is that DiskRange has 2 subclasses now, CacheChunk and old
BufferChunk; we create LinkedList of DiskRange-s to read; then pass them thru
cache and disk reader that replaces parts of ranges with CC-s and BC-s that
actually have the data, so in the end the list is DR-s that all have some sort
of data.
Cache has been changed accordingly.
I wonder if we even need metadata cache then. Whenever ORC goes for DR-s for
metadata (see where I pass null as cache for footer, index etc.) we could
instead also use cache, and just add some feature to have these blocks at much
higher priority so they are less likely to be evicted. That way you'll parse
them every time though, so for now Java-side cache might be better.
> LLAP: ORC production of encoded data, cache usage
> -------------------------------------------------
>
> Key: HIVE-9418
> URL: https://issues.apache.org/jira/browse/HIVE-9418
> Project: Hive
> Issue Type: Sub-task
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
>
> ORC needs to be able to read self-contained rowgroups and return them. It
> should use low-level cache in process. In future, we may use high-level cache
> to cache rowgroups instead
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)