[ 
https://issues.apache.org/jira/browse/HIVE-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14682158#comment-14682158
 ] 

Alan Gates commented on HIVE-11500:
-----------------------------------

I am guessing that this will be the first of several things we'd like to cache 
in the metastore.  Rather than build specific calls in the already seriously 
bloated RawStore interface for each type of cache would it be possible to build 
an interface something like this in the metastore:

{code}
interface CacheFilter { boolean include(String key, byte[] value); }
putInCache(String cacheName, String key, byte[] serializedProtoBuf);
getFromCache(String cacheName, String key);
getFromCacheWithFilter(String cacheName, CacheFilter filter);
deleteFromCache(String cacheName, String key);
{code}

On the ACID incompatibility, is it just because of PPD not working with ACID?  
There is a plan to address that soon.  If there's more than that we need to 
discuss and see how to resolve this.

On cache cleaning, how can the cleaner now when an entry is stale?  We should 
be able to use the active threads in the ACID compactor to do tasks like cache 
cleaning and caching new entries.

> implement file footer / splits cache in HBase metastore
> -------------------------------------------------------
>
>                 Key: HIVE-11500
>                 URL: https://issues.apache.org/jira/browse/HIVE-11500
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Metastore
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HBase metastore split cache.pdf
>
>
> We need to cache file metadata (e.g. ORC file footers) for split generation 
> (which, on FSes that support fileId, will be valid permanently and only needs 
> to be removed lazily when ORC file is erased or compacted), and potentially 
> even some information about splits (e.g. grouping based on location that 
> would be good for some short time), in HBase metastore.
> -It should be queryable by table. Partition predicate pushdown should be 
> supported. If bucket pruning is added, that too.- Given that we cannot cache 
> file lists (we have to check FS for new/changed files anyway), and the 
> difficulty of passing of data about partitions/etc. to split generation 
> compared to paths, we will probably just filter by paths and fileIds. It 
> might be different for splits
> In later phases, it would be nice to save the (first category above) results 
> of expensive work done by jobs, e.g. data size after decompression/decoding 
> per column, etc. to avoid surprises when ORC encoding is very good, or very 
> bad. Perhaps it can even be lazily generated. Here's a pony: 🐴



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to