Hi All,

Currently in carbondata we have LRU caching feature for maintaing BTree and
Dictionary cache. This feature is helpful in low end systems where memory
is less or where the user want to have a control on the memory to be used
by carbondata system for caching.

In LRU cache for every key in cache map an atomic access count variable is
maintained which is incremented when a query accesses that key and
decremented after its usage is complete by that query.

There are many places where we access dictionary columns like decoding
values for result preparation, filter operation, data loading, etc and it
becomes a cumbersome process to maintain the access count during each
access at entry and exit level for the operation. If there is any
inconsistency in incrementing and decrementing access count, the
corresponding key from the caching map will never be cleared from the
memory and if space is not freed queries will start failing with
unavailable memory exception.

Therefore I suggest the following behavior.
1. Remove access count based removal from the caching framework and make
the framework properly LRU based removal.
2. Ensure that for one query BTree and dictionary cache is accessed atmost
once by driver and executor.
3. Fail the query if the size required by the dictionary column or BTree is
more than the size configured by the user for LRU cache. This is the
because the user should be clear that the size for caching need to be
increased and carbondata system is not taking any run time decision.

Please share your inputs on this.

Regards
Manish Gupta

Reply via email to