[Discussion] Support SegmentLevel MinMax for better Pruning and less driver memory usage

Indhumathi Tue, 14 Jan 2020 00:37:45 -0800

Hello all,

In Cloud scenarios, index is too big to store in SparkDriver, since VM may
not have so much memory.
Currently in Carbon, we will load all indexes to cache for first time query.
Since Carbon LRU Cache does 
not support time-based expiration, indexes will be removed from cache based
on LeastRecentlyUsed mechanism,
when the carbon lru cache is full.


In some scenarios, where user's table has more segments and if user queries
only very few segments often, we no
need to load all indexes to cache. For filter queries, if we prune and load
only matched segments to cache, 
then driver's memory will be saved.

For this purpose, I am planing to add block minmax to segment metadata file
and prune segment based on segment files and
load index only for matched segment. As part of this, will add a
configurable carbon property '*carbon.load.all.index.to.cache*' 
to allow user to load all indexes to cache if needed. BY default, value will
be true.

Currently, for each load, we will write a segment metadata file, while holds
the information about indexFile. 
During query, we will read each segmentFile for getting indexFileInfo and
then we will load all datamaps for the segment.
MinMax data will be encoded and stored in segment file.

Any suggestions/inputs from the community is appreciated.

Thanks
Indhumathi



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

[Discussion] Support SegmentLevel MinMax for better Pruning and less driver memory usage

Reply via email to