Hi, I come up with following ideas:

1. Although index server can provide more memory to hold the cache for index 
data, its space still has a limit. 

So cache managment(especially cache invalid) should be paid attention if we 
Pre-Prime during data load or start of index server which easily fill up memory 
of index server as time goes by. 

2.  Pre-Prime is an extended optimization, and it should be focus more on what 
want to optimize. 

So, about the cache way for pre-prime, I think the configuration can support a 
regex/wildcard match list: 

- During start of index server, check and pre-prime matched EXISTED table; 
- During data load, check and pre-prime matched NEW table or NEW segment; 

This can lighten the workload, keeping targeted table cached  in case of swap 
out when many index loaded to cache 

3. Cache command can be another ways to Pre-Prime, manually. For test or embed 
in code.



On 2019/08/16 10:56:33, Akash Nilugal <[email protected]> wrote: 
> Hi All,
> 
> I have raised a jira and attached the design doc there .please refer
> 
> CARBONDATA - 3492
> 
> Regards,
> Akash
> 
> On Thu, Aug 15, 2019, 5:33 PM Akash Nilugal <[email protected]> wrote:
> 
> > Hi Community,
> >
> > Currently, we have an index server which basically helps in distributed
> > caching of the datamaps in a separate spark application.
> >
> > The caching of the datamaps in index server will start once the query is
> > fired on the table for the first time, all the datamaps will be loaded
> >
> > if the count(*) is fired and only required will be loaded for any filter
> > query.
> >
> >
> > Here the problem or the bottleneck is, until and unless the query is fired
> > on table, the caching won’t be done for the table datamaps.
> >
> > So consider a scenario where we are just loading the data to table for
> > whole day and then next day we query,
> >
> > so all the segments will start loading into cache. So first time the query
> > will be slow.
> >
> >
> > What if we load the datamaps into cache or preprime the cache without
> > waititng for any query on the table?
> >
> > Yes, what if we load the cache after every load is done, what if we load
> > the cache for all the segments at once,
> >
> > so that first time query need not do all this job, which makes it faster.
> >
> >
> > Here i have attached the design document for the pre-priming of cache into
> > index server. Please have a look at it
> >
> > and any suggestions or inputs on this are most welcomed.
> >
> >
> >
> > https://drive.google.com/file/d/1YUpDUv7ZPUyZQQYwQYcQK2t2aBQH18PB/view?usp=sharing
> >
> >
> >
> > Regards,
> >
> > Akash R Nilugal
> >
> 

Reply via email to