Re: [DISCUSSION] Cache Pre Priming

Akash r Fri, 23 Aug 2019 19:33:28 -0700

On 2019/08/19 09:53:10, Manhua <[email protected]> wrote: 
> Hi Akash,
> 
> 1. cache will be full when loading is still running all the time. the reason 
> I mention the invalidation is to avoid case, specially, when cache is full 
> before all targeted index is loaded.
> 
> When server just starting, keeping pre-prime and swap out the earliest loaded 
> index is not good.
> Maybe pre-prime need to check the capacity of available cache before load 
> index, else stop pre-prime any more? 
> 
> 2. I think regex/wildcard is more flexible to use, 
> such as :
> *.* for all dbs and tables
> test.* for all tables in test db
> test.day_table_201908* for table has targeted prefix
> 
> 3. yes, you are right, fire a count(*) can do that.
> 
> 
> On 2019/08/19 09:23:06, Akash Nilugal <[email protected]> wrote: 
> > Hi manhua,
> > 
> > Thanks for the inputs.
> > 
> > 1. No need to take care separately to invalidate the cache, i agree that it
> > will have limit. Since we already have eviction policy, when next query
> > comes, whenever required, it will evict and load the segments required, so
> > better not to have a separate mechanism to invalidate cache during
> > pre-prime.
> > 
> > 2.
> > i. For configuration support of pre-prime, already we can have the database
> > name or table name, about the regex support, we will note it, and based on
> > other use case and impacts, i will update the design document.
> > ii. During load no need to load the table or read any configuration for
> > pre-prime. During load pre-prime, just take the current new segment and
> > load into cache.
> > 
> > 3. For command support, can you please explain with more use cases. Because
> > current index server startup will load, and when you say command, even if i
> > do count(*) also, that will load all the segments. So i think new command
> > won't be necessary.
> > 
> > Please get back for any clarifications or doubts.
> > 
> > Thanks
> > 
> > Regards,
> > Akash R Nilugal
> > 
> > On Fri, Aug 16, 2019, 4:26 PM Akash Nilugal <[email protected]> wrote:
> > 
> > > Hi All,
> > >
> > > I have raised a jira and attached the design doc there .please refer
> > >
> > > CARBONDATA - 3492
> > >
> > > Regards,
> > > Akash
> > >
> > > On Thu, Aug 15, 2019, 5:33 PM Akash Nilugal <[email protected]>
> > > wrote:
> > >
> > >> Hi Community,
> > >>
> > >> Currently, we have an index server which basically helps in distributed
> > >> caching of the datamaps in a separate spark application.
> > >>
> > >> The caching of the datamaps in index server will start once the query is
> > >> fired on the table for the first time, all the datamaps will be loaded
> > >>
> > >> if the count(*) is fired and only required will be loaded for any filter
> > >> query.
> > >>
> > >>
> > >> Here the problem or the bottleneck is, until and unless the query is
> > >> fired on table, the caching won’t be done for the table datamaps.
> > >>
> > >> So consider a scenario where we are just loading the data to table for
> > >> whole day and then next day we query,
> > >>
> > >> so all the segments will start loading into cache. So first time the
> > >> query will be slow.
> > >>
> > >>
> > >> What if we load the datamaps into cache or preprime the cache without
> > >> waititng for any query on the table?
> > >>
> > >> Yes, what if we load the cache after every load is done, what if we load
> > >> the cache for all the segments at once,
> > >>
> > >> so that first time query need not do all this job, which makes it faster.
> > >>
> > >>
> > >> Here i have attached the design document for the pre-priming of cache
> > >> into index server. Please have a look at it
> > >>
> > >> and any suggestions or inputs on this are most welcomed.
> > >>
> > >>
> > >>
> > >> https://drive.google.com/file/d/1YUpDUv7ZPUyZQQYwQYcQK2t2aBQH18PB/view?usp=sharing
> > >>
> > >>
> > >>
> > >> Regards,
> > >>
> > >> Akash R Nilugal
> > >>
> > >
> > 
> Hi Manhua,

1. You are right that size will be full at one point, and according to you if 
we stop pre-priming, then query will go and try to load cache and if it does 
not get the size,
it will evict and do, so even pre-prime does the same thing LRU will handle 
that for us. I will still think on this and let you know and if feasible i will 
update the design.

May be pre-priming we can stop once size is full, i 'll update this once 
finalised.


2. Wild card support is also fine according to your input, initial stage load 
and pre-prime is first and then regex support we can provide once after this.

Thank you for the suggestion
Re: [DISCUSSION] Cache Pre Priming

Reply via email to