Re: [DISCUSSION] Cache Pre Priming

Akash Nilugal Wed, 21 Aug 2019 02:39:41 -0700

On 2019/08/21 02:39:45, tao li <litao_xid...@126.com> wrote: 
>   hi Akash, 
>     I have a few questions. 
>     1. About the ways to Pre-Prime: there are 2 ways. one is cache when data 
> loading， another is when the cace server started. 
>         i think the latter is not desirable，because of load cache may take 
> more it can course the cache server long time no response. For the first type 
> need some data support. it may take more time then data loading cache the 
> index data. Although threads are started, there will still be a lot of IO and 
> computing overhead.It may slow down the data loading speed. so the first type 
> need some detail data, How big is the data index file？How much impact does it 
> have on loading? 
>         Should we provide a third way, the way of interface triggers? 
> User-triggered cache loading can be provided. Users can specify when the 
> system is free, such as triggering loading late at night. 
>      2.About Configuration 
>      Could you please give an example of the use of 
> carbon.index.server.pre.prime. 
>      3.About Datamap Table Loading or Child Table Loading to Cache 
>       i think this point is very important, more detailed description is 
> needed. such as the update and delete happen, how about the cache change. 
> when drop or create new mv how about the cache changed?etc. 
>       4.About Rebuild Command 
>       what do we need to do when use this command, first clear the cache 
> data, then loading the cache again? does this command can be executed many 
> times。 
>       5. About Compaction 
>       Does like the rebuild before,we need to decide which cache should be 
> clear and another segments's cache need be loaded?
> On 2019/08/15 12:03:09, Akash Nilugal <akashnilu...@gmail.com> wrote: 
> > Hi Community,
> > 
> > Currently, we have an index server which basically helps in distributed
> > caching of the datamaps in a separate spark application.
> > 
> > The caching of the datamaps in index server will start once the query is
> > fired on the table for the first time, all the datamaps will be loaded
> > 
> > if the count(*) is fired and only required will be loaded for any filter
> > query.
> > 
> > 
> > Here the problem or the bottleneck is, until and unless the query is fired
> > on table, the caching won’t be done for the table datamaps.
> > 
> > So consider a scenario where we are just loading the data to table for
> > whole day and then next day we query,
> > 
> > so all the segments will start loading into cache. So first time the query
> > will be slow.
> > 
> > 
> > What if we load the datamaps into cache or preprime the cache without
> > waititng for any query on the table?
> > 
> > Yes, what if we load the cache after every load is done, what if we load
> > the cache for all the segments at once,
> > 
> > so that first time query need not do all this job, which makes it faster.
> > 
> > 
> > Here i have attached the design document for the pre-priming of cache into
> > index server. Please have a look at it
> > 
> > and any suggestions or inputs on this are most welcomed.
> > 
> > 
> > https://drive.google.com/file/d/1YUpDUv7ZPUyZQQYwQYcQK2t2aBQH18PB/view?usp=sharing
> > 
> > 
> > 
> > Regards,
> > 
> > Akash R Nilugal
> > 
> Hi Litao,

1. I think i didnt understand the point you are teling about the first way, we 
just load the only segment loaded in that load and not all the segments, so it 
will not affect the load performance much. And second way of configurations, it 
will be in configured way right, so only if configured it will load, else you 
can leave it for query to take care.

you said the third way, which is user interface to run at night, or less 
traffic time. It is like running count(*) at night right, no need to expose any 
extra operation for that.

2. About the configuration, it is like configure the value for this property 
like the way said in the main chain. So it will load into cache based on that 
values.

3. About this point i have updated the design document with more description, 
please refer jira for it and get back for any clarifications.

4. rebuid command is only helpful to build lazy mv datamap, currently we have 
only MV as lazy and as well as non lazy datamap, remaining all are non-lazy, so 
whenever rebuild is called, if the MV is not in sync with main table segments, 
it will load that data to MV and load this new MV segment to cache.

5. As already explained in design document, once compaction is done, we will 
invalidated the compacted segments from cache and load the new segment into 
cache.

please get back for any clarifications or inputs.

Thanks,

Akash R
Re: [DISCUSSION] Cache Pre Priming

Reply via email to