Hi, I come up with following ideas: 1. Although index server can provide more memory to hold the cache for index data, its space still has a limit.
So cache managment(especially cache invalid) should be paid attention if we Pre-Prime during data load or start of index server which easily fill up memory of index server as time goes by. 2. Pre-Prime is an extended optimization, and it should be focus more on what want to optimize. So, about the cache way for pre-prime, I think the configuration can support a regex/wildcard match list: - During start of index server, check and pre-prime matched EXISTED table; - During data load, check and pre-prime matched NEW table or NEW segment; This can lighten the workload, keeping targeted table cached in case of swap out when many index loaded to cache 3. Cache command can be another ways to Pre-Prime, manually. For test or embed in code. On 2019/08/16 10:56:33, Akash Nilugal <[email protected]> wrote: > Hi All, > > I have raised a jira and attached the design doc there .please refer > > CARBONDATA - 3492 > > Regards, > Akash > > On Thu, Aug 15, 2019, 5:33 PM Akash Nilugal <[email protected]> wrote: > > > Hi Community, > > > > Currently, we have an index server which basically helps in distributed > > caching of the datamaps in a separate spark application. > > > > The caching of the datamaps in index server will start once the query is > > fired on the table for the first time, all the datamaps will be loaded > > > > if the count(*) is fired and only required will be loaded for any filter > > query. > > > > > > Here the problem or the bottleneck is, until and unless the query is fired > > on table, the caching won’t be done for the table datamaps. > > > > So consider a scenario where we are just loading the data to table for > > whole day and then next day we query, > > > > so all the segments will start loading into cache. So first time the query > > will be slow. > > > > > > What if we load the datamaps into cache or preprime the cache without > > waititng for any query on the table? > > > > Yes, what if we load the cache after every load is done, what if we load > > the cache for all the segments at once, > > > > so that first time query need not do all this job, which makes it faster. > > > > > > Here i have attached the design document for the pre-priming of cache into > > index server. Please have a look at it > > > > and any suggestions or inputs on this are most welcomed. > > > > > > > > https://drive.google.com/file/d/1YUpDUv7ZPUyZQQYwQYcQK2t2aBQH18PB/view?usp=sharing > > > > > > > > Regards, > > > > Akash R Nilugal > > >
