Hi litao,

Basically 
If total first time query takes x amount of time and in that y time takes for 
to connect to index server , cache and return, then with pre prime we can save 
this y time may if all the segments are not loaded then we can save less than Y 
time, so we will get the benefit, benchmark we can do later.

For data loading time, since we will do this loading to cache async way, ,it 
wont affect loading.

What you said about hadoop cache, i didn't get, can you please elaborate what 
exactly you mean by it.
 
About the command to load all tables, may be i will consider the feasibility 
and then include in design and implementation.
I will create sub jira task for , loading into cache after data load, 
configuration type of load and command. Then based on priority we can take up 
the tasks.

Regards,
Akash

On 2019/08/21 10:43:39, tao li <litao_xid...@126.com> wrote: 
> hi, akash
>     How much of the performance difference between the first and second 
> querys is affected by caching index and how much is affected by Hadoop 
> caching.
>     We should open it up and take a look at the time-consuming analysis on 
> the driver side.
> 
> On 2019/08/21 09:42:10, Akash Nilugal <akashnilu...@gmail.com> wrote: 
> > Hi Litao,
> > 
> > Initially with first time count(*) , it used to take around 32seconds as it 
> > used to load into cache, and second time query takes 1.5sec to 2 i think, 
> > so with pre-prime we can achieve more improvement in first time query.
> > 
> > Regards,
> > Akash
> > 
> > On 2019/08/21 03:03:55, tao li <litao_xid...@126.com> wrote: 
> > > hi Akash,
> > >       Before development, we need to know how much improvement can be 
> > > made to queries by caching part of the index in advance.
> > >       We need to compare the first and second query and analyze them. We 
> > > need to find time differences for several important steps.
> > >       It can analyze the performance improvement that can be brought by 
> > > caching part of the index in advance.
> > > 
> > > On 2019/08/15 12:03:09, Akash Nilugal <akashnilu...@gmail.com> wrote: 
> > > > Hi Community,
> > > > 
> > > > Currently, we have an index server which basically helps in distributed
> > > > caching of the datamaps in a separate spark application.
> > > > 
> > > > The caching of the datamaps in index server will start once the query is
> > > > fired on the table for the first time, all the datamaps will be loaded
> > > > 
> > > > if the count(*) is fired and only required will be loaded for any filter
> > > > query.
> > > > 
> > > > 
> > > > Here the problem or the bottleneck is, until and unless the query is 
> > > > fired
> > > > on table, the caching won’t be done for the table datamaps.
> > > > 
> > > > So consider a scenario where we are just loading the data to table for
> > > > whole day and then next day we query,
> > > > 
> > > > so all the segments will start loading into cache. So first time the 
> > > > query
> > > > will be slow.
> > > > 
> > > > 
> > > > What if we load the datamaps into cache or preprime the cache without
> > > > waititng for any query on the table?
> > > > 
> > > > Yes, what if we load the cache after every load is done, what if we load
> > > > the cache for all the segments at once,
> > > > 
> > > > so that first time query need not do all this job, which makes it 
> > > > faster.
> > > > 
> > > > 
> > > > Here i have attached the design document for the pre-priming of cache 
> > > > into
> > > > index server. Please have a look at it
> > > > 
> > > > and any suggestions or inputs on this are most welcomed.
> > > > 
> > > > 
> > > > https://drive.google.com/file/d/1YUpDUv7ZPUyZQQYwQYcQK2t2aBQH18PB/view?usp=sharing
> > > > 
> > > > 
> > > > 
> > > > Regards,
> > > > 
> > > > Akash R Nilugal
> > > > 
> > > 
> > 
> 

Reply via email to