Re: Dear community

2023-10-20 Thread Tao Li
Hi Liang,
 AI technology has broad prospects, and large model technology is in full 
swing
 If we combine AI technology to automatically tune Carbon's parameters, 
including some predictions, we will become more user-friendly
It's very visionary to exert force here。 

On 2023/10/19 07:58:14 Liang Chen wrote:
> As you know, Carbondata as datastore and dataformat already be quite good
> and mature.
> I want to create the thread via mailing list to open discuss what are the
> next milestones of carbondata project?
> One proposal from my side: we should consider how to integrate with AI
> computing engine?
> 
> Regards
> Liang
> 


Re: Invite Bo Xu as new release manager . Re: [ANNOUNCE] Bo Xu as new PMC for Apache CarbonData

2023-10-20 Thread Tao Li
Of course, Xu Bo is great. He is capable of playing this role!
I really agree

On 2023/10/18 07:25:39 Liang Chen wrote:
> Dear community
> 
> I would like to propose Bo xu as new PMC to take charge of next new release.
> 
> Regards
> Liang
> 
> Liang Chen  于2023年4月24日周一 20:57写道:
> 
> > *Hi *
> >
> >
> > *We are pleased to announce that Bo Xu as new PMC for Apache CarbonData.*
> >
> >
> > *Congrats to **Bo Xu**!*
> >
> >
> > *Apache CarbonData PMC*
> >
> 


Re: [ANNOUNCE] Kunal Kapoor as new PMC for Apache CarbonData

2020-04-02 Thread Tao Li
Congratulations Kunal ~~

On 2020/03/29 07:07:04, Liang Chen  wrote: 
> Hi
> 
> 
> We are pleased to announce that Kunal Kapoor as new PMC for Apache
> CarbonData.
> 
> 
> Congrats to Kunal Kapoor!
> 
> 
> Apache CarbonData PMC
> 


Re: [ANNOUNCE] Kunal Kapoor as new PMC for Apache CarbonData

2020-04-02 Thread Tao Li
Congratulations Kunal.




【Web Issues】show datamaps command should be show datamap

2019-09-02 Thread tao li
hi all,
under our web site 
https://carbondata.apache.org/datamap-management.html#datamap-management the 
descript of datamap may be one mistake.
  
"There is a SHOW DATAMAPS command",  does it should be SHOW DATAMAP?


Re: [DISCUSSION] Cache Pre Priming

2019-08-21 Thread tao li
hi akash 
count(*) can only load one table,if the table is very more, it is better we 
can have a command to trigger the cache load.

On 2019/08/21 09:42:10, Akash Nilugal  wrote: 
> Hi Litao,
> 
> Initially with first time count(*) , it used to take around 32seconds as it 
> used to load into cache, and second time query takes 1.5sec to 2 i think, so 
> with pre-prime we can achieve more improvement in first time query.
> 
> Regards,
> Akash
> 
> On 2019/08/21 03:03:55, tao li  wrote: 
> > hi Akash,
> >   Before development, we need to know how much improvement can be made 
> > to queries by caching part of the index in advance.
> >   We need to compare the first and second query and analyze them. We 
> > need to find time differences for several important steps.
> >   It can analyze the performance improvement that can be brought by 
> > caching part of the index in advance.
> > 
> > On 2019/08/15 12:03:09, Akash Nilugal  wrote: 
> > > Hi Community,
> > > 
> > > Currently, we have an index server which basically helps in distributed
> > > caching of the datamaps in a separate spark application.
> > > 
> > > The caching of the datamaps in index server will start once the query is
> > > fired on the table for the first time, all the datamaps will be loaded
> > > 
> > > if the count(*) is fired and only required will be loaded for any filter
> > > query.
> > > 
> > > 
> > > Here the problem or the bottleneck is, until and unless the query is fired
> > > on table, the caching won’t be done for the table datamaps.
> > > 
> > > So consider a scenario where we are just loading the data to table for
> > > whole day and then next day we query,
> > > 
> > > so all the segments will start loading into cache. So first time the query
> > > will be slow.
> > > 
> > > 
> > > What if we load the datamaps into cache or preprime the cache without
> > > waititng for any query on the table?
> > > 
> > > Yes, what if we load the cache after every load is done, what if we load
> > > the cache for all the segments at once,
> > > 
> > > so that first time query need not do all this job, which makes it faster.
> > > 
> > > 
> > > Here i have attached the design document for the pre-priming of cache into
> > > index server. Please have a look at it
> > > 
> > > and any suggestions or inputs on this are most welcomed.
> > > 
> > > 
> > > https://drive.google.com/file/d/1YUpDUv7ZPUyZQQYwQYcQK2t2aBQH18PB/view?usp=sharing
> > > 
> > > 
> > > 
> > > Regards,
> > > 
> > > Akash R Nilugal
> > > 
> > 
> 


Re: [DISCUSSION] Cache Pre Priming

2019-08-21 Thread tao li
hi, akash
How much of the performance difference between the first and second querys 
is affected by caching index and how much is affected by Hadoop caching.
We should open it up and take a look at the time-consuming analysis on the 
driver side.

On 2019/08/21 09:42:10, Akash Nilugal  wrote: 
> Hi Litao,
> 
> Initially with first time count(*) , it used to take around 32seconds as it 
> used to load into cache, and second time query takes 1.5sec to 2 i think, so 
> with pre-prime we can achieve more improvement in first time query.
> 
> Regards,
> Akash
> 
> On 2019/08/21 03:03:55, tao li  wrote: 
> > hi Akash,
> >   Before development, we need to know how much improvement can be made 
> > to queries by caching part of the index in advance.
> >   We need to compare the first and second query and analyze them. We 
> > need to find time differences for several important steps.
> >   It can analyze the performance improvement that can be brought by 
> > caching part of the index in advance.
> > 
> > On 2019/08/15 12:03:09, Akash Nilugal  wrote: 
> > > Hi Community,
> > > 
> > > Currently, we have an index server which basically helps in distributed
> > > caching of the datamaps in a separate spark application.
> > > 
> > > The caching of the datamaps in index server will start once the query is
> > > fired on the table for the first time, all the datamaps will be loaded
> > > 
> > > if the count(*) is fired and only required will be loaded for any filter
> > > query.
> > > 
> > > 
> > > Here the problem or the bottleneck is, until and unless the query is fired
> > > on table, the caching won’t be done for the table datamaps.
> > > 
> > > So consider a scenario where we are just loading the data to table for
> > > whole day and then next day we query,
> > > 
> > > so all the segments will start loading into cache. So first time the query
> > > will be slow.
> > > 
> > > 
> > > What if we load the datamaps into cache or preprime the cache without
> > > waititng for any query on the table?
> > > 
> > > Yes, what if we load the cache after every load is done, what if we load
> > > the cache for all the segments at once,
> > > 
> > > so that first time query need not do all this job, which makes it faster.
> > > 
> > > 
> > > Here i have attached the design document for the pre-priming of cache into
> > > index server. Please have a look at it
> > > 
> > > and any suggestions or inputs on this are most welcomed.
> > > 
> > > 
> > > https://drive.google.com/file/d/1YUpDUv7ZPUyZQQYwQYcQK2t2aBQH18PB/view?usp=sharing
> > > 
> > > 
> > > 
> > > Regards,
> > > 
> > > Akash R Nilugal
> > > 
> > 
> 


Re: [DISCUSSION] Cache Pre Priming

2019-08-20 Thread tao li
hi Akash,
  Before development, we need to know how much improvement can be made to 
queries by caching part of the index in advance.
  We need to compare the first and second query and analyze them. We need 
to find time differences for several important steps.
  It can analyze the performance improvement that can be brought by caching 
part of the index in advance.

On 2019/08/15 12:03:09, Akash Nilugal  wrote: 
> Hi Community,
> 
> Currently, we have an index server which basically helps in distributed
> caching of the datamaps in a separate spark application.
> 
> The caching of the datamaps in index server will start once the query is
> fired on the table for the first time, all the datamaps will be loaded
> 
> if the count(*) is fired and only required will be loaded for any filter
> query.
> 
> 
> Here the problem or the bottleneck is, until and unless the query is fired
> on table, the caching won’t be done for the table datamaps.
> 
> So consider a scenario where we are just loading the data to table for
> whole day and then next day we query,
> 
> so all the segments will start loading into cache. So first time the query
> will be slow.
> 
> 
> What if we load the datamaps into cache or preprime the cache without
> waititng for any query on the table?
> 
> Yes, what if we load the cache after every load is done, what if we load
> the cache for all the segments at once,
> 
> so that first time query need not do all this job, which makes it faster.
> 
> 
> Here i have attached the design document for the pre-priming of cache into
> index server. Please have a look at it
> 
> and any suggestions or inputs on this are most welcomed.
> 
> 
> https://drive.google.com/file/d/1YUpDUv7ZPUyZQQYwQYcQK2t2aBQH18PB/view?usp=sharing
> 
> 
> 
> Regards,
> 
> Akash R Nilugal
> 


Re: [DISCUSSION] Cache Pre Priming

2019-08-20 Thread tao li
  hi Akash, 
I have a few questions. 
1. About the ways to Pre-Prime: there are 2 ways. one is cache when data 
loading, another is when the cace server started. 
i think the latter is not desirable,because of load cache may take more 
it can course the cache server long time no response. For the first type need 
some data support. it may take more time then data loading cache the index 
data. Although threads are started, there will still be a lot of IO and 
computing overhead.It may slow down the data loading speed. so the first type 
need some detail data, How big is the data index file?How much impact does it 
have on loading? 
Should we provide a third way, the way of interface triggers? 
User-triggered cache loading can be provided. Users can specify when the system 
is free, such as triggering loading late at night. 
 2.About Configuration 
 Could you please give an example of the use of 
carbon.index.server.pre.prime. 
 3.About Datamap Table Loading or Child Table Loading to Cache 
  i think this point is very important, more detailed description is 
needed. such as the update and delete happen, how about the cache change. when 
drop or create new mv how about the cache changed?etc. 
  4.About Rebuild Command 
  what do we need to do when use this command, first clear the cache data, 
then loading the cache again? does this command can be executed many times。 
  5. About Compaction 
  Does like the rebuild before,we need to decide which cache should be 
clear and another segments's cache need be loaded?
On 2019/08/15 12:03:09, Akash Nilugal  wrote: 
> Hi Community,
> 
> Currently, we have an index server which basically helps in distributed
> caching of the datamaps in a separate spark application.
> 
> The caching of the datamaps in index server will start once the query is
> fired on the table for the first time, all the datamaps will be loaded
> 
> if the count(*) is fired and only required will be loaded for any filter
> query.
> 
> 
> Here the problem or the bottleneck is, until and unless the query is fired
> on table, the caching won’t be done for the table datamaps.
> 
> So consider a scenario where we are just loading the data to table for
> whole day and then next day we query,
> 
> so all the segments will start loading into cache. So first time the query
> will be slow.
> 
> 
> What if we load the datamaps into cache or preprime the cache without
> waititng for any query on the table?
> 
> Yes, what if we load the cache after every load is done, what if we load
> the cache for all the segments at once,
> 
> so that first time query need not do all this job, which makes it faster.
> 
> 
> Here i have attached the design document for the pre-priming of cache into
> index server. Please have a look at it
> 
> and any suggestions or inputs on this are most welcomed.
> 
> 
> https://drive.google.com/file/d/1YUpDUv7ZPUyZQQYwQYcQK2t2aBQH18PB/view?usp=sharing
> 
> 
> 
> Regards,
> 
> Akash R Nilugal
>