Re: [Discussion] Support pre-aggregate table to improve OLAP performance

2017-10-15 Thread Jacky Li
Hi Lu Cao,

In my previous experience on “cube” engine, no matter it is ROLAP or MOLAP, it 
is something above SQL layer, because it not only need user to establish cube 
schema by transform metadata from datawarehouse star schema but also the engine 
defines its own query language like MDX, and many times these languages are not 
standardized so that different vendor need to provide different BI tools or 
adaptors for it. 
So, although some vendor provides easy-to-use cube management tool, but it at 
least has two problems: vendor locking and the rigid of the cube mode once it 
defines. I think these problems are similar as in other vendor specific 
solution.

Currently one of the strength that carbon store provides is that it complies to 
standard SQL support by integrating with SparkSQL, Hive, etc. The intention of 
providing pre-aggregate table support is, it can enable carbon improve OLAP 
query performance but still stick with standard SQL support, it means all users 
still can use the same BI/JDBC application/tool which can connect to SparkSQL, 
Hive, etc. 

If carbon should support “cube”, not only need to defines its configuration 
which may be very complex and non-standard, but also will force user to use 
vendor specific tools for management and visualization. So, I think before 
going to this complexity, it is better to provide pre-agg table as the first 
step.

Although we do not want the full complexity of “cube” on arbitrary data schema, 
but one special case is for timeseries data. Because time dimension hierarchy 
(year/month/day/hour/minute/second) is naturally understandable and it is 
consistent in all scenarios, so we can provide native support for pre-aggregate 
table on time dimension. Actually it is a cube on time and we can do automatic 
rollup for all levels in time.

Finally, please note that, by using CTAS syntax, we are not restricting carbon 
to support pre-aggreagate table only, but also arbitrary materialized view, if 
we want in the future.

Hope this make things more clear.

Regards,
Jacky



 like mandarin provides, Actually, as you can see in the document, I am 
avoiding to call this “cube”.


> 在 2017年10月15日,下午9:18,Lu Cao  写道:
> 
> Hi Jacky,
> If user want to create a cube on main table, does he/she have to create
> multiple pre-aggregate tables? It will be a heavy workload to write so many
> CTAS commands. If user only need create a few pre-agg tables, current
> carbon already can support this requirement, user can create table first
> and then use insert into select statement. The only different is user need
> to query the pre-agg table instead of main table.
> 
> So maybe we can enable user to create a cube model( in schema or metafile?)
> which contains multiple pre-aggregation definition and carbon can create
> those pre-agg tables automatically according to the model. That would be
> more easy for using and maintenance.
> 
> Regards,
> Lionel
> 
> On Sun, Oct 15, 2017 at 3:56 PM, Jacky Li  wrote:
> 
>> Hi Liang,
>> 
>> For alter table, data update/delete, and delete segment, they are the same.
>> So I write in document “ User can manually perform this operation and
>> rebuild pre-aggregate table as
>> update scenario”
>> User need to drop the associated aggregate table and perform alter table,
>> or data update/delete, or delete segment operation, then he can create the
>> pre-agg table using CTAS command again, and the pre-aggregate table will be
>> rebuilt.
>> 
>> Regards,
>> Jacky
>> 
>>> 在 2017年10月15日,下午2:50,Liang Chen  写道:
>>> 
>>> Hi Jacky
>>> 
>>> Thanks for you started this discussion, this is a great feature in
>>> carbondata.
>>> 
>>> One question:
>>> For sub_jar "Handle alter table scenarios for aggregation table", please
>>> give more detail info.
>>> Just i viewed the pdf attachment as below, looks no need to do any
>> handles
>>> for agg table if users do alter for main table. so can you provide more
>>> detail, which scenarios need to be handled?
>>> 
>> --
>>> Adding of new column will not impact agg table.
>>> Deleting or renaming existing column may invalidate agg tables, if it
>>> invalidate, the operation
>>> will be rejected.
>>> User can manually perform this operation and rebuild pre-aggregate table
>> as
>>> update
>>> scenario.
>>> 
>>> Regards
>>> Liang
>>> 
>>> 
>>> --
>>> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.
>> n5.nabble.com/
>> 
>> 
>> 
>> 





Re: [DISCUSSION] Support only spark 2 in carbon 1.3.0

2017-10-15 Thread 北斗七
+1

2017-10-15 21:19 GMT+08:00 Lu Cao :

> Sure, will create the Jira ticket.
>
> Thanks,
> Lionel
>
> On Sun, Oct 15, 2017 at 2:29 PM, Liang Chen 
> wrote:
>
> > Hi lionel
> >
> > As per mailing list discussion result, no objection. so can you create an
> > umbrella jira to remove spark 1.5 & 1.6 code in 1.3.0.
> >
> > Regards
> > Liang
> >
> >
> > lionel061201 wrote
> > > Hi community,
> > > Currently we have three spark related module in carbondata(spark 1.5,
> > 1.6,
> > > 2.1), the project has become more and more difficult to maintain and
> has
> > > many redundant code.
> > > I propose to stop supporting spark 1.5 &1.6 and focus on spark
> 2.1(2.2).
> > > That will keep the project clean and simple for maintenance.
> > > Maybe we can provide some key patch to old version. But new features
> > could
> > > support spark2 only.
> > > Any ideas?
> > >
> > >
> > > Thanks & Regards,
> > > Lionel Cao
> >
> >
> >
> >
> >
> > --
> > Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.
> > n5.nabble.com/
> >
>


Re: [Discussion] Support pre-aggregate table to improve OLAP performance

2017-10-15 Thread Lu Cao
Hi Jacky,
If user want to create a cube on main table, does he/she have to create
multiple pre-aggregate tables? It will be a heavy workload to write so many
CTAS commands. If user only need create a few pre-agg tables, current
carbon already can support this requirement, user can create table first
and then use insert into select statement. The only different is user need
to query the pre-agg table instead of main table.

So maybe we can enable user to create a cube model( in schema or metafile?)
which contains multiple pre-aggregation definition and carbon can create
those pre-agg tables automatically according to the model. That would be
more easy for using and maintenance.

Regards,
Lionel

On Sun, Oct 15, 2017 at 3:56 PM, Jacky Li  wrote:

> Hi Liang,
>
> For alter table, data update/delete, and delete segment, they are the same.
> So I write in document “ User can manually perform this operation and
> rebuild pre-aggregate table as
> update scenario”
> User need to drop the associated aggregate table and perform alter table,
> or data update/delete, or delete segment operation, then he can create the
> pre-agg table using CTAS command again, and the pre-aggregate table will be
> rebuilt.
>
> Regards,
> Jacky
>
> > 在 2017年10月15日,下午2:50,Liang Chen  写道:
> >
> > Hi Jacky
> >
> > Thanks for you started this discussion, this is a great feature in
> > carbondata.
> >
> > One question:
> > For sub_jar "Handle alter table scenarios for aggregation table", please
> > give more detail info.
> > Just i viewed the pdf attachment as below, looks no need to do any
> handles
> > for agg table if users do alter for main table. so can you provide more
> > detail, which scenarios need to be handled?
> > 
> --
> > Adding of new column will not impact agg table.
> > Deleting or renaming existing column may invalidate agg tables, if it
> > invalidate, the operation
> > will be rejected.
> > User can manually perform this operation and rebuild pre-aggregate table
> as
> > update
> > scenario.
> >
> > Regards
> > Liang
> >
> >
> > --
> > Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.
> n5.nabble.com/
>
>
>
>


Re: [Discussion] Support pre-aggregate table to improve OLAP performance

2017-10-15 Thread Jacky Li
Hi Liang,

For alter table, data update/delete, and delete segment, they are the same.
So I write in document “ User can manually perform this operation and rebuild 
pre-aggregate table as
update scenario”
User need to drop the associated aggregate table and perform alter table, or 
data update/delete, or delete segment operation, then he can create the pre-agg 
table using CTAS command again, and the pre-aggregate table will be rebuilt.

Regards,
Jacky

> 在 2017年10月15日,下午2:50,Liang Chen  写道:
> 
> Hi Jacky
> 
> Thanks for you started this discussion, this is a great feature in
> carbondata.
> 
> One question:
> For sub_jar "Handle alter table scenarios for aggregation table", please
> give more detail info.
> Just i viewed the pdf attachment as below, looks no need to do any handles
> for agg table if users do alter for main table. so can you provide more
> detail, which scenarios need to be handled?
> --
> Adding of new column will not impact agg table.
> Deleting or renaming existing column may invalidate agg tables, if it
> invalidate, the operation
> will be rejected.
> User can manually perform this operation and rebuild pre-aggregate table as
> update
> scenario.
> 
> Regards
> Liang
> 
> 
> --
> Sent from: 
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/





Re: [Discussion] Support pre-aggregate table to improve OLAP performance

2017-10-15 Thread Liang Chen
Hi Jacky

Thanks for you started this discussion, this is a great feature in
carbondata.

One question:
For sub_jar "Handle alter table scenarios for aggregation table", please
give more detail info.
Just i viewed the pdf attachment as below, looks no need to do any handles
for agg table if users do alter for main table. so can you provide more
detail, which scenarios need to be handled?
--
Adding of new column will not impact agg table.
Deleting or renaming existing column may invalidate agg tables, if it
invalidate, the operation
will be rejected.
User can manually perform this operation and rebuild pre-aggregate table as
update
scenario.

Regards
Liang



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [DISCUSSION] Support only spark 2 in carbon 1.3.0

2017-10-15 Thread Liang Chen
Hi lionel

As per mailing list discussion result, no objection. so can you create an
umbrella jira to remove spark 1.5 & 1.6 code in 1.3.0.

Regards
Liang 


lionel061201 wrote
> Hi community,
> Currently we have three spark related module in carbondata(spark 1.5, 1.6,
> 2.1), the project has become more and more difficult to maintain and has
> many redundant code.
> I propose to stop supporting spark 1.5 &1.6 and focus on spark 2.1(2.2).
> That will keep the project clean and simple for maintenance.
> Maybe we can provide some key patch to old version. But new features could
> support spark2 only.
> Any ideas?
> 
> 
> Thanks & Regards,
> Lionel Cao





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/