Re: [Discussion] Support pre-aggregate table to improve OLAP performance

2017-11-02 Thread bill.zhou
hi  Jacky & Ravindra, I have little more query about this design, thank you
very much can clarify my query.  


1. if we support create aggreagation tables from two or more tabels join,
how to set the aggretate.parent?, whether can be like
'aggretate.parent'='fact1,dim1,dim1'
2. what's the agg table colum name ? for following create command it will be
as: user_id,name,c2, price ?
CREATE TABLE agg_sales
STORED BY 'carbondata'
TBLPROPERTIES ('aggregate.parent'='sales')
AS SELECT user_id,user_name as name, sum(quantity) as c2, avg(price) FROM
sales GROUP BY user_id.
3. if we create the dictioanry column in agg table, whether the dictionary
file will use the same one main table? 

4. for rollup table main table creation: what's the mean for
timeseries.eventtime, granualarity? what's column can belong to this?
5. for rollup table main table creation: what's the mean for 
‘timeseries.aggtype’ =’quantity:sum, max', it means the column quantity only
support sum, max ? 

6. In both the above cases carbon generates the 4 pre-aggregation tables
automatically for
year, month, day and hour. (their table name will be prefixed with
agg_sales). -- in about cause only see the column hour, how to generate the
year, month and day ? 

7.In internal implementation, carbon will create these table with
SORT_COLUMNS=’group by 
column defined above’, so that filter group by query on main table will be
faster because it 
can leverage the index in pre-aggregate tables. -- I suggstion user can
control the sort columns order 
8. whether support merge index to agg table ? -- it is usefull.


Jacky Li wrote
> Hi community,
> 
> In traditional data warehouse, pre-aggregate table or cube is a common
> technology to improve OLAP query performance. To take carbondata support
> for OLAP to next level, I’d like to propose pre-aggregate table support in
> carbondata. 
> 
> Please refer to CARBONDATA-1516
> https://issues.apache.org/jira/browse/CARBONDATA-1516; and the
> design document attached in the JIRA ticket
> (https://issues.apache.org/jira/browse/CARBONDATA-1516
> https://issues.apache.org/jira/browse/CARBONDATA-1516;) 
> 
> This design is still in initial phase, proposed usage and SQL syntax are
> subject to change. Please provide your comment to improve this feature.
> Any suggestion on the design from community is welcomed.
> 
> Regards,
> Jacky Li





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Re: Delegation Token can be issued only with kerberos or web authentication" will occur in yarn cluster

2017-11-02 Thread yixu2001
dev 
 Our platform is installed with HDP 2.4, but spark 2.1 is not included in HDP 
2.4, we using spark 2.1 with additional installed of apache version.


yixu2001
 
From: Naresh P R
Date: 2017-11-02 22:02
To: dev
Subject: Re: Re: Delegation Token can be issued only with kerberos or web 
authentication" will occur in yarn cluster
Hi yixu,

I am not able to see any attachment in your previous mail.
---
Regards,
Naresh P R

On Thu, Nov 2, 2017 at 4:40 PM, yixu2001  wrote:
dev 
 Please refer to the attachment "cluster carbon error2.txt" for the log trace.
In this log, I try 2 query statements:
select * from e_carbon.prod_inst_his   prod_inst_his is a hive table, it 
success.
select * from e_carbon.prod_inst_his_c prod_inst_his_c is a carbon table, 
it failed.

I pass the principal in my start script, please refer to the attachment 
"testCluster.sh 

".

I have set hive.server2.enable.doAs = false in the above test and I have 
printed it in the log.


yixu2001
 
From: Naresh P R
Date: 2017-11-01 19:40
To: dev
Subject: Re: Delegation Token can be issued only with kerberos or web 
authentication" will occur in yarn cluster
Hi,
 
Ideally kerberos authentication should work with carbon table, Can you
share us log trace to analyze further more?
 
how are you passing the principal in yarn cluster ?
 
can you try to set hive.server2.enable.doAs = false & run query on carbon
table ?

Regards,
Naresh P R
 
On Wed, Nov 1, 2017 at 3:33 PM, yixu2001  wrote:
 
> dev
>  I submit a spark application in mode yarn cluster to a cluster with
> kerberos. In this application, it will successfully query a hive table, but
> when it try to query a carbon table, it failed with infomation "Delegation
> Token can be issued only with kerberos or web authentication".
>
> If I submit this application in mode yarn client, both hive table and
> carbon table will both success.
>
> And If I submit this application in mode yarn cluster on another cluster
> without kerberos, both hive table and carbon table will both success.
>
>
> yixu2001
>



Re: Version upgrade for Presto Integration to 0.186

2017-11-02 Thread Raghunandan S
Any backward incompatibilities introduced?
+1 for the upgrade
On Thu, 2 Nov 2017 at 12:18 PM, Bhavya Aggarwal  wrote:

> Hi All,
>
> Presto 0.186 version has as lot of improvements that will increase the
> performance and improve the reliability. Some of the major issues and
> improvements are listed below.
>
>
>- Fix excessive GC overhead caused by map to map cast.
>- Fix issue that may cause queries containing expensive functions, such
>as regular expressions, to continue using CPU resources even after they
> are
>killed.
>- Fix performance issue caused by redundant casts
>- Fix leak in running query counter for failed queries. The counter
>would increment but never decrement for queries that failed before
> starting.
>- Reduce memory usage when building data of VARCHAR or VARBINARY types.
>- Estimate memory usage for GROUP BY more precisely to avoid out of
>memory errors.
>- Add Spill to Disk 
>for joins.
>
> Currently the Presto version that we are using in Carbondata is 0.166 , I
> would like to suggest to upgrade it to 0.186. Please let me know what the
> group thinks about it.
>
>
> Regards
>
> Bhavya
>


Re: Re: Delegation Token can be issued only with kerberos or web authentication" will occur in yarn cluster

2017-11-02 Thread Naresh P R
Hi yixu,

I am not able to see any attachment in your previous mail.
---
Regards,
Naresh P R

On Thu, Nov 2, 2017 at 4:40 PM, yixu2001  wrote:

> dev
>  Please refer to the attachment "cluster carbon
> error2.txt" for the log trace.
> In this log, I try 2 query statements:
> select * from e_carbon.prod_inst_his   prod_inst_his
> is a hive table, it success.
> select * from e_carbon.prod_inst_his_c prod_inst_his_
> c is a carbon table, it failed.
>
> I pass the principal in my start script, please refer to the attachment "
> testCluster.sh
>
> ".
>
> I have set hive.server2.enable.doAs = false in the
> above test and I have printed it in the log.
> --
> yixu2001
>
>
> *From:* Naresh P R 
> *Date:* 2017-11-01 19:40
> *To:* dev 
> *Subject:* Re: Delegation Token can be issued only with kerberos or web
> authentication" will occur in yarn cluster
> Hi,
>
> Ideally kerberos authentication should work with carbon table, Can you
> share us log trace to analyze further more?
>
> how are you passing the principal in yarn cluster ?
>
> can you try to set hive.server2.enable.doAs = false & run query on carbon
> table ?
> 
> Regards,
> Naresh P R
>
> On Wed, Nov 1, 2017 at 3:33 PM, yixu2001  wrote:
>
> > dev
> >  I submit a spark application in mode yarn cluster to a cluster with
> > kerberos. In this application, it will successfully query a hive table,
> but
> > when it try to query a carbon table, it failed with infomation
> "Delegation
> > Token can be issued only with kerberos or web authentication".
> >
> > If I submit this application in mode yarn client, both hive table and
> > carbon table will both success.
> >
> > And If I submit this application in mode yarn cluster on another cluster
> > without kerberos, both hive table and carbon table will both success.
> >
> >
> > yixu2001
> >
>
>


Re: Re: Delegation Token can be issued only with kerberos or web authentication" will occur in yarn cluster

2017-11-02 Thread yixu2001
dev 
 Please refer to the attachment "cluster carbon error2.txt" for the log trace.
In this log, I try 2 query statements:
select * from e_carbon.prod_inst_his   prod_inst_his is a hive table, it 
success.
select * from e_carbon.prod_inst_his_c prod_inst_his_c is a carbon table, 
it failed.

I pass the principal in my start script, please refer to the attachment 
"testCluster.sh 

".

I have set hive.server2.enable.doAs = false in the above test and I have 
printed it in the log.


yixu2001
 
From: Naresh P R
Date: 2017-11-01 19:40
To: dev
Subject: Re: Delegation Token can be issued only with kerberos or web 
authentication" will occur in yarn cluster
Hi,
 
Ideally kerberos authentication should work with carbon table, Can you
share us log trace to analyze further more?
 
how are you passing the principal in yarn cluster ?
 
can you try to set hive.server2.enable.doAs = false & run query on carbon
table ?

Regards,
Naresh P R
 
On Wed, Nov 1, 2017 at 3:33 PM, yixu2001  wrote:
 
> dev
>  I submit a spark application in mode yarn cluster to a cluster with
> kerberos. In this application, it will successfully query a hive table, but
> when it try to query a carbon table, it failed with infomation "Delegation
> Token can be issued only with kerberos or web authentication".
>
> If I submit this application in mode yarn client, both hive table and
> carbon table will both success.
>
> And If I submit this application in mode yarn cluster on another cluster
> without kerberos, both hive table and carbon table will both success.
>
>
> yixu2001
>


Version upgrade for Presto Integration to 0.186

2017-11-02 Thread Bhavya Aggarwal
Hi All,

Presto 0.186 version has as lot of improvements that will increase the
performance and improve the reliability. Some of the major issues and
improvements are listed below.


   - Fix excessive GC overhead caused by map to map cast.
   - Fix issue that may cause queries containing expensive functions, such
   as regular expressions, to continue using CPU resources even after they are
   killed.
   - Fix performance issue caused by redundant casts
   - Fix leak in running query counter for failed queries. The counter
   would increment but never decrement for queries that failed before starting.
   - Reduce memory usage when building data of VARCHAR or VARBINARY types.
   - Estimate memory usage for GROUP BY more precisely to avoid out of
   memory errors.
   - Add Spill to Disk 
   for joins.

Currently the Presto version that we are using in Carbondata is 0.166 , I
would like to suggest to upgrade it to 0.186. Please let me know what the
group thinks about it.


Regards

Bhavya