[jira] [Commented] (DRILL-4256) Performance regression in hive planning

2016-02-18 Thread Rahul Challapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152682#comment-15152682
 ] 

Rahul Challapalli commented on DRILL-4256:
--

[~dgu-atmapr] This is not closed as we did not automate the fix using the 
performance framework

> Performance regression in hive planning
> ---
>
> Key: DRILL-4256
> URL: https://issues.apache.org/jira/browse/DRILL-4256
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Hive, Query Planning & Optimization
>Affects Versions: 1.5.0
>Reporter: Rahul Challapalli
>Assignee: Venki Korukanti
> Fix For: 1.5.0
>
> Attachments: jstack.tgz
>
>
> Commit # : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a
> The fix for reading hive tables backed by hbase caused a performance 
> regression. The data set used in the below test has ~3700 partitions and the 
> filter in the query would ensure only 1 partition get selected.
> {code}
> Commit : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a
> Query : explain plan for select count(*) from lineitem_partitioned where 
> `year`=2015 and `month`=1 and `day` =1;
> Time : ~25 seconds
> {code}
> {code}
> Commit : 1ea3d6c3f144614caf460648c1c27c6d0f5b06b8
> Query : explain plan for select count(*) from lineitem_partitioned where 
> `year`=2015 and `month`=1 and `day` =1;
> Time : ~6.5 seconds
> {code}
> Since the data is large, I couldn't attach it here. Reach out to me if you 
> need additional information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4256) Performance regression in hive planning

2016-02-03 Thread Rahul Challapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131465#comment-15131465
 ] 

Rahul Challapalli commented on DRILL-4256:
--

I manually verified the fix and it looks good!

> Performance regression in hive planning
> ---
>
> Key: DRILL-4256
> URL: https://issues.apache.org/jira/browse/DRILL-4256
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Hive, Query Planning & Optimization
>Affects Versions: 1.5.0
>Reporter: Rahul Challapalli
>Assignee: Venki Korukanti
> Fix For: 1.5.0
>
> Attachments: jstack.tgz
>
>
> Commit # : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a
> The fix for reading hive tables backed by hbase caused a performance 
> regression. The data set used in the below test has ~3700 partitions and the 
> filter in the query would ensure only 1 partition get selected.
> {code}
> Commit : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a
> Query : explain plan for select count(*) from lineitem_partitioned where 
> `year`=2015 and `month`=1 and `day` =1;
> Time : ~25 seconds
> {code}
> {code}
> Commit : 1ea3d6c3f144614caf460648c1c27c6d0f5b06b8
> Query : explain plan for select count(*) from lineitem_partitioned where 
> `year`=2015 and `month`=1 and `day` =1;
> Time : ~6.5 seconds
> {code}
> Since the data is large, I couldn't attach it here. Reach out to me if you 
> need additional information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4256) Performance regression in hive planning

2016-01-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108665#comment-15108665
 ] 

ASF GitHub Bot commented on DRILL-4256:
---

Github user jinfengni commented on the pull request:

https://github.com/apache/drill/pull/329#issuecomment-173229548
  
LGTM. +1



> Performance regression in hive planning
> ---
>
> Key: DRILL-4256
> URL: https://issues.apache.org/jira/browse/DRILL-4256
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Hive, Query Planning & Optimization
>Affects Versions: 1.5.0
>Reporter: Rahul Challapalli
>Assignee: Jinfeng Ni
> Attachments: jstack.tgz
>
>
> Commit # : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a
> The fix for reading hive tables backed by hbase caused a performance 
> regression. The data set used in the below test has ~3700 partitions and the 
> filter in the query would ensure only 1 partition get selected.
> {code}
> Commit : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a
> Query : explain plan for select count(*) from lineitem_partitioned where 
> `year`=2015 and `month`=1 and `day` =1;
> Time : ~25 seconds
> {code}
> {code}
> Commit : 1ea3d6c3f144614caf460648c1c27c6d0f5b06b8
> Query : explain plan for select count(*) from lineitem_partitioned where 
> `year`=2015 and `month`=1 and `day` =1;
> Time : ~6.5 seconds
> {code}
> Since the data is large, I couldn't attach it here. Reach out to me if you 
> need additional information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4256) Performance regression in hive planning

2016-01-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15109128#comment-15109128
 ] 

ASF GitHub Bot commented on DRILL-4256:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/329


> Performance regression in hive planning
> ---
>
> Key: DRILL-4256
> URL: https://issues.apache.org/jira/browse/DRILL-4256
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Hive, Query Planning & Optimization
>Affects Versions: 1.5.0
>Reporter: Rahul Challapalli
>Assignee: Venki Korukanti
> Attachments: jstack.tgz
>
>
> Commit # : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a
> The fix for reading hive tables backed by hbase caused a performance 
> regression. The data set used in the below test has ~3700 partitions and the 
> filter in the query would ensure only 1 partition get selected.
> {code}
> Commit : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a
> Query : explain plan for select count(*) from lineitem_partitioned where 
> `year`=2015 and `month`=1 and `day` =1;
> Time : ~25 seconds
> {code}
> {code}
> Commit : 1ea3d6c3f144614caf460648c1c27c6d0f5b06b8
> Query : explain plan for select count(*) from lineitem_partitioned where 
> `year`=2015 and `month`=1 and `day` =1;
> Time : ~6.5 seconds
> {code}
> Since the data is large, I couldn't attach it here. Reach out to me if you 
> need additional information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4256) Performance regression in hive planning

2016-01-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105982#comment-15105982
 ] 

ASF GitHub Bot commented on DRILL-4256:
---

GitHub user vkorukanti opened a pull request:

https://github.com/apache/drill/pull/329

DRILL-4256: Create HiveConf per HiveStoragePlugin and reuse it wherev…

…er needed.

Creating new instances of HiveConf() are very costly, we should avoid 
creating new ones as much as possible.
Also get rid of hiveConfigOverride and use HiveConf in HiveStoregPlugin 
wherever we need the HiveConf.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vkorukanti/drill DRILL-4256

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/329.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #329


commit 3769dada12dafc7cd9209551e96184c968d19f73
Author: vkorukanti 
Date:   2016-01-11T23:01:02Z

DRILL-4256: Create HiveConf per HiveStoragePlugin and reuse it wherever 
needed.

Creating new instances of HiveConf() are very costly, we should avoid 
creating new ones as much as possible.
Also get rid of hiveConfigOverride and use HiveConf in HiveStoregPlugin 
wherever we need the HiveConf.




> Performance regression in hive planning
> ---
>
> Key: DRILL-4256
> URL: https://issues.apache.org/jira/browse/DRILL-4256
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Hive, Query Planning & Optimization
>Affects Versions: 1.5.0
>Reporter: Rahul Challapalli
> Attachments: jstack.tgz
>
>
> Commit # : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a
> The fix for reading hive tables backed by hbase caused a performance 
> regression. The data set used in the below test has ~3700 partitions and the 
> filter in the query would ensure only 1 partition get selected.
> {code}
> Commit : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a
> Query : explain plan for select count(*) from lineitem_partitioned where 
> `year`=2015 and `month`=1 and `day` =1;
> Time : ~25 seconds
> {code}
> {code}
> Commit : 1ea3d6c3f144614caf460648c1c27c6d0f5b06b8
> Query : explain plan for select count(*) from lineitem_partitioned where 
> `year`=2015 and `month`=1 and `day` =1;
> Time : ~6.5 seconds
> {code}
> Since the data is large, I couldn't attach it here. Reach out to me if you 
> need additional information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4256) Performance regression in hive planning

2016-01-11 Thread Venki Korukanti (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092422#comment-15092422
 ] 

Venki Korukanti commented on DRILL-4256:


Commit ({{76f41e18}}) should only affect when native reader is enabled. Not 
sure why it is causing regression in this case. Is the environment same for 
both the tests? Also can you run jstack while the query is running and check 
the callstack?

> Performance regression in hive planning
> ---
>
> Key: DRILL-4256
> URL: https://issues.apache.org/jira/browse/DRILL-4256
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Hive, Query Planning & Optimization
>Affects Versions: 1.5.0
>Reporter: Rahul Challapalli
>
> Commit # : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a
> The fix for reading hive tables backed by hbase caused a performance 
> regression. The data set used in the below test has ~3700 partitions and the 
> filter in the query would ensure only 1 partition get selected.
> {code}
> Commit : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a
> Query : explain plan for select count(*) from lineitem_partitioned where 
> `year`=2015 and `month`=1 and `day` =1;
> Time : ~25 seconds
> {code}
> {code}
> Commit : 1ea3d6c3f144614caf460648c1c27c6d0f5b06b8
> Query : explain plan for select count(*) from lineitem_partitioned where 
> `year`=2015 and `month`=1 and `day` =1;
> Time : ~6.5 seconds
> {code}
> Since the data is large, I couldn't attach it here. Reach out to me if you 
> need additional information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4256) Performance regression in hive planning

2016-01-08 Thread Venki Korukanti (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089878#comment-15089878
 ] 

Venki Korukanti commented on DRILL-4256:


Is the hive native reader enabled? If enabled, can you try after disabling it?

> Performance regression in hive planning
> ---
>
> Key: DRILL-4256
> URL: https://issues.apache.org/jira/browse/DRILL-4256
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Hive, Query Planning & Optimization
>Affects Versions: 1.5.0
>Reporter: Rahul Challapalli
>
> Commit # : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a
> The fix for reading hive tables backed by hbase caused a performance 
> regression. The data set used in the below test has ~3700 partitions and the 
> filter in the query would ensure only 1 partition get selected.
> {code}
> Commit : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a
> Query : explain plan for select count(*) from lineitem_partitioned where 
> `year`=2015 and `month`=1 and `day` =1;
> Time : ~25 seconds
> {code}
> {code}
> Commit : 1ea3d6c3f144614caf460648c1c27c6d0f5b06b8
> Query : explain plan for select count(*) from lineitem_partitioned where 
> `year`=2015 and `month`=1 and `day` =1;
> Time : ~6.5 seconds
> {code}
> Since the data is large, I couldn't attach it here. Reach out to me if you 
> need additional information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4256) Performance regression in hive planning

2016-01-08 Thread Rahul Challapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089949#comment-15089949
 ] 

Rahul Challapalli commented on DRILL-4256:
--

Native reader is disabled

> Performance regression in hive planning
> ---
>
> Key: DRILL-4256
> URL: https://issues.apache.org/jira/browse/DRILL-4256
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Hive, Query Planning & Optimization
>Affects Versions: 1.5.0
>Reporter: Rahul Challapalli
>
> Commit # : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a
> The fix for reading hive tables backed by hbase caused a performance 
> regression. The data set used in the below test has ~3700 partitions and the 
> filter in the query would ensure only 1 partition get selected.
> {code}
> Commit : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a
> Query : explain plan for select count(*) from lineitem_partitioned where 
> `year`=2015 and `month`=1 and `day` =1;
> Time : ~25 seconds
> {code}
> {code}
> Commit : 1ea3d6c3f144614caf460648c1c27c6d0f5b06b8
> Query : explain plan for select count(*) from lineitem_partitioned where 
> `year`=2015 and `month`=1 and `day` =1;
> Time : ~6.5 seconds
> {code}
> Since the data is large, I couldn't attach it here. Reach out to me if you 
> need additional information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)