[jira] [Commented] (DRILL-4256) Performance regression in hive planning
[ https://issues.apache.org/jira/browse/DRILL-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152682#comment-15152682 ] Rahul Challapalli commented on DRILL-4256: -- [~dgu-atmapr] This is not closed as we did not automate the fix using the performance framework > Performance regression in hive planning > --- > > Key: DRILL-4256 > URL: https://issues.apache.org/jira/browse/DRILL-4256 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Hive, Query Planning & Optimization >Affects Versions: 1.5.0 >Reporter: Rahul Challapalli >Assignee: Venki Korukanti > Fix For: 1.5.0 > > Attachments: jstack.tgz > > > Commit # : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a > The fix for reading hive tables backed by hbase caused a performance > regression. The data set used in the below test has ~3700 partitions and the > filter in the query would ensure only 1 partition get selected. > {code} > Commit : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a > Query : explain plan for select count(*) from lineitem_partitioned where > `year`=2015 and `month`=1 and `day` =1; > Time : ~25 seconds > {code} > {code} > Commit : 1ea3d6c3f144614caf460648c1c27c6d0f5b06b8 > Query : explain plan for select count(*) from lineitem_partitioned where > `year`=2015 and `month`=1 and `day` =1; > Time : ~6.5 seconds > {code} > Since the data is large, I couldn't attach it here. Reach out to me if you > need additional information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4256) Performance regression in hive planning
[ https://issues.apache.org/jira/browse/DRILL-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131465#comment-15131465 ] Rahul Challapalli commented on DRILL-4256: -- I manually verified the fix and it looks good! > Performance regression in hive planning > --- > > Key: DRILL-4256 > URL: https://issues.apache.org/jira/browse/DRILL-4256 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Hive, Query Planning & Optimization >Affects Versions: 1.5.0 >Reporter: Rahul Challapalli >Assignee: Venki Korukanti > Fix For: 1.5.0 > > Attachments: jstack.tgz > > > Commit # : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a > The fix for reading hive tables backed by hbase caused a performance > regression. The data set used in the below test has ~3700 partitions and the > filter in the query would ensure only 1 partition get selected. > {code} > Commit : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a > Query : explain plan for select count(*) from lineitem_partitioned where > `year`=2015 and `month`=1 and `day` =1; > Time : ~25 seconds > {code} > {code} > Commit : 1ea3d6c3f144614caf460648c1c27c6d0f5b06b8 > Query : explain plan for select count(*) from lineitem_partitioned where > `year`=2015 and `month`=1 and `day` =1; > Time : ~6.5 seconds > {code} > Since the data is large, I couldn't attach it here. Reach out to me if you > need additional information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4256) Performance regression in hive planning
[ https://issues.apache.org/jira/browse/DRILL-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108665#comment-15108665 ] ASF GitHub Bot commented on DRILL-4256: --- Github user jinfengni commented on the pull request: https://github.com/apache/drill/pull/329#issuecomment-173229548 LGTM. +1 > Performance regression in hive planning > --- > > Key: DRILL-4256 > URL: https://issues.apache.org/jira/browse/DRILL-4256 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Hive, Query Planning & Optimization >Affects Versions: 1.5.0 >Reporter: Rahul Challapalli >Assignee: Jinfeng Ni > Attachments: jstack.tgz > > > Commit # : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a > The fix for reading hive tables backed by hbase caused a performance > regression. The data set used in the below test has ~3700 partitions and the > filter in the query would ensure only 1 partition get selected. > {code} > Commit : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a > Query : explain plan for select count(*) from lineitem_partitioned where > `year`=2015 and `month`=1 and `day` =1; > Time : ~25 seconds > {code} > {code} > Commit : 1ea3d6c3f144614caf460648c1c27c6d0f5b06b8 > Query : explain plan for select count(*) from lineitem_partitioned where > `year`=2015 and `month`=1 and `day` =1; > Time : ~6.5 seconds > {code} > Since the data is large, I couldn't attach it here. Reach out to me if you > need additional information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4256) Performance regression in hive planning
[ https://issues.apache.org/jira/browse/DRILL-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15109128#comment-15109128 ] ASF GitHub Bot commented on DRILL-4256: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/329 > Performance regression in hive planning > --- > > Key: DRILL-4256 > URL: https://issues.apache.org/jira/browse/DRILL-4256 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Hive, Query Planning & Optimization >Affects Versions: 1.5.0 >Reporter: Rahul Challapalli >Assignee: Venki Korukanti > Attachments: jstack.tgz > > > Commit # : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a > The fix for reading hive tables backed by hbase caused a performance > regression. The data set used in the below test has ~3700 partitions and the > filter in the query would ensure only 1 partition get selected. > {code} > Commit : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a > Query : explain plan for select count(*) from lineitem_partitioned where > `year`=2015 and `month`=1 and `day` =1; > Time : ~25 seconds > {code} > {code} > Commit : 1ea3d6c3f144614caf460648c1c27c6d0f5b06b8 > Query : explain plan for select count(*) from lineitem_partitioned where > `year`=2015 and `month`=1 and `day` =1; > Time : ~6.5 seconds > {code} > Since the data is large, I couldn't attach it here. Reach out to me if you > need additional information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4256) Performance regression in hive planning
[ https://issues.apache.org/jira/browse/DRILL-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105982#comment-15105982 ] ASF GitHub Bot commented on DRILL-4256: --- GitHub user vkorukanti opened a pull request: https://github.com/apache/drill/pull/329 DRILL-4256: Create HiveConf per HiveStoragePlugin and reuse it wherev… …er needed. Creating new instances of HiveConf() are very costly, we should avoid creating new ones as much as possible. Also get rid of hiveConfigOverride and use HiveConf in HiveStoregPlugin wherever we need the HiveConf. You can merge this pull request into a Git repository by running: $ git pull https://github.com/vkorukanti/drill DRILL-4256 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/329.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #329 commit 3769dada12dafc7cd9209551e96184c968d19f73 Author: vkorukantiDate: 2016-01-11T23:01:02Z DRILL-4256: Create HiveConf per HiveStoragePlugin and reuse it wherever needed. Creating new instances of HiveConf() are very costly, we should avoid creating new ones as much as possible. Also get rid of hiveConfigOverride and use HiveConf in HiveStoregPlugin wherever we need the HiveConf. > Performance regression in hive planning > --- > > Key: DRILL-4256 > URL: https://issues.apache.org/jira/browse/DRILL-4256 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Hive, Query Planning & Optimization >Affects Versions: 1.5.0 >Reporter: Rahul Challapalli > Attachments: jstack.tgz > > > Commit # : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a > The fix for reading hive tables backed by hbase caused a performance > regression. The data set used in the below test has ~3700 partitions and the > filter in the query would ensure only 1 partition get selected. > {code} > Commit : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a > Query : explain plan for select count(*) from lineitem_partitioned where > `year`=2015 and `month`=1 and `day` =1; > Time : ~25 seconds > {code} > {code} > Commit : 1ea3d6c3f144614caf460648c1c27c6d0f5b06b8 > Query : explain plan for select count(*) from lineitem_partitioned where > `year`=2015 and `month`=1 and `day` =1; > Time : ~6.5 seconds > {code} > Since the data is large, I couldn't attach it here. Reach out to me if you > need additional information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4256) Performance regression in hive planning
[ https://issues.apache.org/jira/browse/DRILL-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092422#comment-15092422 ] Venki Korukanti commented on DRILL-4256: Commit ({{76f41e18}}) should only affect when native reader is enabled. Not sure why it is causing regression in this case. Is the environment same for both the tests? Also can you run jstack while the query is running and check the callstack? > Performance regression in hive planning > --- > > Key: DRILL-4256 > URL: https://issues.apache.org/jira/browse/DRILL-4256 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Hive, Query Planning & Optimization >Affects Versions: 1.5.0 >Reporter: Rahul Challapalli > > Commit # : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a > The fix for reading hive tables backed by hbase caused a performance > regression. The data set used in the below test has ~3700 partitions and the > filter in the query would ensure only 1 partition get selected. > {code} > Commit : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a > Query : explain plan for select count(*) from lineitem_partitioned where > `year`=2015 and `month`=1 and `day` =1; > Time : ~25 seconds > {code} > {code} > Commit : 1ea3d6c3f144614caf460648c1c27c6d0f5b06b8 > Query : explain plan for select count(*) from lineitem_partitioned where > `year`=2015 and `month`=1 and `day` =1; > Time : ~6.5 seconds > {code} > Since the data is large, I couldn't attach it here. Reach out to me if you > need additional information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4256) Performance regression in hive planning
[ https://issues.apache.org/jira/browse/DRILL-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089878#comment-15089878 ] Venki Korukanti commented on DRILL-4256: Is the hive native reader enabled? If enabled, can you try after disabling it? > Performance regression in hive planning > --- > > Key: DRILL-4256 > URL: https://issues.apache.org/jira/browse/DRILL-4256 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Hive, Query Planning & Optimization >Affects Versions: 1.5.0 >Reporter: Rahul Challapalli > > Commit # : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a > The fix for reading hive tables backed by hbase caused a performance > regression. The data set used in the below test has ~3700 partitions and the > filter in the query would ensure only 1 partition get selected. > {code} > Commit : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a > Query : explain plan for select count(*) from lineitem_partitioned where > `year`=2015 and `month`=1 and `day` =1; > Time : ~25 seconds > {code} > {code} > Commit : 1ea3d6c3f144614caf460648c1c27c6d0f5b06b8 > Query : explain plan for select count(*) from lineitem_partitioned where > `year`=2015 and `month`=1 and `day` =1; > Time : ~6.5 seconds > {code} > Since the data is large, I couldn't attach it here. Reach out to me if you > need additional information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4256) Performance regression in hive planning
[ https://issues.apache.org/jira/browse/DRILL-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089949#comment-15089949 ] Rahul Challapalli commented on DRILL-4256: -- Native reader is disabled > Performance regression in hive planning > --- > > Key: DRILL-4256 > URL: https://issues.apache.org/jira/browse/DRILL-4256 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Hive, Query Planning & Optimization >Affects Versions: 1.5.0 >Reporter: Rahul Challapalli > > Commit # : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a > The fix for reading hive tables backed by hbase caused a performance > regression. The data set used in the below test has ~3700 partitions and the > filter in the query would ensure only 1 partition get selected. > {code} > Commit : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a > Query : explain plan for select count(*) from lineitem_partitioned where > `year`=2015 and `month`=1 and `day` =1; > Time : ~25 seconds > {code} > {code} > Commit : 1ea3d6c3f144614caf460648c1c27c6d0f5b06b8 > Query : explain plan for select count(*) from lineitem_partitioned where > `year`=2015 and `month`=1 and `day` =1; > Time : ~6.5 seconds > {code} > Since the data is large, I couldn't attach it here. Reach out to me if you > need additional information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)