[ 
https://issues.apache.org/jira/browse/HIVE-13634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin Hao updated HIVE-13634:
---------------------------
    Description: 
Hive-on-Spark performed worse than Hive-on-MR, for queries with external 
scripts.

For TPCx-BB Q2/Q3/Q4, they are Python Streaming related cases and will call 
external scripts to handle reduce tasks. We found that for these 3 queries 
Hive-on-Spark shows lower performance than Hive-on-MR when processing reduce 
tasks with external (Python) scripts. So ‘Improve HoS performance for queries 
with external scripts’ seems a performance optimization opportunity.

The following shows the Q2/Q3/Q4 test result on 8-worker-node cluster with 
TPCx-BB 3TB data size.

TPCx-BB Query 2
(1)Hive-on-MR 
Total Query Execution Time (sec): 2172.180
Execution Time of External Scripts (sec): 736
(2)Hive-on-Spark
Total Query Execution Time (sec): 2283.604
Execution Time of External Scripts (sec): 1197

TPCx-BB Query 3
(1)Hive-on-MR 
Total Query Execution Time (sec): 1070.632
Execution Time of External Scripts (sec): 513
(2)Hive-on-Spark
Total Query Execution Time (sec): 1287.679
Execution Time of External Scripts (sec): 919

TPCx-BB Query 4
(1)Hive-on-MR 
Total Query Execution Time (sec): 1781.864
Execution Time of External Scripts (sec): 1518
(2)Hive-on-Spark
Total Query Execution Time (sec): 2028.023
Execution Time of External Scripts (sec): 1599

  was:
Hive-on-Spark performed worse than Hive-on-MR, for queries with external 
scripts.

For TPCx-BB Q2/Q3/Q4, they are Python Streaming related cases and will call 
external scripts to handle reduce tasks. We found that for these 3 queries 
Hive-on-Spark shows lower performance than Hive-on-MR when processing reduce 
tasks with external (Python) scripts. So ‘Improve HoS performance for queries 
with external scripts’ seems a performance optimization opportunity.


> Hive-on-Spark performed worse than Hive-on-MR, for queries with external 
> scripts
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-13634
>                 URL: https://issues.apache.org/jira/browse/HIVE-13634
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Xin Hao
>
> Hive-on-Spark performed worse than Hive-on-MR, for queries with external 
> scripts.
> For TPCx-BB Q2/Q3/Q4, they are Python Streaming related cases and will call 
> external scripts to handle reduce tasks. We found that for these 3 queries 
> Hive-on-Spark shows lower performance than Hive-on-MR when processing reduce 
> tasks with external (Python) scripts. So ‘Improve HoS performance for queries 
> with external scripts’ seems a performance optimization opportunity.
> The following shows the Q2/Q3/Q4 test result on 8-worker-node cluster with 
> TPCx-BB 3TB data size.
> TPCx-BB Query 2
> (1)Hive-on-MR 
> Total Query Execution Time (sec): 2172.180
> Execution Time of External Scripts (sec): 736
> (2)Hive-on-Spark
> Total Query Execution Time (sec): 2283.604
> Execution Time of External Scripts (sec): 1197
> TPCx-BB Query 3
> (1)Hive-on-MR 
> Total Query Execution Time (sec): 1070.632
> Execution Time of External Scripts (sec): 513
> (2)Hive-on-Spark
> Total Query Execution Time (sec): 1287.679
> Execution Time of External Scripts (sec): 919
> TPCx-BB Query 4
> (1)Hive-on-MR 
> Total Query Execution Time (sec): 1781.864
> Execution Time of External Scripts (sec): 1518
> (2)Hive-on-Spark
> Total Query Execution Time (sec): 2028.023
> Execution Time of External Scripts (sec): 1599



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to