[ 
https://issues.apache.org/jira/browse/HIVE-16311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15944685#comment-15944685
 ] 

Colin Ma edited comment on HIVE-16311 at 3/28/17 7:33 AM:
----------------------------------------------------------

The initial patch is uploaded.
Do the simple test, 12345.67/123.45  with FastHiveDecimalImpl.fastDivide 500000 
times, the result shows 1s(without patch) vs  0.1s(with patch).
Also test the patch with q06 of TPCx-BB which has the following divide 
expression:  
{code}
sum( case when (d_year = 2001)   THEN 
(((ws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt)+ws_ext_sales_price)/2)
   ELSE 0 END) first_year_total
{code}
The cluster includes 6 nodes, 128G memory/per node, CPU is Intel(R) Xeon(R) 
E5-2680, 1G network.
With the 1T data scale and spark as executor engine, the following is the 
result:
|| ||without patch||with patch||improvement||
|disable vectorization|214s|164s|23.36%|
|enable vectorization(Parquet file format)|252s|125s|50.4%|


was (Author: colinma):
The initial patch is uploaded.
Do the simple test, 12345.67/123.45  with FastHiveDecimalImpl.fastDivide 500000 
times, the result shows 1s(without patch) vs  0.1s(with patch).
Also test the patch with q06 of TPCx-BB which has the following divide 
expression:  
{code}
sum( case when (d_year = 2001)   THEN 
(((ws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt)+ws_ext_sales_price)/2)
   ELSE 0 END) first_year_total
{code}
The cluster includes 6 nodes, 128G memory/per node, CPU is Intel(R) Xeon(R) 
E5-2680, 1G network.
With the 1T data scale and spark as executor engine, the following is the 
result:
|| ||without patch||with patch||improvement||
|disable vectorization|214s|164s|23.36%|
|enable vectorization|252s|125s|50.4%|

> Improve the performance for FastHiveDecimalImpl.fastDivide
> ----------------------------------------------------------
>
>                 Key: HIVE-16311
>                 URL: https://issues.apache.org/jira/browse/HIVE-16311
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 2.2.0
>            Reporter: Colin Ma
>            Assignee: Colin Ma
>             Fix For: 2.2.0
>
>         Attachments: HIVE-16311.001.patch
>
>
> FastHiveDecimalImpl.fastDivide is poor performance when evaluate the 
> expression as 12345.67/123.45
> There are 2 points can be improved:
> 1. Don't always use HiveDecimal.MAX_SCALE as scale when do the 
> BigDecimal.divide.
> 2. Get the precision for BigInteger in a fast way if possible.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to