[ 
https://issues.apache.org/jira/browse/HIVE-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-13189:
------------------------------------
    Attachment: HIVE-13189.1.patch


At TPCH-1 TB, runtime with patch drops from 244 seconds to 169 seconds.

Without Patch 
{noformat}
create temporary table x as  select l_receiptdate, 
date_add(to_date(l_receiptdate), 3) from lineitem;

Status: Running (Executing on YARN cluster with App id 
application_1456147314798_24782)

----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED    262        262        0        0    
   0       0
----------------------------------------------------------------------------------------------
VERTICES: 01/01  [==========================>>] 100%  ELAPSED TIME: 244.05 s
----------------------------------------------------------------------------------------------
Status: DAG finished successfully in 244.05 seconds


METHOD                         DURATION(ms)
parse                                    4
semanticAnalyze                        920
TezBuildDag                            328
TezSubmitToRunningDag                  410
TotalPrepTime                        2,168

VERTICES         TOTAL_TASKS  FAILED_ATTEMPTS KILLED_TASKS DURATION_SECONDS    
CPU_TIME_MILLIS     GC_TIME_MILLIS  INPUT_RECORDS   OUTPUT_RECORDS
Map 1                    262                0            0           240.99     
    31,358,960            306,167  5,999,989,709                0

{noformat}

With Patch:

{noformat}


Status: Running (Executing on YARN cluster with App id 
application_1456147314798_24788)

----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED    262        262        0        0    
   0       0
----------------------------------------------------------------------------------------------
VERTICES: 01/01  [==========================>>] 100%  ELAPSED TIME: 169.15 s
----------------------------------------------------------------------------------------------
Status: DAG finished successfully in 169.15 seconds


METHOD                         DURATION(ms)
parse                                   24
semanticAnalyze                      1,545
TezBuildDag                            242
TezSubmitToRunningDag                  258
TotalPrepTime                        2,768

VERTICES         TOTAL_TASKS  FAILED_ATTEMPTS KILLED_TASKS DURATION_SECONDS    
CPU_TIME_MILLIS     GC_TIME_MILLIS  INPUT_RECORDS   OUTPUT_RECORDS
Map 1                    262                0            0           166.24     
    21,189,670            158,159  5,999,989,709                0
{noformat}

If the approach is fine, this can be extended to datediff as well.

> Consider using Joda DateTimeFormatter instead of SimpleDateFormat in 
> GenericUDFDateAdd
> --------------------------------------------------------------------------------------
>
>                 Key: HIVE-13189
>                 URL: https://issues.apache.org/jira/browse/HIVE-13189
>             Project: Hive
>          Issue Type: Improvement
>          Components: Hive
>            Reporter: Rajesh Balamohan
>            Assignee: varun a kumar
>         Attachments: HIVE-13189.1.patch
>
>
> Quite an amount was spent by tasks in trying to parse date string in 
> GenericUDFDateAdd.  
> {noformat}
>   java.lang.Thread.State: RUNNABLE
>         at java.text.DecimalFormat.subparse(DecimalFormat.java:1467)
>         at java.text.DecimalFormat.parse(DecimalFormat.java:1268)
>         at java.text.SimpleDateFormat.subParse(SimpleDateFormat.java:2088)
>         at java.text.SimpleDateFormat.parse(SimpleDateFormat.java:1455)
>         at java.text.DateFormat.parse(DateFormat.java:355)
>         at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFDateAdd.evaluate(GenericUDFDateAdd.java:172)
>         at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186)
>         at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
>         at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:87)
>         at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPGreaterThan.evaluate(GenericUDFOPGreaterThan.java:80)
>         at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186)
>         at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
>         at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
>         at 
> org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:108)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
>         at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:644)
> {noformat}
> Joda DateTimeFormatter can be considered for better performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to