[ https://issues.apache.org/jira/browse/HIVE-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rajesh Balamohan updated HIVE-13189: ------------------------------------ Attachment: HIVE-13189.1.patch At TPCH-1 TB, runtime with patch drops from 244 seconds to 169 seconds. Without Patch {noformat} create temporary table x as select l_receiptdate, date_add(to_date(l_receiptdate), 3) from lineitem; Status: Running (Executing on YARN cluster with App id application_1456147314798_24782) ---------------------------------------------------------------------------------------------- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED ---------------------------------------------------------------------------------------------- Map 1 .......... container SUCCEEDED 262 262 0 0 0 0 ---------------------------------------------------------------------------------------------- VERTICES: 01/01 [==========================>>] 100% ELAPSED TIME: 244.05 s ---------------------------------------------------------------------------------------------- Status: DAG finished successfully in 244.05 seconds METHOD DURATION(ms) parse 4 semanticAnalyze 920 TezBuildDag 328 TezSubmitToRunningDag 410 TotalPrepTime 2,168 VERTICES TOTAL_TASKS FAILED_ATTEMPTS KILLED_TASKS DURATION_SECONDS CPU_TIME_MILLIS GC_TIME_MILLIS INPUT_RECORDS OUTPUT_RECORDS Map 1 262 0 0 240.99 31,358,960 306,167 5,999,989,709 0 {noformat} With Patch: {noformat} Status: Running (Executing on YARN cluster with App id application_1456147314798_24788) ---------------------------------------------------------------------------------------------- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED ---------------------------------------------------------------------------------------------- Map 1 .......... container SUCCEEDED 262 262 0 0 0 0 ---------------------------------------------------------------------------------------------- VERTICES: 01/01 [==========================>>] 100% ELAPSED TIME: 169.15 s ---------------------------------------------------------------------------------------------- Status: DAG finished successfully in 169.15 seconds METHOD DURATION(ms) parse 24 semanticAnalyze 1,545 TezBuildDag 242 TezSubmitToRunningDag 258 TotalPrepTime 2,768 VERTICES TOTAL_TASKS FAILED_ATTEMPTS KILLED_TASKS DURATION_SECONDS CPU_TIME_MILLIS GC_TIME_MILLIS INPUT_RECORDS OUTPUT_RECORDS Map 1 262 0 0 166.24 21,189,670 158,159 5,999,989,709 0 {noformat} If the approach is fine, this can be extended to datediff as well. > Consider using Joda DateTimeFormatter instead of SimpleDateFormat in > GenericUDFDateAdd > -------------------------------------------------------------------------------------- > > Key: HIVE-13189 > URL: https://issues.apache.org/jira/browse/HIVE-13189 > Project: Hive > Issue Type: Improvement > Components: Hive > Reporter: Rajesh Balamohan > Assignee: varun a kumar > Attachments: HIVE-13189.1.patch > > > Quite an amount was spent by tasks in trying to parse date string in > GenericUDFDateAdd. > {noformat} > java.lang.Thread.State: RUNNABLE > at java.text.DecimalFormat.subparse(DecimalFormat.java:1467) > at java.text.DecimalFormat.parse(DecimalFormat.java:1268) > at java.text.SimpleDateFormat.subParse(SimpleDateFormat.java:2088) > at java.text.SimpleDateFormat.parse(SimpleDateFormat.java:1455) > at java.text.DateFormat.parse(DateFormat.java:355) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDFDateAdd.evaluate(GenericUDFDateAdd.java:172) > at > org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) > at > org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:87) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPGreaterThan.evaluate(GenericUDFOPGreaterThan.java:80) > at > org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) > at > org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:108) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:644) > {noformat} > Joda DateTimeFormatter can be considered for better performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)