[jira] [Commented] (HIVE-13189) Consider using Joda DateTimeFormatter instead of SimpleDateFormat in GenericUDFDateAdd

2016-03-04 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179583#comment-15179583
 ] 

Gopal V commented on HIVE-13189:


[~rajesh.balamohan]: can you add a case with a prefixed space? Not sure if that 
works with regular DateTime, but the indexOf() looks rather odd.

> Consider using Joda DateTimeFormatter instead of SimpleDateFormat in 
> GenericUDFDateAdd
> --
>
> Key: HIVE-13189
> URL: https://issues.apache.org/jira/browse/HIVE-13189
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Rajesh Balamohan
>Assignee: varun a kumar
> Attachments: HIVE-13189.1.patch
>
>
> Quite an amount was spent by tasks in trying to parse date string in 
> GenericUDFDateAdd.  
> {noformat}
>   java.lang.Thread.State: RUNNABLE
> at java.text.DecimalFormat.subparse(DecimalFormat.java:1467)
> at java.text.DecimalFormat.parse(DecimalFormat.java:1268)
> at java.text.SimpleDateFormat.subParse(SimpleDateFormat.java:2088)
> at java.text.SimpleDateFormat.parse(SimpleDateFormat.java:1455)
> at java.text.DateFormat.parse(DateFormat.java:355)
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFDateAdd.evaluate(GenericUDFDateAdd.java:172)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:87)
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPGreaterThan.evaluate(GenericUDFOPGreaterThan.java:80)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
> at 
> org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:108)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
> at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:644)
> {noformat}
> Joda DateTimeFormatter can be considered for better performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13189) Consider using Joda DateTimeFormatter instead of SimpleDateFormat in GenericUDFDateAdd

2016-03-04 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179584#comment-15179584
 ] 

Gopal V commented on HIVE-13189:


[~rajesh.balamohan]: can you add a case with a prefixed space? Not sure if that 
works with regular DateTime, but the indexOf() looks rather odd.

> Consider using Joda DateTimeFormatter instead of SimpleDateFormat in 
> GenericUDFDateAdd
> --
>
> Key: HIVE-13189
> URL: https://issues.apache.org/jira/browse/HIVE-13189
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Rajesh Balamohan
>Assignee: varun a kumar
> Attachments: HIVE-13189.1.patch
>
>
> Quite an amount was spent by tasks in trying to parse date string in 
> GenericUDFDateAdd.  
> {noformat}
>   java.lang.Thread.State: RUNNABLE
> at java.text.DecimalFormat.subparse(DecimalFormat.java:1467)
> at java.text.DecimalFormat.parse(DecimalFormat.java:1268)
> at java.text.SimpleDateFormat.subParse(SimpleDateFormat.java:2088)
> at java.text.SimpleDateFormat.parse(SimpleDateFormat.java:1455)
> at java.text.DateFormat.parse(DateFormat.java:355)
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFDateAdd.evaluate(GenericUDFDateAdd.java:172)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:87)
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPGreaterThan.evaluate(GenericUDFOPGreaterThan.java:80)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
> at 
> org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:108)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
> at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:644)
> {noformat}
> Joda DateTimeFormatter can be considered for better performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13189) Consider using Joda DateTimeFormatter instead of SimpleDateFormat in GenericUDFDateAdd

2016-03-02 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176579#comment-15176579
 ] 

Rajesh Balamohan commented on HIVE-13189:
-

[~varunk44] - I have a patch which is tested with lineitem table in tpch 
dataset.  Plz let me know if I can post it as a first cut.

> Consider using Joda DateTimeFormatter instead of SimpleDateFormat in 
> GenericUDFDateAdd
> --
>
> Key: HIVE-13189
> URL: https://issues.apache.org/jira/browse/HIVE-13189
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Rajesh Balamohan
>Assignee: varun a kumar
>
> Quite an amount was spent by tasks in trying to parse date string in 
> GenericUDFDateAdd.  
> {noformat}
>   java.lang.Thread.State: RUNNABLE
> at java.text.DecimalFormat.subparse(DecimalFormat.java:1467)
> at java.text.DecimalFormat.parse(DecimalFormat.java:1268)
> at java.text.SimpleDateFormat.subParse(SimpleDateFormat.java:2088)
> at java.text.SimpleDateFormat.parse(SimpleDateFormat.java:1455)
> at java.text.DateFormat.parse(DateFormat.java:355)
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFDateAdd.evaluate(GenericUDFDateAdd.java:172)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:87)
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPGreaterThan.evaluate(GenericUDFOPGreaterThan.java:80)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
> at 
> org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:108)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
> at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:644)
> {noformat}
> Joda DateTimeFormatter can be considered for better performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13189) Consider using Joda DateTimeFormatter instead of SimpleDateFormat in GenericUDFDateAdd

2016-03-01 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173975#comment-15173975
 ] 

Gopal V commented on HIVE-13189:


I suspect another added fix would be to fold the args using preferred types.

{{date_add('2000-01-01', day_key);}} should not parse the date for every 
iteration (the impl can compute that from the Const object inspectors during 
initialize).

> Consider using Joda DateTimeFormatter instead of SimpleDateFormat in 
> GenericUDFDateAdd
> --
>
> Key: HIVE-13189
> URL: https://issues.apache.org/jira/browse/HIVE-13189
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Rajesh Balamohan
>
> Quite an amount was spent by tasks in trying to parse date string in 
> GenericUDFDateAdd.  
> {noformat}
>   java.lang.Thread.State: RUNNABLE
> at java.text.DecimalFormat.subparse(DecimalFormat.java:1467)
> at java.text.DecimalFormat.parse(DecimalFormat.java:1268)
> at java.text.SimpleDateFormat.subParse(SimpleDateFormat.java:2088)
> at java.text.SimpleDateFormat.parse(SimpleDateFormat.java:1455)
> at java.text.DateFormat.parse(DateFormat.java:355)
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFDateAdd.evaluate(GenericUDFDateAdd.java:172)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:87)
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPGreaterThan.evaluate(GenericUDFOPGreaterThan.java:80)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
> at 
> org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:108)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
> at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:644)
> {noformat}
> Joda DateTimeFormatter can be considered for better performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13189) Consider using Joda DateTimeFormatter instead of SimpleDateFormat in GenericUDFDateAdd

2016-03-01 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173947#comment-15173947
 ] 

Rajesh Balamohan commented on HIVE-13189:
-

JMH comparison of SimpleDateFormat vs Joda DateTimeFormatter

{noformat}
# JMH 1.11.2 (released 124 days ago, please consider updating!)
# VM version: JDK 1.8.0_05, VM 25.5-b02
# VM invoker: 
/Library/Java/JavaVirtualMachines/jdk1.8.0_05.jdk/Contents/Home/jre/bin/java
# VM options: 
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.apache.jmh.TestJodaVsSimpleDateFormat.testWithJodaTime

# Run progress: 0.00% complete, ETA 00:03:20
# Fork: 1 of 1
# Warmup Iteration   1: 395.761 ns/op
# Warmup Iteration   2: 396.304 ns/op
# Warmup Iteration   3: 388.342 ns/op
# Warmup Iteration   4: 407.058 ns/op
# Warmup Iteration   5: 392.305 ns/op
Iteration   1: 387.758 ns/op
Iteration   2: 419.816 ns/op
Iteration   3: 444.825 ns/op
Iteration   4: 435.538 ns/op
Iteration   5: 431.213 ns/op


Result "testWithJodaTime":
  423.830 ±(99.9%) 85.014 ns/op [Average]
  (min, avg, max) = (387.758, 423.830, 444.825), stdev = 22.078
  CI (99.9%): [338.817, 508.844] (assumes normal distribution)


# JMH 1.11.2 (released 124 days ago, please consider updating!)
# VM version: JDK 1.8.0_05, VM 25.5-b02
# VM invoker: 
/Library/Java/JavaVirtualMachines/jdk1.8.0_05.jdk/Contents/Home/jre/bin/java
# VM options: 
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.apache.jmh.TestJodaVsSimpleDateFormat.testWithSimpleDateFormat

# Run progress: 50.00% complete, ETA 00:01:40
# Fork: 1 of 1
# Warmup Iteration   1: 847.271 ns/op
# Warmup Iteration   2: 839.440 ns/op
# Warmup Iteration   3: 840.931 ns/op
# Warmup Iteration   4: 819.619 ns/op
# Warmup Iteration   5: 838.692 ns/op
Iteration   1: 845.421 ns/op
Iteration   2: 857.534 ns/op
Iteration   3: 857.405 ns/op
Iteration   4: 810.189 ns/op
Iteration   5: 808.703 ns/op


Result "testWithSimpleDateFormat":
  835.850 ±(99.9%) 94.750 ns/op [Average]
  (min, avg, max) = (808.703, 835.850, 857.534), stdev = 24.606
  CI (99.9%): [741.101, 930.600] (assumes normal distribution)


# Run complete. Total time: 00:03:20

BenchmarkMode  CntScore
Error  Units
TestJodaVsSimpleDateFormat.testWithJodaTime  avgt5  423.830 ± 
85.014  ns/op
TestJodaVsSimpleDateFormat.testWithSimpleDateFormat  avgt5  835.850 ± 
94.750  ns/op
{noformat}

> Consider using Joda DateTimeFormatter instead of SimpleDateFormat in 
> GenericUDFDateAdd
> --
>
> Key: HIVE-13189
> URL: https://issues.apache.org/jira/browse/HIVE-13189
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Rajesh Balamohan
>
> Quite an amount was spent by tasks in trying to parse date string in 
> GenericUDFDateAdd.  
> {noformat}
>   java.lang.Thread.State: RUNNABLE
> at java.text.DecimalFormat.subparse(DecimalFormat.java:1467)
> at java.text.DecimalFormat.parse(DecimalFormat.java:1268)
> at java.text.SimpleDateFormat.subParse(SimpleDateFormat.java:2088)
> at java.text.SimpleDateFormat.parse(SimpleDateFormat.java:1455)
> at java.text.DateFormat.parse(DateFormat.java:355)
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFDateAdd.evaluate(GenericUDFDateAdd.java:172)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:87)
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPGreaterThan.evaluate(GenericUDFOPGreaterThan.java:80)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
> at 
> org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:108)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
> at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:644)
> {noformat}
> Joda DateTimeFormatter can be considered for bet