[jira] [Commented] (HIVE-13189) Consider using Joda DateTimeFormatter instead of SimpleDateFormat in GenericUDFDateAdd
[ https://issues.apache.org/jira/browse/HIVE-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179583#comment-15179583 ] Gopal V commented on HIVE-13189: [~rajesh.balamohan]: can you add a case with a prefixed space? Not sure if that works with regular DateTime, but the indexOf() looks rather odd. > Consider using Joda DateTimeFormatter instead of SimpleDateFormat in > GenericUDFDateAdd > -- > > Key: HIVE-13189 > URL: https://issues.apache.org/jira/browse/HIVE-13189 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Rajesh Balamohan >Assignee: varun a kumar > Attachments: HIVE-13189.1.patch > > > Quite an amount was spent by tasks in trying to parse date string in > GenericUDFDateAdd. > {noformat} > java.lang.Thread.State: RUNNABLE > at java.text.DecimalFormat.subparse(DecimalFormat.java:1467) > at java.text.DecimalFormat.parse(DecimalFormat.java:1268) > at java.text.SimpleDateFormat.subParse(SimpleDateFormat.java:2088) > at java.text.SimpleDateFormat.parse(SimpleDateFormat.java:1455) > at java.text.DateFormat.parse(DateFormat.java:355) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDFDateAdd.evaluate(GenericUDFDateAdd.java:172) > at > org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) > at > org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:87) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPGreaterThan.evaluate(GenericUDFOPGreaterThan.java:80) > at > org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) > at > org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:108) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:644) > {noformat} > Joda DateTimeFormatter can be considered for better performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13189) Consider using Joda DateTimeFormatter instead of SimpleDateFormat in GenericUDFDateAdd
[ https://issues.apache.org/jira/browse/HIVE-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179584#comment-15179584 ] Gopal V commented on HIVE-13189: [~rajesh.balamohan]: can you add a case with a prefixed space? Not sure if that works with regular DateTime, but the indexOf() looks rather odd. > Consider using Joda DateTimeFormatter instead of SimpleDateFormat in > GenericUDFDateAdd > -- > > Key: HIVE-13189 > URL: https://issues.apache.org/jira/browse/HIVE-13189 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Rajesh Balamohan >Assignee: varun a kumar > Attachments: HIVE-13189.1.patch > > > Quite an amount was spent by tasks in trying to parse date string in > GenericUDFDateAdd. > {noformat} > java.lang.Thread.State: RUNNABLE > at java.text.DecimalFormat.subparse(DecimalFormat.java:1467) > at java.text.DecimalFormat.parse(DecimalFormat.java:1268) > at java.text.SimpleDateFormat.subParse(SimpleDateFormat.java:2088) > at java.text.SimpleDateFormat.parse(SimpleDateFormat.java:1455) > at java.text.DateFormat.parse(DateFormat.java:355) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDFDateAdd.evaluate(GenericUDFDateAdd.java:172) > at > org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) > at > org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:87) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPGreaterThan.evaluate(GenericUDFOPGreaterThan.java:80) > at > org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) > at > org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:108) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:644) > {noformat} > Joda DateTimeFormatter can be considered for better performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13189) Consider using Joda DateTimeFormatter instead of SimpleDateFormat in GenericUDFDateAdd
[ https://issues.apache.org/jira/browse/HIVE-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176579#comment-15176579 ] Rajesh Balamohan commented on HIVE-13189: - [~varunk44] - I have a patch which is tested with lineitem table in tpch dataset. Plz let me know if I can post it as a first cut. > Consider using Joda DateTimeFormatter instead of SimpleDateFormat in > GenericUDFDateAdd > -- > > Key: HIVE-13189 > URL: https://issues.apache.org/jira/browse/HIVE-13189 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Rajesh Balamohan >Assignee: varun a kumar > > Quite an amount was spent by tasks in trying to parse date string in > GenericUDFDateAdd. > {noformat} > java.lang.Thread.State: RUNNABLE > at java.text.DecimalFormat.subparse(DecimalFormat.java:1467) > at java.text.DecimalFormat.parse(DecimalFormat.java:1268) > at java.text.SimpleDateFormat.subParse(SimpleDateFormat.java:2088) > at java.text.SimpleDateFormat.parse(SimpleDateFormat.java:1455) > at java.text.DateFormat.parse(DateFormat.java:355) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDFDateAdd.evaluate(GenericUDFDateAdd.java:172) > at > org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) > at > org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:87) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPGreaterThan.evaluate(GenericUDFOPGreaterThan.java:80) > at > org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) > at > org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:108) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:644) > {noformat} > Joda DateTimeFormatter can be considered for better performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13189) Consider using Joda DateTimeFormatter instead of SimpleDateFormat in GenericUDFDateAdd
[ https://issues.apache.org/jira/browse/HIVE-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173975#comment-15173975 ] Gopal V commented on HIVE-13189: I suspect another added fix would be to fold the args using preferred types. {{date_add('2000-01-01', day_key);}} should not parse the date for every iteration (the impl can compute that from the Const object inspectors during initialize). > Consider using Joda DateTimeFormatter instead of SimpleDateFormat in > GenericUDFDateAdd > -- > > Key: HIVE-13189 > URL: https://issues.apache.org/jira/browse/HIVE-13189 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Rajesh Balamohan > > Quite an amount was spent by tasks in trying to parse date string in > GenericUDFDateAdd. > {noformat} > java.lang.Thread.State: RUNNABLE > at java.text.DecimalFormat.subparse(DecimalFormat.java:1467) > at java.text.DecimalFormat.parse(DecimalFormat.java:1268) > at java.text.SimpleDateFormat.subParse(SimpleDateFormat.java:2088) > at java.text.SimpleDateFormat.parse(SimpleDateFormat.java:1455) > at java.text.DateFormat.parse(DateFormat.java:355) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDFDateAdd.evaluate(GenericUDFDateAdd.java:172) > at > org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) > at > org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:87) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPGreaterThan.evaluate(GenericUDFOPGreaterThan.java:80) > at > org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) > at > org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:108) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:644) > {noformat} > Joda DateTimeFormatter can be considered for better performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13189) Consider using Joda DateTimeFormatter instead of SimpleDateFormat in GenericUDFDateAdd
[ https://issues.apache.org/jira/browse/HIVE-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173947#comment-15173947 ] Rajesh Balamohan commented on HIVE-13189: - JMH comparison of SimpleDateFormat vs Joda DateTimeFormatter {noformat} # JMH 1.11.2 (released 124 days ago, please consider updating!) # VM version: JDK 1.8.0_05, VM 25.5-b02 # VM invoker: /Library/Java/JavaVirtualMachines/jdk1.8.0_05.jdk/Contents/Home/jre/bin/java # VM options: # Warmup: 5 iterations, 10 s each # Measurement: 5 iterations, 10 s each # Timeout: 10 min per iteration # Threads: 1 thread, will synchronize iterations # Benchmark mode: Average time, time/op # Benchmark: org.apache.jmh.TestJodaVsSimpleDateFormat.testWithJodaTime # Run progress: 0.00% complete, ETA 00:03:20 # Fork: 1 of 1 # Warmup Iteration 1: 395.761 ns/op # Warmup Iteration 2: 396.304 ns/op # Warmup Iteration 3: 388.342 ns/op # Warmup Iteration 4: 407.058 ns/op # Warmup Iteration 5: 392.305 ns/op Iteration 1: 387.758 ns/op Iteration 2: 419.816 ns/op Iteration 3: 444.825 ns/op Iteration 4: 435.538 ns/op Iteration 5: 431.213 ns/op Result "testWithJodaTime": 423.830 ±(99.9%) 85.014 ns/op [Average] (min, avg, max) = (387.758, 423.830, 444.825), stdev = 22.078 CI (99.9%): [338.817, 508.844] (assumes normal distribution) # JMH 1.11.2 (released 124 days ago, please consider updating!) # VM version: JDK 1.8.0_05, VM 25.5-b02 # VM invoker: /Library/Java/JavaVirtualMachines/jdk1.8.0_05.jdk/Contents/Home/jre/bin/java # VM options: # Warmup: 5 iterations, 10 s each # Measurement: 5 iterations, 10 s each # Timeout: 10 min per iteration # Threads: 1 thread, will synchronize iterations # Benchmark mode: Average time, time/op # Benchmark: org.apache.jmh.TestJodaVsSimpleDateFormat.testWithSimpleDateFormat # Run progress: 50.00% complete, ETA 00:01:40 # Fork: 1 of 1 # Warmup Iteration 1: 847.271 ns/op # Warmup Iteration 2: 839.440 ns/op # Warmup Iteration 3: 840.931 ns/op # Warmup Iteration 4: 819.619 ns/op # Warmup Iteration 5: 838.692 ns/op Iteration 1: 845.421 ns/op Iteration 2: 857.534 ns/op Iteration 3: 857.405 ns/op Iteration 4: 810.189 ns/op Iteration 5: 808.703 ns/op Result "testWithSimpleDateFormat": 835.850 ±(99.9%) 94.750 ns/op [Average] (min, avg, max) = (808.703, 835.850, 857.534), stdev = 24.606 CI (99.9%): [741.101, 930.600] (assumes normal distribution) # Run complete. Total time: 00:03:20 BenchmarkMode CntScore Error Units TestJodaVsSimpleDateFormat.testWithJodaTime avgt5 423.830 ± 85.014 ns/op TestJodaVsSimpleDateFormat.testWithSimpleDateFormat avgt5 835.850 ± 94.750 ns/op {noformat} > Consider using Joda DateTimeFormatter instead of SimpleDateFormat in > GenericUDFDateAdd > -- > > Key: HIVE-13189 > URL: https://issues.apache.org/jira/browse/HIVE-13189 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Rajesh Balamohan > > Quite an amount was spent by tasks in trying to parse date string in > GenericUDFDateAdd. > {noformat} > java.lang.Thread.State: RUNNABLE > at java.text.DecimalFormat.subparse(DecimalFormat.java:1467) > at java.text.DecimalFormat.parse(DecimalFormat.java:1268) > at java.text.SimpleDateFormat.subParse(SimpleDateFormat.java:2088) > at java.text.SimpleDateFormat.parse(SimpleDateFormat.java:1455) > at java.text.DateFormat.parse(DateFormat.java:355) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDFDateAdd.evaluate(GenericUDFDateAdd.java:172) > at > org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) > at > org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:87) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPGreaterThan.evaluate(GenericUDFOPGreaterThan.java:80) > at > org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) > at > org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:108) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:644) > {noformat} > Joda DateTimeFormatter can be considered for bet