[ 
https://issues.apache.org/jira/browse/HIVE-15138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643274#comment-15643274
 ] 

Rajesh Balamohan commented on HIVE-15138:
-----------------------------------------

\cc [~jcamachorodriguez], [~gopalv]

> String + Integer gets converted to UDFToDouble causing number format 
> exceptions
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-15138
>                 URL: https://issues.apache.org/jira/browse/HIVE-15138
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Priority: Minor
>
> TPCDS Query 72 has {{"d3.d_date > d1.d_date + 5"}} where in, d_date contains 
> data like {{2002-02-03, 2001-11-07}}. When running this query, compiler 
> converts this into UDFToDouble and causes large number of
> {{NumberFormatExceptions}} trying to convert string to double. Example Stack 
> trace is given below, which can be a good amount of perf hit filling up the 
> stack for every row, depending on the amount of data.
> {noformat}
> "TezTaskRunner" #41340 daemon prio=5 os_prio=0 tid=0x00007f7914745000 
> nid=0x9725 runnable [0x00007f787ee4a000]
>    java.lang.Thread.State: RUNNABLE
>         at java.lang.Throwable.fillInStackTrace(Native Method)
>         at java.lang.Throwable.fillInStackTrace(Throwable.java:783)
>         - locked <0x00007f804b125ab0> (a java.lang.NumberFormatException)
>         at java.lang.Throwable.<init>(Throwable.java:265)
>         at java.lang.Exception.<init>(Exception.java:66)
>         at java.lang.RuntimeException.<init>(RuntimeException.java:62)
>         at 
> java.lang.IllegalArgumentException.<init>(IllegalArgumentException.java:52)
>         at 
> java.lang.NumberFormatException.<init>(NumberFormatException.java:55)
>         at 
> sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
>         at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
>         at java.lang.Double.parseDouble(Double.java:538)
>         at 
> org.apache.hadoop.hive.ql.udf.UDFToDouble.evaluate(UDFToDouble.java:172)
>         at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:967)
>         at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:194)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.setResult(VectorUDFAdaptor.java:194)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:150)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:121)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.gen.FilterDoubleColGreaterDoubleColumn.evaluate(FilterDoubleColGreaterDoubleColumn.java:51)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(VectorFilterOperator.java:110)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:144)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:600)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerLongOperator.process(VectorMapJoinInnerLongOperator.java:386)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:600)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerLongOperator.process(VectorMapJoinInnerLongOperator.java:386)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
> {noformat}
> Simple query to reproduce this issue is given below.  It would be helpful if 
> hive gives explicit WARN messages so that end user can add explicit casts to 
> avoid such situations.
> {noformat}
> Latest Hive (master): (Check UDFToDouble for d_date field)
> ====================
> hive> explain select distinct d_date + 5 from date_dim limit 10;
> OK
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
>       DagId: rbalamohan_20161107005816_1cc412bf-c19c-45c4-b468-236e4fc8ae09:8
>       Edges:
>         Reducer 2 <- Map 1 (SIMPLE_EDGE)
>       DagName:
>       Vertices:
>         Map 1
>             Map Operator Tree:
>                 TableScan
>                   alias: date_dim
>                   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: NONE
>                   Select Operator
>                     expressions: (UDFToDouble(d_date) + 5.0) (type: double)
>                     outputColumnNames: _col0
>                     Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: NONE
>                     Group By Operator
>                       keys: _col0 (type: double)
>                       mode: hash
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: NONE
>                       Reduce Output Operator
>                         key expressions: _col0 (type: double)
>                         sort order: +
>                         Map-reduce partition columns: _col0 (type: double)
>                         Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: NONE
>                         TopN Hash Memory Usage: 0.04
>             Execution mode: vectorized, llap
>             LLAP IO: all inputs
>         Reducer 2
>             Execution mode: vectorized, llap
>             Reduce Operator Tree:
>               Group By Operator
>                 keys: KEY._col0 (type: double)
>                 mode: mergepartial
>                 outputColumnNames: _col0
>                 Statistics: Num rows: 36524 Data size: 41016452 Basic stats: 
> COMPLETE Column stats: NONE
>                 Limit
>                   Number of rows: 10
>                   Statistics: Num rows: 10 Data size: 11230 Basic stats: 
> COMPLETE Column stats: NONE
>                   File Output Operator
>                     compressed: false
>                     Statistics: Num rows: 10 Data size: 11230 Basic stats: 
> COMPLETE Column stats: NONE
>                     table:
>                         input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>                         output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>                         serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
>     Fetch Operator
>       limit: 10
>       Processor Tree:
>         ListSink
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to