[ https://issues.apache.org/jira/browse/HIVE-15138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643274#comment-15643274 ]
Rajesh Balamohan commented on HIVE-15138: ----------------------------------------- \cc [~jcamachorodriguez], [~gopalv] > String + Integer gets converted to UDFToDouble causing number format > exceptions > ------------------------------------------------------------------------------- > > Key: HIVE-15138 > URL: https://issues.apache.org/jira/browse/HIVE-15138 > Project: Hive > Issue Type: Improvement > Reporter: Rajesh Balamohan > Priority: Minor > > TPCDS Query 72 has {{"d3.d_date > d1.d_date + 5"}} where in, d_date contains > data like {{2002-02-03, 2001-11-07}}. When running this query, compiler > converts this into UDFToDouble and causes large number of > {{NumberFormatExceptions}} trying to convert string to double. Example Stack > trace is given below, which can be a good amount of perf hit filling up the > stack for every row, depending on the amount of data. > {noformat} > "TezTaskRunner" #41340 daemon prio=5 os_prio=0 tid=0x00007f7914745000 > nid=0x9725 runnable [0x00007f787ee4a000] > java.lang.Thread.State: RUNNABLE > at java.lang.Throwable.fillInStackTrace(Native Method) > at java.lang.Throwable.fillInStackTrace(Throwable.java:783) > - locked <0x00007f804b125ab0> (a java.lang.NumberFormatException) > at java.lang.Throwable.<init>(Throwable.java:265) > at java.lang.Exception.<init>(Exception.java:66) > at java.lang.RuntimeException.<init>(RuntimeException.java:62) > at > java.lang.IllegalArgumentException.<init>(IllegalArgumentException.java:52) > at > java.lang.NumberFormatException.<init>(NumberFormatException.java:55) > at > sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043) > at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110) > at java.lang.Double.parseDouble(Double.java:538) > at > org.apache.hadoop.hive.ql.udf.UDFToDouble.evaluate(UDFToDouble.java:172) > at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:967) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:194) > at > org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.setResult(VectorUDFAdaptor.java:194) > at > org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:150) > at > org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:121) > at > org.apache.hadoop.hive.ql.exec.vector.expressions.gen.FilterDoubleColGreaterDoubleColumn.evaluate(FilterDoubleColGreaterDoubleColumn.java:51) > at > org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(VectorFilterOperator.java:110) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879) > at > org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:144) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:600) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerLongOperator.process(VectorMapJoinInnerLongOperator.java:386) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:600) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerLongOperator.process(VectorMapJoinInnerLongOperator.java:386) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879) > {noformat} > Simple query to reproduce this issue is given below. It would be helpful if > hive gives explicit WARN messages so that end user can add explicit casts to > avoid such situations. > {noformat} > Latest Hive (master): (Check UDFToDouble for d_date field) > ==================== > hive> explain select distinct d_date + 5 from date_dim limit 10; > OK > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > DagId: rbalamohan_20161107005816_1cc412bf-c19c-45c4-b468-236e4fc8ae09:8 > Edges: > Reducer 2 <- Map 1 (SIMPLE_EDGE) > DagName: > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: date_dim > Statistics: Num rows: 73049 Data size: 82034027 Basic > stats: COMPLETE Column stats: NONE > Select Operator > expressions: (UDFToDouble(d_date) + 5.0) (type: double) > outputColumnNames: _col0 > Statistics: Num rows: 73049 Data size: 82034027 Basic > stats: COMPLETE Column stats: NONE > Group By Operator > keys: _col0 (type: double) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 73049 Data size: 82034027 Basic > stats: COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: _col0 (type: double) > sort order: + > Map-reduce partition columns: _col0 (type: double) > Statistics: Num rows: 73049 Data size: 82034027 Basic > stats: COMPLETE Column stats: NONE > TopN Hash Memory Usage: 0.04 > Execution mode: vectorized, llap > LLAP IO: all inputs > Reducer 2 > Execution mode: vectorized, llap > Reduce Operator Tree: > Group By Operator > keys: KEY._col0 (type: double) > mode: mergepartial > outputColumnNames: _col0 > Statistics: Num rows: 36524 Data size: 41016452 Basic stats: > COMPLETE Column stats: NONE > Limit > Number of rows: 10 > Statistics: Num rows: 10 Data size: 11230 Basic stats: > COMPLETE Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 10 Data size: 11230 Basic stats: > COMPLETE Column stats: NONE > table: > input format: > org.apache.hadoop.mapred.SequenceFileInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Stage: Stage-0 > Fetch Operator > limit: 10 > Processor Tree: > ListSink > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)