[ 
https://issues.apache.org/jira/browse/HIVE-20771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Clemens Valiente updated HIVE-20771:
------------------------------------
    Affects Version/s: 3.1.0
           Attachment: HIVE-20771.patch
               Status: Patch Available  (was: Open)

> LazyBinarySerDe fails on empty structs.
> ---------------------------------------
>
>                 Key: HIVE-20771
>                 URL: https://issues.apache.org/jira/browse/HIVE-20771
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 3.1.0
>            Reporter: Clemens Valiente
>            Assignee: Clemens Valiente
>            Priority: Minor
>         Attachments: HIVE-20771.patch
>
>
> {code:java}
> CREATE TABLE cvaliente.structtest AS
> SELECT named_struct();
> SHOW CREATE TABLE cvaliente.structtest;
> SELECT * FROM cvaliente.structtest ORDER BY rand();
> {code}
> The resulting schema is:
> {code:sql}
> CREATE TABLE `cvaliente.structtest`(
>   `_c0` struct<>)
> ROW FORMAT SERDE 
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
> STORED AS INPUTFORMAT 
>   'org.apache.hadoop.mapred.TextInputFormat' 
> OUTPUTFORMAT 
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'hdfs://nameservice1/user/cvaliente/cvaliente/structtest2'
> TBLPROPERTIES (
>   'COLUMN_STATS_ACCURATE'='true', 
>   'numFiles'='1',       
>   'numRows'='1', 
>   'rawDataSize'='0', 
>   'totalSize'='1',      
>   'transient_lastDdlTime'='1539781607');
> {code}
> Between the MAP and REDUCE phase hive serializes to LazyBinaryStruct and when 
> trying to read the same object back the {{SELECT}} query above fails:
> {code}
> 2018-10-17 14:32:02,298 [FATAL] [TezChild] |tez.ReduceRecordSource|: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) 
> {"key":{"reducesinkkey0":0.13508293503238622},"value":{"_col0":{}}}
>       at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:338)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:259)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:169)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:164)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
>       at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>       at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>       at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
>       at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>       at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>       at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating 
> VALUE._col0
>       at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:82)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:329)
>       ... 17 more
> Caused by: java.lang.RuntimeException: length should be positive!
>       at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryNonPrimitive.init(LazyBinaryNonPrimitive.java:54)
>       at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.init(LazyBinaryStruct.java:95)
>       at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:264)
>       at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:201)
>       at 
> org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64)
>       at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:98)
>       at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
>       at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
>       at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:77)
>       ... 18 more
> {code}
> this is because the LazyBinaryNonPrimitive doesn't allow for empty structs in 
> https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryNonPrimitive.java#L53



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to