[jira] [Updated] (HIVE-20990) ORC, group by, case: java.lang.AssertionError: Output column number expected to be 0 when isRepeating

Yi Zhang (Jira) Mon, 05 Feb 2024 14:38:04 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-20990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Yi Zhang updated HIVE-20990:
----------------------------
    Labels: duplicate  (was: )

> ORC, group by, case: java.lang.AssertionError: Output column number expected 
> to be 0 when isRepeating
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-20990
>                 URL: https://issues.apache.org/jira/browse/HIVE-20990
>             Project: Hive
>          Issue Type: Bug
>          Components: Vectorization
>    Affects Versions: 3.0.0
>         Environment: * Hive 3.0.0 from hdp 3.0.1.
>  * centos 7
>  * 5 datanodes, 2 masters
>  
>            Reporter: Guillaume
>            Priority: Major
>              Labels: duplicate
>
> Run this to replicate:
> {code:sql}
> drop table if exists ds;
> create table ds stored as orc as select
>   inline(array(
>     struct('gmail.com'), -- when branch of the case statement
>     struct('apache.org') -- else branch of the case statement
>   )) as (domain)
> ;
> select
>   case
>     when domain='gmail.com' then 'gmail'
>     else coalesce(domain, 'other')
>   end as domaingroup
> from ds
> group by 1 -- useless (datawise) for this example, but triggers the bug.
> ;
> {code}
> Exceptions out with:  
> {noformat}
> java.lang.RuntimeException: java.lang.AssertionError: Output column number 
> expected to be 0 when isRepeating
> {noformat}
>  
> Full exception is shown below.
> Of interest:
>  * if the case is removed (eg. replaced only by the else clause), the query 
> works,
>  * if only the else clause from the case matches, the query works,
>  * replacing the case by a bunch of nested if does not change anything,
>  * removing the group by and the query works,
>  * replacing the table (ds) by a CTE and the query works.
> Workaround, at the cost of performance: 
> {code:java}
> set hive.vectorized.execution.enabled = false; 
> {code}
>  
> Full exception:
> {noformat}
> ERROR : FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1543326624484_8258_45_00, diagnostics=[Task failed, 
> taskId=task_1543326624484_8258_45_00_000000, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> java.lang.RuntimeException: java.lang.AssertionError: Output column number 
> expected to be 0 when isRepeating
>       at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>       at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>       at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>       at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>       at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>       at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>       at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>       at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>       at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>       at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.AssertionError: Output column number expected to be 0 
> when isRepeating
>       at 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:492)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.IfExprColumnCondExpr.evaluate(IfExprColumnCondExpr.java:117)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
>       at 
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:965)
>       at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938)
>       at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:812)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:845)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92)
>       ... 19 more
> , errorMessage=Cannot recover from this error:java.lang.RuntimeException: 
> java.lang.AssertionError: Output column number expected to be 0 when 
> isRepeating
>       at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>       at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>       at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>       at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>       at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>       at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>       at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>       at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>       at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>       at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.AssertionError: Output column number expected to be 0 
> when isRepeating
>       at 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:492)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.IfExprColumnCondExpr.evaluate(IfExprColumnCondExpr.java:117)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
>       at 
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:965)
>       at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938)
>       at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:812)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:845)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92)
>       ... 19 more
> ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 
> killedTasks:0, Vertex vertex_1543326624484_8258_45_00 [Map 1] killed/failed 
> due to:OWN_TASK_FAILURE]Vertex killed, vertexName=Reducer 2, 
> vertexId=vertex_1543326624484_8258_45_01, diagnostics=[Vertex received Kill 
> while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, 
> failedTasks:0 killedTasks:2, Vertex vertex_1543326624484_8258_45_01 [Reducer 
> 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to 
> VERTEX_FAILURE. failedVertices:1 killedVertices:1
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-20990) ORC, group by, case: java.lang.AssertionError: Output column number expected to be 0 when isRepeating

Reply via email to