[ 
https://issues.apache.org/jira/browse/HIVE-23010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Klemke updated HIVE-23010:
------------------------------------
    Description: 
When executing a query in Hive that runs a filesink, mergejoin and two group by 
operators in a single reduce vertex (reducer 2 in [^simplified-explain.txt]), 
the following exception occurs non-deterministically:
{code:java}
java.lang.RuntimeException: java.lang.IllegalStateException: Was expecting 
dummy store operator but found: FS[17]
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
        at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
        at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Was expecting dummy store operator 
but found: FS[17]
        at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.getJoinParentOp(ReduceRecordProcessor.java:421)
        at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.getJoinParentOp(ReduceRecordProcessor.java:425)
        at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.getJoinParentOp(ReduceRecordProcessor.java:425)
        at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.getJoinParentOp(ReduceRecordProcessor.java:425)
        at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:148)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
        ... 16 more
{code}
Looking at Yarn logs, IllegalStateException occurs in a container if and only if
 * the container has been running a task attempt of "Reducer 2" successfully 
before
 * the container is then being reused for another task attempt of the same 
"Reducer 2" vertex

The same query runs fine with tez.am.container.reuse.enabled=false.

Apparently, this error occurs deterministically within a container that is 
being reused for multiple task attempts of the same reduce vertex.

We have not been able to reproduce this error deterministically or with a 
smaller execution plan due low probability of container reuse for same vertex.

  was:
When executing a query in Hive that runs a filesink, mergejoin and two group by 
operators in a single reduce vertex (reducer 2 in [^simplified-explain.txt]), 
the following exception occurs non-deterministically:
{code:java}
java.lang.RuntimeException: java.lang.IllegalStateException: Was expecting 
dummy store operator but found: FS[17]
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
        at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
        at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Was expecting dummy store operator 
but found: FS[17]
        at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.getJoinParentOp(ReduceRecordProcessor.java:421)
        at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.getJoinParentOp(ReduceRecordProcessor.java:425)
        at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.getJoinParentOp(ReduceRecordProcessor.java:425)
        at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.getJoinParentOp(ReduceRecordProcessor.java:425)
        at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:148)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
        ... 16 more
{code}
Looking at Yarn logs, IllegalStateException occurs in a container if and only if
 * the container has been running a task attempt of "Reducer 2" successfully 
before
 * the container is then being reused for another task attempt of the same 
"Reducer 2" vertex

The same query runs fine with tez.am.container.reuse.enabled=false.

Apparently, this error occurs deterministically within a container that is 
being reused for multiple task attempts of the same reduce vertex.

We have not been able to reproduce this error deterministically or with a 
smaller execution plan.


> IllegalStateException in tez.ReduceRecordProcessor when containers are being 
> reused
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-23010
>                 URL: https://issues.apache.org/jira/browse/HIVE-23010
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 3.1.0
>            Reporter: Sebastian Klemke
>            Priority: Major
>         Attachments: simplified-explain.txt
>
>
> When executing a query in Hive that runs a filesink, mergejoin and two group 
> by operators in a single reduce vertex (reducer 2 in 
> [^simplified-explain.txt]), the following exception occurs 
> non-deterministically:
> {code:java}
> java.lang.RuntimeException: java.lang.IllegalStateException: Was expecting 
> dummy store operator but found: FS[17]
>         at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>         at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>         at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>         at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>         at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>         at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>         at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>         at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>         at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
>         at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
>         at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IllegalStateException: Was expecting dummy store 
> operator but found: FS[17]
>         at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.getJoinParentOp(ReduceRecordProcessor.java:421)
>         at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.getJoinParentOp(ReduceRecordProcessor.java:425)
>         at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.getJoinParentOp(ReduceRecordProcessor.java:425)
>         at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.getJoinParentOp(ReduceRecordProcessor.java:425)
>         at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:148)
>         at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
>         ... 16 more
> {code}
> Looking at Yarn logs, IllegalStateException occurs in a container if and only 
> if
>  * the container has been running a task attempt of "Reducer 2" successfully 
> before
>  * the container is then being reused for another task attempt of the same 
> "Reducer 2" vertex
> The same query runs fine with tez.am.container.reuse.enabled=false.
> Apparently, this error occurs deterministically within a container that is 
> being reused for multiple task attempts of the same reduce vertex.
> We have not been able to reproduce this error deterministically or with a 
> smaller execution plan due low probability of container reuse for same vertex.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to