[jira] [Created] (YARN-10851) Tez session close does not interrupt yarn's async thread

Qihong Wu (Jira) Wed, 07 Jul 2021 15:25:09 -0700

Qihong Wu created YARN-10851:
--------------------------------

             Summary: Tez session close does not interrupt yarn's async thread
                 Key: YARN-10851
                 URL: https://issues.apache.org/jira/browse/YARN-10851
             Project: Hadoop YARN
          Issue Type: Bug
    Affects Versions: 2.10.1, 2.8.5
         Environment: On an HA cluster, where RM1 is not the active RM
Yarn of version 2.8.5 and is configured with Tez
            Reporter: Qihong Wu
         Attachments: hive.log


Hi, I want to ask for the expertise knowledge on the yarn behavior when 
handling `InterruptedIOException`. 

The issue occurs on a HA cluster, where RM1 is NOT the active RM. Therefore, if 
the yarn request made to RM1 failed, the RM failover should happen. However, if 
an interrupted exception is thrown when connecting to RM1, the thread should 
try to [bail 
out|https://dzone.com/articles/how-to-handle-the-interruptedexception] as soon 
as possible to [respect interrupt 
request|https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ExecutorService.html#shutdownNow--],
 rather than moving on to another RM.

But I found my application (hive) after throwing `InterruptedIOException` when 
trying to connect with RM1 failed, continuing to RM2. I want to know how does 
yarn handle InterruptedIOException, shouldn't the async thread gets interrupted 
and shutdown when tez close() triggered interrupt request?



*The reproduction step is:*
 1. In an HA cluster which uses yarn of version 2.8.5 and is configured with Tez
 2. Make sure RM1 is not the active RM by checking `yarn rmadmin 
-getAllServiceState`. It it is, manually [transition RM2 as active 
RM|https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html#Admin_commands].
 3. Apply failover-retry properties to yarn-site.xml 
{quote}<property>
 <name>yarn.client.failover-retries</name>
 <value>4</value>
 </property>
 <property>
 <name>yarn.client.failover-retries-on-socket-timeouts</name>
 <value>4</value>
 </property>
 <property>
 <name>yarn.client.failover-max-attempts</name>
 <value>4</value>
 </property>
{quote}
4. Run a simple application to yarn-client (for example, a simple hive DDL 
command)
{quote}hive --hiveconf hive.root.logger=TRACE,console -e "create table tez_test 
(id int, name string);"
{quote}
5. Find from application's log (for example, hive.log), you can find 
`RetryInvocationHandler` has captured the `InterruptedIOException` when request 
was talking over rm1, but the thread didn't bail out immediately, but continue 
moving to rm2.



*More information:*
The interrupted exception is triggered via via 
[TezSessionState#close|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java#L689]
 and 
[Future#cancel|https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/Future.html#cancel-boolean-].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-10851) Tez session close does not interrupt yarn's async thread

Reply via email to