[ https://issues.apache.org/jira/browse/SPARK-33669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Su Qilong updated SPARK-33669: ------------------------------ Description: For YarnClient mode, when stopping YarnClientSchedulerBackend, it first tries to interrupt Yarn application monitor thread. In MonitorThread.run() it catches InterruptedException to gracefully response to stopping request. But client.monitorApplication method also throws InterruptedIOException when the hadoop rpc call is calling. In this case, MonitorThread will not know it is interrupted, a Yarn App failed is returned with "Failed to contact YARN for application xxxxx; YARN application has exited unexpectedly with state xxxxx" is logged with error level. which confuse user a lot. We Should take considerate InterruptedIOException here to make it the same behavior with InterruptedException. {code:java} // code placeholder private class MonitorThread extends Thread { private var allowInterrupt = true override def run() { try { val YarnAppReport(_, state, diags) = client.monitorApplication(appId.get, logApplicationReport = false) logError(s"YARN application has exited unexpectedly with state $state! " + "Check the YARN application logs for more details.") diags.foreach { err => logError(s"Diagnostics message: $err") } allowInterrupt = false sc.stop() } catch { case e: InterruptedException => logInfo("Interrupting monitor thread") } } {code} was: For YarnClient mode, when stopping YarnClientSchedulerBackend, it first tries to interrupt Yarn application monitor thread. In MonitorThread.run() it catches InterruptedException to gracefully response to stopping request. But client.monitorApplication method also throws InterruptedIOException when the hadoop rpc call is calling. In this case, MonitorThread will not know it is interrupted, a Yarn App failed is returned with "Failed to contact YARN for application xxxxx; YARN application has exited unexpectedly with state xxxxx" is logged with error level. which confuse user a lot. We Should take considerate InterruptedIOException here to make it the same behavior with InterruptedException. > Wrong error message from YARN application state monitor when sc.stop in yarn > client mode > ---------------------------------------------------------------------------------------- > > Key: SPARK-33669 > URL: https://issues.apache.org/jira/browse/SPARK-33669 > Project: Spark > Issue Type: Bug > Components: YARN > Affects Versions: 2.4.3, 3.0.1 > Reporter: Su Qilong > Priority: Minor > > For YarnClient mode, when stopping YarnClientSchedulerBackend, it first tries > to interrupt Yarn application monitor thread. In MonitorThread.run() it > catches InterruptedException to gracefully response to stopping request. > But client.monitorApplication method also throws InterruptedIOException when > the hadoop rpc call is calling. In this case, MonitorThread will not know it > is interrupted, a Yarn App failed is returned with "Failed to contact YARN > for application xxxxx; YARN application has exited unexpectedly with state > xxxxx" is logged with error level. which confuse user a lot. > We Should take considerate InterruptedIOException here to make it the same > behavior with InterruptedException. > {code:java} > // code placeholder > private class MonitorThread extends Thread { > private var allowInterrupt = true > override def run() { > try { > val YarnAppReport(_, state, diags) = > client.monitorApplication(appId.get, logApplicationReport = false) > logError(s"YARN application has exited unexpectedly with state $state! > " + > "Check the YARN application logs for more details.") > diags.foreach { err => > logError(s"Diagnostics message: $err") > } > allowInterrupt = false > sc.stop() > } catch { > case e: InterruptedException => logInfo("Interrupting monitor thread") > } > } > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org