[ https://issues.apache.org/jira/browse/HIVE-13376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301613#comment-15301613 ]
Rui Li commented on HIVE-13376: ------------------------------- Hi [~xuefuz], I just briefly looked at the code. Although there're switches to control whether to log the app state, the switches are not exposed to user via configurations. So I think in order to disable the logging, we either need a log level higher than INFO, or we can disable {{spark.yarn.submit.waitAppCompletion}} (only works for yarn-cluster) . Otherwise we need the interval to avoid the verbose state logs. Let me know if there's other method to achieve it. Related code in {{Client.scala}}: {code} def monitorApplication( appId: ApplicationId, returnOnRunning: Boolean = false, logApplicationReport: Boolean = true): (YarnApplicationState, FinalApplicationStatus) = { val interval = sparkConf.getLong("spark.yarn.report.interval", 1000) var lastState: YarnApplicationState = null while (true) { Thread.sleep(interval) val report: ApplicationReport = try { getApplicationReport(appId) } catch { case e: ApplicationNotFoundException => logError(s"Application $appId not found.") return (YarnApplicationState.KILLED, FinalApplicationStatus.KILLED) case NonFatal(e) => logError(s"Failed to contact YARN for application $appId.", e) return (YarnApplicationState.FAILED, FinalApplicationStatus.FAILED) } val state = report.getYarnApplicationState if (logApplicationReport) { logInfo(s"Application report for $appId (state: $state)") // If DEBUG is enabled, log report details every iteration // Otherwise, log them every time the application changes state if (log.isDebugEnabled) { logDebug(formatReportDetails(report)) } else if (lastState != state) { logInfo(formatReportDetails(report)) } } if (lastState != state) { state match { case YarnApplicationState.RUNNING => reportLauncherState(SparkAppHandle.State.RUNNING) case YarnApplicationState.FINISHED => reportLauncherState(SparkAppHandle.State.FINISHED) case YarnApplicationState.FAILED => reportLauncherState(SparkAppHandle.State.FAILED) case YarnApplicationState.KILLED => reportLauncherState(SparkAppHandle.State.KILLED) case _ => } } if (state == YarnApplicationState.FINISHED || state == YarnApplicationState.FAILED || state == YarnApplicationState.KILLED) { cleanupStagingDir(appId) return (state, report.getFinalApplicationStatus) } if (returnOnRunning && state == YarnApplicationState.RUNNING) { return (state, report.getFinalApplicationStatus) } lastState = state } // Never reached, but keeps compiler happy throw new SparkException("While loop is depleted! This should never happen...") } {code} > HoS emits too many logs with application state > ---------------------------------------------- > > Key: HIVE-13376 > URL: https://issues.apache.org/jira/browse/HIVE-13376 > Project: Hive > Issue Type: Improvement > Components: Spark > Reporter: Szehon Ho > Assignee: Szehon Ho > Fix For: 2.1.0 > > Attachments: HIVE-13376.2.patch, HIVE-13376.patch > > > The logs get flooded with something like: > > Mar 28, 3:12:21.851 PM INFO > > org.apache.hive.spark.client.SparkClientImpl > > [stderr-redir-1]: 16/03/28 15:12:21 INFO yarn.Client: Application report > > for application_1458679386200_0161 (state: RUNNING) > > Mar 28, 3:12:21.912 PM INFO > > org.apache.hive.spark.client.SparkClientImpl > > [stderr-redir-1]: 16/03/28 15:12:21 INFO yarn.Client: Application report > > for application_1458679386200_0149 (state: RUNNING) > > Mar 28, 3:12:22.853 PM INFO > > org.apache.hive.spark.client.SparkClientImpl > > [stderr-redir-1]: 16/03/28 15:12:22 INFO yarn.Client: Application report > > for application_1458679386200_0161 (state: RUNNING) > > Mar 28, 3:12:22.913 PM INFO > > org.apache.hive.spark.client.SparkClientImpl > > [stderr-redir-1]: 16/03/28 15:12:22 INFO yarn.Client: Application report > > for application_1458679386200_0149 (state: RUNNING) > > Mar 28, 3:12:23.855 PM INFO > > org.apache.hive.spark.client.SparkClientImpl > > [stderr-redir-1]: 16/03/28 15:12:23 INFO yarn.Client: Application report > > for application_1458679386200_0161 (state: RUNNING) > While this is good information, it is a bit much. > Seems like SparkJobMonitor hard-codes its interval to 1 second. It should be > higher and perhaps made configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)