[ 
https://issues.apache.org/jira/browse/HIVE-13376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301613#comment-15301613
 ] 

Rui Li commented on HIVE-13376:
-------------------------------

Hi [~xuefuz], I just briefly looked at the code. Although there're switches to 
control whether to log the app state, the switches are not exposed to user via 
configurations. So I think in order to disable the logging, we either need a 
log level higher than INFO, or we can disable 
{{spark.yarn.submit.waitAppCompletion}} (only works for yarn-cluster) . 
Otherwise we need the interval to avoid the verbose state logs. Let me know if 
there's other method to achieve it.
Related code in {{Client.scala}}:
{code}
  def monitorApplication(
      appId: ApplicationId,
      returnOnRunning: Boolean = false,
      logApplicationReport: Boolean = true): (YarnApplicationState, 
FinalApplicationStatus) = {
    val interval = sparkConf.getLong("spark.yarn.report.interval", 1000)
    var lastState: YarnApplicationState = null
    while (true) {
      Thread.sleep(interval)
      val report: ApplicationReport =
        try {
          getApplicationReport(appId)
        } catch {
          case e: ApplicationNotFoundException =>
            logError(s"Application $appId not found.")
            return (YarnApplicationState.KILLED, FinalApplicationStatus.KILLED)
          case NonFatal(e) =>
            logError(s"Failed to contact YARN for application $appId.", e)
            return (YarnApplicationState.FAILED, FinalApplicationStatus.FAILED)
        }
      val state = report.getYarnApplicationState

      if (logApplicationReport) {
        logInfo(s"Application report for $appId (state: $state)")

        // If DEBUG is enabled, log report details every iteration
        // Otherwise, log them every time the application changes state
        if (log.isDebugEnabled) {
          logDebug(formatReportDetails(report))
        } else if (lastState != state) {
          logInfo(formatReportDetails(report))
        }
      }

      if (lastState != state) {
        state match {
          case YarnApplicationState.RUNNING =>
            reportLauncherState(SparkAppHandle.State.RUNNING)
          case YarnApplicationState.FINISHED =>
            reportLauncherState(SparkAppHandle.State.FINISHED)
          case YarnApplicationState.FAILED =>
            reportLauncherState(SparkAppHandle.State.FAILED)
          case YarnApplicationState.KILLED =>
            reportLauncherState(SparkAppHandle.State.KILLED)
          case _ =>
        }
      }

      if (state == YarnApplicationState.FINISHED ||
        state == YarnApplicationState.FAILED ||
        state == YarnApplicationState.KILLED) {
        cleanupStagingDir(appId)
        return (state, report.getFinalApplicationStatus)
      }

      if (returnOnRunning && state == YarnApplicationState.RUNNING) {
        return (state, report.getFinalApplicationStatus)
      }

      lastState = state
    }

    // Never reached, but keeps compiler happy
    throw new SparkException("While loop is depleted! This should never 
happen...")
  }
{code}

> HoS emits too many logs with application state
> ----------------------------------------------
>
>                 Key: HIVE-13376
>                 URL: https://issues.apache.org/jira/browse/HIVE-13376
>             Project: Hive
>          Issue Type: Improvement
>          Components: Spark
>            Reporter: Szehon Ho
>            Assignee: Szehon Ho
>             Fix For: 2.1.0
>
>         Attachments: HIVE-13376.2.patch, HIVE-13376.patch
>
>
> The logs get flooded with something like:
> > Mar 28, 3:12:21.851 PM        INFO    
> > org.apache.hive.spark.client.SparkClientImpl
> > [stderr-redir-1]: 16/03/28 15:12:21 INFO yarn.Client: Application report 
> > for application_1458679386200_0161 (state: RUNNING)
> > Mar 28, 3:12:21.912 PM        INFO    
> > org.apache.hive.spark.client.SparkClientImpl
> > [stderr-redir-1]: 16/03/28 15:12:21 INFO yarn.Client: Application report 
> > for application_1458679386200_0149 (state: RUNNING)
> > Mar 28, 3:12:22.853 PM        INFO    
> > org.apache.hive.spark.client.SparkClientImpl
> > [stderr-redir-1]: 16/03/28 15:12:22 INFO yarn.Client: Application report 
> > for application_1458679386200_0161 (state: RUNNING)
> > Mar 28, 3:12:22.913 PM        INFO    
> > org.apache.hive.spark.client.SparkClientImpl
> > [stderr-redir-1]: 16/03/28 15:12:22 INFO yarn.Client: Application report 
> > for application_1458679386200_0149 (state: RUNNING)
> > Mar 28, 3:12:23.855 PM        INFO    
> > org.apache.hive.spark.client.SparkClientImpl
> > [stderr-redir-1]: 16/03/28 15:12:23 INFO yarn.Client: Application report 
> > for application_1458679386200_0161 (state: RUNNING)
> While this is good information, it is a bit much.
> Seems like SparkJobMonitor hard-codes its interval to 1 second.  It should be 
> higher and perhaps made configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to