[ 
https://issues.apache.org/jira/browse/HIVE-13376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15303379#comment-15303379
 ] 

Rui Li commented on HIVE-13376:
-------------------------------

[~xuefuz], [~szehon] - I just did more tests about this and want to correct 
some of my previous comments:
# In yarn-cluster mode, {{SparkSubmit}} runs the {{Client}}. The Client keeps 
checking the app state and printing the logs. On hive side, we read from 
SparkSubmit's input and err streams and print to hive log.
# In yarn-client mode, {{SparkSubmit}} runs our {{RemoteDriver}}. RemoteDirver 
waits for the app to start running and then serves the job requests from hive. 
It doesn't report the app state after that.
# The verbose logging only happens with yarn-cluster mode.
# The long interval only affects yarn-client mode.
# To avoid the state reports in yarn-cluster mode, we can change log level 
(e.g. WARN instead of INFO), or we can set 
{{spark.yarn.submit.waitAppCompletion=false}} and {{SparkSubmit}} will 
terminate after it submits the app to RM.

I'd prefer disabling {{spark.yarn.submit.waitAppCompletion}}, if it doesn't 
cause any other trouble.

> HoS emits too many logs with application state
> ----------------------------------------------
>
>                 Key: HIVE-13376
>                 URL: https://issues.apache.org/jira/browse/HIVE-13376
>             Project: Hive
>          Issue Type: Improvement
>          Components: Spark
>            Reporter: Szehon Ho
>            Assignee: Szehon Ho
>             Fix For: 2.1.0
>
>         Attachments: HIVE-13376.2.patch, HIVE-13376.patch
>
>
> The logs get flooded with something like:
> > Mar 28, 3:12:21.851 PM        INFO    
> > org.apache.hive.spark.client.SparkClientImpl
> > [stderr-redir-1]: 16/03/28 15:12:21 INFO yarn.Client: Application report 
> > for application_1458679386200_0161 (state: RUNNING)
> > Mar 28, 3:12:21.912 PM        INFO    
> > org.apache.hive.spark.client.SparkClientImpl
> > [stderr-redir-1]: 16/03/28 15:12:21 INFO yarn.Client: Application report 
> > for application_1458679386200_0149 (state: RUNNING)
> > Mar 28, 3:12:22.853 PM        INFO    
> > org.apache.hive.spark.client.SparkClientImpl
> > [stderr-redir-1]: 16/03/28 15:12:22 INFO yarn.Client: Application report 
> > for application_1458679386200_0161 (state: RUNNING)
> > Mar 28, 3:12:22.913 PM        INFO    
> > org.apache.hive.spark.client.SparkClientImpl
> > [stderr-redir-1]: 16/03/28 15:12:22 INFO yarn.Client: Application report 
> > for application_1458679386200_0149 (state: RUNNING)
> > Mar 28, 3:12:23.855 PM        INFO    
> > org.apache.hive.spark.client.SparkClientImpl
> > [stderr-redir-1]: 16/03/28 15:12:23 INFO yarn.Client: Application report 
> > for application_1458679386200_0161 (state: RUNNING)
> While this is good information, it is a bit much.
> Seems like SparkJobMonitor hard-codes its interval to 1 second.  It should be 
> higher and perhaps made configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to