[ https://issues.apache.org/jira/browse/SPARK-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14073369#comment-14073369 ]
Peng Zhang commented on SPARK-2668: ----------------------------------- Yes, this is a common issue for long running tasks on yarn. Our solution for Spark Streaming is using RollingAppender to keep only the latest 10 x 100M log files on disk. This will help log view by yarn UI(single file is not too big), and also avoid disk overflow. Besides file appender, we also put all log messages to scribe service which writes messages to HDFS (using log4j appender for scribe) This will help analyse all logs generated during running. > Support log4j log to yarn container log directory > ------------------------------------------------- > > Key: SPARK-2668 > URL: https://issues.apache.org/jira/browse/SPARK-2668 > Project: Spark > Issue Type: Improvement > Components: YARN > Reporter: Peng Zhang > Fix For: 1.0.0 > > > Assign value of yarn container log directory to java opts > "spark.yarn.log.dir", So user defined log4j.properties can reference this > value and write log to YARN container directory. > Otherwise, user defined file append will log to CWD, and files will not be > displayed on YARN UIļ¼and either cannot be aggregated to HDFS log directory > after job finished. > User defined log4j.properties reference example: > {code} > log4j.appender.rolling_file.File = ${spark.yarn.log.dir}/spark.log > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)