GitHub user jerryshao opened a pull request:
https://github.com/apache/spark/pull/13712
[SPARK-15990][YARN] Add rolling log aggregation support for Spark on yarn
## What changes were proposed in this pull request?
Yarn supports rolling log aggregation since 2.6, previously log will only
be aggregated to HDFS after application is finished, it is quite painful for
long running applications like Spark Streaming, thriftserver. Also out of disk
problem will be occurred when log file is too large. So here propose to add
support of rolling log aggregation for Spark on yarn.
One limitation for this is that log4j should be set to change to file
appender, now in Spark itself uses console appender by default, in which file
will not be created again once removed after aggregation. But I think lots of
production users should have changed their log4j configuration instead of
default on, so this is not a big problem.
## How was this patch tested?
Manually verified with Hadoop 2.7.1.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jerryshao/apache-spark SPARK-15990
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/13712.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #13712
----
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]