GitHub user jerryshao opened a pull request:

    https://github.com/apache/spark/pull/13712

    [SPARK-15990][YARN] Add rolling log aggregation support for Spark on yarn

    ## What changes were proposed in this pull request?
    
    Yarn supports rolling log aggregation since 2.6, previously log will only 
be aggregated to HDFS after application is finished, it is quite painful for 
long running applications like Spark Streaming, thriftserver. Also out of disk 
problem will be occurred when log file is too large. So here propose to add 
support of rolling log aggregation for Spark on yarn.
    
    One limitation for this is that log4j should be set to change to file 
appender, now in Spark itself uses console appender by default, in which file 
will not be created again once removed after aggregation. But I think lots of 
production users should have changed their log4j configuration instead of 
default on, so this is not a big problem.
    
    ## How was this patch tested?
    
    Manually verified with Hadoop 2.7.1.
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jerryshao/apache-spark SPARK-15990

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13712.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13712
    
----

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to