[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS

Sherry302 Fri, 19 Aug 2016 22:14:58 -0700

Github user Sherry302 commented on the issue:

    https://github.com/apache/spark/pull/14659
  
    Hi, @srowen . Thank you so much for the review. Sorry for the test
    failure and late update. The failure reasons are that âjobIDâ were
    none or there was no âspark.app.nameâ in sparkConf. I have updated the 
PR to set
    default values to âjobIDâ and âspark.app.nameâ. When a real 
application runs on
    Spark, it will always have âjobIDâ and âspark.app.nameâ. 
    
    What's the use case for this?
    When users run Spark applications on Yarn on HDFS, Sparkâs
    caller contexts will be written into hdfs-audit.log. The Spark caller 
contexts
    are JobID_stageID_stageAttemptId_taskID_attemptNumbe and applicationsâ 
name. 
    
    The caller context can help users to better diagnose and understand how 
specific
    applications impacting parts of the Hadoop system and potential problems 
they
    may be creating (e.g. overloading NN). As HDFS mentioned in HDFS-9184, for a
    given HDFS operation, it's very helpful to track which upper level job 
issues
    it.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS

Reply via email to