[ 
https://issues.apache.org/jira/browse/SPARK-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15429237#comment-15429237
 ] 

Weiqing Yang commented on SPARK-16757:
--------------------------------------

Thanks, [~srowen]. When Spark applications run on HDFS, if Spark reads data 
from HDFS or writes data into HDFS, a corresponding operation record with spark 
caller contexts will be written into hdfs-audit.log. The Spark caller contexts 
are JobID_stageID_stageAttemptId_taskID_attemptNumbe and applications’ name. 
That can help users to better diagnose and understand how specific applications 
impacting parts of the Hadoop system and potential problems they may be 
creating (e.g. overloading NN). As HDFS mentioned in HDFS-9184, for a given 
HDFS operation, it's very helpful to track which upper level job issues it.

> Set up caller context to HDFS
> -----------------------------
>
>                 Key: SPARK-16757
>                 URL: https://issues.apache.org/jira/browse/SPARK-16757
>             Project: Spark
>          Issue Type: Sub-task
>            Reporter: Weiqing Yang
>
> In this jira, Spark will invoke hadoop caller context api to set up its 
> caller context to HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to