[ 
https://issues.apache.org/jira/browse/SPARK-15857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiqing Yang resolved SPARK-15857.
----------------------------------
    Resolution: Fixed

> Add Caller Context in Spark
> ---------------------------
>
>                 Key: SPARK-15857
>                 URL: https://issues.apache.org/jira/browse/SPARK-15857
>             Project: Spark
>          Issue Type: New Feature
>            Reporter: Weiqing Yang
>
> Hadoop has implemented a feature of log tracing – caller context (Jira: 
> HDFS-9184 and YARN-4349). The motivation is to better diagnose and understand 
> how specific applications impacting parts of the Hadoop system and potential 
> problems they may be creating (e.g. overloading NN). As HDFS mentioned in 
> HDFS-9184, for a given HDFS operation, it's very helpful to track which upper 
> level job issues it. The upper level callers may be specific Oozie tasks, MR 
> jobs, hive queries, Spark jobs. 
> Hadoop ecosystems like MapReduce, Tez (TEZ-2851), Hive (HIVE-12249, 
> HIVE-12254) and Pig(PIG-4714) have implemented their caller contexts. Those 
> systems invoke HDFS client API and Yarn client API to setup caller context, 
> and also expose an API to pass in caller context into it.
> Lots of Spark applications are running on Yarn/HDFS. Spark can also implement 
> its caller context via invoking HDFS/Yarn API, and also expose an API to its 
> upstream applications to set up their caller contexts. In the end, the spark 
> caller context written into Yarn log / HDFS log can associate with task id, 
> stage id, job id and app id. That is also very good for Spark users to 
> identify tasks especially if Spark supports multi-tenant environment in the 
> future.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to