[ https://issues.apache.org/jira/browse/SPARK-15857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Weiqing Yang resolved SPARK-15857. ---------------------------------- Resolution: Fixed > Add Caller Context in Spark > --------------------------- > > Key: SPARK-15857 > URL: https://issues.apache.org/jira/browse/SPARK-15857 > Project: Spark > Issue Type: New Feature > Reporter: Weiqing Yang > > Hadoop has implemented a feature of log tracing – caller context (Jira: > HDFS-9184 and YARN-4349). The motivation is to better diagnose and understand > how specific applications impacting parts of the Hadoop system and potential > problems they may be creating (e.g. overloading NN). As HDFS mentioned in > HDFS-9184, for a given HDFS operation, it's very helpful to track which upper > level job issues it. The upper level callers may be specific Oozie tasks, MR > jobs, hive queries, Spark jobs. > Hadoop ecosystems like MapReduce, Tez (TEZ-2851), Hive (HIVE-12249, > HIVE-12254) and Pig(PIG-4714) have implemented their caller contexts. Those > systems invoke HDFS client API and Yarn client API to setup caller context, > and also expose an API to pass in caller context into it. > Lots of Spark applications are running on Yarn/HDFS. Spark can also implement > its caller context via invoking HDFS/Yarn API, and also expose an API to its > upstream applications to set up their caller contexts. In the end, the spark > caller context written into Yarn log / HDFS log can associate with task id, > stage id, job id and app id. That is also very good for Spark users to > identify tasks especially if Spark supports multi-tenant environment in the > future. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org