[ 
https://issues.apache.org/jira/browse/FALCON-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152370#comment-14152370
 ] 

Venkatesh Seetharam commented on FALCON-510:
--------------------------------------------

>From [~billie.rinaldi]:
{quote}
For RCA in Hadoop 1, we added properties to the job conf that allowed us to 
tell which MR jobs were part of a Pig or Hive job.  There were often multiple 
MR jobs per Hive query or Pig script.
https://cwiki.apache.org/confluence/display/AMBARI/HDP+1.x+DAG+Specification

In Hadoop 2 with Tez, we rely on the information that Hive and Tez write to the 
Yarn Timeline Server.  Each Tez job is used for more than one Hive query.
{quote}

> Inject falcon related properties to job conf
> --------------------------------------------
>
>                 Key: FALCON-510
>                 URL: https://issues.apache.org/jira/browse/FALCON-510
>             Project: Falcon
>          Issue Type: Improvement
>            Reporter: Shwetha G S
>            Assignee: Peeyush Bishnoi
>
> Currently there is no falcon context injected at MR job level. The job conf 
> has at most the oozie workflow / action ID either in the job name or 
> sometimes in the job conf.
> Therefore there is no way for a tool like hraven, which relies completely on 
> jobconf and job history data, to identify that a particular job maps to a 
> particular falcon process or it's instance time, etc. Right now hraven does 
> regex-based job name surgery on a best effort basis before emitting metrics 
> to graphite
> Request the following feature in falcon:
> Add the following properties to the job conf (for all jobs - be it a pig 
> action or an MR action):
> falcon.process.name
> falcon.process.instancetime
> while we're at it, might as well add any other falcon context as a jobconf 
> property (like whether it was a rerun or the input/output feeds, cluster, 
> validity, any process properties, etc.)
> This will ofcourse inject at the first job level and cannot ensure that any 
> child jobs get the properties passed on (unless we can figure out a way to do 
> that too).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to