[ https://issues.apache.org/jira/browse/PIG-3048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13497708#comment-13497708 ]
Bill Graham commented on PIG-3048: ---------------------------------- I like the idea of adding more of this type of info, but we should make sure we define all the different namings and concepts first (existing and proposed) to make sure out terminology is clear and consistent. For example we already have these concepts: - job name - script id - script submit time - job submit time - logical plan signature So then what is a script and what is a workflow and how does versioning (i.e., logical plan signature) play into things, or does it? I like the idea of adjacency lists, which we don't currently produce. What I'd love to see though is the full DAG. Being able to get the full dag from any job in it would be pretty cool. For already executed jobs, their job ids could even be populated. Thoughts? > Add mapreduce workflow information to job configuration > ------------------------------------------------------- > > Key: PIG-3048 > URL: https://issues.apache.org/jira/browse/PIG-3048 > Project: Pig > Issue Type: Improvement > Reporter: Billie Rinaldi > Attachments: PIG-3048.patch > > > Adding workflow properties to the job configuration would enable logging and > analysis of workflows in addition to individual MapReduce jobs. Suggested > properties include a workflow ID, workflow name, adjacency list connecting > nodes in the workflow, and the name of the current node in the workflow. > mapreduce.workflow.id - a unique ID for the workflow, ideally prepended with > the application name > e.g. pig_<pigScriptId> > mapreduce.workflow.name - a name for the workflow, to distinguish this > workflow from other workflows and to group different runs of the same workflow > e.g. pig command line > mapreduce.workflow.adjacency - an adjacency list for the workflow graph, > encoded as mapreduce.workflow.adjacency.<source node> = <comma-separated list > of target nodes> > mapreduce.workflow.node.name - the name of the node corresponding to this > MapReduce job in the workflow adjacency list -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira