[
https://issues.apache.org/jira/browse/GOBBLIN-2166?focusedWorklogId=938747&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-938747
]
ASF GitHub Bot logged work on GOBBLIN-2166:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 17/Oct/24 17:39
Start Date: 17/Oct/24 17:39
Worklog Time Spent: 10m
Work Description: phet commented on code in PR #4067:
URL: https://github.com/apache/gobblin/pull/4067#discussion_r1805165363
##########
gobblin-yarn/src/main/java/org/apache/gobblin/yarn/GobblinYarnAppLauncher.java:
##########
@@ -173,6 +174,8 @@ public class GobblinYarnAppLauncher {
private static final String GOBBLIN_YARN_APPLICATION_TYPE = "GOBBLIN_YARN";
+ private static final String APPLICATION_TAGS_KEY =
"hadoop-inject.mapreduce.job.tags";
Review Comment:
to anticipate confusion, this may be worth a code comment, that even though
this is named for an MR prop:
https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
that it's essentially morphed into a general-purpose pass-through for
Azkaban to provide context about its execution
the `hadoop-inject.` prefix is documented here -
https://azkaban.readthedocs.io/en/latest/jobTypes.html#id4
Issue Time Tracking
-------------------
Worklog Id: (was: 938747)
Time Spent: 20m (was: 10m)
> GoT must fill in info required for RMAppSummaryEvent fields - azkabanexecid,
> azkabanprojectname, azkabanflowid, azkabanjobid
> ----------------------------------------------------------------------------------------------------------------------------
>
> Key: GOBBLIN-2166
> URL: https://issues.apache.org/jira/browse/GOBBLIN-2166
> Project: Apache Gobblin
> Issue Type: Bug
> Reporter: Abhishek Jain
> Priority: Major
> Time Spent: 20m
> Remaining Estimate: 0h
>
> right now, it's not possible to find `RMAppSummaryEvent`s by any of the above
> named fields even though the `GaaS-Gobblin-Temporal-Azkaban` project is used
> by GoT execs
> because `azkabanprojectname` is not populated in events for any GoT execution
> (the way it IS for GoMR executions), the only way to locate
> `RMAppSummaryEvent`s for GoT executions is `appid`.
> *why does this matter?*
> a significant consequence of missing these fields is it thwarts joining
> `GaaSJobObservabilityEvent`s to `RMAppSummaryEvent`s. this severely
> complicates analysis, because the GaaS obs. event does NOT contain the YARN
> appid, only the AZ flow ID.
> since there is clearly an AZ execution involved, the solution is for GoT to
> set whatever props are required on the YARN app side, so YARN will emit
> fully-populated `RMAppSummaryEvent`s, with all of their `azkaban*` fields set.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)