[
https://issues.apache.org/jira/browse/GOBBLIN-2186?focusedWorklogId=950621&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-950621
]
ASF GitHub Bot logged work on GOBBLIN-2186:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 02/Jan/25 05:06
Start Date: 02/Jan/25 05:06
Worklog Time Spent: 10m
Work Description: phet commented on code in PR #4089:
URL: https://github.com/apache/gobblin/pull/4089#discussion_r1900542681
##########
gobblin-temporal/src/main/java/org/apache/gobblin/temporal/ddm/activity/impl/GenerateWorkUnitsImpl.java:
##########
@@ -150,26 +156,28 @@ public GenerateWorkUnitsResult
generateWorkUnits(Properties jobProps, EventSubmi
protected List<WorkUnit>
generateWorkUnitsForJobStateAndCollectCleanupPaths(JobState jobState,
EventSubmitterContext eventSubmitterContext, Closer closer,
Set<String> pathsToCleanUp)
throws ReflectiveOperationException {
+ // report (timer) metrics for "Work Discovery", *planning only* - NOT
including WU prep, like serialization, `DestinationDatasetHandlerService`ing,
etc.
+ // IMPORTANT: for accurate timing, SEPARATELY emit
`.createWorkPreparationTimer`, to record time prior to measuring the WU size
required for that one
Review Comment:
originally, in `AbstractJobLauncher` the "WU creation timer" measured only
the planning -
https://github.com/apache/gobblin/blob/7dbeebf7fecc748ea3ef90cc318214cf26ba5afa/gobblin-runtime/src/main/java/org/apache/gobblin/runtime/AbstractJobLauncher.java#L476
that is what's included in the `GaaSJobObservabilityEvent`.
the timer for WU prep happens a bit later -
https://github.com/apache/gobblin/blob/7dbeebf7fecc748ea3ef90cc318214cf26ba5afa/gobblin-runtime/src/main/java/org/apache/gobblin/runtime/AbstractJobLauncher.java#L549
so in this comment:
> "Work Discovery", *planning only* - NOT including WU prep, like
serialization, ...
I just meant that we're timing only planning/creation, not the preparation
such as serialization.
as for WU serialization, there is no existing, historical event strictly for
that. typically that only takes a long time when memory-constrained and
GC-bound. although we could consider adding a new event to time that, but it's
not at the top of my list. for purposes of right-sizing, GC stats are more
interesting than the duration it happened to take.
Issue Time Tracking
-------------------
Worklog Id: (was: 950621)
Time Spent: 40m (was: 0.5h)
> Ensure GoT jobs record Work Discovery planning timing for populating the
> `GaaSJobObservabilityEvent` fields `jobPlanning{Start,End}Timestamp`
> ---------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: GOBBLIN-2186
> URL: https://issues.apache.org/jira/browse/GOBBLIN-2186
> Project: Apache Gobblin
> Issue Type: New Feature
> Components: gobblin-core
> Reporter: Kip Kohn
> Assignee: Abhishek Tiwari
> Priority: Minor
> Time Spent: 40m
> Remaining Estimate: 0h
>
> `GaaSJobObservabilityEvent`s for Gobblin-on-Temporal jobs have no values set
> for the fields `jobPlanningStartTimestamp` and `jobPlanningEndTimestamp`
> because no `TimingEvent.LauncherTimings.WORK_UNITS_CREATION` GTE (to record
> those values) is emitted by `GenerateWorkUnitsImpl`
--
This message was sent by Atlassian Jira
(v8.20.10#820010)