Blazer-007 commented on code in PR #4089:
URL: https://github.com/apache/gobblin/pull/4089#discussion_r1900532554
##########
gobblin-temporal/src/main/java/org/apache/gobblin/temporal/ddm/activity/impl/GenerateWorkUnitsImpl.java:
##########
@@ -127,6 +130,9 @@ public GenerateWorkUnitsResult generateWorkUnits(Properties
jobProps, EventSubmi
int numSizeSummaryQuantiles =
getConfiguredNumSizeSummaryQuantiles(jobState);
WorkUnitsSizeSummary wuSizeSummary =
digestWorkUnitsSize(workUnits).asSizeSummary(numSizeSummaryQuantiles);
log.info("Discovered WorkUnits: {}", wuSizeSummary);
+ // IMPORTANT: send prior to `writeWorkUnits`, so the volume of work
discovered (and bin packed) gets durably measured. even if serialization were
to
+ // exceed available memory and this activity execution were to fail, a
subsequent re-attempt would know the amount of work, to guide re-config/attempt
+ createWorkPreparedSizeDistillationTimer(wuSizeSummary,
eventSubmitterContext).stop();
Review Comment:
I think, if serialization takes time and if it accounts for more OOM issue
then should we consider these both in separate activity and launched through
one parent workflow as IIUC activity retry will do everything from beginning
and discovered workunits will be generated again which can also lead to
duplication of GTE, whats your opinion on this ?
##########
gobblin-temporal/src/main/java/org/apache/gobblin/temporal/ddm/activity/impl/GenerateWorkUnitsImpl.java:
##########
@@ -150,26 +156,28 @@ public GenerateWorkUnitsResult
generateWorkUnits(Properties jobProps, EventSubmi
protected List<WorkUnit>
generateWorkUnitsForJobStateAndCollectCleanupPaths(JobState jobState,
EventSubmitterContext eventSubmitterContext, Closer closer,
Set<String> pathsToCleanUp)
throws ReflectiveOperationException {
+ // report (timer) metrics for "Work Discovery", *planning only* - NOT
including WU prep, like serialization, `DestinationDatasetHandlerService`ing,
etc.
+ // IMPORTANT: for accurate timing, SEPARATELY emit
`.createWorkPreparationTimer`, to record time prior to measuring the WU size
required for that one
Review Comment:
Do you mean here one GTE event for serialization step or GTE event for
everything discovery serialization and all ?
and also would that GTE help us in anyway if we had , wdyt ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]