phet commented on code in PR #4084:
URL: https://github.com/apache/gobblin/pull/4084#discussion_r1884245212
##########
gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/proc/DagProcUtils.java:
##########
@@ -139,12 +138,15 @@ public static void
submitJobToExecutor(DagManagementStateStore dagManagementStat
dagManagementStateStore.updateDagNode(dagNode);
sendEnforceJobStartDeadlineDagAction(dagManagementStateStore, dagNode);
} catch (Exception e) {
- TimingEvent jobFailedTimer =
DagProc.eventSubmitter.getTimingEvent(TimingEvent.LauncherTimings.JOB_FAILED);
String message = "Cannot submit job " +
DagUtils.getFullyQualifiedJobName(dagNode) + " on executor " + specExecutorUri;
log.error(message, e);
- jobMetadata.put(TimingEvent.METADATA_MESSAGE, message + " due to " +
e.getMessage());
- if (jobFailedTimer != null) {
- jobFailedTimer.stop(jobMetadata);
+ // Only mark the job as failed in case of non transient exceptions
+ if (!DagProcessingEngine.isTransientException(e)) {
+ TimingEvent jobFailedTimer =
DagProc.eventSubmitter.getTimingEvent(TimingEvent.LauncherTimings.JOB_FAILED);
+ jobMetadata.put(TimingEvent.METADATA_MESSAGE, message + " due to " +
e.getMessage());
+ if (jobFailedTimer != null) {
+ jobFailedTimer.stop(jobMetadata);
+ }
}
Review Comment:
whoops, sorry - I got my parens confused and thought this was the end of the
block. I now see L157 throwing the `RuntimeException`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]