XComp commented on a change in pull request #13111:
URL: https://github.com/apache/flink/pull/13111#discussion_r469889173



##########
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobManagerRunnerImpl.java
##########
@@ -137,7 +138,12 @@ public JobManagerRunnerImpl(
                this.leaderGatewayFuture = new CompletableFuture<>();
 
                // now start the JobManager
-               this.jobMasterService = 
jobMasterFactory.createJobMasterService(jobGraph, this, userCodeLoader);
+               try {
+                       this.jobMasterService = 
jobMasterFactory.createJobMasterService(jobGraph, this, userCodeLoader);
+               } catch (Throwable t) {
+                       
ClusterEntryPointExceptionUtils.tryEnrichClusterEntryPointError(t);
+                       throw t;
+               }

Review comment:
       I see. But wouldn't we miss doing the error handling for a Job rerun. 
The JobManager instantiation happens in `Dispatcher.persistAndRun(JobGraph)` 
(which calls `Dispatcher.internalSubmitJob(JobGraph)` but also in 
`Dispatcher.runRecoveredJob(JobGraph)`. The latter call graph does not touch 
`Dispatcher.internalSubmitJob(JobGraph)` which means that we wouldn't be able 
to handle the OOM for Job reruns if moving it into 
`Dispatcher.internalSubmitJob(JobGraph)` in that case.
   
   Wouldn't `Dispatcher.runJob(JobGraph)` the better location for the OOM 
handling?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to