Re: [PR] [hotfix] In case of unexpected errors do not loose the primary failur [flink]

2024-03-15 Thread via GitHub


pnowojski merged PR #24487:
URL: https://github.com/apache/flink/pull/24487


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [hotfix] In case of unexpected errors do not loose the primary failur [flink]

2024-03-15 Thread via GitHub


pnowojski commented on PR #24487:
URL: https://github.com/apache/flink/pull/24487#issuecomment-1999835148

   Merging. Builds are failing due to unrelated test instabilities/bugs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [hotfix] In case of unexpected errors do not loose the primary failur [flink]

2024-03-12 Thread via GitHub


rkhachatryan commented on code in PR #24487:
URL: https://github.com/apache/flink/pull/24487#discussion_r1521803705


##
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java:
##
@@ -1046,27 +1046,32 @@ private void onTriggerFailure(
 CheckpointProperties checkpointProperties,
 Throwable throwable) {
 // beautify the stack trace a bit
-throwable = ExceptionUtils.stripCompletionException(throwable);
-
 try {
-coordinatorsToCheckpoint.forEach(
-
OperatorCoordinatorCheckpointContext::abortCurrentTriggering);
+throwable = ExceptionUtils.stripCompletionException(throwable);
 
-final CheckpointException cause =
-getCheckpointException(
-
CheckpointFailureReason.TRIGGER_CHECKPOINT_FAILURE, throwable);
+try {
+coordinatorsToCheckpoint.forEach(
+
OperatorCoordinatorCheckpointContext::abortCurrentTriggering);
 
-if (checkpoint != null && !checkpoint.isDisposed()) {
-synchronized (lock) {
-abortPendingCheckpoint(checkpoint, cause);
+final CheckpointException cause =
+getCheckpointException(
+
CheckpointFailureReason.TRIGGER_CHECKPOINT_FAILURE, throwable);
+
+if (checkpoint != null && !checkpoint.isDisposed()) {
+synchronized (lock) {
+abortPendingCheckpoint(checkpoint, cause);
+}
+} else {
+failureManager.handleCheckpointException(
+checkpoint, checkpointProperties, cause, null, 
job, null, statsTracker);
 }
-} else {
-failureManager.handleCheckpointException(
-checkpoint, checkpointProperties, cause, null, job, 
null, statsTracker);
+} finally {
+isTriggering = false;
+executeQueuedRequest();
 }
-} finally {
-isTriggering = false;
-executeQueuedRequest();
+} catch (Throwable secondThrowable) {

Review Comment:
   Can't we have just one try/catch block?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [hotfix] In case of unexpected errors do not loose the primary failur [flink]

2024-03-12 Thread via GitHub


rkhachatryan commented on code in PR #24487:
URL: https://github.com/apache/flink/pull/24487#discussion_r1521803705


##
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java:
##
@@ -1046,27 +1046,32 @@ private void onTriggerFailure(
 CheckpointProperties checkpointProperties,
 Throwable throwable) {
 // beautify the stack trace a bit
-throwable = ExceptionUtils.stripCompletionException(throwable);
-
 try {
-coordinatorsToCheckpoint.forEach(
-
OperatorCoordinatorCheckpointContext::abortCurrentTriggering);
+throwable = ExceptionUtils.stripCompletionException(throwable);
 
-final CheckpointException cause =
-getCheckpointException(
-
CheckpointFailureReason.TRIGGER_CHECKPOINT_FAILURE, throwable);
+try {
+coordinatorsToCheckpoint.forEach(
+
OperatorCoordinatorCheckpointContext::abortCurrentTriggering);
 
-if (checkpoint != null && !checkpoint.isDisposed()) {
-synchronized (lock) {
-abortPendingCheckpoint(checkpoint, cause);
+final CheckpointException cause =
+getCheckpointException(
+
CheckpointFailureReason.TRIGGER_CHECKPOINT_FAILURE, throwable);
+
+if (checkpoint != null && !checkpoint.isDisposed()) {
+synchronized (lock) {
+abortPendingCheckpoint(checkpoint, cause);
+}
+} else {
+failureManager.handleCheckpointException(
+checkpoint, checkpointProperties, cause, null, 
job, null, statsTracker);
 }
-} else {
-failureManager.handleCheckpointException(
-checkpoint, checkpointProperties, cause, null, job, 
null, statsTracker);
+} finally {
+isTriggering = false;
+executeQueuedRequest();
 }
-} finally {
-isTriggering = false;
-executeQueuedRequest();
+} catch (Throwable secondThrowable) {

Review Comment:
   Can't we have just one try/catch block?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [hotfix] In case of unexpected errors do not loose the primary failur [flink]

2024-03-12 Thread via GitHub


flinkbot commented on PR #24487:
URL: https://github.com/apache/flink/pull/24487#issuecomment-1992068365

   
   ## CI report:
   
   * d07a37b06fc847c1f2c6ce148a918c2490f2490c UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [hotfix] In case of unexpected errors do not loose the primary failur [flink]

2024-03-12 Thread via GitHub


pnowojski opened a new pull request, #24487:
URL: https://github.com/apache/flink/pull/24487

   Unexpected error can be for example NPE
   
   ## Verifying this change
   
   This
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / **no** / don't 
know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
 - If yes, how is the feature documented? (**not applicable** / docs / 
JavaDocs / not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org