Re: [PR] [hotfix] In case of unexpected errors do not loose the primary failur [flink]
pnowojski merged PR #24487: URL: https://github.com/apache/flink/pull/24487 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [hotfix] In case of unexpected errors do not loose the primary failur [flink]
pnowojski commented on PR #24487: URL: https://github.com/apache/flink/pull/24487#issuecomment-1999835148 Merging. Builds are failing due to unrelated test instabilities/bugs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [hotfix] In case of unexpected errors do not loose the primary failur [flink]
rkhachatryan commented on code in PR #24487: URL: https://github.com/apache/flink/pull/24487#discussion_r1521803705 ## flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java: ## @@ -1046,27 +1046,32 @@ private void onTriggerFailure( CheckpointProperties checkpointProperties, Throwable throwable) { // beautify the stack trace a bit -throwable = ExceptionUtils.stripCompletionException(throwable); - try { -coordinatorsToCheckpoint.forEach( - OperatorCoordinatorCheckpointContext::abortCurrentTriggering); +throwable = ExceptionUtils.stripCompletionException(throwable); -final CheckpointException cause = -getCheckpointException( - CheckpointFailureReason.TRIGGER_CHECKPOINT_FAILURE, throwable); +try { +coordinatorsToCheckpoint.forEach( + OperatorCoordinatorCheckpointContext::abortCurrentTriggering); -if (checkpoint != null && !checkpoint.isDisposed()) { -synchronized (lock) { -abortPendingCheckpoint(checkpoint, cause); +final CheckpointException cause = +getCheckpointException( + CheckpointFailureReason.TRIGGER_CHECKPOINT_FAILURE, throwable); + +if (checkpoint != null && !checkpoint.isDisposed()) { +synchronized (lock) { +abortPendingCheckpoint(checkpoint, cause); +} +} else { +failureManager.handleCheckpointException( +checkpoint, checkpointProperties, cause, null, job, null, statsTracker); } -} else { -failureManager.handleCheckpointException( -checkpoint, checkpointProperties, cause, null, job, null, statsTracker); +} finally { +isTriggering = false; +executeQueuedRequest(); } -} finally { -isTriggering = false; -executeQueuedRequest(); +} catch (Throwable secondThrowable) { Review Comment: Can't we have just one try/catch block? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [hotfix] In case of unexpected errors do not loose the primary failur [flink]
rkhachatryan commented on code in PR #24487: URL: https://github.com/apache/flink/pull/24487#discussion_r1521803705 ## flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java: ## @@ -1046,27 +1046,32 @@ private void onTriggerFailure( CheckpointProperties checkpointProperties, Throwable throwable) { // beautify the stack trace a bit -throwable = ExceptionUtils.stripCompletionException(throwable); - try { -coordinatorsToCheckpoint.forEach( - OperatorCoordinatorCheckpointContext::abortCurrentTriggering); +throwable = ExceptionUtils.stripCompletionException(throwable); -final CheckpointException cause = -getCheckpointException( - CheckpointFailureReason.TRIGGER_CHECKPOINT_FAILURE, throwable); +try { +coordinatorsToCheckpoint.forEach( + OperatorCoordinatorCheckpointContext::abortCurrentTriggering); -if (checkpoint != null && !checkpoint.isDisposed()) { -synchronized (lock) { -abortPendingCheckpoint(checkpoint, cause); +final CheckpointException cause = +getCheckpointException( + CheckpointFailureReason.TRIGGER_CHECKPOINT_FAILURE, throwable); + +if (checkpoint != null && !checkpoint.isDisposed()) { +synchronized (lock) { +abortPendingCheckpoint(checkpoint, cause); +} +} else { +failureManager.handleCheckpointException( +checkpoint, checkpointProperties, cause, null, job, null, statsTracker); } -} else { -failureManager.handleCheckpointException( -checkpoint, checkpointProperties, cause, null, job, null, statsTracker); +} finally { +isTriggering = false; +executeQueuedRequest(); } -} finally { -isTriggering = false; -executeQueuedRequest(); +} catch (Throwable secondThrowable) { Review Comment: Can't we have just one try/catch block? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [hotfix] In case of unexpected errors do not loose the primary failur [flink]
flinkbot commented on PR #24487: URL: https://github.com/apache/flink/pull/24487#issuecomment-1992068365 ## CI report: * d07a37b06fc847c1f2c6ce148a918c2490f2490c UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [hotfix] In case of unexpected errors do not loose the primary failur [flink]
pnowojski opened a new pull request, #24487: URL: https://github.com/apache/flink/pull/24487 Unexpected error can be for example NPE ## Verifying this change This ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / **no**) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**) - The serializers: (yes / **no** / don't know) - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / **no** / don't know) - The S3 file system connector: (yes / **no** / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / **no**) - If yes, how is the feature documented? (**not applicable** / docs / JavaDocs / not documented) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org