[ https://issues.apache.org/jira/browse/KAFKA-16349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824207#comment-17824207 ]
Ismael Juma commented on KAFKA-16349: ------------------------------------- Good catch. > ShutdownableThread fails build by calling Exit with race condition > ------------------------------------------------------------------ > > Key: KAFKA-16349 > URL: https://issues.apache.org/jira/browse/KAFKA-16349 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 3.8.0 > Reporter: Greg Harris > Priority: Minor > > `ShutdownableThread` calls `Exit.exit()` when the thread's operation throws > FatalExitError. In normal operation, this calls System.exit, and exits the > process. In tests, the exit procedure is masked with Exit.setExitProcedure to > prevent tests that encounter a FatalExitError from crashing the test JVM. > Masking of exit procedures is usually done in BeforeEach/AfterEach > annotations, with the exit procedures cleaned up immediately after the test > finishes. If the body of the test creates a ShutdownableThread that outlives > the test, such as by omitting `ShutdownableThread#awaitShutdown`, by having > `ShutdownableThread#awaitShutdown` be interrupted by a test timeout, or by a > race condition between `Exit.resetExitProcedure` and `Exit.exit`, then > System.exit() can be called erroneously. > > {noformat} > // First, in the test thread: > Exit.setExitProcedure(...) > try { > new ShutdownableThread(...).start() > } finally { > Exit.resetExitProcedure() > } > // Second, in the ShutdownableThread: > try { > throw new FatalExitError(...) > } catch (FatalExitError e) { > Exit.exit(...) // Calls real System.exit() > }{noformat} > > This can be resolved by one of the following: > # Eliminate FatalExitError usages in code when setExitProcedure is in-use > # Eliminate the Exit.exit call from ShutdownableThread, and instead > propagate this error to another thread to handle without a race-condition > # Eliminate resetExitProcedure by refactoring Exit to be non-static > FatalExitError is in use in a small number of places, but may be difficult to > eliminate: > * FinalizedFeatureChangeListener > * InterBrokerSendThread > * TopicBasedRemoteLogMetadataManager > It appears that every other use of Exit.setExitProcedure/Exit.exit is done on > a single thread, so ShutdownableThread is the only place where this race > condition is present. > The effect of this bug is that the build is flaky, as race > conditions/timeouts in tests can cause the gradle executors to exit with > status code 1, which has happened 26 times in the last 28 days. I have not > yet been able to confirm this bug is happening in other tests, but I do have > a deterministic reproduction case with the exact same symptoms: > {noformat} > Gradle Test Run :core:test > Gradle Test Executor 38 > ShutdownableThreadTest > > testShutdownWhenTestTimesOut(boolean) > > "testShutdownWhenTestTimesOut(boolean).isInterruptible=true" SKIPPED > FAILURE: Build failed with an exception. > * What went wrong: > Execution failed for task ':core:test'. > > Process 'Gradle Test Executor 38' finished with non-zero exit value 1 > This problem might be caused by incorrect test process configuration. > For more on test execution, please refer to > https://docs.gradle.org/8.6/userguide/java_testing.html#sec:test_execution in > the Gradle documentation.{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)