[ https://issues.apache.org/jira/browse/SAMZA-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16631379#comment-16631379 ]
ASF GitHub Bot commented on SAMZA-1832: --------------------------------------- Github user bharathkk closed the pull request at: https://github.com/apache/samza/pull/660 > Race condition between SamzaContainerListener.onContainerFailure(t) and > StreamProcessor.stop() > ---------------------------------------------------------------------------------------------- > > Key: SAMZA-1832 > URL: https://issues.apache.org/jira/browse/SAMZA-1832 > Project: Samza > Issue Type: Bug > Reporter: Yi Pan (Data Infrastructure) > Assignee: Bharath Kumarasubramanian > Priority: Major > Fix For: 1.0 > > > There is a race condition between > SamzaContainerListener.onContainerFailure(t) and StreamProcessor.stop() that > may mistakenly return a successful stopping state even there is an exception > happened in the SamzaContainer. > The sequence of events are: > Thread-1: SamzaContainer.run() -> exception happened -> status set to FAILED > -> start shutdown sequence > Thread-2: User called LocalApplicationRunner.kill() -> StreamProcessor.stop() > -> stopSamzaContainer(): since container status is failed, skipped waiting -> > jobCoordinator.stop() -> since callback > SamzaContainerListener.onContainerFailure() is only called after the shutdown > sequence, containerException is not set -> normal shutdown of StreamProcessor > -> appStatus = SuccessfulFinish > The issue here is when StreamProcessor.stop() calls stopSamzaContainer(), it > needs to wait till the callback of > SamzaContainerListener.onContainerFailure(t) finishes before making the > decision that the container is stopped successfully/failed. > Reproduce steps: > - Write an StreamApplication that throws exception in process() > - Using StreamApplicationIntegrationTestHarness to start the application by > runApplication() > - In the test, immediately call runner.kill(); runner.waitForFinish(); > runner.status() -- This message was sent by Atlassian JIRA (v7.6.3#76005)