[
https://issues.apache.org/jira/browse/SAMZA-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shanthoosh Venkataraman updated SAMZA-1568:
-------------------------------------------
Description:
When zookeeper session failures occur in a stream processor, leaves the
group(zkClient is closed) and joins the group again.
The last step in that shutdown sequence is zkClient.close(). In some scenarios,
it throws the following exception,
{code:java}
org.I0Itec.zkclient.exception.ZkInterruptedException:
java.lang.InterruptedException
at org.I0Itec.zkclient.ZkClient.close(ZkClient.java:1278)
at org.apache.samza.zk.ZkControllerImpl.stop(ZkControllerImpl.java:92)
at org.apache.samza.zk.ZkJobCoordinator.stop(ZkJobCoordinator.java:141)
{code}
In existing implementation this is not handled, there by killing the stream
processor. The following codepath triggers this exception:
{code:java}
StreamProcessor.stop -> ZkJobCoordinator.stop() -> zkController.stop() ->
zkUtils.close
{code}
This exception causes the integration test to fail occasionally and can cause
LocalApplicationRunner.waitForFinish method call to block indefinitely(since
this callback event success, updates the latch state required for waitForFinish
to end).
was:
When zookeeper session failures occur in a stream processor, leaves the
group(zkClient is closed) and joins the group again.
The last step in that shutdown sequence is zkClient.close(). In some scenarios,
it throws the following exception,
{code:java}
org.I0Itec.zkclient.exception.ZkInterruptedException:
java.lang.InterruptedException
at org.I0Itec.zkclient.ZkClient.close(ZkClient.java:1278)
at org.apache.samza.zk.ZkControllerImpl.stop(ZkControllerImpl.java:92)
at org.apache.samza.zk.ZkJobCoordinator.stop(ZkJobCoordinator.java:141)
{code}
In existing implementation this is not handled, there by killing the stream
processor. The following codepath triggers this exception:
{code:java}
StreamProcessor.stop -> ZkJobCoordinator.stop() -> zkController.stop() ->
zkUtils.close
{code}
This exception causes the integration test to fail occasionally and can cause
LocalApplicationRunner.waitForFinish method call to be infinite (since this
callback event success, updates the latch state required for waitForFinish to
end).
> Handle ZkInterruptedException in zkclient.close.
> ------------------------------------------------
>
> Key: SAMZA-1568
> URL: https://issues.apache.org/jira/browse/SAMZA-1568
> Project: Samza
> Issue Type: Bug
> Reporter: Shanthoosh Venkataraman
> Assignee: Shanthoosh Venkataraman
> Priority: Major
>
> When zookeeper session failures occur in a stream processor, leaves the
> group(zkClient is closed) and joins the group again.
> The last step in that shutdown sequence is zkClient.close(). In some
> scenarios, it throws the following exception,
> {code:java}
> org.I0Itec.zkclient.exception.ZkInterruptedException:
> java.lang.InterruptedException
> at org.I0Itec.zkclient.ZkClient.close(ZkClient.java:1278)
> at org.apache.samza.zk.ZkControllerImpl.stop(ZkControllerImpl.java:92)
> at org.apache.samza.zk.ZkJobCoordinator.stop(ZkJobCoordinator.java:141)
> {code}
> In existing implementation this is not handled, there by killing the stream
> processor. The following codepath triggers this exception:
> {code:java}
> StreamProcessor.stop -> ZkJobCoordinator.stop() -> zkController.stop() ->
> zkUtils.close
> {code}
> This exception causes the integration test to fail occasionally and can
> cause LocalApplicationRunner.waitForFinish method call to block
> indefinitely(since this callback event success, updates the latch state
> required for waitForFinish to end).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)