[jira] [Updated] (SAMZA-1692) Standalone stability fixes.

Shanthoosh Venkataraman (JIRA) Sun, 29 Apr 2018 22:54:06 -0700

     [ 
https://issues.apache.org/jira/browse/SAMZA-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Shanthoosh Venkataraman updated SAMZA-1692:
-------------------------------------------
    Description: 
* Currently on session expiration, processorListener with incorrect 
generationId is registered with zookeeper(ZkUtils generationId is incremented 
on reconnect but the generationId in processorListener is zero all the time). 
When this happens to immediate successor to leader, leader expiration event 
will be skipped by that processor. This will prevent leader re-election on a 
current leader death and will stall the processors group. Fix is to 
re-instantiate and then register processorChangeListener on session expiration.
* Add processorId to debounce thread name (this can aid debugging when multiple 
processors are running within a jvm).
* After ScheduleAfterDebounceTime queue is shutdown, don't accept new schedule 
requests. Current ZkJobCoordinator shutdown sequence comprise of the following 
steps:

          #            Shutdown the ScheduleAfterDebounceTime queue.
          #             Stop the zkClient and relinquish it's resources.
         

After we shutdown ScheduleAfterDebounceTime and before zkclient is stopped, any 
new operations can be scheduled in ScheduleAfterDebounceTime queue by zkClient. 
This will result in RejectedExecutionException, since executorService is 
stopped.

sample exception:
{code:java}
Caused by: java.util.concurrent.RejectedExecutionException: Task 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@23f962a8 
rejected from java.util.concurrent.ScheduledThreadPoolExecutor@43408be8
{code}


  was:
* Currently on session expiration, processorListener with incorrect 
generationId is registered with zookeeper(ZkUtils generationId is incremented 
on reconnect but the generationId in processorListener is zero all the time). 
When this happens to immediate successor to leader, leader expiration event 
will be skipped by that processor. This will prevent leader re-election on a 
current leader death and will stall the processors group. Fix is to 
re-instantiate and then register processorChangeListener on session expiration.
* Add processorId to debounce thread name (this can aid debugging when multiple 
processors are running within a jvm).
* After ScheduleAfterDebounceTime queue is shutdown, don't accept new schedule 
requests. Current ZkJobCoordinator shutdown sequence comprise of the following 
steps

*           * Shutdown the ScheduleAfterDebounceTime queue.
          * Stop the zkClient and relinquish it's resources.

After we shutdown ScheduleAfterDebounceTime and before zkclient is stopped, any 
new operations can be scheduled in ScheduleAfterDebounceTime queue by zkClient. 
This will result in RejectedExecutionException, since executorService is 
stopped.

sample exception:
{code:java}
Caused by: java.util.concurrent.RejectedExecutionException: Task 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@23f962a8 
rejected from java.util.concurrent.ScheduledThreadPoolExecutor@43408be8
{code}



> Standalone stability fixes.
> ---------------------------
>
>                 Key: SAMZA-1692
>                 URL: https://issues.apache.org/jira/browse/SAMZA-1692
>             Project: Samza
>          Issue Type: Bug
>            Reporter: Shanthoosh Venkataraman
>            Assignee: Shanthoosh Venkataraman
>            Priority: Major
>
> * Currently on session expiration, processorListener with incorrect 
> generationId is registered with zookeeper(ZkUtils generationId is incremented 
> on reconnect but the generationId in processorListener is zero all the time). 
> When this happens to immediate successor to leader, leader expiration event 
> will be skipped by that processor. This will prevent leader re-election on a 
> current leader death and will stall the processors group. Fix is to 
> re-instantiate and then register processorChangeListener on session 
> expiration.
> * Add processorId to debounce thread name (this can aid debugging when 
> multiple processors are running within a jvm).
> * After ScheduleAfterDebounceTime queue is shutdown, don't accept new 
> schedule requests. Current ZkJobCoordinator shutdown sequence comprise of the 
> following steps:
>           #            Shutdown the ScheduleAfterDebounceTime queue.
>           #             Stop the zkClient and relinquish it's resources.
>          
> After we shutdown ScheduleAfterDebounceTime and before zkclient is stopped, 
> any new operations can be scheduled in ScheduleAfterDebounceTime queue by 
> zkClient. This will result in RejectedExecutionException, since 
> executorService is stopped.
> sample exception:
> {code:java}
> Caused by: java.util.concurrent.RejectedExecutionException: Task 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@23f962a8 
> rejected from java.util.concurrent.ScheduledThreadPoolExecutor@43408be8
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (SAMZA-1692) Standalone stability fixes.

Reply via email to