Re: Curator listener issue

2020-02-26 Thread Jordan Zimmerman
That sleep won't solve the problem as there is a single thread for Connection 
State Listeners. Instead, you can pass different Executors when you register 
your listener and your sleep would then have the desired effect.

-Jordan

> On Feb 26, 2020, at 10:37 AM, Arpit Jain  wrote:
> 
> Thats what exactly I also thought so I put a sleep with 10 seconds before 
> calling shutting down and closing the curator instance but it did not change 
> the behaviour.
> 
> On Wed, Feb 26, 2020 at 3:30 PM Jordan Zimmerman  > wrote:
> You are shutting down your server in the handleConnectionStateChange() 
> handler. So, maybe your application shutsdown/closes the Curator handle 
> before the other handler gets called. i.e. a kind of race.
> 
> -Jordan
> 
>> On Feb 26, 2020, at 10:15 AM, Arpit Jain > > wrote:
>> 
>> What is RetryLoopExecutor? Curator does all retries internally. You should 
>> not have your own retry mechanism.
>> Its just a retry loop for the submitted action. Will remove it if its 
>> already done by curator.
>> Calling blockUntilConnectedOrTimedOut() is unnecessary. Curator does this 
>> internally already.
>> Will remove it
>> What do you do with the "isConnected" value? That seems suspicious to me.
>> Its not used anywhere. Just for logging if we get connected or not
>> You do not get LOST until the session expires. How long is 
>> "coordinatorSessionTimeout"? You won't receive LOST until that has elapsed.
>>   "ConnectionTimeout": 1,
>>   "Hosts": "localhost:2181",
>>   "MaxRetries": 3,
>>   "RetryTimeout": 3000,
>>   "SessionTimeout": 9,
>> As I said earlier, I am receiving LOST on both application instances. Its 
>> only that specific listener is not getting called. Here are the logs 
>> 
>> [L: WARN] [O: c.t.s.c.ZookeeperHelper] [I: ] [U: ] [S: ] [P: platform2] [T: 
>> Curator-ConnectionStateManager-0] ZOOKEEPER STATE CHANGED TO : SUSPENDED
>> 2020-02-26 12:39:13.107+ [L: WARN] [O: o.a.c.f.s.ConnectionStateManager] 
>> [I: ] [U: ] [S: ] [P: platform2] [T: Curator-ConnectionStateManager-0] 
>> Session timeout has elapsed while SUSPENDED. Injecting a session expiration. 
>> Elapsed ms: 40003. Adjusted session timeout ms: 4
>> 2020-02-26 12:39:13.109+ [L: WARN] [O: o.a.c.f.s.ConnectionStateManager] 
>> [I: ] [U: ] [S: ] [P: platform2] [T: Curator-ConnectionStateManager-0] 
>> Session timeout has elapsed while SUSPENDED. Injecting a session expiration. 
>> Elapsed ms: 40002. Adjusted session timeout ms: 4
>> 2020-02-26 12:39:13.109+ [L: WARN] [O: o.a.c.f.s.ConnectionStateManager] 
>> [I: ] [U: ] [S: ] [P: platform2] [T: Curator-ConnectionStateManager-0] 
>> Session timeout has elapsed while SUSPENDED. Injecting a session expiration. 
>> Elapsed ms: 40002. Adjusted session timeout ms: 4
>> 2020-02-26 12:39:13.111+ [L: WARN] [O: o.a.c.ConnectionState] [I: ] [U: 
>> ] [S: ] [P: platform2] [T: localhost-startStop-1-EventThread] Session 
>> expired event received
>> 2020-02-26 12:39:13.111+ [L: WARN] [O: o.a.c.ConnectionState] [I: ] [U: 
>> ] [S: ] [P: platform2] [T: localhost-startStop-1-EventThread] Session 
>> expired event received
>> 2020-02-26 12:39:13.111+ [L: WARN] [O: o.a.c.ConnectionState] [I: ] [U: 
>> ] [S: ] [P: platform2] [T: ForkJoinPool.commonPool-worker-1-EventThread] 
>> Session expired event received
>> 2020-02-26 12:39:24.405+ [L: ERROR] [O: 
>> c.t.s.c.ClusteredModeCoordinator] [I: ] [U: ] [S: ] [P: platform2] [T: 
>> Curator-ConnectionStateManager-0] Could not connect to ZK. Shutting down 
>> server
>> 
>> As you can see in above logs, the message "ZOOKEEPER STATE CHANGED TO : 
>> LOST" is missing 
>> 
>> private final void handleConnectionStateChange(final ConnectionState 
>> newConnectionState) {
>> 
>> if 
>> (ConnectionState.LOST.name().equalsIgnoreCase(newConnectionState.name())) {
>> _logger.error("Could not connect to ZK. Shutting down server..");
>> shutDownPlatform(1);
>> } else if (_isSingletonServer && 
>> ConnectionState.SUSPENDED.name().equalsIgnoreCase(newConnectionState.name()))
>>  {
>> notifyClusterModeCoordinatorListenersOfSingletonRoleLoss();
>> }
>> }
>> 
>> private final void handleTakeSingletonRole() throws Exception {
>> 
>> _logger.info("handleTakeSingletonRole: Received Singleton Role.");
>> 
>> // We have just become the Singleton Server. Ha!
>> try {
>> while (true) {
>> try {
>> Thread.sleep(1000);
>> } catch (Exception e) {
>> // ignore interrupted exception
>> }
>> }
>> }
>> 
>> Thanks
>> 
>> On Wed, Feb 26, 2020 at 2:53 PM Jordan Zimmerman > > wrote:
>> A few things:
>> 
>> What is RetryLoopExecutor? Curator does all retries internally. You should 
>> not have your own retry 

Re: Curator listener issue

2020-02-26 Thread Jordan Zimmerman
You are shutting down your server in the handleConnectionStateChange() handler. 
So, maybe your application shutsdown/closes the Curator handle before the other 
handler gets called. i.e. a kind of race.

-Jordan

> On Feb 26, 2020, at 10:15 AM, Arpit Jain  wrote:
> 
> What is RetryLoopExecutor? Curator does all retries internally. You should 
> not have your own retry mechanism.
> Its just a retry loop for the submitted action. Will remove it if its already 
> done by curator.
> Calling blockUntilConnectedOrTimedOut() is unnecessary. Curator does this 
> internally already.
> Will remove it
> What do you do with the "isConnected" value? That seems suspicious to me.
> Its not used anywhere. Just for logging if we get connected or not
> You do not get LOST until the session expires. How long is 
> "coordinatorSessionTimeout"? You won't receive LOST until that has elapsed.
>   "ConnectionTimeout": 1,
>   "Hosts": "localhost:2181",
>   "MaxRetries": 3,
>   "RetryTimeout": 3000,
>   "SessionTimeout": 9,
> As I said earlier, I am receiving LOST on both application instances. Its 
> only that specific listener is not getting called. Here are the logs 
> 
> [L: WARN] [O: c.t.s.c.ZookeeperHelper] [I: ] [U: ] [S: ] [P: platform2] [T: 
> Curator-ConnectionStateManager-0] ZOOKEEPER STATE CHANGED TO : SUSPENDED
> 2020-02-26 12:39:13.107+ [L: WARN] [O: o.a.c.f.s.ConnectionStateManager] 
> [I: ] [U: ] [S: ] [P: platform2] [T: Curator-ConnectionStateManager-0] 
> Session timeout has elapsed while SUSPENDED. Injecting a session expiration. 
> Elapsed ms: 40003. Adjusted session timeout ms: 4
> 2020-02-26 12:39:13.109+ [L: WARN] [O: o.a.c.f.s.ConnectionStateManager] 
> [I: ] [U: ] [S: ] [P: platform2] [T: Curator-ConnectionStateManager-0] 
> Session timeout has elapsed while SUSPENDED. Injecting a session expiration. 
> Elapsed ms: 40002. Adjusted session timeout ms: 4
> 2020-02-26 12:39:13.109+ [L: WARN] [O: o.a.c.f.s.ConnectionStateManager] 
> [I: ] [U: ] [S: ] [P: platform2] [T: Curator-ConnectionStateManager-0] 
> Session timeout has elapsed while SUSPENDED. Injecting a session expiration. 
> Elapsed ms: 40002. Adjusted session timeout ms: 4
> 2020-02-26 12:39:13.111+ [L: WARN] [O: o.a.c.ConnectionState] [I: ] [U: ] 
> [S: ] [P: platform2] [T: localhost-startStop-1-EventThread] Session expired 
> event received
> 2020-02-26 12:39:13.111+ [L: WARN] [O: o.a.c.ConnectionState] [I: ] [U: ] 
> [S: ] [P: platform2] [T: localhost-startStop-1-EventThread] Session expired 
> event received
> 2020-02-26 12:39:13.111+ [L: WARN] [O: o.a.c.ConnectionState] [I: ] [U: ] 
> [S: ] [P: platform2] [T: ForkJoinPool.commonPool-worker-1-EventThread] 
> Session expired event received
> 2020-02-26 12:39:24.405+ [L: ERROR] [O: c.t.s.c.ClusteredModeCoordinator] 
> [I: ] [U: ] [S: ] [P: platform2] [T: Curator-ConnectionStateManager-0] Could 
> not connect to ZK. Shutting down server
> 
> As you can see in above logs, the message "ZOOKEEPER STATE CHANGED TO : LOST" 
> is missing 
> 
> private final void handleConnectionStateChange(final ConnectionState 
> newConnectionState) {
> 
> if 
> (ConnectionState.LOST.name().equalsIgnoreCase(newConnectionState.name())) {
> _logger.error("Could not connect to ZK. Shutting down server..");
> shutDownPlatform(1);
> } else if (_isSingletonServer && 
> ConnectionState.SUSPENDED.name().equalsIgnoreCase(newConnectionState.name())) 
> {
> notifyClusterModeCoordinatorListenersOfSingletonRoleLoss();
> }
> }
> 
> private final void handleTakeSingletonRole() throws Exception {
> 
> _logger.info("handleTakeSingletonRole: Received Singleton Role.");
> 
> // We have just become the Singleton Server. Ha!
> try {
> while (true) {
> try {
> Thread.sleep(1000);
> } catch (Exception e) {
> // ignore interrupted exception
> }
> }
> }
> 
> Thanks
> 
> On Wed, Feb 26, 2020 at 2:53 PM Jordan Zimmerman  > wrote:
> A few things:
> 
> What is RetryLoopExecutor? Curator does all retries internally. You should 
> not have your own retry mechanism.
> Calling blockUntilConnectedOrTimedOut() is unnecessary. Curator does this 
> internally already.
> What do you do with the "isConnected" value? That seems suspicious to me.
> You do not get LOST until the session expires. How long is 
> "coordinatorSessionTimeout"? You won't receive LOST until that has elapsed.
> 
> Other than that, I'd need to see the code for handleConnectionStateChange(), 
> handleTakeSingletonRole() and RetryLoopExecutor. Also, the logs from the 
> instance that doesn't get LOST would be helpful.
> 
> -Jordan
> 
>> On Feb 26, 2020, at 9:46 AM, Arpit Jain > > wrote:
>> 
>> Here is the setup I have :
>> 
>> 2 application instances