[ 
https://issues.apache.org/jira/browse/CURATOR-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525459#comment-16525459
 ] 

Martin Serrano commented on CURATOR-405:
----------------------------------------

We have also seen this occur in our environment with curator 4.0.0.

The issue appears to have been caused by the OS swapping out the ZK client for 
some period of time.   The server side may have also been swapped out during 
some of the periods.   The server does recover, but the client never does.
{noformat}
2018-06-27 06:35:37,948 WARN ClientCnxn [main-SendThread(localhost:16980)] - 
Client session timed out, have not heard from server in 885327ms for sessionid 
0x164381924ca0039
2018-06-27 06:35:38,011 WARN ClientCnxn [main-SendThread(localhost:16980)] - 
Client session timed out, have not heard from server in 876402ms for sessionid 
0x16438192ca0021
2018-06-27 06:35:38,015 INFO ClientCnxn [main-SendThread(localhost:16980)] - 
Client session timed out, have not heard from server in 876402ms for sessionid 
0x164381924ca0021, closing socket connection and attempting reconnect
2018-06-27 06:35:38,021 INFO ClientCnxn [main-SendThread(localhost:16980)] - 
Client session timed out, have not heard from server in 885327ms for sessionid 
0x164381924ca0039, closing socket connection and attempting reconnect
2018-06-27 06:35:38,122 INFO ConnectionStateManager [main-EventThread] - State 
change: SUSPENDED
2018-06-27 06:35:38,123 INFO CuratorHelper [Curator-ConnectionStateManager-0] - 
Connection state change: SUSPENDED
2018-06-27 06:39:27,109 WARN ConnectionStateManager 
[Curator-ConnectionStateManager-0] - Session timeout has elapsed while 
SUSPENDED. Injecting a session expiration. Elapsed ms: 155192. Adjusted session 
timeout ms: 60000
2018-06-27 06:39:27,120 INFO ClientCnxn [main-SendThread(localhost:16980)] - 
Opening socket connection to server localhost/127.0.0.1:16980. Will not attempt 
to authenticate using SASL (unknown error)
2018-06-27 06:39:27,120 INFO ClientCnxn [main-SendThread(localhost:16980)] - 
Opening socket connection to server localhost/127.0.0.1:16980. Will not attempt 
to authenticate using SASL (unknown error)
2018-06-27 06:39:27,299 INFO ClientCnxn [main-SendThread(localhost:16980)] - 
Socket connection established to localhost/127.0.0.1:16980, initiating session
2018-06-27 06:39:27,324 INFO ClientCnxn [main-SendThread(localhost:16980)] - 
Socket connection established to localhost/127.0.0.1:16980, initiating session
2018-06-27 06:39:27,326 WARN ClientCnxn [main-SendThread(localhost:16980)] - 
Unable to reconnect to ZooKeeper service, session 0x164381924ca0039 has expired
2018-06-27 06:39:27,326 INFO ClientCnxn [main-EventThread] - EventThread shut 
down for session: 0x164381924ca0039
2018-06-27 06:39:27,330 WARN ClientCnxn [main-SendThread(localhost:16980)] - 
Unable to reconnect to ZooKeeper service, session 0x164381924ca0021 has expired
2018-06-27 06:39:27,356 WARN ConnectionStateManager 
[Curator-ConnectionStateManager-0] - Session timeout has elapsed while 
SUSPENDED. Injecting a session expiration. Elapsed ms: 229233. Adjusted session 
timeout ms: 60000
2018-06-27 06:39:27,356 WARN ConnectionStateManager 
[Curator-ConnectionStateManager-0] - Session timeout has elapsed while 
SUSPENDED. Injecting a session expiration. Elapsed ms: 229234. Adjusted session 
timeout ms: 60000
2018-06-27 06:39:27,356 WARN ConnectionStateManager 
[Curator-ConnectionStateManager-0] - Session timeout has elapsed while 
SUSPENDED. Injecting a session expiration. Elapsed ms: 229234. Adjusted session 
timeout ms: 60000
2018-06-27 06:39:27,356 WARN ConnectionStateManager 
[Curator-ConnectionStateManager-0] - Session timeout has elapsed while 
SUSPENDED. Injecting a session expiration. Elapsed ms: 229234. Adjusted session 
timeout ms: 6000{noformat}
 

>From the server side during this time:
{noformat}
2018-06-26 12:42:50,572 INFO  ZooKeeperServer - Established session 
0x164381924ca0039 with negotiated timeout 60000 for client /127.0.0.1:51616
2018-06-27 06:35:38,042 INFO  ZooKeeperServer - Expiring session 
0x164381924ca0039, timeout of 60000ms exceeded
2018-06-27 06:35:38,060 INFO  PrepRequestProcessor - Processed session 
termination for sessionid: 0x164381924ca0039
EndOfStreamException: Unable to read additional data from client sessionid 
0x164381924ca0039, likely client has closed socket
2018-06-27 06:35:38,149 INFO  NIOServerCnxn - Closed socket connection for 
client /127.0.0.1:51616 which had sessionid 0x164381924ca0039
2018-06-27 06:39:27,326 INFO  ZooKeeperServer - Client attempting to renew 
session 0x164381924ca0039 at /127.0.0.1:32892
2018-06-27 06:39:27,326 INFO  ZooKeeperServer - Invalid session 
0x164381924ca0039 for client /127.0.0.1:32892, probably expired
2018-06-27 06:39:27,326 INFO  NIOServerCnxn - Closed socket connection for 
client /127.0.0.1:32892 which had sessionid 0x164381924ca0039{noformat}

> Connection loss not leading to ConnectionState.LOST and causing logs to fill
> ----------------------------------------------------------------------------
>
>                 Key: CURATOR-405
>                 URL: https://issues.apache.org/jira/browse/CURATOR-405
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Framework
>    Affects Versions: 3.3.0
>            Reporter: Jordan Zimmerman
>            Priority: Major
>
> It's unclear how it got into this state, but a client started endlessly 
> logging "Session timeout has elapsed while SUSPENDED". Curator 3.0 is 
> supposed to go to LOST and inject a session expiration but it's not working 
> in this case.
> There are millions of these:
> {code}
> [2017-05-01 18:29:34,636][WARN 
> ][org.apache.curator.framework.state.ConnectionStateManager] Session timeout 
> has elapsed while SUSPENDED. Injecting a session expiration. Elapsed ms: 
> 250368206. Adjusted session timeout ms: 15000 {}
> [2017-05-01 18:29:34,637][INFO ][XXXXX] Resolved connection string from 
> [http://localhost:2180/zookeeper/clients/ensemble/connection-string?namespace=/v1]
>  to [localhost:22191,localhost:22194,localhost:22192,localhost:22193/v1] with 
> local namespace [/v1] {}
> [2017-05-01 18:29:34,637][INFO ][org.apache.zookeeper.ZooKeeperTestable] 
> injectSessionExpiration() called {}
> [2017-05-01 18:29:34,637][WARN 
> ][org.apache.curator.framework.state.ConnectionStateManager] Session timeout 
> has elapsed while SUSPENDED. Injecting a session expiration. Elapsed ms: 
> 250368207. Adjusted session timeout ms: 15000 {}
> [2017-05-01 18:29:34,637][INFO ][XXXXX] Resolved connection string from 
> [http://localhost:2180/zookeeper/clients/ensemble/connection-string?namespace=/v1]
>  to [localhost:22191,localhost:22194,localhost:22192,localhost:22193/v1] with 
> local namespace [/v1] {}
> [2017-05-01 18:29:34,637][INFO ][org.apache.zookeeper.ZooKeeperTestable] 
> injectSessionExpiration() called {}
> [2017-05-01 18:29:34,637][WARN 
> ][org.apache.curator.framework.state.ConnectionStateManager] Session timeout 
> has elapsed while SUSPENDED. Injecting a session expiration. Elapsed ms: 
> 250368207. Adjusted session timeout ms: 15000 {}
> [2017-05-01 18:29:34,637][INFO ][XXXXX] Resolved connection string from 
> [http://localhost:2180/zookeeper/clients/ensemble/connection-string?namespace=/v1]
>  to [localhost:22191,localhost:22194,localhost:22192,localhost:22193/v1] with 
> local namespace [/v1] {}
> [2017-05-01 18:29:34,637][INFO ][org.apache.zookeeper.ZooKeeperTestable] 
> injectSessionExpiration() called {}
> [2017-05-01 18:29:34,637][WARN 
> ][org.apache.curator.framework.state.ConnectionStateManager] Session timeout 
> has elapsed while SUSPENDED. Injecting a session expiration. Elapsed ms: 
> 250368207. Adjusted session timeout ms: 15000 {}
> [2017-05-01 18:29:34,637][INFO ][XXXXX] Resolved connection string from 
> [http://localhost:2180/zookeeper/clients/ensemble/connection-string?namespace=/v1]
>  to [localhost:22191,localhost:22194,localhost:22192,localhost:22193/v1] with 
> local namespace [/v1] {}
> [2017-05-01 18:29:34,637][INFO ][org.apache.zookeeper.ZooKeeperTestable] 
> injectSessionExpiration() called {}
> [2017-05-01 18:29:34,637][WARN 
> ][org.apache.curator.framework.state.ConnectionStateManager] Session timeout 
> has elapsed while SUSPENDED. Injecting a session expiration. Elapsed ms: 
> 250368207. Adjusted session timeout ms: 15000 {}
> [2017-05-01 18:29:34,638][INFO ][XXXXX] Resolved connection string from 
> [http://localhost:2180/zookeeper/clients/ensemble/connection-string?namespace=/v1]
>  to [localhost:22191,localhost:22194,localhost:22192,localhost:22193/v1] with 
> local namespace [/v1] {}
> [2017-05-01 18:29:34,638][INFO ][org.apache.zookeeper.ZooKeeperTestable] 
> injectSessionExpiration() called {}
> [2017-05-01 18:29:34,638][WARN 
> ][org.apache.curator.framework.state.ConnectionStateManager] Session timeout 
> has elapsed while SUSPENDED. Injecting a session expiration. Elapsed ms: 
> 250368208. Adjusted session timeout ms: 15000 {}
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to