from:"guoping.gp \(JIRA\)"

[jira] [Commented] (ZOOKEEPER-2615) Zookeeper server holds onto dead/expired session ids in the watch data structures

2016-10-28 Thread guoping.gp (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15615182#comment-15615182
 ] 

guoping.gp commented on ZOOKEEPER-2615:
---

Is there any recommended fix option on this issue?

> Zookeeper server holds onto dead/expired session ids in the watch data 
> structures
> -
>
> Key: ZOOKEEPER-2615
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2615
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: guoping.gp
>Assignee: Camille Fournier
> Fix For: 3.4.10, 3.5.3, 3.6.0
>
>
> The same issue (https://issues.apache.org/jira/browse/ZOOKEEPER-1382) still 
> can be found even with zookeeper 3.4.6.
> this issue cause our production zookeeper cluster leak about 1 million 
> watchs, after restart the server one by one, the watch count decrease to only 
> about 4.
> I can reproduce the issue on my mac,here it is:
> 
> pguodeMacBook-Air:bin pguo$ echo srvr | nc localhost 6181
> Zookeeper version: 3.4.6-1569965, built on 02/20/2014 09:09 GMT
> Latency min/avg/max: 0/1156/128513
> Received: 539
> Sent: 531
> Connections: 1
> Outstanding: 0
> Zxid: 0x10037
> Mode: follower
> Node count: 5
> 
> pguodeMacBook-Air:bin pguo$ echo cons | nc localhost 6181
>  
> /127.0.0.1:55759[1](queued=0,recved=5,sent=5,sid=0x157be2732de,lop=PING,est=1476372631116,to=15000,lcxid=0x1,lzxid=0x,lresp=1476372646260,llat=8,minlat=0,avglat=6,maxlat=17)
>  /0:0:0:0:0:0:0:1:55767[0](queued=0,recved=1,sent=0)
> 
> pguodeMacBook-Air:bin pguo$ echo wchp | nc localhost 6181
> /curator_exists_watch
>   0x357be48e4d90007
>   0x357be48e4d90009
>   0x157be2732de
> as above 4-letter's report shows, 0x357be48e4d90007 and 0x357be48e4d90009 
> are leaked after the two sessions expired 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures

2016-10-13 Thread guoping.gp (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15574315#comment-15574315
 ] 

guoping.gp edited comment on ZOOKEEPER-1382 at 10/14/16 5:47 AM:
-

the same issue is still can be found on 3.4.6, 
https://issues.apache.org/jira/browse/ZOOKEEPER-2615.

when the client app reconnect to a server, the setWatchs and close still stay 
in the race condition as following possible:

Daemon Thread [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:6181] (Suspended 
(breakpoint at line 966 in NIOServerCnxn)) 
NIOServerCnxn.close() line: 966 
NIOServerCnxn.doIO(SelectionKey) line: 360  
NIOServerCnxnFactory.run() line: 208
Thread.run() line: 745

and then

Thread [CommitProcessor:1] (Suspended (breakpoint at line 303 in 
FinalRequestProcessor))
FinalRequestProcessor.processRequest(Request) line: 303 
CommitProcessor.run() line: 74

the issue cause our production leak about 1 million watchs


was (Author: guoping.gp):
the same issue is still can be found on 3.4.6, 
https://issues.apache.org/jira/browse/ZOOKEEPER-2615.

when the client app reconnect to a server, the setWatchs and close still stay 
in the race condition as following possible:

Daemon Thread [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:6181] (Suspended 
(breakpoint at line 966 in NIOServerCnxn)) 
NIOServerCnxn.close() line: 966 
NIOServerCnxn.doIO(SelectionKey) line: 360  
NIOServerCnxnFactory.run() line: 208
Thread.run() line: 745

and then

Thread [CommitProcessor:1] (Suspended (breakpoint at line 303 in 
FinalRequestProcessor))
FinalRequestProcessor.processRequest(Request) line: 303 
CommitProcessor.run() line: 74

> Zookeeper server holds onto dead/expired session ids in the watch data 
> structures
> -
>
> Key: ZOOKEEPER-1382
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.5
>Reporter: Neha Narkhede
>Assignee: Germán Blanco
>Priority: Critical
> Fix For: 3.4.6, 3.5.0
>
> Attachments: ZOOKEEPER-1382-branch-3.4.patch, 
> ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382-branch-3.4.patch, 
> ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, 
> ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, 
> ZOOKEEPER-1382_3.3.4.patch
>
>
> I've observed that zookeeper server holds onto expired session ids in the 
> watcher data structures. The result is the wchp command reports session ids 
> that cannot be found through cons/dump and those expired session ids sit 
> there maybe until the server is restarted. Here are snippets from the client 
> and the server logs that lead to this state, for one particular session id 
> 0x134485fd7bcb26f -
> There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 
> and I'm using ZkClient to connect to the cluster
> From the application log -
> application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] 
> [main-SendThread(223.prod:12913)] [application Session establishment complete 
> on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, 
> negotiated timeout = 6000
> application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] 
> [main-SendThread(223.prod:12913)] [application] Client session timed out, 
> have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing 
> socket connection and attempting reconnect
> application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] 
> [main-SendThread(226.prod:12913)] [application] Unable to reconnect to 
> ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket 
> connection
> On the leader zk, 225 -
> zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
> [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, 
> timeout of 6000ms exceeded
> zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
> [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination 
> for sessionid: 0x134485fd7bcb26f
> On the server, the client was initially connected to, 223 -
> zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO  
> [CommitProcessor:1:NIOServerCnxn@1580] - Established session 
> 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020
> zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO  
> [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client 
> /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f
> Here are the log snippets from 226, which is the server, the client 
> reconnected to, before getting session expired event -
> 2012-01-27 09:52:38,190 - INFO

[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures

2016-10-13 Thread guoping.gp (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15574315#comment-15574315
 ] 

guoping.gp commented on ZOOKEEPER-1382:
---

the same issue is still can be found on 3.4.6, 
https://issues.apache.org/jira/browse/ZOOKEEPER-2615.

when the client app reconnect to a server, the setWatchs and close still stay 
in the race condition as following possible:

Daemon Thread [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:6181] (Suspended 
(breakpoint at line 966 in NIOServerCnxn)) 
NIOServerCnxn.close() line: 966 
NIOServerCnxn.doIO(SelectionKey) line: 360  
NIOServerCnxnFactory.run() line: 208
Thread.run() line: 745

and then

Thread [CommitProcessor:1] (Suspended (breakpoint at line 303 in 
FinalRequestProcessor))
FinalRequestProcessor.processRequest(Request) line: 303 
CommitProcessor.run() line: 74

> Zookeeper server holds onto dead/expired session ids in the watch data 
> structures
> -
>
> Key: ZOOKEEPER-1382
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.5
>Reporter: Neha Narkhede
>Assignee: Germán Blanco
>Priority: Critical
> Fix For: 3.4.6, 3.5.0
>
> Attachments: ZOOKEEPER-1382-branch-3.4.patch, 
> ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382-branch-3.4.patch, 
> ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, 
> ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, 
> ZOOKEEPER-1382_3.3.4.patch
>
>
> I've observed that zookeeper server holds onto expired session ids in the 
> watcher data structures. The result is the wchp command reports session ids 
> that cannot be found through cons/dump and those expired session ids sit 
> there maybe until the server is restarted. Here are snippets from the client 
> and the server logs that lead to this state, for one particular session id 
> 0x134485fd7bcb26f -
> There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 
> and I'm using ZkClient to connect to the cluster
> From the application log -
> application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] 
> [main-SendThread(223.prod:12913)] [application Session establishment complete 
> on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, 
> negotiated timeout = 6000
> application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] 
> [main-SendThread(223.prod:12913)] [application] Client session timed out, 
> have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing 
> socket connection and attempting reconnect
> application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] 
> [main-SendThread(226.prod:12913)] [application] Unable to reconnect to 
> ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket 
> connection
> On the leader zk, 225 -
> zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
> [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, 
> timeout of 6000ms exceeded
> zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
> [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination 
> for sessionid: 0x134485fd7bcb26f
> On the server, the client was initially connected to, 223 -
> zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO  
> [CommitProcessor:1:NIOServerCnxn@1580] - Established session 
> 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020
> zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO  
> [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client 
> /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f
> Here are the log snippets from 226, which is the server, the client 
> reconnected to, before getting session expired event -
> 2012-01-27 09:52:38,190 - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client 
> attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367
> 2012-01-27 09:52:38,191 - INFO  
> [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 
> 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired
> 2012-01-27 09:52:38,191 - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed 
> socket connection for client /172.17.136.82:49367 which had sessionid 
> 0x134485fd7bcb26f
> wchp output from 226, taken on 01/30 -
> nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
> *226.*wchp* | wc -l
> 3
> wchp output from 223, taken on 01/30 -
> nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
> *223.*wchp* | wc -l
> 0
> cons output from

[jira] [Comment Edited] (ZOOKEEPER-2615) Zookeeper server holds onto dead/expired session ids in the watch data structures

2016-10-13 Thread guoping.gp (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15574296#comment-15574296
 ] 

guoping.gp edited comment on ZOOKEEPER-2615 at 10/14/16 5:41 AM:
-

REPRODUCE TEST CASE STEP BY STEP

1).  start zookeeper cluster, says server1(localhost:6181), 
server2(localhost:4181), server3(localhost:2181)
let's assume now server2 is the LEADER.

2). use zkCli.sh create a persistence node /curator_exists_watch

3). use Curator Framework2.9.1 to create a simple Java Client Application,  as 
following:

*

import org.apache.curator.framework.CuratorFramework;
import org.apache.curator.framework.CuratorFrameworkFactory;
import org.apache.curator.framework.api.CuratorWatcher;
import org.apache.curator.retry.ExponentialBackoffRetry;
import org.apache.zookeeper.WatchedEvent;

public class ExistingWatchDemo {

public static void main(String[] args) throws Exception {

CuratorFrameworkFactory.Builder builder = 
CuratorFrameworkFactory.builder();
builder = builder.connectString("localhost:2181,localhost:6181")
 .sessionTimeoutMs(15000)
 .connectionTimeoutMs(1)
.retryPolicy(new 
ExponentialBackoffRetry(5000, Integer.MAX_VALUE));

final CuratorFramework zkClient = builder.build();

zkClient.start();

CuratorWatcher watcher = new CuratorWatcher() {

public void process(WatchedEvent event) throws 
Exception {
 System.out.println("received event " + event);
 System.out.println("watch again ...");

zkClient.checkExists().usingWatcher(this).forPath("/curator_exists_watch");
}

};


zkClient.checkExists().usingWatcher(watcher).forPath("/curator_exists_watch");

Thread.sleep(600 * 1000);

}

}

*

to stay us to reproduce in more convenience, here we provide only 2 servers in 
the connection string instead of 3 both, says as server1(FOLLOWING) 
server3(FOLLOWING)

4).  breakpoint at 2 places: 
 
FinalRequestProcessor.java: line 303 : 
zks.getZKDatabase().setWatches(relativeZxid, 
NIOServerCnxn.java: line 966: 
synchronized(factory.cnxns){


5).  running the client app ExistingWatchDemo.java, let say the client 
application connection to server3(2181)

6). use the IDE make a remote debug session connect to another server of the 
connection string which we provided, here server1(6181)

7). now kill -STOP server3(2181), the client app will find the server hang and 
reconnect to  server1(6181), we can confirm the reconnect in
the client's log:

-
016-10-14 13:31:21,505 [myid:] - INFO  
[main-SendThread(localhost:2181):ClientCnxn$SendThread@1096] - Client session 
timed out, have not heard from server in 3754ms for sessionid 
0x357c1a4462b000e, closing socket connection and attempting reconnect
2016-10-14 13:31:22,389 [myid:] - INFO  
[main-SendThread(localhost:6181):ClientCnxn$SendThread@975] - Opening socket 
connection to server localhost/0:0:0:0:0:0:0:1:6181. Will not attempt to 
authenticate using SASL (unknown error)
2016-10-14 13:31:22,392 [myid:] - INFO  
[main-SendThread(localhost:6181):ClientCnxn$SendThread@852] - Socket connection 
established to localhost/0:0:0:0:0:0:0:1:6181, initiating session
2016-10-14 13:31:22,392 [myid:] - DEBUG 
[main-SendThread(localhost:6181):ClientCnxn$SendThread@892] - Session 
establishment request sent on localhost/0:0:0:0:0:0:0:1:6181
2016-10-14 13:31:22,407 [myid:] - INFO  
[main-SendThread(localhost:6181):ClientCnxn$SendThread@1235] - Session 
establishment complete on server localhost/0:0:0:0:0:0:0:1:6181, sessionid = 
0x357c1a4462b000e, negotiated timeout = 15000
-

the reconnect will hit the breakpoint FinalRequestProcessor.java: line 303 as 
following:

Thread [CommitProcessor:1] (Suspended (breakpoint at line 303 in 
FinalRequestProcessor))
FinalRequestProcessor.processRequest(Request) line: 303 
CommitProcessor.run() line: 74  

8). now kill the process of  the client app, kill -9 , this 
will let us another breakpoint hit, as following
 
Daemon Thread [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:6181] (Suspended 
(breakpoint at line 966 in NIOServerCnxn)) 
NIOServerCnxn.close() line: 966 
NIOServerCnxn.doIO(SelectionKey) line: 360  
NIOServerCnxnFactory.run() line: 208
Thread.run() line: 745  

9).  now resume the breakpoint(NIOServerCnxn:966) in step 8, after that, then 
resume the breakpoint (FinalRequestProcessor:303) in step 7

10).

[jira] [Commented] (ZOOKEEPER-2615) Zookeeper server holds onto dead/expired session ids in the watch data structures

2016-10-13 Thread guoping.gp (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15574296#comment-15574296
 ] 

guoping.gp commented on ZOOKEEPER-2615:
---

REPRODUCE TEST CASE STEP BY STEP

1. start zookeeper cluster, says server1(localhost:6181), 
server2(localhost:4181), server3(localhost:2181)
let's assume now server2 is the LEADER.

2. use zkCli.sh create a persistence node /curator_exists_watch

3. use Curator Framework2.9.1 to create a simple Java Client Application,  as 
following:

*

import org.apache.curator.framework.CuratorFramework;
import org.apache.curator.framework.CuratorFrameworkFactory;
import org.apache.curator.framework.api.CuratorWatcher;
import org.apache.curator.retry.ExponentialBackoffRetry;
import org.apache.zookeeper.WatchedEvent;

public class ExistingWatchDemo {

public static void main(String[] args) throws Exception {

CuratorFrameworkFactory.Builder builder = 
CuratorFrameworkFactory.builder();
builder = builder.connectString("localhost:2181,localhost:6181")
 .sessionTimeoutMs(15000)
 .connectionTimeoutMs(1)
.retryPolicy(new 
ExponentialBackoffRetry(5000, Integer.MAX_VALUE));

final CuratorFramework zkClient = builder.build();

zkClient.start();

CuratorWatcher watcher = new CuratorWatcher() {

public void process(WatchedEvent event) throws 
Exception {
 System.out.println("received event " + event);
 System.out.println("watch again ...");

zkClient.checkExists().usingWatcher(this).forPath("/curator_exists_watch");
}

};


zkClient.checkExists().usingWatcher(watcher).forPath("/curator_exists_watch");

Thread.sleep(600 * 1000);

}

}

*

to stay us to reproduce in more convenience, here we provide only 2 servers in 
the connection string instead of 3 both, says as server1(FOLLOWING) 
server3(FOLLOWING)

4.  breakpoint at 2 places: 
 
FinalRequestProcessor.java: line 303 : 
zks.getZKDatabase().setWatches(relativeZxid, 
NIOServerCnxn.java: line 966: 
synchronized(factory.cnxns){


5. running the client app ExistingWatchDemo.java, let say the client 
application connection to server3(2181)

6. use the IDE make a remote debug session connect to another server of the 
connection string which we provided, here server1(6181)

7. now kill -STOP server3(2181), the client app will find the server hang and 
reconnect to  server1(6181), we can confirm the reconnect in
the client's log:

-
016-10-14 13:31:21,505 [myid:] - INFO  
[main-SendThread(localhost:2181):ClientCnxn$SendThread@1096] - Client session 
timed out, have not heard from server in 3754ms for sessionid 
0x357c1a4462b000e, closing socket connection and attempting reconnect
2016-10-14 13:31:22,389 [myid:] - INFO  
[main-SendThread(localhost:6181):ClientCnxn$SendThread@975] - Opening socket 
connection to server localhost/0:0:0:0:0:0:0:1:6181. Will not attempt to 
authenticate using SASL (unknown error)
2016-10-14 13:31:22,392 [myid:] - INFO  
[main-SendThread(localhost:6181):ClientCnxn$SendThread@852] - Socket connection 
established to localhost/0:0:0:0:0:0:0:1:6181, initiating session
2016-10-14 13:31:22,392 [myid:] - DEBUG 
[main-SendThread(localhost:6181):ClientCnxn$SendThread@892] - Session 
establishment request sent on localhost/0:0:0:0:0:0:0:1:6181
2016-10-14 13:31:22,407 [myid:] - INFO  
[main-SendThread(localhost:6181):ClientCnxn$SendThread@1235] - Session 
establishment complete on server localhost/0:0:0:0:0:0:0:1:6181, sessionid = 
0x357c1a4462b000e, negotiated timeout = 15000
-

the reconnect will hit the breakpoint FinalRequestProcessor.java: line 303 as 
following:

Thread [CommitProcessor:1] (Suspended (breakpoint at line 303 in 
FinalRequestProcessor))
FinalRequestProcessor.processRequest(Request) line: 303 
CommitProcessor.run() line: 74  

8. now kill the process of  the client app, kill -9 , this 
will let us another breakpoint hit, as following
 
Daemon Thread [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:6181] (Suspended 
(breakpoint at line 966 in NIOServerCnxn)) 
NIOServerCnxn.close() line: 966 
NIOServerCnxn.doIO(SelectionKey) line: 360  
NIOServerCnxnFactory.run() line: 208
Thread.run() line: 745  

9.  now resume the breakpoint(NIOServerCnxn:966) in step 8, after that, then 
resume the breakpoint (FinalRequestProcessor:303) in step 7

10. now use wchp and cons to check the leak watch, as following

[jira] [Created] (ZOOKEEPER-2615) Zookeeper server holds onto dead/expired session ids in the watch data structures

2016-10-13 Thread guoping.gp (JIRA)

guoping.gp created ZOOKEEPER-2615:
-

 Summary: Zookeeper server holds onto dead/expired session ids in 
the watch data structures
 Key: ZOOKEEPER-2615
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2615
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.6
Reporter: guoping.gp


The same issue (https://issues.apache.org/jira/browse/ZOOKEEPER-1382) still can 
be found even with zookeeper 3.4.6.

this issue cause our production zookeeper cluster leak about 1 million watchs, 
after restart the server one by one, the watch count decrease to only about 
4.

I can reproduce the issue on my mac,here it is:


pguodeMacBook-Air:bin pguo$ echo srvr | nc localhost 6181
Zookeeper version: 3.4.6-1569965, built on 02/20/2014 09:09 GMT
Latency min/avg/max: 0/1156/128513
Received: 539
Sent: 531
Connections: 1
Outstanding: 0
Zxid: 0x10037
Mode: follower
Node count: 5

pguodeMacBook-Air:bin pguo$ echo cons | nc localhost 6181
 
/127.0.0.1:55759[1](queued=0,recved=5,sent=5,sid=0x157be2732de,lop=PING,est=1476372631116,to=15000,lcxid=0x1,lzxid=0x,lresp=1476372646260,llat=8,minlat=0,avglat=6,maxlat=17)
 /0:0:0:0:0:0:0:1:55767[0](queued=0,recved=1,sent=0)


pguodeMacBook-Air:bin pguo$ echo wchp | nc localhost 6181
/curator_exists_watch
0x357be48e4d90007
0x357be48e4d90009
0x157be2732de


as above 4-letter's report shows,   0x357be48e4d90007 and 0x357be48e4d90009 
are leaked after the two sessions expired 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2615) Zookeeper server holds onto dead/expired session ids in the watch data structures

[jira] [Comment Edited] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures

[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures

[jira] [Comment Edited] (ZOOKEEPER-2615) Zookeeper server holds onto dead/expired session ids in the watch data structures

[jira] [Commented] (ZOOKEEPER-2615) Zookeeper server holds onto dead/expired session ids in the watch data structures

[jira] [Created] (ZOOKEEPER-2615) Zookeeper server holds onto dead/expired session ids in the watch data structures

6 matches

Site Navigation

Mail list logo

Footer information