Re: Design document

2015-06-22 Thread Jordan Zimmerman
https://cwiki.apache.org/confluence/display/ZOOKEEPER/FAQ

http://web.stanford.edu/class/cs347/reading/zab.pdf

Start with these.

-Jordan



On June 22, 2015 at 3:34:24 PM, sajjad rizvi (sm3ri...@uwaterloo.ca) wrote:

Hi, 

I am just curious, is there any design document available to help in 
understanding the ZooKeeper code? In a research project, I have to make 
some significant changes in the quorum part of the code. Although the code 
is very elegant and self descriptive, any design document will be very 
helpful. 

Thanks, 
Sajjad Rizvi 


[jira] [Commented] (ZOOKEEPER-2210) clock_gettime is not available in os x

2015-06-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595750#comment-14595750
 ] 

Hudson commented on ZOOKEEPER-2210:
---

SUCCESS: Integrated in ZooKeeper-trunk #2734 (See 
[https://builds.apache.org/job/ZooKeeper-trunk/2734/])
ZOOKEEPER-2210: clock_gettime is not available in OS X
(Michi Mutsuzaki via rgs) (rgs: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1686767)
* /zookeeper/trunk/CHANGES.txt
* /zookeeper/trunk/src/c/src/zookeeper.c


 clock_gettime is not available in os x
 --

 Key: ZOOKEEPER-2210
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2210
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Reporter: Michi Mutsuzaki
Assignee: Michi Mutsuzaki
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2210.patch, ZOOKEEPER-2210.patch


 {noformat}
 src/zookeeper.c:286:9: warning: implicit declaration of function 
 'clock_gettime' is invalid in C99 [-Wimplicit-function-declaration]
   ret = clock_gettime(CLOCK_MONOTONIC, ts);
 ^
 src/zookeeper.c:286:23: error: use of undeclared identifier 'CLOCK_MONOTONIC'
   ret = clock_gettime(CLOCK_MONOTONIC, ts);
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Design document

2015-06-22 Thread sajjad rizvi
Hi,

I am just curious, is there any design document available to help in
understanding the ZooKeeper code? In a research project, I have to make
some significant changes in the quorum part of the code. Although the code
is very elegant and self descriptive, any design document will be very
helpful.

Thanks,
Sajjad Rizvi


Re: Design document

2015-06-22 Thread sajjad rizvi
Thank you Jordan, these are good pointers.

On Mon, Jun 22, 2015 at 4:37 PM, Jordan Zimmerman 
jor...@jordanzimmerman.com wrote:

 https://cwiki.apache.org/confluence/display/ZOOKEEPER/FAQ

 http://web.stanford.edu/class/cs347/reading/zab.pdf

 Start with these.

 -Jordan



 On June 22, 2015 at 3:34:24 PM, sajjad rizvi (sm3ri...@uwaterloo.ca)
 wrote:

 Hi,

 I am just curious, is there any design document available to help in
 understanding the ZooKeeper code? In a research project, I have to make
 some significant changes in the quorum part of the code. Although the code
 is very elegant and self descriptive, any design document will be very
 helpful.

 Thanks,
 Sajjad Rizvi




[jira] [Commented] (ZOOKEEPER-2218) Close IO Streams in finally block

2015-06-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597030#comment-14597030
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2218:
---

GitHub user sugartxy opened a pull request:

https://github.com/apache/zookeeper/pull/36

#ZOOKEEPER-2218 Close IO Streams in finally block

Place the close method in the finally clause, so we can ensure it always 
runs regardless of how the method exits.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sugartxy/zookeeper CloseRightly

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/36.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #36


commit 90745d7476504630c1d68772c26546b28639ba91
Author: sugartxy tgt...@163.com
Date:   2015-06-23T02:36:11Z

#ZOOKEEPER-2218 Close IO Streams in finally block

Place the close method in the finally clause, so we can ensure it always
runs regardless of how the method exits.




 Close IO Streams in finally block
 -

 Key: ZOOKEEPER-2218
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2218
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Tang Xinye
Priority: Critical

 The problem here is that if an exception is thrown during the read process 
 the method will exit without closing the stream and hence without releasing 
 the file system resources, it may run out of resources before it does run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (ZOOKEEPER-2218) Close IO Streams in finally block

2015-06-22 Thread Raul Gutierrez Segales (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales reopened ZOOKEEPER-2218:
---

 Close IO Streams in finally block
 -

 Key: ZOOKEEPER-2218
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2218
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Tang Xinye
Priority: Critical

 The problem here is that if an exception is thrown during the read process 
 the method will exit without closing the stream and hence without releasing 
 the file system resources, it may run out of resources before it does run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2218) Close IO Streams in finally block

2015-06-22 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597125#comment-14597125
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2218:
---

Thanks for the patch [~tgttxy]! Lets reopen the issue though, since it hasn't 
been merged yet. 

 Close IO Streams in finally block
 -

 Key: ZOOKEEPER-2218
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2218
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Tang Xinye
Priority: Critical

 The problem here is that if an exception is thrown during the read process 
 the method will exit without closing the stream and hence without releasing 
 the file system resources, it may run out of resources before it does run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2193) reconfig command completes even if parameter is wrong obviously

2015-06-22 Thread Yasuhito Fukuda (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597155#comment-14597155
 ] 

Yasuhito Fukuda commented on ZOOKEEPER-2193:


Thank you for your review. I attached v8 patch based on your comments.
and, I posted a new diff on the reviewboard.
https://reviews.apache.org/r/35204/diff/4-5/

 reconfig command completes even if parameter is wrong obviously
 ---

 Key: ZOOKEEPER-2193
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2193
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection, server
Affects Versions: 3.5.0
 Environment: CentOS7 + Java7
Reporter: Yasuhito Fukuda
Assignee: Yasuhito Fukuda
 Attachments: ZOOKEEPER-2193-v2.patch, ZOOKEEPER-2193-v3.patch, 
 ZOOKEEPER-2193-v4.patch, ZOOKEEPER-2193-v5.patch, ZOOKEEPER-2193-v6.patch, 
 ZOOKEEPER-2193-v7.patch, ZOOKEEPER-2193-v8.patch, ZOOKEEPER-2193.patch


 Even if reconfig parameter is wrong, it was confirmed to complete.
 refer to the following.
 - Ensemble consists of four nodes
 {noformat}
 [zk: vm-101:2181(CONNECTED) 0] config
 server.1=192.168.100.101:2888:3888:participant
 server.2=192.168.100.102:2888:3888:participant
 server.3=192.168.100.103:2888:3888:participant
 server.4=192.168.100.104:2888:3888:participant
 version=1
 {noformat}
 - add node by reconfig command
 {noformat}
 [zk: vm-101:2181(CONNECTED) 9] reconfig -add 
 server.5=192.168.100.104:2888:3888:participant;0.0.0.0:2181
 Committed new configuration:
 server.1=192.168.100.101:2888:3888:participant
 server.2=192.168.100.102:2888:3888:participant
 server.3=192.168.100.103:2888:3888:participant
 server.4=192.168.100.104:2888:3888:participant
 server.5=192.168.100.104:2888:3888:participant;0.0.0.0:2181
 version=30007
 {noformat}
 server.4 and server.5 of the IP address is a duplicate.
 In this state, reader election will not work properly.
 Besides, it is assumed an ensemble will be undesirable state.
 I think that need a parameter validation when reconfig.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZOOKEEPER-2218) Close IO Streams in finally block

2015-06-22 Thread Tang Xinye (JIRA)
Tang Xinye created ZOOKEEPER-2218:
-

 Summary: Close IO Streams in finally block
 Key: ZOOKEEPER-2218
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2218
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Tang Xinye
Priority: Critical


The problem here is that if an exception is thrown during the read process the 
method will exit without closing the stream and hence without releasing the 
file system resources, it may run out of resources before it does run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (ZOOKEEPER-2218) Close IO Streams in finally block

2015-06-22 Thread Tang Xinye (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tang Xinye resolved ZOOKEEPER-2218.
---
  Resolution: Fixed
Release Note: Place the close method in the finally clause, so we can 
ensure it always runs regardless of how the method exits.

Issue resolved by pull request https://github.com/apache/zookeeper/pull/36

 Close IO Streams in finally block
 -

 Key: ZOOKEEPER-2218
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2218
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Tang Xinye
Priority: Critical

 The problem here is that if an exception is thrown during the read process 
 the method will exit without closing the stream and hence without releasing 
 the file system resources, it may run out of resources before it does run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] zookeeper pull request: #ZOOKEEPER-2218 Close IO Streams in finall...

2015-06-22 Thread sugartxy
GitHub user sugartxy opened a pull request:

https://github.com/apache/zookeeper/pull/36

#ZOOKEEPER-2218 Close IO Streams in finally block

Place the close method in the finally clause, so we can ensure it always 
runs regardless of how the method exits.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sugartxy/zookeeper CloseRightly

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/36.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #36


commit 90745d7476504630c1d68772c26546b28639ba91
Author: sugartxy tgt...@163.com
Date:   2015-06-23T02:36:11Z

#ZOOKEEPER-2218 Close IO Streams in finally block

Place the close method in the finally clause, so we can ensure it always
runs regardless of how the method exits.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (ZOOKEEPER-1792) Observers don't need to keep an in-memory copy of last commited proposals

2015-06-22 Thread Raul Gutierrez Segales (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales updated ZOOKEEPER-1792:
--
Summary: Observers don't need to keep an in-memory copy of last commited 
proposals   (was: Observers don't need to keep the an in-memory copy of last 
commited proposals )

 Observers don't need to keep an in-memory copy of last commited proposals 
 --

 Key: ZOOKEEPER-1792
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1792
 Project: ZooKeeper
  Issue Type: Improvement
Reporter: Raul Gutierrez Segales
Priority: Minor

 In FinalRequestProcessor.java#processRequest we have:
 {noformat}
  if (request.isQuorum()) {
 zks.getZKDatabase().addCommittedProposal(request);
  }
 {noformat}
 but this is only useful to the leader since committed proposals are only used 
 from LearnerHandler to sync up followers. I presume followers do need it as 
 they might become a leader at any point. But observers have no need for them, 
 so we could probably special case this for them and optimize the path for 
 them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 35204: ZOOKEEPER-2193: reconfig command completes even if parameter is wrong obviously

2015-06-22 Thread Yasuhito Fukuda

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35204/
---

(Updated 6月 23, 2015, 1:39 p.m.)


Review request for zookeeper.


Bugs: ZOOKEEPER-2193
https://issues.apache.org/jira/browse/ZOOKEEPER-2193


Repository: zookeeper-git


Description
---

See ZOOKEEPER-2193


Diffs (updated)
-

  src/java/main/org/apache/zookeeper/server/PrepRequestProcessor.java 
eb045de19c9eeb632e5f2b98c5465abcaead7740 
  src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java 
f15f831701f9c8514db5003ebd550cd3880b48c7 

Diff: https://reviews.apache.org/r/35204/diff/


Testing
---


Thanks,

Yasuhito Fukuda



[jira] [Commented] (ZOOKEEPER-2218) Close IO Streams in finally block

2015-06-22 Thread Tang Xinye (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597135#comment-14597135
 ] 

Tang Xinye commented on ZOOKEEPER-2218:
---

oops! still learning, sorry for the mistake!

 Close IO Streams in finally block
 -

 Key: ZOOKEEPER-2218
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2218
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Tang Xinye
Priority: Critical

 The problem here is that if an exception is thrown during the read process 
 the method will exit without closing the stream and hence without releasing 
 the file system resources, it may run out of resources before it does run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2193) reconfig command completes even if parameter is wrong obviously

2015-06-22 Thread Yasuhito Fukuda (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yasuhito Fukuda updated ZOOKEEPER-2193:
---
Attachment: ZOOKEEPER-2193-v8.patch

 reconfig command completes even if parameter is wrong obviously
 ---

 Key: ZOOKEEPER-2193
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2193
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection, server
Affects Versions: 3.5.0
 Environment: CentOS7 + Java7
Reporter: Yasuhito Fukuda
Assignee: Yasuhito Fukuda
 Attachments: ZOOKEEPER-2193-v2.patch, ZOOKEEPER-2193-v3.patch, 
 ZOOKEEPER-2193-v4.patch, ZOOKEEPER-2193-v5.patch, ZOOKEEPER-2193-v6.patch, 
 ZOOKEEPER-2193-v7.patch, ZOOKEEPER-2193-v8.patch, ZOOKEEPER-2193.patch


 Even if reconfig parameter is wrong, it was confirmed to complete.
 refer to the following.
 - Ensemble consists of four nodes
 {noformat}
 [zk: vm-101:2181(CONNECTED) 0] config
 server.1=192.168.100.101:2888:3888:participant
 server.2=192.168.100.102:2888:3888:participant
 server.3=192.168.100.103:2888:3888:participant
 server.4=192.168.100.104:2888:3888:participant
 version=1
 {noformat}
 - add node by reconfig command
 {noformat}
 [zk: vm-101:2181(CONNECTED) 9] reconfig -add 
 server.5=192.168.100.104:2888:3888:participant;0.0.0.0:2181
 Committed new configuration:
 server.1=192.168.100.101:2888:3888:participant
 server.2=192.168.100.102:2888:3888:participant
 server.3=192.168.100.103:2888:3888:participant
 server.4=192.168.100.104:2888:3888:participant
 server.5=192.168.100.104:2888:3888:participant;0.0.0.0:2181
 version=30007
 {noformat}
 server.4 and server.5 of the IP address is a duplicate.
 In this state, reader election will not work properly.
 Besides, it is assumed an ensemble will be undesirable state.
 I think that need a parameter validation when reconfig.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant

2015-06-22 Thread Ziyou Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597203#comment-14597203
 ] 

Ziyou Wang commented on ZOOKEEPER-2172:
---

Thanks for looking on this. I always suspect this problem may has relationship 
with the sync. Because I need to wait more time to avoid it when the cluster is 
running with a slow disk.

I upload the log files after I add the log to record the quorum packet type.

 Cluster crashes when reconfig a new node as a participant
 -

 Key: ZOOKEEPER-2172
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2172
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection, quorum, server
Affects Versions: 3.5.0
 Environment: Ubuntu 12.04 + java 7
Reporter: Ziyou Wang
Priority: Critical
 Attachments: node-1.log, node-2.log, node-3.log, zoo-1.log, 
 zoo-2-1.log, zoo-2-2.log, zoo-2-3.log, zoo-2.log, zoo-2212-1.log, 
 zoo-2212-2.log, zoo-2212-3.log, zoo-3-1.log, zoo-3-2.log, zoo-3-3.log, 
 zoo-3.log, zoo-4-1.log, zoo-4-2.log, zoo-4-3.log, zoo.cfg.dynamic.1005d, 
 zoo.cfg.dynamic.next, zookeeper-1.log, zookeeper-2.log, zookeeper-3.log


 The operations are quite simple: start three zk servers one by one, then 
 reconfig the cluster to add the new one as a participant. When I add the  
 third one, the zk cluster may enter a weird state and cannot recover.
  
   I found “2015-04-20 12:53:48,236 [myid:1] - INFO  [ProcessThread(sid:1 
 cport:-1)::PrepRequestProcessor@547] - Incremental reconfig” in node-1 log. 
 So the first node received the reconfig cmd at 12:53:48. Latter, it logged 
 “2015-04-20  12:53:52,230 [myid:1] - ERROR 
 [LearnerHandler-/10.0.0.2:55890:LearnerHandler@580] - Unexpected exception 
 causing shutdown while sock still open” and “2015-04-20 12:53:52,231 [myid:1] 
 - WARN  [LearnerHandler-/10.0.0.2:55890:LearnerHandler@595] - *** GOODBYE 
  /10.0.0.2:55890 ”. From then on, the first node and second node 
 rejected all client connections and the third node didn’t join the cluster as 
 a participant. The whole cluster was done.
  
  When the problem happened, all three nodes just used the same dynamic 
 config file zoo.cfg.dynamic.1005d which only contained the first two 
 nodes. But there was another unused dynamic config file in node-1 directory 
 zoo.cfg.dynamic.next  which already contained three nodes.
  
  When I extended the waiting time between starting the third node and 
 reconfiguring the cluster, the problem didn’t show again. So it should be a 
 race condition problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant

2015-06-22 Thread Ziyou Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ziyou Wang updated ZOOKEEPER-2172:
--
Attachment: zoo-4-3.log
zoo-4-2.log
zoo-4-1.log

Add log to record quorum packet types.

 Cluster crashes when reconfig a new node as a participant
 -

 Key: ZOOKEEPER-2172
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2172
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection, quorum, server
Affects Versions: 3.5.0
 Environment: Ubuntu 12.04 + java 7
Reporter: Ziyou Wang
Priority: Critical
 Attachments: node-1.log, node-2.log, node-3.log, zoo-1.log, 
 zoo-2-1.log, zoo-2-2.log, zoo-2-3.log, zoo-2.log, zoo-2212-1.log, 
 zoo-2212-2.log, zoo-2212-3.log, zoo-3-1.log, zoo-3-2.log, zoo-3-3.log, 
 zoo-3.log, zoo-4-1.log, zoo-4-2.log, zoo-4-3.log, zoo.cfg.dynamic.1005d, 
 zoo.cfg.dynamic.next, zookeeper-1.log, zookeeper-2.log, zookeeper-3.log


 The operations are quite simple: start three zk servers one by one, then 
 reconfig the cluster to add the new one as a participant. When I add the  
 third one, the zk cluster may enter a weird state and cannot recover.
  
   I found “2015-04-20 12:53:48,236 [myid:1] - INFO  [ProcessThread(sid:1 
 cport:-1)::PrepRequestProcessor@547] - Incremental reconfig” in node-1 log. 
 So the first node received the reconfig cmd at 12:53:48. Latter, it logged 
 “2015-04-20  12:53:52,230 [myid:1] - ERROR 
 [LearnerHandler-/10.0.0.2:55890:LearnerHandler@580] - Unexpected exception 
 causing shutdown while sock still open” and “2015-04-20 12:53:52,231 [myid:1] 
 - WARN  [LearnerHandler-/10.0.0.2:55890:LearnerHandler@595] - *** GOODBYE 
  /10.0.0.2:55890 ”. From then on, the first node and second node 
 rejected all client connections and the third node didn’t join the cluster as 
 a participant. The whole cluster was done.
  
  When the problem happened, all three nodes just used the same dynamic 
 config file zoo.cfg.dynamic.1005d which only contained the first two 
 nodes. But there was another unused dynamic config file in node-1 directory 
 zoo.cfg.dynamic.next  which already contained three nodes.
  
  When I extended the waiting time between starting the third node and 
 reconfiguring the cluster, the problem didn’t show again. So it should be a 
 race condition problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling

2015-06-22 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595965#comment-14595965
 ] 

Rakesh R commented on ZOOKEEPER-1907:
-

As per the 
[discussion|https://issues.apache.org/jira/browse/ZOOKEEPER-602?focusedCommentId=14547208page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14547208]
 re-opening this jira to backport the changes to {{branch-3.4}}. I will prepare 
a patch some time later this week.

 Improve Thread handling
 ---

 Key: ZOOKEEPER-1907
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.5.0
Reporter: Rakesh R
Assignee: Rakesh R
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, 
 ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, 
 ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, 
 ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, 
 ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch


 Server has many critical threads running and co-ordinating each other like  
 RequestProcessor chains et. When going through each threads, most of them 
 having the similar structure like:
 {code}
 public void run() {
 try {
   while(running)
// processing logic
   }
 } catch (InterruptedException e) {
 LOG.error(Unexpected interruption, e);
 } catch (Exception e) {
 LOG.error(Unexpected exception, e);
 }
 LOG.info(...exited loop!);
 }
 {code}
 From the design I could see, there could be a chance of silently leaving the 
 thread by swallowing the exception. If this happens in the production, the 
 server would get hanged forever and would not be able to deliver its role. 
 Now its hard for the management tool to detect this.
 The idea of this JIRA is to discuss and imprv.
 Reference: [Community discussion 
 thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (ZOOKEEPER-1907) Improve Thread handling

2015-06-22 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R reopened ZOOKEEPER-1907:
-

 Improve Thread handling
 ---

 Key: ZOOKEEPER-1907
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.5.0
Reporter: Rakesh R
Assignee: Rakesh R
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, 
 ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, 
 ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, 
 ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, 
 ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch


 Server has many critical threads running and co-ordinating each other like  
 RequestProcessor chains et. When going through each threads, most of them 
 having the similar structure like:
 {code}
 public void run() {
 try {
   while(running)
// processing logic
   }
 } catch (InterruptedException e) {
 LOG.error(Unexpected interruption, e);
 } catch (Exception e) {
 LOG.error(Unexpected exception, e);
 }
 LOG.info(...exited loop!);
 }
 {code}
 From the design I could see, there could be a chance of silently leaving the 
 thread by swallowing the exception. If this happens in the production, the 
 server would get hanged forever and would not be able to deliver its role. 
 Now its hard for the management tool to detect this.
 The idea of this JIRA is to discuss and imprv.
 Reference: [Community discussion 
 thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-602) log all exceptions not caught by ZK threads

2015-06-22 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595961#comment-14595961
 ] 

Rakesh R commented on ZOOKEEPER-602:


Thank you [~rgs] for the reviews and commit. Also, thank you [~fpj], [~hdeng] 
for the help in reviews. As per the [discussions in this 
jira|https://issues.apache.org/jira/browse/ZOOKEEPER-602?focusedCommentId=14547208page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14547208],
 I will re-open ZOOKEEPER-1907 for backporting it to {{branch-3.4}}.

 log all exceptions not caught by ZK threads
 ---

 Key: ZOOKEEPER-602
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-602
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client, server
Affects Versions: 3.2.1
Reporter: Patrick Hunt
Assignee: Rakesh R
Priority: Blocker
 Fix For: 3.4.7, 3.5.0

 Attachments: ZOOKEEPER-602-br3-4.patch, ZOOKEEPER-602.patch, 
 ZOOKEEPER-602.patch, ZOOKEEPER-602.patch, ZOOKEEPER-602.patch, 
 ZOOKEEPER-602.patch, ZOOKEEPER-602.patch, ZOOKEEPER-602.patch


 the java code should add a ThreadGroup exception handler that logs at ERROR 
 level any uncaught exceptions thrown by Thread run methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)