[jira] [Updated] (ZOOKEEPER-1105) c client zookeeper_close not send CLOSE_OP request to server

2011-06-22 Thread jiang guangran (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiang guangran updated ZOOKEEPER-1105:
--

Summary: c client zookeeper_close not send CLOSE_OP request to server  
(was: zookeeper_close not send CLOSE_OP request to server)

> c client zookeeper_close not send CLOSE_OP request to server
> 
>
> Key: ZOOKEEPER-1105
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1105
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.3.2
>Reporter: jiang guangran
>
> in zookeeper_close function,  do adaptor_finish before send CLOSE_OP request 
> to server
> so the CLOSE_OP request can not be sent to server
> in server zookeeper.log have many
> 2011-06-22 00:23:02,323 - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@634] - 
> EndOfStreamException: Unable to read additional data from client sessionid 
> 0x1305970d66d2224, likely client has closed socket
> 2011-06-22 00:23:02,324 - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1435] - Closed 
> socket connection for client /10.250.8.123:60257 which had sessionid 
> 0x1305970d66d2224
> 2011-06-22 00:23:02,325 - ERROR [CommitProcessor:1:NIOServerCnxn@445] - 
> Unexpected Exception:
> java.nio.channels.CancelledKeyException
> at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
> at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
> at 
> org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:418)
> at 
> org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1509)
> at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:367)
> at 
> org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)
> and java client not have this problem

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (ZOOKEEPER-1105) zookeeper_close not send CLOSE_OP request to server

2011-06-22 Thread jiang guangran (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiang guangran updated ZOOKEEPER-1105:
--

Summary: zookeeper_close not send CLOSE_OP request to server  (was: 
zookeeper_close can send CLOSE_OP request to server)

> zookeeper_close not send CLOSE_OP request to server
> ---
>
> Key: ZOOKEEPER-1105
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1105
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.3.2
>Reporter: jiang guangran
>
> in zookeeper_close function,  do adaptor_finish before send CLOSE_OP request 
> to server
> so the CLOSE_OP request can not be sent to server
> in server zookeeper.log have many
> 2011-06-22 00:23:02,323 - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@634] - 
> EndOfStreamException: Unable to read additional data from client sessionid 
> 0x1305970d66d2224, likely client has closed socket
> 2011-06-22 00:23:02,324 - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1435] - Closed 
> socket connection for client /10.250.8.123:60257 which had sessionid 
> 0x1305970d66d2224
> 2011-06-22 00:23:02,325 - ERROR [CommitProcessor:1:NIOServerCnxn@445] - 
> Unexpected Exception:
> java.nio.channels.CancelledKeyException
> at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
> at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
> at 
> org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:418)
> at 
> org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1509)
> at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:367)
> at 
> org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)
> and java client not have this problem

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (ZOOKEEPER-1106) mt c client core when create node

2011-06-22 Thread jiang guangran (JIRA)
mt c client core  when create node
--

 Key: ZOOKEEPER-1106
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1106
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.3.2
Reporter: jiang guangran


in deserialize_CreateResponse
   rc = rc ? : in->deserialize_String(in, "path", &v->path);
   in deserialize_String
  len = -1
  so v->path is uninitialised, and free, so core

do_io thread
#0  0x0039fb030265 in raise () from /lib64/libc.so.6
#1  0x0039fb031d10 in abort () from /lib64/libc.so.6
#2  0x0039fb06a84b in __libc_message () from /lib64/libc.so.6
#3  0x0039fb0722ef in _int_free () from /lib64/libc.so.6
#4  0x0039fb07273b in free () from /lib64/libc.so.6
#5  0x2b0afd755dd1 in deallocate_String (s=0x5a490f40) at src/recordio.c:29
#6  0x2b0afd754ade in zookeeper_process (zh=0x131e3870, events=) at src/zookeeper.c:2071
#7  0x2b0afd75b2ef in do_io (v=) at 
src/mt_adaptor.c:310
#8  0x0039fb8064a7 in start_thread () from /lib64/libpthread.so.0
#9  0x0039fb0d3c2d in clone () from /lib64/libc.so.6

create_node thread
#0  0x0039fb80ab99 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x2b0afd75af5c in wait_sync_completion (sc=0x131e4c90) at 
src/mt_adaptor.c:82
#2  0x2b0afd751750 in zoo_create (zh=0x131e3870, path=0x13206fa8 
"/jsq/zr2/hb/10.250.8.139:8102", 
value=0x131e86a8 
"\n\021\061\060.250.8.139:8102\022\035/home/shaoqiang/workdir2/qrs/\030\001 
\001*%\n\020\n", 
valuelen=102, acl=0x2b0afd961700, flags=1, path_buffer=0x0, 
path_buffer_len=0) at src/zookeeper.c:3028


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (ZOOKEEPER-1105) zookeeper_close can send CLOSE_OP request to server

2011-06-22 Thread jiang guangran (JIRA)
zookeeper_close can send CLOSE_OP request to server
---

 Key: ZOOKEEPER-1105
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1105
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.3.2
Reporter: jiang guangran


in zookeeper_close function,  do adaptor_finish before send CLOSE_OP request to 
server
so the CLOSE_OP request can not be sent to server

in server zookeeper.log have many
2011-06-22 00:23:02,323 - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@634] - 
EndOfStreamException: Unable to read additional data from client sessionid 
0x1305970d66d2224, likely client has closed socket
2011-06-22 00:23:02,324 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1435] - Closed socket 
connection for client /10.250.8.123:60257 which had sessionid 0x1305970d66d2224
2011-06-22 00:23:02,325 - ERROR [CommitProcessor:1:NIOServerCnxn@445] - 
Unexpected Exception:
java.nio.channels.CancelledKeyException
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
at 
org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:418)
at 
org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1509)
at 
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:367)
at 
org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)

and java client not have this problem

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

2011-06-22 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053656#comment-13053656
 ] 

Benjamin Reed commented on ZOOKEEPER-1046:
--

+1 good find. sorry i missed the contrib.

> Creating a new sequential node results in a ZNODEEXISTS error
> -
>
> Key: ZOOKEEPER-1046
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.3.2, 3.3.3
> Environment: A 3 node-cluster running Debian squeeze.
>Reporter: Jeremy Stribling
>Assignee: Vishal K
>Priority: Blocker
>  Labels: sequence
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, 
> ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, 
> ZOOKEEPER-1046_2.patch, zookeeper-1046-3, zookeeper-1046-4.patch, 
> zookeeper-1046-5.patch
>
>
> On several occasions, I've seen a create() with the sequential flag set fail 
> with a ZNODEEXISTS error, and I don't think that should ever be possible.  In 
> past runs, I've been able to closely inspect the state of the system with the 
> command line client, and saw that the parent znode's cversion is smaller than 
> the sequential number of existing children znode under that parent.  In one 
> example:
> {noformat}
> [zk:(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/002d_record120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach 
> them to this JIRA.  The logs are named as nodeX..log, and each 
> new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its 
> log (node1.0x4-0x5.log) you can see the first (of many) instances of the 
> following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO 
> org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level 
> KeeperException when processing sessionid:0x512f466bd44e0002 type:create 
> cxid:0x4da376ab zxid:0xfffe txntype:unknown reqpath:n/a Error 
> Path:/zkrsm/00b2_record0001761440 Error:KeeperErrorCode = 
> NodeExists for /zkrsm/00b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this 
> error message on a sequential node create, and just continually retries.  The 
> message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes 
> into play.
> I don't see anything terribly fishy in the transition between the epochs; the 
> correct snapshots seem to be getting transferred, etc.  Unfortunately I don't 
> have a ZK snapshot/log that exhibits the problem when starting with a fresh 
> system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a 
> bug in our application code.  (They are assigned randomly, but are supposed 
> to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the 
> ZooKeeperServer classes whenever a new node wants to join (without restarting 
> the entire application process).  This is why you'll see messages like the 
> following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO 
> org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in 
> node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how 
> this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1064) Startup script needs more LSB compatability

2011-06-22 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053563#comment-13053563
 ] 

Ted Dunning commented on ZOOKEEPER-1064:


Awesome!



> Startup script needs more LSB compatability
> ---
>
> Key: ZOOKEEPER-1064
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1064
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.3.2
>Reporter: Ted Dunning
> Fix For: 3.2.3, 3.3.3, 3.3.4
>
>
> The zkServer.sh script kind of sort of implements the standard init.d style 
> of interaction.
> It lacks
> - nice return codes
> - status method
> - standard output messages
> See 
> http://refspecs.freestandards.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html
> and
> http://refspecs.freestandards.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptfunc.html
> and
> http://wiki.debian.org/LSBInitScripts
> It is an open question how much zkServer should use these LSB scripts because 
> that may impair portability.  I
> think it should produce similar messages, however, and should return 
> standardized error codes.  If lsb functions
> are available, I think that they should be used so that ZK works as a first 
> class citizen.
> I will produce a proposed patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1064) Startup script needs more LSB compatability

2011-06-22 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053558#comment-13053558
 ] 

Eric Yang commented on ZOOKEEPER-1064:
--

In ZOOKEEPER-999, src/packages/deb/init.d script is supporting LSB 
compatibility.  You probably don't need to do anything if the patch is 
committed.

> Startup script needs more LSB compatability
> ---
>
> Key: ZOOKEEPER-1064
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1064
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.3.2
>Reporter: Ted Dunning
> Fix For: 3.2.3, 3.3.3, 3.3.4
>
>
> The zkServer.sh script kind of sort of implements the standard init.d style 
> of interaction.
> It lacks
> - nice return codes
> - status method
> - standard output messages
> See 
> http://refspecs.freestandards.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html
> and
> http://refspecs.freestandards.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptfunc.html
> and
> http://wiki.debian.org/LSBInitScripts
> It is an open question how much zkServer should use these LSB scripts because 
> that may impair portability.  I
> think it should produce similar messages, however, and should return 
> standardized error codes.  If lsb functions
> are available, I think that they should be used so that ZK works as a first 
> class citizen.
> I will produce a proposed patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-999) Create an package integration project

2011-06-22 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053556#comment-13053556
 ] 

Eric Yang commented on ZOOKEEPER-999:
-

I am not sure how the build system check for ant tar failed.  It appears to be 
working from the console output.

> Create an package integration project
> -
>
> Key: ZOOKEEPER-999
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-999
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: build
> Environment: Java 6, RHEL/Ubuntu
>Reporter: Eric Yang
>Assignee: Eric Yang
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-999-1.patch, ZOOKEEPER-999-2.patch, 
> ZOOKEEPER-999-3.patch, ZOOKEEPER-999.patch
>
>
> This goal of this ticket is to generate a set of RPM/debian package which 
> integrate well with RPM sets created by HADOOP-6255.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

2011-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053457#comment-13053457
 ] 

Hadoop QA commented on ZOOKEEPER-1046:
--

+1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12483497/zookeeper-1046-5.patch
  against trunk revision 1138595.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 35 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/349//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/349//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/349//console

This message is automatically generated.

> Creating a new sequential node results in a ZNODEEXISTS error
> -
>
> Key: ZOOKEEPER-1046
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.3.2, 3.3.3
> Environment: A 3 node-cluster running Debian squeeze.
>Reporter: Jeremy Stribling
>Assignee: Vishal K
>Priority: Blocker
>  Labels: sequence
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, 
> ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, 
> ZOOKEEPER-1046_2.patch, zookeeper-1046-3, zookeeper-1046-4.patch, 
> zookeeper-1046-5.patch
>
>
> On several occasions, I've seen a create() with the sequential flag set fail 
> with a ZNODEEXISTS error, and I don't think that should ever be possible.  In 
> past runs, I've been able to closely inspect the state of the system with the 
> command line client, and saw that the parent znode's cversion is smaller than 
> the sequential number of existing children znode under that parent.  In one 
> example:
> {noformat}
> [zk:(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/002d_record120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach 
> them to this JIRA.  The logs are named as nodeX..log, and each 
> new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its 
> log (node1.0x4-0x5.log) you can see the first (of many) instances of the 
> following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO 
> org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level 
> KeeperException when processing sessionid:0x512f466bd44e0002 type:create 
> cxid:0x4da376ab zxid:0xfffe txntype:unknown reqpath:n/a Error 
> Path:/zkrsm/00b2_record0001761440 Error:KeeperErrorCode = 
> NodeExists for /zkrsm/00b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this 
> error message on a sequential node create, and just continually retries.  The 
> message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes 
> into play.
> I don't see anything terribly fishy in the transition between the epochs; the 
> correct snapshots seem to be getting transferred, etc.  Unfortunately I don't 
> have a ZK snapshot/log that exhibits the problem when starting with a fresh 
> system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a 
> bug in our application code.  (They are assigned randomly, but are supposed 
> to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the 
> ZooKeeperServer classes whenever a new node wants to join (without restarting 
> the entire application process)

[jira] [Updated] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

2011-06-22 Thread Camille Fournier (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Camille Fournier updated ZOOKEEPER-1046:


Attachment: zookeeper-1046-5.patch

> Creating a new sequential node results in a ZNODEEXISTS error
> -
>
> Key: ZOOKEEPER-1046
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.3.2, 3.3.3
> Environment: A 3 node-cluster running Debian squeeze.
>Reporter: Jeremy Stribling
>Assignee: Vishal K
>Priority: Blocker
>  Labels: sequence
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, 
> ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, 
> ZOOKEEPER-1046_2.patch, zookeeper-1046-3, zookeeper-1046-4.patch, 
> zookeeper-1046-5.patch
>
>
> On several occasions, I've seen a create() with the sequential flag set fail 
> with a ZNODEEXISTS error, and I don't think that should ever be possible.  In 
> past runs, I've been able to closely inspect the state of the system with the 
> command line client, and saw that the parent znode's cversion is smaller than 
> the sequential number of existing children znode under that parent.  In one 
> example:
> {noformat}
> [zk:(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/002d_record120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach 
> them to this JIRA.  The logs are named as nodeX..log, and each 
> new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its 
> log (node1.0x4-0x5.log) you can see the first (of many) instances of the 
> following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO 
> org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level 
> KeeperException when processing sessionid:0x512f466bd44e0002 type:create 
> cxid:0x4da376ab zxid:0xfffe txntype:unknown reqpath:n/a Error 
> Path:/zkrsm/00b2_record0001761440 Error:KeeperErrorCode = 
> NodeExists for /zkrsm/00b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this 
> error message on a sequential node create, and just continually retries.  The 
> message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes 
> into play.
> I don't see anything terribly fishy in the transition between the epochs; the 
> correct snapshots seem to be getting transferred, etc.  Unfortunately I don't 
> have a ZK snapshot/log that exhibits the problem when starting with a fresh 
> system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a 
> bug in our application code.  (They are assigned randomly, but are supposed 
> to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the 
> ZooKeeperServer classes whenever a new node wants to join (without restarting 
> the entire application process).  This is why you'll see messages like the 
> following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO 
> org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in 
> node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how 
> this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1103) In QuorumTest, use the same "for ( .. try { break } catch { } )" pattern in testFollowersStartAfterLeaders as in testSessionMove.

2011-06-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053430#comment-13053430
 ] 

Hudson commented on ZOOKEEPER-1103:
---

Integrated in ZooKeeper-trunk #1221 (See 
[https://builds.apache.org/job/ZooKeeper-trunk/1221/])
Fixed a problem introduced by the first patch for ZOOKEEPER-1103 (phunt)

phunt : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1138595
Files : 
* /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/QuorumTest.java


> In QuorumTest, use the same "for ( .. try { break } catch { } )" pattern in 
> testFollowersStartAfterLeaders as in testSessionMove.
> -
>
> Key: ZOOKEEPER-1103
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1103
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: tests
>Affects Versions: 3.3.3, 3.4.0
>Reporter: Eugene Koontz
>Assignee: Eugene Koontz
>Priority: Minor
> Fix For: 3.3.4, 3.4.0
>
> Attachments: ZOOKEEPER-1103.patch, ZOOKEEPER-1103_2.patch, 
> ZOOKEEPER-1103_branch_3_3.patch, ZOOKEEPER-1103_branch_3_3_try2.patch
>
>
> Patrick Hunt writes: 
> "Such uses of sleep [used in testFollowersStartAfterLeader] are just asking 
> for trouble. Take a look at the use
> of sleep in testSessionMove in the same class for a better way to do
> this. I had gone through all the tests a while back, replacing all the
> "sleep(x)" with something like this testSessionMove pattern (retry
> with a max limit that's very long). During reviews we should look for
> anti-patterns like this and address them before commit."
> So, modify testFollowersStartAfterLeaders to use the same retrying approach 
> that testSessionMove uses.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1068) Documentation and default config suggest incorrect location for Zookeeper state

2011-06-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053429#comment-13053429
 ] 

Hudson commented on ZOOKEEPER-1068:
---

Integrated in ZooKeeper-trunk #1221 (See 
[https://builds.apache.org/job/ZooKeeper-trunk/1221/])


> Documentation and default config suggest incorrect location for Zookeeper 
> state
> ---
>
> Key: ZOOKEEPER-1068
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1068
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: documentation, scripts
>Reporter: Roman Shaposhnik
>Assignee: Roman Shaposhnik
>Priority: Minor
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-1068.patch.txt
>
>
> Documentation and default config suggest /var/zookeeper as a value for 
> dataDir. This practice is, strictly speaking, incompatible with UNIX/Linux 
> filesystem layout standards (e.g. 
> http://www.s-gms.ms.edus.si/cgi-bin/man-cgi?filesystem+5 , 
> http://tldp.org/LDP/Linux-Filesystem-Hierarchy/html/index.html  ). 
> Even though Zookeeper use is not limited to UNIX-like OSes I'd recommend that 
> we change references to /var/zookeeper to /var/lib/zookeeper

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

2011-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053425#comment-13053425
 ] 

Hadoop QA commented on ZOOKEEPER-1046:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12483488/zookeeper-1046-4.patch
  against trunk revision 1138213.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 30 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The patch appears to cause tar ant target to fail.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/348//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/348//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/348//console

This message is automatically generated.

> Creating a new sequential node results in a ZNODEEXISTS error
> -
>
> Key: ZOOKEEPER-1046
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.3.2, 3.3.3
> Environment: A 3 node-cluster running Debian squeeze.
>Reporter: Jeremy Stribling
>Assignee: Vishal K
>Priority: Blocker
>  Labels: sequence
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, 
> ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, 
> ZOOKEEPER-1046_2.patch, zookeeper-1046-3, zookeeper-1046-4.patch
>
>
> On several occasions, I've seen a create() with the sequential flag set fail 
> with a ZNODEEXISTS error, and I don't think that should ever be possible.  In 
> past runs, I've been able to closely inspect the state of the system with the 
> command line client, and saw that the parent znode's cversion is smaller than 
> the sequential number of existing children znode under that parent.  In one 
> example:
> {noformat}
> [zk:(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/002d_record120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach 
> them to this JIRA.  The logs are named as nodeX..log, and each 
> new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its 
> log (node1.0x4-0x5.log) you can see the first (of many) instances of the 
> following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO 
> org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level 
> KeeperException when processing sessionid:0x512f466bd44e0002 type:create 
> cxid:0x4da376ab zxid:0xfffe txntype:unknown reqpath:n/a Error 
> Path:/zkrsm/00b2_record0001761440 Error:KeeperErrorCode = 
> NodeExists for /zkrsm/00b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this 
> error message on a sequential node create, and just continually retries.  The 
> message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes 
> into play.
> I don't see anything terribly fishy in the transition between the epochs; the 
> correct snapshots seem to be getting transferred, etc.  Unfortunately I don't 
> have a ZK snapshot/log that exhibits the problem when starting with a fresh 
> system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a 
> bug in our application code.  (They are assigned randomly, but are supposed 
> to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the 
> ZooKeeperServer classes whenever a new node wants to join (without restarting 
> the entire application process).  This is why you'll see messages like the 
> following i

[jira] [Updated] (ZOOKEEPER-1103) In QuorumTest, use the same "for ( .. try { break } catch { } )" pattern in testFollowersStartAfterLeaders as in testSessionMove.

2011-06-22 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-1103:


Attachment: ZOOKEEPER-1103_branch_3_3_try2.patch
ZOOKEEPER-1103_2.patch

> In QuorumTest, use the same "for ( .. try { break } catch { } )" pattern in 
> testFollowersStartAfterLeaders as in testSessionMove.
> -
>
> Key: ZOOKEEPER-1103
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1103
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: tests
>Affects Versions: 3.3.3, 3.4.0
>Reporter: Eugene Koontz
>Assignee: Eugene Koontz
>Priority: Minor
> Fix For: 3.3.4, 3.4.0
>
> Attachments: ZOOKEEPER-1103.patch, ZOOKEEPER-1103_2.patch, 
> ZOOKEEPER-1103_branch_3_3.patch, ZOOKEEPER-1103_branch_3_3_try2.patch
>
>
> Patrick Hunt writes: 
> "Such uses of sleep [used in testFollowersStartAfterLeader] are just asking 
> for trouble. Take a look at the use
> of sleep in testSessionMove in the same class for a better way to do
> this. I had gone through all the tests a while back, replacing all the
> "sleep(x)" with something like this testSessionMove pattern (retry
> with a max limit that's very long). During reviews we should look for
> anti-patterns like this and address them before commit."
> So, modify testFollowersStartAfterLeaders to use the same retrying approach 
> that testSessionMove uses.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (ZOOKEEPER-1103) In QuorumTest, use the same "for ( .. try { break } catch { } )" pattern in testFollowersStartAfterLeaders as in testSessionMove.

2011-06-22 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt resolved ZOOKEEPER-1103.
-

Resolution: Fixed

Updated to check success/fail correctly. (sorry for missing that!)

> In QuorumTest, use the same "for ( .. try { break } catch { } )" pattern in 
> testFollowersStartAfterLeaders as in testSessionMove.
> -
>
> Key: ZOOKEEPER-1103
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1103
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: tests
>Affects Versions: 3.3.3, 3.4.0
>Reporter: Eugene Koontz
>Assignee: Eugene Koontz
>Priority: Minor
> Fix For: 3.3.4, 3.4.0
>
> Attachments: ZOOKEEPER-1103.patch, ZOOKEEPER-1103_2.patch, 
> ZOOKEEPER-1103_branch_3_3.patch, ZOOKEEPER-1103_branch_3_3_try2.patch
>
>
> Patrick Hunt writes: 
> "Such uses of sleep [used in testFollowersStartAfterLeader] are just asking 
> for trouble. Take a look at the use
> of sleep in testSessionMove in the same class for a better way to do
> this. I had gone through all the tests a while back, replacing all the
> "sleep(x)" with something like this testSessionMove pattern (retry
> with a max limit that's very long). During reviews we should look for
> anti-patterns like this and address them before commit."
> So, modify testFollowersStartAfterLeaders to use the same retrying approach 
> that testSessionMove uses.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

2011-06-22 Thread Camille Fournier (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Camille Fournier updated ZOOKEEPER-1046:


Attachment: zookeeper-1046-4.patch

> Creating a new sequential node results in a ZNODEEXISTS error
> -
>
> Key: ZOOKEEPER-1046
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.3.2, 3.3.3
> Environment: A 3 node-cluster running Debian squeeze.
>Reporter: Jeremy Stribling
>Assignee: Vishal K
>Priority: Blocker
>  Labels: sequence
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, 
> ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, 
> ZOOKEEPER-1046_2.patch, zookeeper-1046-3, zookeeper-1046-4.patch
>
>
> On several occasions, I've seen a create() with the sequential flag set fail 
> with a ZNODEEXISTS error, and I don't think that should ever be possible.  In 
> past runs, I've been able to closely inspect the state of the system with the 
> command line client, and saw that the parent znode's cversion is smaller than 
> the sequential number of existing children znode under that parent.  In one 
> example:
> {noformat}
> [zk:(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/002d_record120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach 
> them to this JIRA.  The logs are named as nodeX..log, and each 
> new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its 
> log (node1.0x4-0x5.log) you can see the first (of many) instances of the 
> following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO 
> org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level 
> KeeperException when processing sessionid:0x512f466bd44e0002 type:create 
> cxid:0x4da376ab zxid:0xfffe txntype:unknown reqpath:n/a Error 
> Path:/zkrsm/00b2_record0001761440 Error:KeeperErrorCode = 
> NodeExists for /zkrsm/00b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this 
> error message on a sequential node create, and just continually retries.  The 
> message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes 
> into play.
> I don't see anything terribly fishy in the transition between the epochs; the 
> correct snapshots seem to be getting transferred, etc.  Unfortunately I don't 
> have a ZK snapshot/log that exhibits the problem when starting with a fresh 
> system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a 
> bug in our application code.  (They are assigned randomly, but are supposed 
> to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the 
> ZooKeeperServer classes whenever a new node wants to join (without restarting 
> the entire application process).  This is why you'll see messages like the 
> following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO 
> org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in 
> node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how 
> this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

2011-06-22 Thread Camille Fournier (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053412#comment-13053412
 ] 

Camille Fournier commented on ZOOKEEPER-1046:
-

Ah, here is the error:

zookeeperbuildcontrib.compile:
 [echo] contrib: loggraph
[javac] Compiling 34 source files to 
/grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/contrib/loggraph/classes
[javac] 
/grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/src/contrib/loggraph/src/java/org/apache/zookeeper/graph/TxnLogSource.java:185:
 deserializeTxn(byte[],org.apache.zookeeper.txn.TxnHeader) in 
org.apache.zookeeper.server.util.SerializeUtils cannot be applied to 
(org.apache.jute.InputArchive,org.apache.zookeeper.txn.TxnHeader)
[javac] Record r = SerializeUtils.deserializeTxn(iab, hdr);
[javac]  ^
[javac] 
/grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/src/contrib/loggraph/src/java/org/apache/zookeeper/graph/TxnLogSource.java:333:
 deserializeTxn(byte[],org.apache.zookeeper.txn.TxnHeader) in 
org.apache.zookeeper.server.util.SerializeUtils cannot be applied to 
(org.apache.jute.InputArchive,org.apache.zookeeper.txn.TxnHeader)
[javac] Record r = SerializeUtils.deserializeTxn(iab, hdr);
[javac]  ^
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
[javac] 2 errors



> Creating a new sequential node results in a ZNODEEXISTS error
> -
>
> Key: ZOOKEEPER-1046
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.3.2, 3.3.3
> Environment: A 3 node-cluster running Debian squeeze.
>Reporter: Jeremy Stribling
>Assignee: Vishal K
>Priority: Blocker
>  Labels: sequence
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, 
> ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, 
> ZOOKEEPER-1046_2.patch, zookeeper-1046-3
>
>
> On several occasions, I've seen a create() with the sequential flag set fail 
> with a ZNODEEXISTS error, and I don't think that should ever be possible.  In 
> past runs, I've been able to closely inspect the state of the system with the 
> command line client, and saw that the parent znode's cversion is smaller than 
> the sequential number of existing children znode under that parent.  In one 
> example:
> {noformat}
> [zk:(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/002d_record120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach 
> them to this JIRA.  The logs are named as nodeX..log, and each 
> new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its 
> log (node1.0x4-0x5.log) you can see the first (of many) instances of the 
> following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO 
> org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level 
> KeeperException when processing sessionid:0x512f466bd44e0002 type:create 
> cxid:0x4da376ab zxid:0xfffe txntype:unknown reqpath:n/a Error 
> Path:/zkrsm/00b2_record0001761440 Error:KeeperErrorCode = 
> NodeExists for /zkrsm/00b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this 
> error message on a sequential node create, and just continually retries.  The 
> message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes 
> into play.
> I don't see anything terribly fishy in the transition between the epochs; the 
> correct snapshots seem to be getting transferred, etc.  Unfortunately I don't 
> have a ZK snapshot/log that exhibits the problem when starting with a fresh 
> system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a 
> bug in our application code.  (They are assigned randomly, but are supposed 
> to be consistent across restarts.)
> * We manage node membership dynamically

[jira] [Reopened] (ZOOKEEPER-1103) In QuorumTest, use the same "for ( .. try { break } catch { } )" pattern in testFollowersStartAfterLeaders as in testSessionMove.

2011-06-22 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt reopened ZOOKEEPER-1103:
-


My bad, there's an issue with this change (didn't show up on my testing but I 
can see why on jenkins). I'll fix this.

> In QuorumTest, use the same "for ( .. try { break } catch { } )" pattern in 
> testFollowersStartAfterLeaders as in testSessionMove.
> -
>
> Key: ZOOKEEPER-1103
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1103
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: tests
>Affects Versions: 3.3.3, 3.4.0
>Reporter: Eugene Koontz
>Assignee: Eugene Koontz
>Priority: Minor
> Fix For: 3.3.4, 3.4.0
>
> Attachments: ZOOKEEPER-1103.patch, ZOOKEEPER-1103_branch_3_3.patch
>
>
> Patrick Hunt writes: 
> "Such uses of sleep [used in testFollowersStartAfterLeader] are just asking 
> for trouble. Take a look at the use
> of sleep in testSessionMove in the same class for a better way to do
> this. I had gone through all the tests a while back, replacing all the
> "sleep(x)" with something like this testSessionMove pattern (retry
> with a max limit that's very long). During reviews we should look for
> anti-patterns like this and address them before commit."
> So, modify testFollowersStartAfterLeaders to use the same retrying approach 
> that testSessionMove uses.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

2011-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053399#comment-13053399
 ] 

Hadoop QA commented on ZOOKEEPER-1046:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12483478/zookeeper-1046-3
  against trunk revision 1138213.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 30 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The patch appears to cause tar ant target to fail.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/347//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/347//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/347//console

This message is automatically generated.

> Creating a new sequential node results in a ZNODEEXISTS error
> -
>
> Key: ZOOKEEPER-1046
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.3.2, 3.3.3
> Environment: A 3 node-cluster running Debian squeeze.
>Reporter: Jeremy Stribling
>Assignee: Vishal K
>Priority: Blocker
>  Labels: sequence
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, 
> ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, 
> ZOOKEEPER-1046_2.patch, zookeeper-1046-3
>
>
> On several occasions, I've seen a create() with the sequential flag set fail 
> with a ZNODEEXISTS error, and I don't think that should ever be possible.  In 
> past runs, I've been able to closely inspect the state of the system with the 
> command line client, and saw that the parent znode's cversion is smaller than 
> the sequential number of existing children znode under that parent.  In one 
> example:
> {noformat}
> [zk:(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/002d_record120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach 
> them to this JIRA.  The logs are named as nodeX..log, and each 
> new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its 
> log (node1.0x4-0x5.log) you can see the first (of many) instances of the 
> following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO 
> org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level 
> KeeperException when processing sessionid:0x512f466bd44e0002 type:create 
> cxid:0x4da376ab zxid:0xfffe txntype:unknown reqpath:n/a Error 
> Path:/zkrsm/00b2_record0001761440 Error:KeeperErrorCode = 
> NodeExists for /zkrsm/00b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this 
> error message on a sequential node create, and just continually retries.  The 
> message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes 
> into play.
> I don't see anything terribly fishy in the transition between the epochs; the 
> correct snapshots seem to be getting transferred, etc.  Unfortunately I don't 
> have a ZK snapshot/log that exhibits the problem when starting with a fresh 
> system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a 
> bug in our application code.  (They are assigned randomly, but are supposed 
> to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the 
> ZooKeeperServer classes whenever a new node wants to join (without restarting 
> the entire application process).  This is why you'll see messages like the 
> following in node1.0x4-0x5.log before a ne

[jira] [Updated] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

2011-06-22 Thread Camille Fournier (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Camille Fournier updated ZOOKEEPER-1046:


Attachment: zookeeper-1046-3

removed printlns

> Creating a new sequential node results in a ZNODEEXISTS error
> -
>
> Key: ZOOKEEPER-1046
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.3.2, 3.3.3
> Environment: A 3 node-cluster running Debian squeeze.
>Reporter: Jeremy Stribling
>Assignee: Vishal K
>Priority: Blocker
>  Labels: sequence
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, 
> ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, 
> ZOOKEEPER-1046_2.patch, zookeeper-1046-3
>
>
> On several occasions, I've seen a create() with the sequential flag set fail 
> with a ZNODEEXISTS error, and I don't think that should ever be possible.  In 
> past runs, I've been able to closely inspect the state of the system with the 
> command line client, and saw that the parent znode's cversion is smaller than 
> the sequential number of existing children znode under that parent.  In one 
> example:
> {noformat}
> [zk:(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/002d_record120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach 
> them to this JIRA.  The logs are named as nodeX..log, and each 
> new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its 
> log (node1.0x4-0x5.log) you can see the first (of many) instances of the 
> following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO 
> org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level 
> KeeperException when processing sessionid:0x512f466bd44e0002 type:create 
> cxid:0x4da376ab zxid:0xfffe txntype:unknown reqpath:n/a Error 
> Path:/zkrsm/00b2_record0001761440 Error:KeeperErrorCode = 
> NodeExists for /zkrsm/00b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this 
> error message on a sequential node create, and just continually retries.  The 
> message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes 
> into play.
> I don't see anything terribly fishy in the transition between the epochs; the 
> correct snapshots seem to be getting transferred, etc.  Unfortunately I don't 
> have a ZK snapshot/log that exhibits the problem when starting with a fresh 
> system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a 
> bug in our application code.  (They are assigned randomly, but are supposed 
> to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the 
> ZooKeeperServer classes whenever a new node wants to join (without restarting 
> the entire application process).  This is why you'll see messages like the 
> following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO 
> org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in 
> node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how 
> this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

2011-06-22 Thread Camille Fournier (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053379#comment-13053379
 ] 

Camille Fournier commented on ZOOKEEPER-1046:
-

Checked in 3.3 last thursday (1136440)

Should we be worried about the -1 javac from jenkins? I don't know what that 
error means since clearly this thing can compile if it can pass tests.

> Creating a new sequential node results in a ZNODEEXISTS error
> -
>
> Key: ZOOKEEPER-1046
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.3.2, 3.3.3
> Environment: A 3 node-cluster running Debian squeeze.
>Reporter: Jeremy Stribling
>Assignee: Vishal K
>Priority: Blocker
>  Labels: sequence
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, 
> ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, 
> ZOOKEEPER-1046_2.patch
>
>
> On several occasions, I've seen a create() with the sequential flag set fail 
> with a ZNODEEXISTS error, and I don't think that should ever be possible.  In 
> past runs, I've been able to closely inspect the state of the system with the 
> command line client, and saw that the parent znode's cversion is smaller than 
> the sequential number of existing children znode under that parent.  In one 
> example:
> {noformat}
> [zk:(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/002d_record120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach 
> them to this JIRA.  The logs are named as nodeX..log, and each 
> new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its 
> log (node1.0x4-0x5.log) you can see the first (of many) instances of the 
> following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO 
> org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level 
> KeeperException when processing sessionid:0x512f466bd44e0002 type:create 
> cxid:0x4da376ab zxid:0xfffe txntype:unknown reqpath:n/a Error 
> Path:/zkrsm/00b2_record0001761440 Error:KeeperErrorCode = 
> NodeExists for /zkrsm/00b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this 
> error message on a sequential node create, and just continually retries.  The 
> message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes 
> into play.
> I don't see anything terribly fishy in the transition between the epochs; the 
> correct snapshots seem to be getting transferred, etc.  Unfortunately I don't 
> have a ZK snapshot/log that exhibits the problem when starting with a fresh 
> system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a 
> bug in our application code.  (They are assigned randomly, but are supposed 
> to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the 
> ZooKeeperServer classes whenever a new node wants to join (without restarting 
> the entire application process).  This is why you'll see messages like the 
> following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO 
> org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in 
> node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how 
> this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1102) Need update for programmer manual to cover multi operation

2011-06-22 Thread sreekanth (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053378#comment-13053378
 ] 

sreekanth commented on ZOOKEEPER-1102:
--

hi

> Need update for programmer manual to cover multi operation
> --
>
> Key: ZOOKEEPER-1102
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1102
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Ted Dunning
>
> The new multi operation is undocumented as yet.  Clearly it needs some doc to 
> cover:
> 1) the basic syntax
> 2) java code sample
> 3) C code sample

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1104) CLONE - In QuorumTest, use the same "for ( .. try { break } catch { } )" pattern in testFollowersStartAfterLeaders as in testSessionMove.

2011-06-22 Thread sreekanth (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053376#comment-13053376
 ] 

sreekanth commented on ZOOKEEPER-1104:
--

f

> CLONE - In QuorumTest, use the same "for ( .. try { break } catch { } )" 
> pattern in testFollowersStartAfterLeaders as in testSessionMove.
> -
>
> Key: ZOOKEEPER-1104
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1104
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: tests
>Affects Versions: 3.3.3, 3.4.0
>Reporter: sreekanth
>Assignee: Eugene Koontz
>Priority: Minor
> Fix For: 3.3.4, 3.4.0
>
> Attachments: ZOOKEEPER-1103.patch, ZOOKEEPER-1103_branch_3_3.patch
>
>
> Patrick Hunt writes: 
> "Such uses of sleep [used in testFollowersStartAfterLeader] are just asking 
> for trouble. Take a look at the use
> of sleep in testSessionMove in the same class for a better way to do
> this. I had gone through all the tests a while back, replacing all the
> "sleep(x)" with something like this testSessionMove pattern (retry
> with a max limit that's very long). During reviews we should look for
> anti-patterns like this and address them before commit."
> So, modify testFollowersStartAfterLeaders to use the same retrying approach 
> that testSessionMove uses.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (ZOOKEEPER-1104) CLONE - In QuorumTest, use the same "for ( .. try { break } catch { } )" pattern in testFollowersStartAfterLeaders as in testSessionMove.

2011-06-22 Thread sreekanth (JIRA)
CLONE - In QuorumTest, use the same "for ( .. try { break } catch { } )" 
pattern in testFollowersStartAfterLeaders as in testSessionMove.
-

 Key: ZOOKEEPER-1104
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1104
 Project: ZooKeeper
  Issue Type: Improvement
  Components: tests
Affects Versions: 3.3.3, 3.4.0
Reporter: sreekanth
Assignee: Eugene Koontz
Priority: Minor
 Fix For: 3.3.4, 3.4.0
 Attachments: ZOOKEEPER-1103.patch, ZOOKEEPER-1103_branch_3_3.patch

Patrick Hunt writes: 

"Such uses of sleep [used in testFollowersStartAfterLeader] are just asking for 
trouble. Take a look at the use
of sleep in testSessionMove in the same class for a better way to do
this. I had gone through all the tests a while back, replacing all the
"sleep(x)" with something like this testSessionMove pattern (retry
with a max limit that's very long). During reviews we should look for
anti-patterns like this and address them before commit."

So, modify testFollowersStartAfterLeaders to use the same retrying approach 
that testSessionMove uses.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

2011-06-22 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053371#comment-13053371
 ] 

Benjamin Reed commented on ZOOKEEPER-1046:
--

oops, i missed those. yeah if you could remove and commit that would be great. 
btw, does the 3.3 patch still need to go in?

> Creating a new sequential node results in a ZNODEEXISTS error
> -
>
> Key: ZOOKEEPER-1046
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.3.2, 3.3.3
> Environment: A 3 node-cluster running Debian squeeze.
>Reporter: Jeremy Stribling
>Assignee: Vishal K
>Priority: Blocker
>  Labels: sequence
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, 
> ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, 
> ZOOKEEPER-1046_2.patch
>
>
> On several occasions, I've seen a create() with the sequential flag set fail 
> with a ZNODEEXISTS error, and I don't think that should ever be possible.  In 
> past runs, I've been able to closely inspect the state of the system with the 
> command line client, and saw that the parent znode's cversion is smaller than 
> the sequential number of existing children znode under that parent.  In one 
> example:
> {noformat}
> [zk:(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/002d_record120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach 
> them to this JIRA.  The logs are named as nodeX..log, and each 
> new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its 
> log (node1.0x4-0x5.log) you can see the first (of many) instances of the 
> following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO 
> org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level 
> KeeperException when processing sessionid:0x512f466bd44e0002 type:create 
> cxid:0x4da376ab zxid:0xfffe txntype:unknown reqpath:n/a Error 
> Path:/zkrsm/00b2_record0001761440 Error:KeeperErrorCode = 
> NodeExists for /zkrsm/00b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this 
> error message on a sequential node create, and just continually retries.  The 
> message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes 
> into play.
> I don't see anything terribly fishy in the transition between the epochs; the 
> correct snapshots seem to be getting transferred, etc.  Unfortunately I don't 
> have a ZK snapshot/log that exhibits the problem when starting with a fresh 
> system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a 
> bug in our application code.  (They are assigned randomly, but are supposed 
> to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the 
> ZooKeeperServer classes whenever a new node wants to join (without restarting 
> the entire application process).  This is why you'll see messages like the 
> following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO 
> org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in 
> node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how 
> this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2011-06-22 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053348#comment-13053348
 ] 

Flavio Junqueira commented on ZOOKEEPER-702:


Your plan sounds reasonable to me, Mahadev.

I have reviewed this patch some time back, and I'm happy to review it again if 
necessary.

> GSoC 2010: Failure Detector Model
> -
>
> Key: ZOOKEEPER-702
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
> Project: ZooKeeper
>  Issue Type: Wish
>Reporter: Henry Robinson
>Assignee: Abmar Barros
>  Labels: gsoc, mentor
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, chen-pseudo.txt, 
> phiaccrual-pseudo.txt, phiaccrual-pseudo.txt
>
>
> Failure Detector Module
> Possible Mentor
> Henry Robinson (henry at apache dot org)
> Requirements
> Java, some distributed systems knowledge, comfort implementing distributed 
> systems protocols
> Description
> ZooKeeper servers detects the failure of other servers and clients by 
> counting the number of 'ticks' for which it doesn't get a heartbeat from 
> other machines. This is the 'timeout' method of failure detection and works 
> very well; however it is possible that it is too aggressive and not easily 
> tuned for some more unusual ZooKeeper installations (such as in a wide-area 
> network, or even in a mobile ad-hoc network).
> This project would abstract the notion of failure detection to a dedicated 
> Java module, and implement several failure detectors to compare and contrast 
> their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
> phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
> is much more tunable and has some very interesting properties. This is a 
> great project if you are interested in distributed algorithms, or want to 
> help re-factor some of ZooKeeper's internal code.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

2011-06-22 Thread Camille Fournier (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053345#comment-13053345
 ] 

Camille Fournier commented on ZOOKEEPER-1046:
-

Looked at the patch. Besides a couple extraneous printlns it looks ok to me. 
Shall I remove these myself and commit this to trunk?

> Creating a new sequential node results in a ZNODEEXISTS error
> -
>
> Key: ZOOKEEPER-1046
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.3.2, 3.3.3
> Environment: A 3 node-cluster running Debian squeeze.
>Reporter: Jeremy Stribling
>Assignee: Vishal K
>Priority: Blocker
>  Labels: sequence
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, 
> ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, 
> ZOOKEEPER-1046_2.patch
>
>
> On several occasions, I've seen a create() with the sequential flag set fail 
> with a ZNODEEXISTS error, and I don't think that should ever be possible.  In 
> past runs, I've been able to closely inspect the state of the system with the 
> command line client, and saw that the parent znode's cversion is smaller than 
> the sequential number of existing children znode under that parent.  In one 
> example:
> {noformat}
> [zk:(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/002d_record120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach 
> them to this JIRA.  The logs are named as nodeX..log, and each 
> new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its 
> log (node1.0x4-0x5.log) you can see the first (of many) instances of the 
> following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO 
> org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level 
> KeeperException when processing sessionid:0x512f466bd44e0002 type:create 
> cxid:0x4da376ab zxid:0xfffe txntype:unknown reqpath:n/a Error 
> Path:/zkrsm/00b2_record0001761440 Error:KeeperErrorCode = 
> NodeExists for /zkrsm/00b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this 
> error message on a sequential node create, and just continually retries.  The 
> message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes 
> into play.
> I don't see anything terribly fishy in the transition between the epochs; the 
> correct snapshots seem to be getting transferred, etc.  Unfortunately I don't 
> have a ZK snapshot/log that exhibits the problem when starting with a fresh 
> system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a 
> bug in our application code.  (They are assigned randomly, but are supposed 
> to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the 
> ZooKeeperServer classes whenever a new node wants to join (without restarting 
> the entire application process).  This is why you'll see messages like the 
> following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO 
> org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in 
> node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how 
> this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Review Request: ZOOKEEPER-1046: Creating a new sequential node results in a ZNODEEXISTS error

2011-06-22 Thread Camille Fournier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/942/#review883
---



/src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java


Should remove this println



/src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java


Should remove this println


Otherwise looks good I think

- Camille


On 2011-06-22 16:55:12, Camille Fournier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/942/
> ---
> 
> (Updated 2011-06-22 16:55:12)
> 
> 
> Review request for zookeeper and Benjamin Reed.
> 
> 
> Summary
> ---
> 
> see https://issues.apache.org/jira/browse/ZOOKEEPER-1046
> 
> 
> Diffs
> -
> 
>   /src/java/main/org/apache/zookeeper/server/DataNode.java 1136231 
>   /src/java/main/org/apache/zookeeper/server/DataTree.java 1136231 
>   /src/java/main/org/apache/zookeeper/server/LogFormatter.java 1136231 
>   /src/java/main/org/apache/zookeeper/server/PrepRequestProcessor.java 
> 1136231 
>   /src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java 
> 1136231 
>   /src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java 
> 1136231 
>   /src/java/main/org/apache/zookeeper/server/quorum/Follower.java 1136231 
>   /src/java/main/org/apache/zookeeper/server/quorum/Learner.java 1136231 
>   /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java 
> 1136231 
>   /src/java/main/org/apache/zookeeper/server/quorum/Observer.java 1136231 
>   /src/java/main/org/apache/zookeeper/server/upgrade/UpgradeSnapShotV1.java 
> 1136231 
>   /src/java/main/org/apache/zookeeper/server/util/SerializeUtils.java 1136231 
>   /src/java/test/org/apache/zookeeper/server/DataTreeUnitTest.java 1136231 
>   /src/java/test/org/apache/zookeeper/server/DeserializationPerfTest.java 
> 1136231 
>   /src/java/test/org/apache/zookeeper/server/SerializationPerfTest.java 
> 1136231 
>   /src/java/test/org/apache/zookeeper/test/DataTreeTest.java 1136231 
>   /src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java 1136231 
>   /src/zookeeper.jute 1136231 
> 
> Diff: https://reviews.apache.org/r/942/diff
> 
> 
> Testing
> ---
> 
> unit testing
> 
> 
> Thanks,
> 
> Camille
> 
>



Review Request: ZOOKEEPER-1046: Creating a new sequential node results in a ZNODEEXISTS error

2011-06-22 Thread Camille Fournier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/942/
---

Review request for zookeeper and Benjamin Reed.


Summary
---

see https://issues.apache.org/jira/browse/ZOOKEEPER-1046


Diffs
-

  /src/java/main/org/apache/zookeeper/server/DataNode.java 1136231 
  /src/java/main/org/apache/zookeeper/server/DataTree.java 1136231 
  /src/java/main/org/apache/zookeeper/server/LogFormatter.java 1136231 
  /src/java/main/org/apache/zookeeper/server/PrepRequestProcessor.java 1136231 
  /src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java 
1136231 
  /src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java 
1136231 
  /src/java/main/org/apache/zookeeper/server/quorum/Follower.java 1136231 
  /src/java/main/org/apache/zookeeper/server/quorum/Learner.java 1136231 
  /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java 1136231 
  /src/java/main/org/apache/zookeeper/server/quorum/Observer.java 1136231 
  /src/java/main/org/apache/zookeeper/server/upgrade/UpgradeSnapShotV1.java 
1136231 
  /src/java/main/org/apache/zookeeper/server/util/SerializeUtils.java 1136231 
  /src/java/test/org/apache/zookeeper/server/DataTreeUnitTest.java 1136231 
  /src/java/test/org/apache/zookeeper/server/DeserializationPerfTest.java 
1136231 
  /src/java/test/org/apache/zookeeper/server/SerializationPerfTest.java 1136231 
  /src/java/test/org/apache/zookeeper/test/DataTreeTest.java 1136231 
  /src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java 1136231 
  /src/zookeeper.jute 1136231 

Diff: https://reviews.apache.org/r/942/diff


Testing
---

unit testing


Thanks,

Camille



[jira] [Commented] (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2011-06-22 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053326#comment-13053326
 ] 

Mahadev konar commented on ZOOKEEPER-702:
-

camille/flavio,
 I am also a little hesitant to add this to 3.4. Can we do this: Ill cut out a 
branch this weekend and then we immediately check it in into trunk to make sure 
we get this into 3.5? 3.4 seems to have quite a bit of features and we'll need 
some effort to stabilize 3.4. Any more features will just increase our effort 
to stabilization of the release. What do you guys think? 

> GSoC 2010: Failure Detector Model
> -
>
> Key: ZOOKEEPER-702
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
> Project: ZooKeeper
>  Issue Type: Wish
>Reporter: Henry Robinson
>Assignee: Abmar Barros
>  Labels: gsoc, mentor
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, chen-pseudo.txt, 
> phiaccrual-pseudo.txt, phiaccrual-pseudo.txt
>
>
> Failure Detector Module
> Possible Mentor
> Henry Robinson (henry at apache dot org)
> Requirements
> Java, some distributed systems knowledge, comfort implementing distributed 
> systems protocols
> Description
> ZooKeeper servers detects the failure of other servers and clients by 
> counting the number of 'ticks' for which it doesn't get a heartbeat from 
> other machines. This is the 'timeout' method of failure detection and works 
> very well; however it is possible that it is too aggressive and not easily 
> tuned for some more unusual ZooKeeper installations (such as in a wide-area 
> network, or even in a mobile ad-hoc network).
> This project would abstract the notion of failure detection to a dedicated 
> Java module, and implement several failure detectors to compare and contrast 
> their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
> phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
> is much more tunable and has some very interesting properties. This is a 
> great project if you are interested in distributed algorithms, or want to 
> help re-factor some of ZooKeeper's internal code.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1034) perl bindings should automatically find the zookeeper c-client headers

2011-06-22 Thread Nicholas Harteau (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053318#comment-13053318
 ] 

Nicholas Harteau commented on ZOOKEEPER-1034:
-

pretty sure that failing core test isn't related to 1034.

> perl bindings should automatically find the zookeeper c-client headers
> --
>
> Key: ZOOKEEPER-1034
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1034
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: contrib
>Affects Versions: 3.3.3
>Reporter: Nicholas Harteau
>Assignee: Nicholas Harteau
>Priority: Minor
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-1034-trunk.patch
>
>
> Installing Net::ZooKeeper from cpan or the zookeeper distribution tarballs 
> will always fail due to not finding c-client header files.  In conjunction 
> with ZOOKEEPER-1033 update perl bindings to look for c-client header files in 
> INCDIR/zookeeper/
> a.k.a. make installs of Net::ZooKeeper via cpan/cpanm/whatever *just work*, 
> assuming you've already got the zookeeper c client installed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2011-06-22 Thread Camille Fournier (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053266#comment-13053266
 ] 

Camille Fournier commented on ZOOKEEPER-702:


Well, I finally got the rb updated with the latest diff. I am willing to be one 
of two reviewers for this, but I am definitely not comfortable signing off on 
such a large change by myself (and note that I am also rather blind to 
whitespace issues despite my best efforts). If one of you is willing to also do 
the final review, I will deal with checking it in, doc generation, etc. 

https://reviews.apache.org/r/483/


> GSoC 2010: Failure Detector Model
> -
>
> Key: ZOOKEEPER-702
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
> Project: ZooKeeper
>  Issue Type: Wish
>Reporter: Henry Robinson
>Assignee: Abmar Barros
>  Labels: gsoc, mentor
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, chen-pseudo.txt, 
> phiaccrual-pseudo.txt, phiaccrual-pseudo.txt
>
>
> Failure Detector Module
> Possible Mentor
> Henry Robinson (henry at apache dot org)
> Requirements
> Java, some distributed systems knowledge, comfort implementing distributed 
> systems protocols
> Description
> ZooKeeper servers detects the failure of other servers and clients by 
> counting the number of 'ticks' for which it doesn't get a heartbeat from 
> other machines. This is the 'timeout' method of failure detection and works 
> very well; however it is possible that it is too aggressive and not easily 
> tuned for some more unusual ZooKeeper installations (such as in a wide-area 
> network, or even in a mobile ad-hoc network).
> This project would abstract the notion of failure detection to a dedicated 
> Java module, and implement several failure detectors to compare and contrast 
> their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
> phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
> is much more tunable and has some very interesting properties. This is a 
> great project if you are interested in distributed algorithms, or want to 
> help re-factor some of ZooKeeper's internal code.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Review Request: FD options in ZooKeeper

2011-06-22 Thread Camille Fournier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/483/
---

(Updated 2011-06-22 13:49:34.620179)


Review request for zookeeper.


Changes
---

Updating the diff for this review


Summary
---

https://issues.apache.org/jira/browse/ZOOKEEPER-702


Diffs (updated)
-

  /src/docs/src/documentation/content/xdocs/index.xml 1065709 
  /src/docs/src/documentation/content/xdocs/zookeeperFailureDetector.xml 
PRE-CREATION 
  /src/java/main/org/apache/zookeeper/ClientCnxn.java 1127985 
  /src/java/main/org/apache/zookeeper/ClientCnxnSocket.java 1127985 
  /src/java/main/org/apache/zookeeper/ClientCnxnSocketNIO.java 1127985 
  /src/java/main/org/apache/zookeeper/ZooKeeper.java 1127985 
  /src/java/main/org/apache/zookeeper/ZooKeeperMain.java 1127985 
  /src/java/main/org/apache/zookeeper/common/fd/AbstractFailureDetector.java 
PRE-CREATION 
  /src/java/main/org/apache/zookeeper/common/fd/BertierFailureDetector.java 
PRE-CREATION 
  /src/java/main/org/apache/zookeeper/common/fd/ChenFailureDetector.java 
PRE-CREATION 
  /src/java/main/org/apache/zookeeper/common/fd/FailureDetector.java 
PRE-CREATION 
  /src/java/main/org/apache/zookeeper/common/fd/FailureDetectorFactory.java 
PRE-CREATION 
  /src/java/main/org/apache/zookeeper/common/fd/FailureDetectorOptParser.java 
PRE-CREATION 
  /src/java/main/org/apache/zookeeper/common/fd/FixedPingFailureDetector.java 
PRE-CREATION 
  /src/java/main/org/apache/zookeeper/common/fd/InterArrivalSamplingWindow.java 
PRE-CREATION 
  /src/java/main/org/apache/zookeeper/common/fd/MessageType.java PRE-CREATION 
  /src/java/main/org/apache/zookeeper/common/fd/Monitored.java PRE-CREATION 
  /src/java/main/org/apache/zookeeper/common/fd/PhiAccrualFailureDetector.java 
PRE-CREATION 
  /src/java/main/org/apache/zookeeper/common/fd/SlicedPingFailureDetector.java 
PRE-CREATION 
  /src/java/main/org/apache/zookeeper/server/ServerConfig.java 1065709 
  /src/java/main/org/apache/zookeeper/server/SessionTracker.java 1065709 
  /src/java/main/org/apache/zookeeper/server/SessionTrackerImpl.java 1095174 
  /src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 1127985 
  /src/java/main/org/apache/zookeeper/server/ZooKeeperServerMain.java 1095174 
  
/src/java/main/org/apache/zookeeper/server/quorum/FollowerZooKeeperServer.java 
1095174 
  /src/java/main/org/apache/zookeeper/server/quorum/Leader.java 1127985 
  /src/java/main/org/apache/zookeeper/server/quorum/LeaderZooKeeperServer.java 
1065709 
  /src/java/main/org/apache/zookeeper/server/quorum/Learner.java 1095174 
  /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java 1095174 
  /src/java/main/org/apache/zookeeper/server/quorum/LearnerSessionTracker.java 
1065709 
  /src/java/main/org/apache/zookeeper/server/quorum/LearnerZooKeeperServer.java 
1065709 
  
/src/java/main/org/apache/zookeeper/server/quorum/ObserverZooKeeperServer.java 
1095174 
  /src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java 1127985 
  /src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java 
1095174 
  /src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerMain.java 1095174 
  /src/java/main/org/apache/zookeeper/server/quorum/QuorumZooKeeperServer.java 
1127985 
  
/src/java/main/org/apache/zookeeper/server/quorum/ReadOnlyZooKeeperServer.java 
1125581 
  /src/java/test/org/apache/zookeeper/TestableZooKeeper.java 1065709 
  /src/java/test/org/apache/zookeeper/test/ClientBase.java 1127985 
  /src/java/test/org/apache/zookeeper/test/DisconnectableZooKeeper.java 1091841 
  /src/java/test/org/apache/zookeeper/test/QuorumBase.java 1127985 
  /src/java/test/org/apache/zookeeper/test/QuorumFDHammerTest.java PRE-CREATION 
  /src/java/test/org/apache/zookeeper/test/ReadOnlyModeTest.java 1125581 
  /src/java/test/org/apache/zookeeper/test/RecoveryTest.java 1091841 
  /src/java/test/org/apache/zookeeper/test/SessionTest.java 1091841 
  /src/java/test/org/apache/zookeeper/test/fd/BertierClientHammerTest.java 
PRE-CREATION 
  /src/java/test/org/apache/zookeeper/test/fd/BertierFDTest.java PRE-CREATION 
  /src/java/test/org/apache/zookeeper/test/fd/BertierQuorumHammerTest.java 
PRE-CREATION 
  /src/java/test/org/apache/zookeeper/test/fd/BertierRecoveryTest.java 
PRE-CREATION 
  /src/java/test/org/apache/zookeeper/test/fd/BertierSessionTest.java 
PRE-CREATION 
  /src/java/test/org/apache/zookeeper/test/fd/ChenClientHammerTest.java 
PRE-CREATION 
  /src/java/test/org/apache/zookeeper/test/fd/ChenFDTest.java PRE-CREATION 
  /src/java/test/org/apache/zookeeper/test/fd/ChenQuorumHammerTest.java 
PRE-CREATION 
  /src/java/test/org/apache/zookeeper/test/fd/ChenRecoveryTest.java 
PRE-CREATION 
  /src/java/test/org/apache/zookeeper/test/fd/ChenSessionTest.java PRE-CREATION 
  /src/java/test/org/apache/zookeeper/test/fd/FixedPingFDTest.java PRE-CREATION 
  
/src/java/test/org/ap

ZooKeeper-trunk - Build # 1219 - Failure

2011-06-22 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-trunk/1219/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 245272 lines...]
[junit] 2011-06-22 10:54:50,001 [myid:] - INFO  [main:Environment@98] - 
Client environment:java.home=/homes/hudson/tools/java/jdk1.6.0_23-32/jre
[junit] 2011-06-22 10:54:50,001 [myid:] - INFO  [main:Environment@98] - 
Client 
environment:java.class.path=/grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build/test/classes:/grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build/test/lib/antlr-2.7.6.jar:/grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build/test/lib/checkstyle-5.0.jar:/grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build/test/lib/commons-beanutils-core-1.7.0.jar:/grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build/test/lib/commons-cli-1.0.jar:/grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build/test/lib/commons-collections-2.0.jar:/grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build/test/lib/commons-lang-1.0.jar:/grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build/test/lib/commons-logging-1.0.3.jar:/grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build/test/lib/google-collections-0.9.jar:/grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build/test/lib/junit-4.8.1.jar:/grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build/classes:/grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/src/java/lib/ivy-2.2.0.jar:/homes/hudson/tools/ant/latest/lib/ant.jar:/grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build/lib/jline-0.9.94.jar:/grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build/lib/log4j-1.2.15.jar:/grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build/lib/netty-3.2.2.Final.jar:/grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build/lib/slf4j-api-1.6.1.jar:/grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build/lib/slf4j-log4j12-1.6.1.jar:/homes/hudson/tools/clover/latest/lib/clover.jar:/homes/hudson/tools/ant/apache-ant-1.7.1/lib/ant-launcher.jar:/homes/hudson/tools/ant/latest/lib/ant-junit.jar
[junit] 2011-06-22 10:54:50,002 [myid:] - INFO  [main:Environment@98] - 
Client 
environment:java.library.path=/homes/hudson/tools/java/jdk1.6.0_23-32/jre/lib/i386/server:/homes/hudson/tools/java/jdk1.6.0_23-32/jre/lib/i386:/homes/hudson/tools/java/jdk1.6.0_23-32/jre/../lib/i386:/usr/java/packages/lib/i386:/lib:/usr/lib
[junit] 2011-06-22 10:54:50,002 [myid:] - INFO  [main:Environment@98] - 
Client environment:java.io.tmpdir=/tmp
[junit] 2011-06-22 10:54:50,003 [myid:] - INFO  [main:Environment@98] - 
Client environment:java.compiler=
[junit] 2011-06-22 10:54:50,003 [myid:] - INFO  [main:Environment@98] - 
Client environment:os.name=Linux
[junit] 2011-06-22 10:54:50,004 [myid:] - INFO  [main:Environment@98] - 
Client environment:os.arch=i386
[junit] 2011-06-22 10:54:50,004 [myid:] - INFO  [main:Environment@98] - 
Client environment:os.version=2.6.28-18-generic
[junit] 2011-06-22 10:54:50,004 [myid:] - INFO  [main:Environment@98] - 
Client environment:user.name=hudson
[junit] 2011-06-22 10:54:50,005 [myid:] - INFO  [main:Environment@98] - 
Client environment:user.home=/homes/hudson
[junit] 2011-06-22 10:54:50,005 [myid:] - INFO  [main:Environment@98] - 
Client 
environment:user.dir=/grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk
[junit] 2011-06-22 10:54:50,007 [myid:] - INFO  [main:ZooKeeper@447] - 
Initiating client connection, connectString=127.0.0.1:11221 
sessionTimeout=3 
watcher=org.apache.zookeeper.test.ClientBase$CountdownWatcher@1c8efd1
[junit] 2011-06-22 10:54:50,031 [myid:] - INFO  
[main-SendThread():ClientCnxn$SendThread@888] - Opening socket connection to 
server /127.0.0.1:11221
[junit] 2011-06-22 10:54:50,033 [myid:] - INFO  
[main-SendThread(localhost:11221):ClientCnxn$SendThread@814] - Socket 
connection established to localhost/127.0.0.1:11221, initiating session
[junit] 2011-06-22 10:54:50,034 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@197] - 
Accepted socket connection from /127.0.0.1:55045
[junit] 2011-06-22 10:54:50,037 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:ZooKeeperServer@833] - Client 
attempting to establish new session at /127.0.0.1:55045
[junit] 2011-06-22 10:54:50,040 [myid:] - INFO  
[SyncThread:0:FileTxnLog@195] - Creating new log file: log.1
[junit] 2011-06-22 10:54:50,066 [myid:] - INFO  
[SyncThread:0:ZooKeeperServer@592] - Established session 0x130b6fcf848 with 
negotiated timeout 3 for client /127.0.0.1:55045
[junit] 2011-06-22 10:54:50,067 [myid:] - INFO  
[main-SendThread(localhost:11221):ClientCnxn$SendThread@1098] - Session 
establishment complete on server localhost/

[jira] [Commented] (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2011-06-22 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053170#comment-13053170
 ] 

Flavio Junqueira commented on ZOOKEEPER-702:


Should we keep this issue marked for 3.4? Camille, Abmar, could you give an 
update, please? 

> GSoC 2010: Failure Detector Model
> -
>
> Key: ZOOKEEPER-702
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
> Project: ZooKeeper
>  Issue Type: Wish
>Reporter: Henry Robinson
>Assignee: Abmar Barros
>  Labels: gsoc, mentor
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, chen-pseudo.txt, 
> phiaccrual-pseudo.txt, phiaccrual-pseudo.txt
>
>
> Failure Detector Module
> Possible Mentor
> Henry Robinson (henry at apache dot org)
> Requirements
> Java, some distributed systems knowledge, comfort implementing distributed 
> systems protocols
> Description
> ZooKeeper servers detects the failure of other servers and clients by 
> counting the number of 'ticks' for which it doesn't get a heartbeat from 
> other machines. This is the 'timeout' method of failure detection and works 
> very well; however it is possible that it is too aggressive and not easily 
> tuned for some more unusual ZooKeeper installations (such as in a wide-area 
> network, or even in a mobile ad-hoc network).
> This project would abstract the notion of failure detection to a dedicated 
> Java module, and implement several failure detectors to compare and contrast 
> their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
> phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
> is much more tunable and has some very interesting properties. This is a 
> great project if you are interested in distributed algorithms, or want to 
> help re-factor some of ZooKeeper's internal code.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Zookeeper service is down when Leader disk is full

2011-06-22 Thread Laxman
Hi Everyone,

  

We have found one issue while testing the disk space full scenario. Request
you to validate our observations. Will log an issue if this found to be
valid.

 

Problem: Zookeeper is not shut down completely when dataDir disk space is
full and ZK Cluster went into unserviceable state.
Version: Zookeeper 3.3.3

 

Scenario
If the leader zookeeper disk is made full, the zookeeper is trying to
shutdown. But it is waiting indefinitely while shutting down the
SyncRequestProcessor thread.

Root Cause: this.join() is invoked in the same thread where System.exit(11)
has been triggered.
When disk space full happens, It got the exception as follows 'No space left
on device' and invoked System.exit(11) from the SyncRequestProcessor
thread(The following logs shows the same). Before exiting JVM, ZK will
execute the ShutdownHook of QuorumPeerMain and the flow comes to
SyncRequestProcessor.shutdown(). Here this.join() is invoked in the same
thread where System.exit(11) has been invoked. 



Thread dumps: 

The following thread dump shows the QuorumPeerMain thread is infntely
waiting inside SyncRequestProcessor. 

"Thread-2" prio=10 tid=0x0810a400 nid=0x1695 in Object.wait() [0xac783000] 
   java.lang.Thread.State: WAITING (on object monitor) 
at java.lang.Object.wait(Native Method) 
- waiting on <0xb804f5e8> (a
org.apache.zookeeper.server.SyncRequestProcessor) 
at java.lang.Thread.join(Thread.java:1143) 
- locked <0xb804f5e8> (a
org.apache.zookeeper.server.SyncRequestProcessor) 
at java.lang.Thread.join(Thread.java:1196) 
at
org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcess
or.java:171) 
at
org.apache.zookeeper.server.quorum.ProposalRequestProcessor.shutdown(Proposa
lRequestProcessor.java:79) 
at
org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcess
or.java:513) 
at
org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:41
3) 
at
org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:411) 
at
org.apache.zookeeper.server.quorum.QuorumPeer.shutdown(QuorumPeer.java:694) 
at
org.apache.zookeeper.server.quorum.QuorumPeerMain$1.run(QuorumPeerMain.java:
126) 

"SyncThread:2" prio=10 tid=0xad7fd800 nid=0x4acb in Object.wait()
[0xac9ba000] 
   java.lang.Thread.State: WAITING (on object monitor) 
at java.lang.Object.wait(Native Method) 
- waiting on <0xb8030d00> (a
org.apache.zookeeper.server.quorum.QuorumPeerMain$1) 
at java.lang.Thread.join(Thread.java:1143) 
- locked <0xb8030d00> (a
org.apache.zookeeper.server.quorum.QuorumPeerMain$1) 
at java.lang.Thread.join(Thread.java:1196) 
at
java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:79
) 
at
java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:24) 
at java.lang.Shutdown.runHooks(Shutdown.java:79) 
at java.lang.Shutdown.sequence(Shutdown.java:123) 
at java.lang.Shutdown.exit(Shutdown.java:168) 
- locked <0xf01ff3b0> (a java.lang.Class for java.lang.Shutdown) 
at java.lang.Runtime.exit(Runtime.java:90) 
at java.lang.System.exit(System.java:904) 
at
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.ja
va:149)



Logs :


2011-06-21 10:09:59,730 - FATAL [SyncThread:2:SyncRequestProcessor@148] -
Severe unrecoverable error, exiting 
java.io.IOException: No space left on device 
at java.io.FileOutputStream.writeBytes(Native Method) 
at java.io.FileOutputStream.write(FileOutputStream.java:260) 
at
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) 
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)

at
org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:30
5) 
at
org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog
.java:324) 
at
org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) 
at
org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.
java:158) 
at
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.ja
va:98) 
2011-06-21 10:09:59,732 - INFO  [Thread-2:QuorumPeer@691] - The Quorum
server is going for shutdown 
2011-06-21 10:09:59,732 - INFO  [Thread-2:Leader@393] - Shutdown called 
java.lang.Exception: shutdown Leader! reason: quorum Peer shutdown 
at
org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:393) 
at
org.apache.zookeeper.server.quorum.QuorumPeer.shutdown(QuorumPeer.java:694) 
at
org.apache.zookeeper.server.quorum.QuorumPeerMain$1.run(QuorumPeerMain.java:
126) 
2011-06-21 10:09:59,733 - INFO  [Thread-6:Leader$LearnerCnxAcceptor@243] -
exception while shutting down acceptor: java.net.SocketException: Socket
closed 
2011-06-21 10:09:59,758 - INFO  [ProcessThread:-1:PrepRe

[jira] [Commented] (ZOOKEEPER-1034) perl bindings should automatically find the zookeeper c-client headers

2011-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053102#comment-13053102
 ] 

Hadoop QA commented on ZOOKEEPER-1034:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12483405/ZOOKEEPER-1034-trunk.patch
  against trunk revision 1138213.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/346//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/346//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/346//console

This message is automatically generated.

> perl bindings should automatically find the zookeeper c-client headers
> --
>
> Key: ZOOKEEPER-1034
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1034
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: contrib
>Affects Versions: 3.3.3
>Reporter: Nicholas Harteau
>Assignee: Nicholas Harteau
>Priority: Minor
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-1034-trunk.patch
>
>
> Installing Net::ZooKeeper from cpan or the zookeeper distribution tarballs 
> will always fail due to not finding c-client header files.  In conjunction 
> with ZOOKEEPER-1033 update perl bindings to look for c-client header files in 
> INCDIR/zookeeper/
> a.k.a. make installs of Net::ZooKeeper via cpan/cpanm/whatever *just work*, 
> assuming you've already got the zookeeper c client installed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira