[jira] [Commented] (ZOOKEEPER-661) Add Ruby bindings

2011-07-27 Thread Grant Gardner (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071695#comment-13071695
 ] 

Grant Gardner commented on ZOOKEEPER-661:
-

Yet another approach https://github.com/lwoggardner/zkruby. I've implemented 
the client TCP protocol directly, avoiding the threading and C Ruby v JRuby 
issues but otherwise duplicating code from the C/Java bindings.  Consider it 
experimental.

> Add Ruby bindings
> -
>
> Key: ZOOKEEPER-661
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-661
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: contrib-bindings
> Environment: MRI Ruby 1.9
> JRuby 1.4
>Reporter: Andrew Reynhout
>Priority: Minor
>
> Add Ruby bindings to the ZooKeeper distribution.
> Ruby presents special threading difficulties for asynchronous ZK calls (aget, 
> watchers, etc).  It looks like the simplest workaround is to patch the ZK C 
> API.
> Proposed approach will be described in comment.
> Please use this ticket for discussion and suggestions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1137) AuthFLE is throwing NPE when servers are configured with different election ports.

2011-07-27 Thread Laxman (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071707#comment-13071707
 ] 

Laxman commented on ZOOKEEPER-1137:
---

Following piece of code in AuthFLE class looks to be problematic.

{code}
for (QuorumServer server : self.getVotingView().values()) {
InetSocketAddress saddr = new InetSocketAddress(server.addr
.getAddress(), port);
addrChallengeMap.put(saddr, new ConcurrentHashMap());
}
{code}

We are populating the addrChallengeMap with same port all the times.

Ideally, this should be 

{code}
for (QuorumServer server : self.getVotingView().values()) {
addrChallengeMap.put(server.electionAddr, new 
ConcurrentHashMap());
}
{code}

Any other thoughts?

> AuthFLE is throwing NPE when servers are configured with different election 
> ports.
> --
>
> Key: ZOOKEEPER-1137
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1137
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.3.3
>Reporter: Laxman
>Assignee: Laxman
>Priority: Critical
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> AuthFLE is throwing NPE when servers are configured with different election 
> ports.
> *Configuration*
> {noformat}
> server.1 = 10.18.52.25:2888:3888
> server.2 = 10.18.52.205:2889:3889
> server.3 = 10.18.52.144:2899:3890
> {noformat}
> *Logs*
> {noformat}
> 2011-07-22 16:06:22,404 - INFO  
> [QuorumPeer:/0:0:0:0:0:0:0:0:65170:AuthFastLeaderElection@844] - Election 
> tally
> 2011-07-22 16:06:29,483 - ERROR [WorkerSender Thread: 
> 6:NIOServerCnxn$Factory$1@81] - Thread Thread[WorkerSender Thread: 6,5,main] 
> died
> java.lang.NullPointerException
>   at 
> org.apache.zookeeper.server.quorum.AuthFastLeaderElection$Messenger$WorkerSender.process(AuthFastLeaderElection.java:488)
>   at 
> org.apache.zookeeper.server.quorum.AuthFastLeaderElection$Messenger$WorkerSender.run(AuthFastLeaderElection.java:432)
>   at java.lang.Thread.run(Thread.java:619)
> 2011-07-22 16:06:29,583 - ERROR [WorkerSender Thread: 
> 1:NIOServerCnxn$Factory$1@81] - Thread Thread[WorkerSender Thread: 1,5,main] 
> died
> java.lang.NullPointerException
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (ZOOKEEPER-1137) AuthFLE is throwing NPE when servers are configured with different election ports.

2011-07-27 Thread Laxman (JIRA)
AuthFLE is throwing NPE when servers are configured with different election 
ports.
--

 Key: ZOOKEEPER-1137
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1137
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.3.3
Reporter: Laxman
Assignee: Laxman
Priority: Critical


AuthFLE is throwing NPE when servers are configured with different election 
ports.

*Configuration*
{noformat}
server.1 = 10.18.52.25:2888:3888
server.2 = 10.18.52.205:2889:3889
server.3 = 10.18.52.144:2899:3890
{noformat}

*Logs*
{noformat}
2011-07-22 16:06:22,404 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:65170:AuthFastLeaderElection@844] - Election tally
2011-07-22 16:06:29,483 - ERROR [WorkerSender Thread: 
6:NIOServerCnxn$Factory$1@81] - Thread Thread[WorkerSender Thread: 6,5,main] 
died
java.lang.NullPointerException
at 
org.apache.zookeeper.server.quorum.AuthFastLeaderElection$Messenger$WorkerSender.process(AuthFastLeaderElection.java:488)
at 
org.apache.zookeeper.server.quorum.AuthFastLeaderElection$Messenger$WorkerSender.run(AuthFastLeaderElection.java:432)
at java.lang.Thread.run(Thread.java:619)
2011-07-22 16:06:29,583 - ERROR [WorkerSender Thread: 
1:NIOServerCnxn$Factory$1@81] - Thread Thread[WorkerSender Thread: 1,5,main] 
died
java.lang.NullPointerException
{noformat}



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1090) Race condition while taking snapshot can lead to not restoring data tree correctly

2011-07-27 Thread Vishal K (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071740#comment-13071740
 ] 

Vishal K commented on ZOOKEEPER-1090:
-

We could run into this if the JVM is running low on memory (runtime execption) 
while modifying the tree. It is a very rare case. We don't need to fix it right 
away. It sounds like if modification to the data tree for x fails due to 
runtime exception (and not due to exceptions like NoNode), then before applying 
x+1 to the tree we should attempt to apply x first. We should change 
lastProcessedZxid only if modifiction to the tree succeeds. 

> Race condition while taking snapshot can lead to not restoring data tree 
> correctly
> --
>
> Key: ZOOKEEPER-1090
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1090
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.3.3
>Reporter: Vishal K
>Assignee: Vishal K
>Priority: Critical
>  Labels: persistence, server, snapshot
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-1090
>
>
> I think I have found a bug in the snapshot mechanism.
> The problem occurs because dt.lastProcessedZxid is not synchronized (or 
> rather set before the data tree is modified):
> FileTxnSnapLog:
> {code}
> public void save(DataTree dataTree,
> ConcurrentHashMap sessionsWithTimeouts)
> throws IOException {
> long lastZxid = dataTree.lastProcessedZxid;
> LOG.info("Snapshotting: " + Long.toHexString(lastZxid));
> File snapshot=new File(
> snapDir, Util.makeSnapshotName(lastZxid));
> snapLog.serialize(dataTree, sessionsWithTimeouts, snapshot);   <=== 
> the Datatree may not have the modification for lastProcessedZxid
> }
> {code}
> DataTree:
> {code}
> public ProcessTxnResult processTxn(TxnHeader header, Record txn) {
> ProcessTxnResult rc = new ProcessTxnResult();
> String debug = "";
> try {
> rc.clientId = header.getClientId();
> rc.cxid = header.getCxid();
> rc.zxid = header.getZxid();
> rc.type = header.getType();
> rc.err = 0;
> if (rc.zxid > lastProcessedZxid) {
> lastProcessedZxid = rc.zxid;
> }
> [...modify data tree...]   
>  }
> {code}
> The lastProcessedZxid must be set after the modification is done.
> As a result, if server crashes after taking the snapshot (and the snapshot 
> does not contain change corresponding to lastProcessedZxid) restore will not 
> restore the data tree correctly:
> {code}
> public long restore(DataTree dt, Map sessions,
> PlayBackListener listener) throws IOException {
> snapLog.deserialize(dt, sessions);
> FileTxnLog txnLog = new FileTxnLog(dataDir);
> TxnIterator itr = txnLog.read(dt.lastProcessedZxid+1); <=== Assumes 
> lastProcessedZxid is deserialized
>  }
> {code}
> I have had offline discussion with Ben and Camille on this. I will be posting 
> the discussion shortly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1137) AuthFLE is throwing NPE when servers are configured with different election ports.

2011-07-27 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071751#comment-13071751
 ] 

Flavio Junqueira commented on ZOOKEEPER-1137:
-

Hi Laxman, We haven't been maintaining the AuthFLE implementation, and we have 
been talking about deprecating it for a long time. Do you have any particular 
reason why you'd like to fix it, like a use case, or you're simply exploring? 

> AuthFLE is throwing NPE when servers are configured with different election 
> ports.
> --
>
> Key: ZOOKEEPER-1137
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1137
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.3.3
>Reporter: Laxman
>Assignee: Laxman
>Priority: Critical
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> AuthFLE is throwing NPE when servers are configured with different election 
> ports.
> *Configuration*
> {noformat}
> server.1 = 10.18.52.25:2888:3888
> server.2 = 10.18.52.205:2889:3889
> server.3 = 10.18.52.144:2899:3890
> {noformat}
> *Logs*
> {noformat}
> 2011-07-22 16:06:22,404 - INFO  
> [QuorumPeer:/0:0:0:0:0:0:0:0:65170:AuthFastLeaderElection@844] - Election 
> tally
> 2011-07-22 16:06:29,483 - ERROR [WorkerSender Thread: 
> 6:NIOServerCnxn$Factory$1@81] - Thread Thread[WorkerSender Thread: 6,5,main] 
> died
> java.lang.NullPointerException
>   at 
> org.apache.zookeeper.server.quorum.AuthFastLeaderElection$Messenger$WorkerSender.process(AuthFastLeaderElection.java:488)
>   at 
> org.apache.zookeeper.server.quorum.AuthFastLeaderElection$Messenger$WorkerSender.run(AuthFastLeaderElection.java:432)
>   at java.lang.Thread.run(Thread.java:619)
> 2011-07-22 16:06:29,583 - ERROR [WorkerSender Thread: 
> 1:NIOServerCnxn$Factory$1@81] - Thread Thread[WorkerSender Thread: 1,5,main] 
> died
> java.lang.NullPointerException
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1136) NEW_LEADER should be queued not sent to match the Zab 1.0 protocol on the twiki

2011-07-27 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071788#comment-13071788
 ] 

Mahadev konar commented on ZOOKEEPER-1136:
--

Ben, are you working on this?

> NEW_LEADER should be queued not sent to match the Zab 1.0 protocol on the 
> twiki
> ---
>
> Key: ZOOKEEPER-1136
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1136
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Benjamin Reed
>Priority: Blocker
> Fix For: 3.3.4, 3.4.0
>
>
> the NEW_LEADER message was sent at the beginning of the sync phase in Zab 
> pre1.0, but it must be at the end in Zab 1.0. if the protocol is 1.0 or 
> greater we need to queue rather than send the packet.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (ZOOKEEPER-1080) Provide a Leader Election framework based on Zookeeper receipe

2011-07-27 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1080:
-

Fix Version/s: (was: 3.4.0)
   3.5.0

Hari,
 You might want to check out  ZOOKEEPER-1095. Maybe this is a duplicate of  
ZOOKEEPER-1095?

> Provide a Leader Election framework based on Zookeeper receipe
> --
>
> Key: ZOOKEEPER-1080
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1080
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: contrib
>Affects Versions: 3.3.2
>Reporter: Hari A V
>Assignee: Hari A V
> Fix For: 3.5.0
>
> Attachments: LeaderElectionService.pdf, ZOOKEEPER-1080.patch, 
> zkclient-0.1.0.jar, zookeeper-leader-0.0.1.tar.gz
>
>
> Currently Hadoop components such as NameNode and JobTracker are single point 
> of failure.
> If Namenode or JobTracker goes down, there service will not be available 
> until they are up and running again. If there was a Standby Namenode or 
> JobTracker available and ready to serve when Active nodes go down, we could 
> have reduced the service down time. Hadoop already provides a Standby 
> Namenode implementation which is not fully a "hot" Standby. 
> The common problem to be addressed in any such Active-Standby cluster is 
> Leader Election and Failure detection. This can be done using Zookeeper as 
> mentioned in the Zookeeper recipes.
> http://zookeeper.apache.org/doc/r3.3.3/recipes.html
> +Leader Election Service (LES)+
> Any Node who wants to participate in Leader Election can use this service. 
> They should start the service with required configurations. The service will 
> notify the nodes whether they should be started as Active or Standby mode. 
> Also they intimate any changes in the mode at runtime. All other complexities 
> can be handled internally by the LES.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (ZOOKEEPER-1057) zookeeper c-client, connection to offline server fails to successfully fallback to second zk host

2011-07-27 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1057:
-

Fix Version/s: (was: 3.3.4)
   (was: 3.4.0)
   3.5.0

Not a blocker.

> zookeeper c-client, connection to offline server fails to successfully 
> fallback to second zk host
> -
>
> Key: ZOOKEEPER-1057
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1057
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.3.1, 3.3.2, 3.3.3
> Environment: snowdutyrise-lm ~/-> uname -a
> Darwin snowdutyrise-lm 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul 15 16:55:01 
> PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386
> also observed on:
> 2.6.35-28-server 49-Ubuntu SMP Tue Mar 1 14:55:37 UTC 2011
>Reporter: Woody Anderson
> Fix For: 3.5.0
>
>
> Hello, I'm a contributor for the node.js zookeeper module: 
> https://github.com/yfinkelstein/node-zookeeper
> i'm using zk 3.3.3 for the purposes of this issue, but i have validated it 
> fails on 3.3.1 and 3.3.2
> i'm having an issue when trying to connect when one of my zookeeper servers 
> is offline.
> if the first server attempted is online, all is good.
> if the offline server is attempted first, then the client is never able to 
> connect to _any_ server.
> inside zookeeper.c a connection loss (-4) is received, the socket is closed 
> and buffers are cleaned up, it then attempts the next server in the list, 
> creates a new socket (which gets the same fd as the previously closed socket) 
> and connecting fails, and it continues to fail seemingly forever.
> The nature of this "fail" is not that it gets -4 connection loss errors, but 
> that zookeeper_interest doesn't find anything going on on the socket before 
> the user provided timeout kicks things out. I don't want to have to wait 5 
> minutes, even if i could make myself.
> this is the message that follows the connection loss:
> 2011-04-27 23:18:28,355:13485:ZOO_ERROR@handle_socket_error_msg@1530: Socket 
> [127.0.0.1:5020] zk retcode=-7, errno=60(Operation timed out): connection 
> timed out (exceeded timeout by 3ms)
> 2011-04-27 23:18:28,355:13485:ZOO_ERROR@yield@213: yield:zookeeper_interest 
> returned error: -7 - operation timeout
> While investigating, i decided to comment out close(zh->fd) in handle_error 
> (zookeeper.c#1153)
> now everything works (obviously i'm leaking an fd). Connection the the second 
> host works immediately.
> this is the behavior i'm looking for, though i clearly don't want to leak the 
> fd, so i'm wondering why the fd re-use is causing this issue.
> close() is not returning an error (i checked even though current code assumes 
> success).
> i'm on osx 10.6.7
> i tried adding a setsockopt so_linger (though i didn't want that to be a 
> solution), it didn't work.
> full debug traces are included in issue here: 
> https://github.com/yfinkelstein/node-zookeeper/issues/6

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (ZOOKEEPER-781) provide a generalized "connection strategy" for ZooKeeper clients

2011-07-27 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-781:


Fix Version/s: (was: 3.4.0)
   3.5.0

Moving this out. Not a blocker for 3.4.

> provide a generalized "connection strategy" for ZooKeeper clients
> -
>
> Key: ZOOKEEPER-781
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-781
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: c client, java client
>Reporter: Patrick Hunt
>Assignee: Qian Ye
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-781.patch, ZOOKEEPER-781.patch, 
> ZOOKEEPER-781.patch, ZOOKEEPER-781.patch, ZOOKEEPER-781.patch, 
> ZOOKEEPER-781.patch, ZOOKEEPER-781.patch
>
>
> A connection strategy allows control over the way that ZooKeeper clients (we 
> would implement this for both c and java apis) connect to a serving ensemble. 
> Today we have two strategies, randomized round robin (default) and ordered 
> round robin, both of which are hard coded into the client implementation. We 
> would generalize this interface and allow users to create their own.
> See this page for more detail: 
> http://wiki.apache.org/hadoop/ZooKeeper/ConnectionStrategy

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1006) QuorumPeer "Address already in use" -- regression in 3.3.3

2011-07-27 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071797#comment-13071797
 ] 

Mahadev konar commented on ZOOKEEPER-1006:
--

SHould this be marked resolved since ZOOKEEPER-880 is fixed?

> QuorumPeer "Address already in use" -- regression in 3.3.3
> --
>
> Key: ZOOKEEPER-1006
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1006
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 3.3.3
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
>Priority: Minor
> Fix For: 3.3.4, 3.4.0
>
> Attachments: TEST-org.apache.zookeeper.test.CnxManagerTest.txt, 
> ZOOKEEPER-1006.patch, ZOOKEEPER-1006.patch, workerthreads_badtest.txt
>
>
> CnxManagerTest.testWorkerThreads 
> See attachment, this is the first time I've seen this test fail, and it's 
> failed 2 out of the last three test runs.
> Notice (attachment) once this happens the port never becomes available.
> {noformat}
> 2011-03-02 15:53:12,425 - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11245:NIOServerCnxn$Factory@251] - 
> Accepted socket connection from /172.29.6.162:51441
> 2011-03-02 15:53:12,430 - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11245:NIOServerCnxn@639] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> 2011-03-02 15:53:12,430 - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11245:NIOServerCnxn@1435] - Closed 
> socket connection for client /172.29.6.162:51441 (no session established for 
> client)
> 2011-03-02 15:53:12,430 - WARN  
> [QuorumPeer:/0:0:0:0:0:0:0:0:11241:Follower@82] - Exception when following 
> the leader
> java.io.EOFException
>   at java.io.DataInputStream.readInt(DataInputStream.java:375)
>   at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>   at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:84)
>   at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
>   at 
> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:148)
>   at 
> org.apache.zookeeper.server.quorum.Learner.registerWithLeader(Learner.java:267)
>   at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:66)
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:645)
> 2011-03-02 15:53:12,431 - INFO  
> [QuorumPeer:/0:0:0:0:0:0:0:0:11241:Follower@165] - shutdown called
> java.lang.Exception: shutdown Follower
>   at 
> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:165)
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:649)
> 2011-03-02 15:53:12,432 - INFO  
> [QuorumPeer:/0:0:0:0:0:0:0:0:11241:QuorumPeer@621] - LOOKING
> 2011-03-02 15:53:12,432 - INFO  
> [QuorumPeer:/0:0:0:0:0:0:0:0:11241:FastLeaderElection@663] - New election. My 
> id =  0, Proposed zxid = 0
> 2011-03-02 15:53:12,433 - INFO  [WorkerReceiver 
> Thread:FastLeaderElection@496] - Notification: 0 (n.leader), 0 (n.zxid), 2 
> (n.round), LOOKING (n.state), 0 (n.sid), LOOKING (my state)
> 2011-03-02 15:53:12,433 - INFO  [WorkerReceiver 
> Thread:FastLeaderElection@496] - Notification: 0 (n.leader), 0 (n.zxid), 2 
> (n.round), LOOKING (n.state), 0 (n.sid), LOOKING (my state)
> 2011-03-02 15:53:12,433 - INFO  [WorkerReceiver 
> Thread:FastLeaderElection@496] - Notification: 0 (n.leader), 0 (n.zxid), 2 
> (n.round), LOOKING (n.state), 0 (n.sid), LOOKING (my state)
> 2011-03-02 15:53:12,633 - INFO  [WorkerReceiver 
> Thread:FastLeaderElection@496] - Notification: 0 (n.leader), 0 (n.zxid), 2 
> (n.round), LOOKING (n.state), 0 (n.sid), LOOKING (my state)
> 2011-03-02 15:53:12,633 - INFO  
> [QuorumPeer:/0:0:0:0:0:0:0:0:11245:QuorumPeer@655] - LEADING
> 2011-03-02 15:53:12,636 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:11245:Leader@54] 
> - TCP NoDelay set to: true
> 2011-03-02 15:53:12,638 - INFO  
> [QuorumPeer:/0:0:0:0:0:0:0:0:11245:ZooKeeperServer@151] - Created server with 
> tickTime 1000 minSessionTimeout 2000 maxSessionTimeout 2 datadir 
> /var/lib/hudson/workspace/CDH3-ZooKeeper-3.3.3_sles/build/test/tmp/test9001250572426375869.junit.dir/version-2
>  snapdir 
> /var/lib/hudson/workspace/CDH3-ZooKeeper-3.3.3_sles/build/test/tmp/test9001250572426375869.junit.dir/version-2
> 2011-03-02 15:53:12,639 - ERROR 
> [QuorumPeer:/0:0:0:0:0:0:0:0:11245:Leader@133] - Couldn't bind to port 11245
> java.net.BindException: Address already in use
>   at java.net.PlainSocketImpl.socketBind(Native Method)
>   at java.net.PlainSocketImpl.bind(PlainSocketImpl.java:365)
>   at java.net.ServerSocket.bind(ServerSocket.java:319)
>   at java.net.ServerSocket.(ServerSocket.java:185)
>  

[jira] [Updated] (ZOOKEEPER-195) Configuration information is spread across too many docs. Consolidate into one

2011-07-27 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-195:


Fix Version/s: (was: 3.4.0)
   3.5.0

Moving out of 3.4

> Configuration information is spread across too many docs. Consolidate into one
> --
>
> Key: ZOOKEEPER-195
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-195
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 3.0.0
>Reporter: Robbie Scott
>Priority: Minor
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-195.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> There are definition lists of the configuration parameters in both the 
> getting started guide and in the admin guide.  It should probably only exist 
> in the administration guide in the configuration parameters section. 
> Note that in the getting started guide, definitions of config params can be 
> found in both 
> - Installing and Running ZooKeeper in Single Server Mode
> - Running Replicated ZooKeeper

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (ZOOKEEPER-1130) Java port of PHunt's zk-smoketest

2011-07-27 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1130:
-

Fix Version/s: (was: 3.4.0)
   3.5.0

> Java port of PHunt's zk-smoketest
> -
>
> Key: ZOOKEEPER-1130
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1130
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: contrib
>Affects Versions: 3.4.0
>Reporter: Colin Goodheart-Smithe
>Assignee: Colin Goodheart-Smithe
> Fix For: 3.5.0
>
> Attachments: zk-smoketest.patch
>
>
> I have ported Patrick's zookeeper smoke test to Java so that it can be run on 
> windows machines (since I couldn't find any way of getting the python 
> bindings for windows).  The port provides the same functionality as the 
> python varient as of 21st June 2011.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1130) Java port of PHunt's zk-smoketest

2011-07-27 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071802#comment-13071802
 ] 

Mahadev konar commented on ZOOKEEPER-1130:
--

Moving this to 3.5. Its not a blocker.

> Java port of PHunt's zk-smoketest
> -
>
> Key: ZOOKEEPER-1130
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1130
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: contrib
>Affects Versions: 3.4.0
>Reporter: Colin Goodheart-Smithe
>Assignee: Colin Goodheart-Smithe
> Fix For: 3.5.0
>
> Attachments: zk-smoketest.patch
>
>
> I have ported Patrick's zookeeper smoke test to Java so that it can be run on 
> windows machines (since I couldn't find any way of getting the python 
> bindings for windows).  The port provides the same functionality as the 
> python varient as of 21st June 2011.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-999) Create an package integration project

2011-07-27 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071804#comment-13071804
 ] 

Mahadev konar commented on ZOOKEEPER-999:
-

Eric, can you please resubmit?

> Create an package integration project
> -
>
> Key: ZOOKEEPER-999
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-999
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: build
> Environment: Java 6, RHEL/Ubuntu
>Reporter: Eric Yang
>Assignee: Eric Yang
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-999-1.patch, ZOOKEEPER-999-10.patch, 
> ZOOKEEPER-999-2.patch, ZOOKEEPER-999-3.patch, ZOOKEEPER-999-4.patch, 
> ZOOKEEPER-999-5.patch, ZOOKEEPER-999-6.patch, ZOOKEEPER-999-7.patch, 
> ZOOKEEPER-999-8.patch, ZOOKEEPER-999-9.patch, ZOOKEEPER-999.patch
>
>
> This goal of this ticket is to generate a set of RPM/debian package which 
> integrate well with RPM sets created by HADOOP-6255.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (ZOOKEEPER-837) cyclic dependency ClientCnxn, ZooKeeper

2011-07-27 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-837:


Fix Version/s: (was: 3.4.0)
   3.5.0

Moving this out to 3.5 for cleanup.

> cyclic dependency ClientCnxn, ZooKeeper
> ---
>
> Key: ZOOKEEPER-837
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-837
> Project: ZooKeeper
>  Issue Type: Sub-task
>Affects Versions: 3.3.1
>Reporter: Patrick Datko
>Assignee: Thomas Koch
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-837.patch, ZOOKEEPER-837.patch, 
> ZOOKEEPER-837.patch, ZOOKEEPER-837.patch, ZOOKEEPER-837.patch
>
>
> ZooKeeper instantiates ClientCnxn in its ctor with this and therefor builds a 
> cyclic dependency graph between both objects. This means, you can't have the 
> one without the other. So why did you bother do make them to separate classes 
> in the first place?
> ClientCnxn accesses ZooKeeper.state. State should rather be a property of 
> ClientCnxn. And ClientCnxn accesses zooKeeper.get???Watches() in its method 
> primeConnection(). I've not yet checked, how this dependency should be 
> resolved better.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1136) NEW_LEADER should be queued not sent to match the Zab 1.0 protocol on the twiki

2011-07-27 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071855#comment-13071855
 ] 

Benjamin Reed commented on ZOOKEEPER-1136:
--

yes, hoping to have a patch today.

> NEW_LEADER should be queued not sent to match the Zab 1.0 protocol on the 
> twiki
> ---
>
> Key: ZOOKEEPER-1136
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1136
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Benjamin Reed
>Priority: Blocker
> Fix For: 3.3.4, 3.4.0
>
>
> the NEW_LEADER message was sent at the beginning of the sync phase in Zab 
> pre1.0, but it must be at the end in Zab 1.0. if the protocol is 1.0 or 
> greater we need to queue rather than send the packet.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (ZOOKEEPER-1006) QuorumPeer "Address already in use" -- regression in 3.3.3

2011-07-27 Thread Vishal K (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal K resolved ZOOKEEPER-1006.
-

Resolution: Fixed

patch committed to trunk as a part of ZOOKEEPER-880.

> QuorumPeer "Address already in use" -- regression in 3.3.3
> --
>
> Key: ZOOKEEPER-1006
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1006
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 3.3.3
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
>Priority: Minor
> Fix For: 3.3.4, 3.4.0
>
> Attachments: TEST-org.apache.zookeeper.test.CnxManagerTest.txt, 
> ZOOKEEPER-1006.patch, ZOOKEEPER-1006.patch, workerthreads_badtest.txt
>
>
> CnxManagerTest.testWorkerThreads 
> See attachment, this is the first time I've seen this test fail, and it's 
> failed 2 out of the last three test runs.
> Notice (attachment) once this happens the port never becomes available.
> {noformat}
> 2011-03-02 15:53:12,425 - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11245:NIOServerCnxn$Factory@251] - 
> Accepted socket connection from /172.29.6.162:51441
> 2011-03-02 15:53:12,430 - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11245:NIOServerCnxn@639] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> 2011-03-02 15:53:12,430 - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11245:NIOServerCnxn@1435] - Closed 
> socket connection for client /172.29.6.162:51441 (no session established for 
> client)
> 2011-03-02 15:53:12,430 - WARN  
> [QuorumPeer:/0:0:0:0:0:0:0:0:11241:Follower@82] - Exception when following 
> the leader
> java.io.EOFException
>   at java.io.DataInputStream.readInt(DataInputStream.java:375)
>   at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>   at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:84)
>   at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
>   at 
> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:148)
>   at 
> org.apache.zookeeper.server.quorum.Learner.registerWithLeader(Learner.java:267)
>   at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:66)
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:645)
> 2011-03-02 15:53:12,431 - INFO  
> [QuorumPeer:/0:0:0:0:0:0:0:0:11241:Follower@165] - shutdown called
> java.lang.Exception: shutdown Follower
>   at 
> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:165)
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:649)
> 2011-03-02 15:53:12,432 - INFO  
> [QuorumPeer:/0:0:0:0:0:0:0:0:11241:QuorumPeer@621] - LOOKING
> 2011-03-02 15:53:12,432 - INFO  
> [QuorumPeer:/0:0:0:0:0:0:0:0:11241:FastLeaderElection@663] - New election. My 
> id =  0, Proposed zxid = 0
> 2011-03-02 15:53:12,433 - INFO  [WorkerReceiver 
> Thread:FastLeaderElection@496] - Notification: 0 (n.leader), 0 (n.zxid), 2 
> (n.round), LOOKING (n.state), 0 (n.sid), LOOKING (my state)
> 2011-03-02 15:53:12,433 - INFO  [WorkerReceiver 
> Thread:FastLeaderElection@496] - Notification: 0 (n.leader), 0 (n.zxid), 2 
> (n.round), LOOKING (n.state), 0 (n.sid), LOOKING (my state)
> 2011-03-02 15:53:12,433 - INFO  [WorkerReceiver 
> Thread:FastLeaderElection@496] - Notification: 0 (n.leader), 0 (n.zxid), 2 
> (n.round), LOOKING (n.state), 0 (n.sid), LOOKING (my state)
> 2011-03-02 15:53:12,633 - INFO  [WorkerReceiver 
> Thread:FastLeaderElection@496] - Notification: 0 (n.leader), 0 (n.zxid), 2 
> (n.round), LOOKING (n.state), 0 (n.sid), LOOKING (my state)
> 2011-03-02 15:53:12,633 - INFO  
> [QuorumPeer:/0:0:0:0:0:0:0:0:11245:QuorumPeer@655] - LEADING
> 2011-03-02 15:53:12,636 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:11245:Leader@54] 
> - TCP NoDelay set to: true
> 2011-03-02 15:53:12,638 - INFO  
> [QuorumPeer:/0:0:0:0:0:0:0:0:11245:ZooKeeperServer@151] - Created server with 
> tickTime 1000 minSessionTimeout 2000 maxSessionTimeout 2 datadir 
> /var/lib/hudson/workspace/CDH3-ZooKeeper-3.3.3_sles/build/test/tmp/test9001250572426375869.junit.dir/version-2
>  snapdir 
> /var/lib/hudson/workspace/CDH3-ZooKeeper-3.3.3_sles/build/test/tmp/test9001250572426375869.junit.dir/version-2
> 2011-03-02 15:53:12,639 - ERROR 
> [QuorumPeer:/0:0:0:0:0:0:0:0:11245:Leader@133] - Couldn't bind to port 11245
> java.net.BindException: Address already in use
>   at java.net.PlainSocketImpl.socketBind(Native Method)
>   at java.net.PlainSocketImpl.bind(PlainSocketImpl.java:365)
>   at java.net.ServerSocket.bind(ServerSocket.java:319)
>   at java.net.ServerSocket.(ServerSocket.java:185)
>   at java.net.ServerSocket.(ServerSocket.java:9

Re: FW: Does abrupt kill corrupts the datadir?

2011-07-27 Thread Patrick Hunt
ZK has been built around the "fail fast" approach. In order to
maintain high availability we want to ensure that restarting a server
will result in it attempting to rejoin the quorum. IMO we would not
want to change this (kill -9).

Patrick

On Tue, Jul 26, 2011 at 2:02 AM, Laxman  wrote:
> Hi Everyone,
>
> Any thoughts?
> Do we need consider changing abrupt shutdown to
>
> Implementations in some other hadoop eco system projects for your reference.
> Hadoop - kill [SIGTERM]
> HBase - kill [SIGTERM] and then "kill -9" [SIGKILL] if process hung
> ZooKeeper - "kill -9" [SIGKILL]
>
>
> -Original Message-
> From: Laxman [mailto:lakshman...@huawei.com]
> Sent: Wednesday, July 13, 2011 12:36 PM
> To: 'dev@zookeeper.apache.org'
> Subject: RE: Does abrupt kill corrupts the datadir?
>
> Hi Mahadev,
>
> Shutdown hook is just a quick thought. Another approach can be just give a
> kill [SIGTERM] call which can be interpreted by process.
>
> First look at the "kill -9" triggered the following scenario.
>>In worst case, if latest snaps in all zookeeper nodes gets corrupted there
>>is a chance of dataloss.
>
> How does zookeeper can deal with this scenario gracefully?
>
> Also, I feel we should give a chance to application to shutdown gracefully
> before abrupt shutdown.
>
> http://en.wikipedia.org/wiki/SIGKILL
>
> Because SIGKILL gives the process no opportunity to do cleanup operations on
> terminating, in most system shutdown procedures an attempt is first made to
> terminate processes using SIGTERM, before resorting to SIGKILL.
>
> http://rackerhacker.com/2010/03/18/sigterm-vs-sigkill/
>
> The application can determine what it wants to do once a SIGTERM is
> received. While most applications will clean up their resources and stop,
> some may not. An application may be configured to do something completely
> different when a SIGTERM is received. Also, if the application is in a bad
> state, such as waiting for disk I/O, it may not be able to act on the signal
> that was sent.
>
> Most system administrators will usually resort to the more abrupt signal
> when an application doesn't respond to a SIGTERM.
>
> -Original Message-
> From: Mahadev Konar [mailto:maha...@hortonworks.com]
> Sent: Wednesday, July 13, 2011 12:02 PM
> To: dev@zookeeper.apache.org
> Subject: Re: Does abrupt kill corrupts the datadir?
>
> Hi Laxman,
>  The servers takes care of all the issues with data integrity, so a kill
> -9 is OK. Shutdown hooks are tricky. Also, the best way to make sure
> everything works reliably is use kill -9 :).
>
> Thanks
> mahadev
>
> On 7/12/11 11:16 PM, "Laxman"  wrote:
>
>>When we stop zookeeper through zkServer.sh stop, we are aborting the
>>zookeeper process using "kill -9".
>>
>>
>>
>>129 stop)
>>
>>130     echo -n "Stopping zookeeper ... "
>>
>>131     if [ ! -f "$ZOOPIDFILE" ]
>>
>>132     then
>>
>>133       echo "error: could not find file $ZOOPIDFILE"
>>
>>134       exit 1
>>
>>135     else
>>
>>136       $KILL -9 $(cat "$ZOOPIDFILE")
>>
>>137       rm "$ZOOPIDFILE"
>>
>>138       echo STOPPED
>>
>>139       exit 0
>>
>>140     fi
>>
>>141     ;;
>>
>>
>>
>>
>>
>>This may corrupt the snapshot and transaction logs. Also, its not
>>recommended to use "kill -9".
>>
>>In worst case, if latest snaps in all zookeeper nodes gets corrupted there
>>is a chance of dataloss.
>>
>>
>>
>>How about introducing a shutdown hook which will ensure zookeeper is
>>shutdown gracefully when we call stop?
>>
>>
>>
>>Note: This is just an observation and its not found in a test.
>>
>>
>>
>>--
>>
>>Thanks,
>>
>>Laxman
>>
>
>
>


[jira] [Commented] (ZOOKEEPER-1128) Recipe wrong for Lock process.

2011-07-27 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071888#comment-13071888
 ] 

Patrick Hunt commented on ZOOKEEPER-1128:
-

bq. It shouldn't be the "the next lowest sequence number". It should be the 
"current lowest path".

this seems incorrect to me as the intent is to eliminate a herd effect on write 
locking. Using the "current lowest path" would result in all waiting writers 
waking up each time the lock is free'd. By using the next lowest this doesn't 
happen

A creates write-1 and holds lock
B creates write-2, watches 1 (next lowest to it's own write-2)
C creates write-3, watches 2
A releases the lock
B wakes up
B releases lock
C wakes up

If B drops out before A releases the lock C will wake up and see that A still 
has the lock.

Is there a particular scenario that you can point out not handled here?


> Recipe wrong for Lock process.
> --
>
> Key: ZOOKEEPER-1128
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1128
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: recipes
>Affects Versions: 3.3.3
>Reporter: yynil
>
> http://zookeeper.apache.org/doc/trunk/recipes.html
> The current recipe for Lock has the wrong process.
> Specifically, for the 
> "4. The client calls exists( ) with the watch flag set on the path in the 
> lock directory with the next lowest sequence number."
> It shouldn't be the "the next lowest sequence number". It should be the 
> "current lowest path". 
> If you're gonna use "the next lowest sequence number", you'll never wait for 
> the lock possession.
> The following is the test code:
> {code:title=LockTest.java|borderStyle=solid}
> ACL acl = new ACL(Perms.ALL, new Id("10.0.0.0/8", "1"));
> List acls = new ArrayList();
> acls.add(acl);
> String connectStr = "localhost:2181";
> final Semaphore sem = new Semaphore(0);
> ZooKeeper zooKeeper = new ZooKeeper(connectStr, 1000 * 30, new 
> Watcher() {
> @Override
> public void process(WatchedEvent event) {
> System.out.println("eventType:" + event.getType());
> System.out.println("keeperState:" + event.getState());
> if (event.getType() == Event.EventType.None) {
> if (event.getState() == Event.KeeperState.SyncConnected) {
> sem.release();
> }
> }
> }
> });
> System.out.println("state:" + zooKeeper.getState());
> System.out.println("Waiting for the state to be connected");
> try {
> sem.acquire();
> } catch (InterruptedException ex) {
> ex.printStackTrace();
> }
> System.out.println("Now state:" + zooKeeper.getState());
> String directory = "/_locknode_";
> Stat stat = zooKeeper.exists(directory, false);
> if (stat == null) {
> zooKeeper.create(directory, new byte[]{}, 
> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
> }
> String prefix = directory + "/lock-";
> String path = zooKeeper.create(prefix, new byte[]{}, 
> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL_SEQUENTIAL);
> System.out.println("Create the path for " + path);
> while (true) {
> List children = zooKeeper.getChildren(directory, false);
> Collections.sort(children);
> System.out.println("The whole lock size is " + children.size());
> String lowestPath = children.get(0);
> DecimalFormat df = new DecimalFormat("00");
> String currentSuffix = lowestPath.substring("lock-".length());
> System.out.println("CurrentSuffix is " + currentSuffix);
> int intIndex = Integer.parseInt(currentSuffix);
> if (path.equals(directory + "/" + lowestPath)) {
> //I've got the lock and release it
> System.out.println("I've got the lock at " + new Date());
> System.out.println("next index is " + intIndex);
> Thread.sleep(1);
> System.out.println("After sleep 3 seconds, I'm gonna release 
> the lock");
> zooKeeper.delete(path, -1);
> break;
> }
> final Semaphore wakeupSem = new Semaphore(0);
> stat = zooKeeper.exists(directory + "/" + lowestPath, new 
> Watcher() {
> @Override
> public void process(WatchedEvent event) {
> System.out.println("Event is " + event.getType());
> System.out.println("State is " + event.getState());
> if (event.getType() == Event.EventType.NodeDeleted) {
> wakeupSem.re

[jira] [Updated] (ZOOKEEPER-1076) some quorum tests are unnecessarily extending QuorumBase

2011-07-27 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-1076:


Attachment: ZOOKEEPER-1076.patch

Just rebased the patch to latest trunk.

> some quorum tests are unnecessarily extending QuorumBase
> 
>
> Key: ZOOKEEPER-1076
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1076
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 3.4.0
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
>Priority: Minor
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-1076.patch, ZOOKEEPER-1076.patch
>
>
> Some tests are unnecessarily extending QuorumBase. Typically this is not a 
> big issue, but it may cause more servers than necessary to be started (harder 
> to debug a failing test in particular).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1131) Transactions can be dropped because leader election uses last committed zxid instead of last acknowledged/received zxid

2011-07-27 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071916#comment-13071916
 ] 

Patrick Hunt commented on ZOOKEEPER-1131:
-

No worries, thanks for verifying this!

> Transactions can be dropped because leader election uses last committed zxid 
> instead of last acknowledged/received zxid
> ---
>
> Key: ZOOKEEPER-1131
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1131
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, server
>Affects Versions: 3.4.0
>Reporter: Alexander Shraer
>
> Suppose we have 3 servers - A, B, C which have seen the same number of 
> commits. 
> - A is the leader and it sends out a new proposal.
> - B doesn't receive the proposal, but A and C receive and ACK it
> - A commits the proposal, but fails before anyone else sees the commit.
> - B and C start leader election. 
> - since both B and C saw the same number of commits, if B has a higher 
> server-id than C, leader election will elect B. Then, the last transaction 
> will be truncated from C's log, which is a bug since it was acked by a 
> majority.
>   
> This happens since servers propose their last committed zxid in leader 
> election, and not their last received / acked zxid (this is not being 
> tracked, AFAIK). See method
> FastLeaderElection.getInitLastLoggedZxid(), which calls 
> QuorumPeer.getLastLoggedZxid(), which is supposed to return the last logged 
> Zxid, but instead calls zkDb.getDataTreeLastProcessedZxid() which returns the 
> last committed zxid.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (ZOOKEEPER-1119) zkServer stop command incorrectly reading comment lines in zoo.cfg

2011-07-27 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated ZOOKEEPER-1119:
-

Attachment: (was: ZOOKEEPER-999-10.patch)

> zkServer stop command incorrectly reading comment lines in zoo.cfg
> --
>
> Key: ZOOKEEPER-1119
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1119
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 3.3.3
> Environment: Ubuntu Linux 10.04, JDK 6
>Reporter: Glen Mazza
>Assignee: Patrick Hunt
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-1119.patch
>
>
> Hello, adding the following commented-out dataDir to the zoo.cfg file 
> (keeping the default one provided active):
> {noformat}
> # the directory where the snapshot is stored.
> # dataDir=test123/data
> dataDir=/export/crawlspace/mahadev/zookeeper/server1/data
> {noformat}
> and then running sh zkServer.sh stop is showing that the program is 
> incorrectly reading the commented-out dataDir:
> {noformat}
> gmazza@gmazza-work:~/dataExt3/apps/zookeeper-3.3.3/bin$ sh zkServer.sh stop
> JMX enabled by default
> Using config: /media/NewDriveExt3_/apps/zookeeper-3.3.3/bin/../conf/zoo.cfg
> Stopping zookeeper ... 
> error: could not find file test123/data
> /export/crawlspace/mahadev/zookeeper/server1/data/zookeeper_server.pid
> gmazza@gmazza-work:~/dataExt3/apps/zookeeper-3.3.3/bin$ 
> {noformat}
> If I change the commented-out line in zoo.cfg to "test123456/data" and run 
> the stop command again I get:
> error: could not find file test123456/data
> showing that it's incorrectly doing a run-time read of the commented-out 
> lines.  (Difficult to completely confirm, but this problem  doesn't appear to 
> occur with the start command, only the stop one.)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1057) zookeeper c-client, connection to offline server fails to successfully fallback to second zk host

2011-07-27 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071926#comment-13071926
 ] 

Mahadev konar commented on ZOOKEEPER-1057:
--

Ben,
 Can you take a look at this? Do you think this should be a blocker for 3.4?

> zookeeper c-client, connection to offline server fails to successfully 
> fallback to second zk host
> -
>
> Key: ZOOKEEPER-1057
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1057
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.3.1, 3.3.2, 3.3.3
> Environment: snowdutyrise-lm ~/-> uname -a
> Darwin snowdutyrise-lm 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul 15 16:55:01 
> PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386
> also observed on:
> 2.6.35-28-server 49-Ubuntu SMP Tue Mar 1 14:55:37 UTC 2011
>Reporter: Woody Anderson
> Fix For: 3.5.0
>
>
> Hello, I'm a contributor for the node.js zookeeper module: 
> https://github.com/yfinkelstein/node-zookeeper
> i'm using zk 3.3.3 for the purposes of this issue, but i have validated it 
> fails on 3.3.1 and 3.3.2
> i'm having an issue when trying to connect when one of my zookeeper servers 
> is offline.
> if the first server attempted is online, all is good.
> if the offline server is attempted first, then the client is never able to 
> connect to _any_ server.
> inside zookeeper.c a connection loss (-4) is received, the socket is closed 
> and buffers are cleaned up, it then attempts the next server in the list, 
> creates a new socket (which gets the same fd as the previously closed socket) 
> and connecting fails, and it continues to fail seemingly forever.
> The nature of this "fail" is not that it gets -4 connection loss errors, but 
> that zookeeper_interest doesn't find anything going on on the socket before 
> the user provided timeout kicks things out. I don't want to have to wait 5 
> minutes, even if i could make myself.
> this is the message that follows the connection loss:
> 2011-04-27 23:18:28,355:13485:ZOO_ERROR@handle_socket_error_msg@1530: Socket 
> [127.0.0.1:5020] zk retcode=-7, errno=60(Operation timed out): connection 
> timed out (exceeded timeout by 3ms)
> 2011-04-27 23:18:28,355:13485:ZOO_ERROR@yield@213: yield:zookeeper_interest 
> returned error: -7 - operation timeout
> While investigating, i decided to comment out close(zh->fd) in handle_error 
> (zookeeper.c#1153)
> now everything works (obviously i'm leaking an fd). Connection the the second 
> host works immediately.
> this is the behavior i'm looking for, though i clearly don't want to leak the 
> fd, so i'm wondering why the fd re-use is causing this issue.
> close() is not returning an error (i checked even though current code assumes 
> success).
> i'm on osx 10.6.7
> i tried adding a setsockopt so_linger (though i didn't want that to be a 
> solution), it didn't work.
> full debug traces are included in issue here: 
> https://github.com/yfinkelstein/node-zookeeper/issues/6

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1076) some quorum tests are unnecessarily extending QuorumBase

2011-07-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071929#comment-13071929
 ] 

Hadoop QA commented on ZOOKEEPER-1076:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12488009/ZOOKEEPER-1076.patch
  against trunk revision 1150937.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 15 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

-1 release audit.  The applied patch generated 26 release audit warnings 
(more than the trunk's current 24 warnings).

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/413//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/413//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/413//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/413//console

This message is automatically generated.

> some quorum tests are unnecessarily extending QuorumBase
> 
>
> Key: ZOOKEEPER-1076
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1076
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 3.4.0
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
>Priority: Minor
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-1076.patch, ZOOKEEPER-1076.patch
>
>
> Some tests are unnecessarily extending QuorumBase. Typically this is not a 
> big issue, but it may cause more servers than necessary to be started (harder 
> to debug a failing test in particular).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1076) some quorum tests are unnecessarily extending QuorumBase

2011-07-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071948#comment-13071948
 ] 

Hadoop QA commented on ZOOKEEPER-1076:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12488009/ZOOKEEPER-1076.patch
  against trunk revision 1150937.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 15 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

-1 release audit.  The applied patch generated 26 release audit warnings 
(more than the trunk's current 24 warnings).

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/414//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/414//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/414//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/414//console

This message is automatically generated.

> some quorum tests are unnecessarily extending QuorumBase
> 
>
> Key: ZOOKEEPER-1076
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1076
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 3.4.0
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
>Priority: Minor
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-1076.patch, ZOOKEEPER-1076.patch
>
>
> Some tests are unnecessarily extending QuorumBase. Typically this is not a 
> big issue, but it may cause more servers than necessary to be started (harder 
> to debug a failing test in particular).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (ZOOKEEPER-1138) release audit failing for a number of new files

2011-07-27 Thread Patrick Hunt (JIRA)
release audit failing for a number of new files
---

 Key: ZOOKEEPER-1138
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1138
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.4.0
Reporter: Patrick Hunt
Assignee: Patrick Hunt
Priority: Blocker
 Fix For: 3.4.0


I'm seeing a number of problems in the release audit output for 3.4.0, these 
must be fixed before 3.4.0 release:

{noformat}
[rat:report]  !? 
/grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/contrib/ZooInspector/config/defaultConnectionSettings.cfg
[rat:report]  !? 
/grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/contrib/ZooInspector/config/defaultNodeVeiwers.cfg
[rat:report]  !? 
/grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/contrib/ZooInspector/licences/epl-v10.html
[rat:report]  !? 
/grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/c/Cli.vcproj
[rat:report]  !? 
/grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/c/include/winconfig.h
[rat:report]  !? 
/grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/c/include/winstdint.h
[rat:report]  !? 
/grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/c/zookeeper.sln
[rat:report]  !? 
/grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/c/zookeeper.vcproj
[rat:report]  !? 
/grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/huebrowser/zkui/src/zkui/static/help/index.html
[rat:report]  !? 
/grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/huebrowser/zkui/src/zkui/static/js/package.yml
[rat:report]  !? 
/grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/log4j.properties
[rat:report]  !? 
/grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/date.format.js
[rat:report]  !? 
/grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/g.bar.js
[rat:report]  !? 
/grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/g.dot.js
[rat:report]  !? 
/grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/g.line.js
[rat:report]  !? 
/grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/g.pie.js
[rat:report]  !? 
/grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/g.raphael.js
[rat:report]  !? 
/grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/raphael.js
[rat:report]  !? 
/grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/yui-min.js
[rat:report]  !? 
/grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/monitoring/JMX-RESOURCES
[rat:report]  !? 
/grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/zooinspector/config/defaultConnectionSettings.cfg
[rat:report]  !? 
/grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/zooinspector/config/defaultNodeVeiwers.cfg
[rat:report]  !? 
/grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/zooinspector/lib/log4j.properties
[rat:report]  !? 
/grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/zooinspector/licences/epl-v10.html
[rat:report]  !? 
/grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/java/test/org/apache/zookeeper/MultiTransactionRecordTest.java
[rat:report]  !? 
/grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/java/test/org/apache/zookeeper/server/quorum/LearnerTest.java
Lines that start with ? in the release audit report indicate file

reminder to committers: new source files must have license headers

2011-07-27 Thread Patrick Hunt
In running the release audit for 3.4.0 branch I see a number of new
files w/o licenses:
https://issues.apache.org/jira/browse/ZOOKEEPER-1138

in general new files must have license headers - this is esp the case
for source code, scripts, etc...

When adding new files to SVN be sure to review the license status of each.

Patrick


[jira] [Updated] (ZOOKEEPER-1138) release audit failing for a number of new files

2011-07-27 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-1138:


Attachment: ZOOKEEPER-1138.patch

This patch addresses the issues identified - the js files do have license 
headers, MIT license.

> release audit failing for a number of new files
> ---
>
> Key: ZOOKEEPER-1138
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1138
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.0
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
>Priority: Blocker
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-1138.patch
>
>
> I'm seeing a number of problems in the release audit output for 3.4.0, these 
> must be fixed before 3.4.0 release:
> {noformat}
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/contrib/ZooInspector/config/defaultConnectionSettings.cfg
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/contrib/ZooInspector/config/defaultNodeVeiwers.cfg
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/contrib/ZooInspector/licences/epl-v10.html
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/c/Cli.vcproj
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/c/include/winconfig.h
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/c/include/winstdint.h
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/c/zookeeper.sln
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/c/zookeeper.vcproj
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/huebrowser/zkui/src/zkui/static/help/index.html
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/huebrowser/zkui/src/zkui/static/js/package.yml
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/log4j.properties
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/date.format.js
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/g.bar.js
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/g.dot.js
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/g.line.js
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/g.pie.js
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/g.raphael.js
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/raphael.js
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/yui-min.js
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/monitoring/JMX-RESOURCES
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/zooinspector/config/defaultConnectionSettings.cfg
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/zooinspector/config/defaultNodeVeiwers.cfg
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/zooinspector/lib/log4j.properties
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build

[jira] [Commented] (ZOOKEEPER-1138) release audit failing for a number of new files

2011-07-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072017#comment-13072017
 ] 

Hadoop QA commented on ZOOKEEPER-1138:
--

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12488026/ZOOKEEPER-1138.patch
  against trunk revision 1150937.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 9 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/415//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/415//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/415//console

This message is automatically generated.

> release audit failing for a number of new files
> ---
>
> Key: ZOOKEEPER-1138
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1138
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.0
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
>Priority: Blocker
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-1138.patch
>
>
> I'm seeing a number of problems in the release audit output for 3.4.0, these 
> must be fixed before 3.4.0 release:
> {noformat}
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/contrib/ZooInspector/config/defaultConnectionSettings.cfg
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/contrib/ZooInspector/config/defaultNodeVeiwers.cfg
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/contrib/ZooInspector/licences/epl-v10.html
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/c/Cli.vcproj
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/c/include/winconfig.h
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/c/include/winstdint.h
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/c/zookeeper.sln
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/c/zookeeper.vcproj
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/huebrowser/zkui/src/zkui/static/help/index.html
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/huebrowser/zkui/src/zkui/static/js/package.yml
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/log4j.properties
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/date.format.js
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/g.bar.js
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/g.dot.js
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/g.line.js
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/g.pie.js
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/g.raphael.js
> [rat:report]  !? 
> /grid/0/hudson/hudson-slave/w

[jira] [Created] (ZOOKEEPER-1139) jenkins is reporting two warnings, fix these

2011-07-27 Thread Patrick Hunt (JIRA)
jenkins is reporting two warnings, fix these


 Key: ZOOKEEPER-1139
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1139
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.4.0
Reporter: Patrick Hunt
Assignee: Patrick Hunt
Priority: Minor
 Fix For: 3.4.0


cleanup jenkins report, currently 2 compiler warnings being reported.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (ZOOKEEPER-1139) jenkins is reporting two warnings, fix these

2011-07-27 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-1139:


Attachment: ZOOKEEPER-1139.patch

cleanup the warnings

> jenkins is reporting two warnings, fix these
> 
>
> Key: ZOOKEEPER-1139
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1139
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.0
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
>Priority: Minor
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-1139.patch
>
>
> cleanup jenkins report, currently 2 compiler warnings being reported.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1139) jenkins is reporting two warnings, fix these

2011-07-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072055#comment-13072055
 ] 

Hadoop QA commented on ZOOKEEPER-1139:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12488036/ZOOKEEPER-1139.patch
  against trunk revision 1150937.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

-1 release audit.  The applied patch generated 26 release audit warnings 
(more than the trunk's current 24 warnings).

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/416//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/416//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/416//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/416//console

This message is automatically generated.

> jenkins is reporting two warnings, fix these
> 
>
> Key: ZOOKEEPER-1139
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1139
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.0
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
>Priority: Minor
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-1139.patch
>
>
> cleanup jenkins report, currently 2 compiler warnings being reported.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1133) allow for "clientPortAddress=host:port"

2011-07-27 Thread Eugene Koontz (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072088#comment-13072088
 ] 

Eugene Koontz commented on ZOOKEEPER-1133:
--

Thanks for your comments Patrick. I think I can remedy your point 4) by 
studying 
http://download.oracle.com/javase/1.4.2/docs/api/java/net/Inet6Address.html.

> allow for "clientPortAddress=host:port"
> ---
>
> Key: ZOOKEEPER-1133
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1133
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: server
>Reporter: Eugene Koontz
>Assignee: Eugene Koontz
>Priority: Minor
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-1133.txt
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-642) "exceeded deadline by N ms" floods logs

2011-07-27 Thread Vishal Kathuria (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072101#comment-13072101
 ] 

Vishal Kathuria commented on ZOOKEEPER-642:
---

I have run into this issue as well, while trying to do some scale testing on 
ZooKeeper. I usually see it when the client machine is under heavy load, but 
the message doesn't have much diagnostic value unless we miss by a large time. 

We could change it to min(timeout/10, 200) instead of 10. In my experience, I 
was routinely seeing delays of less than 200 without any connection loss, hence 
the suggestion of that value.

> "exceeded deadline by N ms" floods logs
> ---
>
> Key: ZOOKEEPER-642
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-642
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.2.1
> Environment: virtualized linux - ec2 - ubuntu
>Reporter: Dale Johnson
> Fix For: 3.5.0
>
>
> More important zookeeper warnings are drown out by the following several 
> times per minute:
> 2010-01-12 17:39:57,227:22317(0x4147eb90):ZOO_WARN@zookeeper_interest@1335: 
> Exceeded deadline by 13ms
> Perhaps this is an issue with the way virtualized systems manage gettimeofday 
> results?
> Maybe the current 10ms threshold could be pushed up a bit.  I notice that 95% 
> of the messages are below 50ms.
> Is there an obvious configuration change that I can make to fix this?
> config file below:
> # The number of milliseconds of each tick
> tickTime=2000
> # The number of ticks that the initial
> # synchronization phase can take
> initLimit=10
> # The number of ticks that can pass between
> # sending a request and getting an acknowledgement
> syncLimit=5
> # the directory where the snapshot is stored.
> dataDir=/mnt/zookeeper
> # the port at which the clients will connect
> clientPort=2181
> server.1=hbase.1:2888:3888
> server.2=hbase.2:2888:3888
> server.3=hbase.3:2888:3888
> server.4=hbase.4:2888:3888
> server.5=hbase.5:2888:3888

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-642) "exceeded deadline by N ms" floods logs

2011-07-27 Thread Ashish Mishra (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072109#comment-13072109
 ] 

Ashish Mishra commented on ZOOKEEPER-642:
-

I'll work on the fix suggested by Vishal.

> "exceeded deadline by N ms" floods logs
> ---
>
> Key: ZOOKEEPER-642
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-642
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.2.1
> Environment: virtualized linux - ec2 - ubuntu
>Reporter: Dale Johnson
> Fix For: 3.5.0
>
>
> More important zookeeper warnings are drown out by the following several 
> times per minute:
> 2010-01-12 17:39:57,227:22317(0x4147eb90):ZOO_WARN@zookeeper_interest@1335: 
> Exceeded deadline by 13ms
> Perhaps this is an issue with the way virtualized systems manage gettimeofday 
> results?
> Maybe the current 10ms threshold could be pushed up a bit.  I notice that 95% 
> of the messages are below 50ms.
> Is there an obvious configuration change that I can make to fix this?
> config file below:
> # The number of milliseconds of each tick
> tickTime=2000
> # The number of ticks that the initial
> # synchronization phase can take
> initLimit=10
> # The number of ticks that can pass between
> # sending a request and getting an acknowledgement
> syncLimit=5
> # the directory where the snapshot is stored.
> dataDir=/mnt/zookeeper
> # the port at which the clients will connect
> clientPort=2181
> server.1=hbase.1:2888:3888
> server.2=hbase.2:2888:3888
> server.3=hbase.3:2888:3888
> server.4=hbase.4:2888:3888
> server.5=hbase.5:2888:3888

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1128) Recipe wrong for Lock process.

2011-07-27 Thread yynil (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072135#comment-13072135
 ] 

yynil commented on ZOOKEEPER-1128:
--

ops, I got it.
I just misunderstood the term "the next lowest sequence number". 
It means "next" to MY number while is not "next" to the current lock number.
It solves my problem and it's way better than "current lock" solution I'm using 
now.

But some more suggestion is to just add your comments here in the tutorial. I 
don't know if I'm the only one to misunderstand it. 
 




> Recipe wrong for Lock process.
> --
>
> Key: ZOOKEEPER-1128
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1128
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: recipes
>Affects Versions: 3.3.3
>Reporter: yynil
>
> http://zookeeper.apache.org/doc/trunk/recipes.html
> The current recipe for Lock has the wrong process.
> Specifically, for the 
> "4. The client calls exists( ) with the watch flag set on the path in the 
> lock directory with the next lowest sequence number."
> It shouldn't be the "the next lowest sequence number". It should be the 
> "current lowest path". 
> If you're gonna use "the next lowest sequence number", you'll never wait for 
> the lock possession.
> The following is the test code:
> {code:title=LockTest.java|borderStyle=solid}
> ACL acl = new ACL(Perms.ALL, new Id("10.0.0.0/8", "1"));
> List acls = new ArrayList();
> acls.add(acl);
> String connectStr = "localhost:2181";
> final Semaphore sem = new Semaphore(0);
> ZooKeeper zooKeeper = new ZooKeeper(connectStr, 1000 * 30, new 
> Watcher() {
> @Override
> public void process(WatchedEvent event) {
> System.out.println("eventType:" + event.getType());
> System.out.println("keeperState:" + event.getState());
> if (event.getType() == Event.EventType.None) {
> if (event.getState() == Event.KeeperState.SyncConnected) {
> sem.release();
> }
> }
> }
> });
> System.out.println("state:" + zooKeeper.getState());
> System.out.println("Waiting for the state to be connected");
> try {
> sem.acquire();
> } catch (InterruptedException ex) {
> ex.printStackTrace();
> }
> System.out.println("Now state:" + zooKeeper.getState());
> String directory = "/_locknode_";
> Stat stat = zooKeeper.exists(directory, false);
> if (stat == null) {
> zooKeeper.create(directory, new byte[]{}, 
> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
> }
> String prefix = directory + "/lock-";
> String path = zooKeeper.create(prefix, new byte[]{}, 
> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL_SEQUENTIAL);
> System.out.println("Create the path for " + path);
> while (true) {
> List children = zooKeeper.getChildren(directory, false);
> Collections.sort(children);
> System.out.println("The whole lock size is " + children.size());
> String lowestPath = children.get(0);
> DecimalFormat df = new DecimalFormat("00");
> String currentSuffix = lowestPath.substring("lock-".length());
> System.out.println("CurrentSuffix is " + currentSuffix);
> int intIndex = Integer.parseInt(currentSuffix);
> if (path.equals(directory + "/" + lowestPath)) {
> //I've got the lock and release it
> System.out.println("I've got the lock at " + new Date());
> System.out.println("next index is " + intIndex);
> Thread.sleep(1);
> System.out.println("After sleep 3 seconds, I'm gonna release 
> the lock");
> zooKeeper.delete(path, -1);
> break;
> }
> final Semaphore wakeupSem = new Semaphore(0);
> stat = zooKeeper.exists(directory + "/" + lowestPath, new 
> Watcher() {
> @Override
> public void process(WatchedEvent event) {
> System.out.println("Event is " + event.getType());
> System.out.println("State is " + event.getState());
> if (event.getType() == Event.EventType.NodeDeleted) {
> wakeupSem.release();
> }
> }
> });
> if (stat != null) {
> System.out.println("Waiting for the delete of ");
> wakeupSem.acquire();
> } else {
> System.out.println("Continue to seek");
> }
> }
> {code} 

[jira] [Resolved] (ZOOKEEPER-1128) Recipe wrong for Lock process.

2011-07-27 Thread yynil (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yynil resolved ZOOKEEPER-1128.
--

Resolution: Fixed

With Hunt's comments, it's clear to me. I feel this issue resolved with the 
explanation. It will be good to add these comments to the tutorial to help the 
newbies like me. 

> Recipe wrong for Lock process.
> --
>
> Key: ZOOKEEPER-1128
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1128
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: recipes
>Affects Versions: 3.3.3
>Reporter: yynil
>
> http://zookeeper.apache.org/doc/trunk/recipes.html
> The current recipe for Lock has the wrong process.
> Specifically, for the 
> "4. The client calls exists( ) with the watch flag set on the path in the 
> lock directory with the next lowest sequence number."
> It shouldn't be the "the next lowest sequence number". It should be the 
> "current lowest path". 
> If you're gonna use "the next lowest sequence number", you'll never wait for 
> the lock possession.
> The following is the test code:
> {code:title=LockTest.java|borderStyle=solid}
> ACL acl = new ACL(Perms.ALL, new Id("10.0.0.0/8", "1"));
> List acls = new ArrayList();
> acls.add(acl);
> String connectStr = "localhost:2181";
> final Semaphore sem = new Semaphore(0);
> ZooKeeper zooKeeper = new ZooKeeper(connectStr, 1000 * 30, new 
> Watcher() {
> @Override
> public void process(WatchedEvent event) {
> System.out.println("eventType:" + event.getType());
> System.out.println("keeperState:" + event.getState());
> if (event.getType() == Event.EventType.None) {
> if (event.getState() == Event.KeeperState.SyncConnected) {
> sem.release();
> }
> }
> }
> });
> System.out.println("state:" + zooKeeper.getState());
> System.out.println("Waiting for the state to be connected");
> try {
> sem.acquire();
> } catch (InterruptedException ex) {
> ex.printStackTrace();
> }
> System.out.println("Now state:" + zooKeeper.getState());
> String directory = "/_locknode_";
> Stat stat = zooKeeper.exists(directory, false);
> if (stat == null) {
> zooKeeper.create(directory, new byte[]{}, 
> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
> }
> String prefix = directory + "/lock-";
> String path = zooKeeper.create(prefix, new byte[]{}, 
> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL_SEQUENTIAL);
> System.out.println("Create the path for " + path);
> while (true) {
> List children = zooKeeper.getChildren(directory, false);
> Collections.sort(children);
> System.out.println("The whole lock size is " + children.size());
> String lowestPath = children.get(0);
> DecimalFormat df = new DecimalFormat("00");
> String currentSuffix = lowestPath.substring("lock-".length());
> System.out.println("CurrentSuffix is " + currentSuffix);
> int intIndex = Integer.parseInt(currentSuffix);
> if (path.equals(directory + "/" + lowestPath)) {
> //I've got the lock and release it
> System.out.println("I've got the lock at " + new Date());
> System.out.println("next index is " + intIndex);
> Thread.sleep(1);
> System.out.println("After sleep 3 seconds, I'm gonna release 
> the lock");
> zooKeeper.delete(path, -1);
> break;
> }
> final Semaphore wakeupSem = new Semaphore(0);
> stat = zooKeeper.exists(directory + "/" + lowestPath, new 
> Watcher() {
> @Override
> public void process(WatchedEvent event) {
> System.out.println("Event is " + event.getType());
> System.out.println("State is " + event.getState());
> if (event.getType() == Event.EventType.NodeDeleted) {
> wakeupSem.release();
> }
> }
> });
> if (stat != null) {
> System.out.println("Waiting for the delete of ");
> wakeupSem.acquire();
> } else {
> System.out.println("Continue to seek");
> }
> }
> {code} 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1137) AuthFLE is throwing NPE when servers are configured with different election ports.

2011-07-27 Thread Laxman (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072171#comment-13072171
 ] 

Laxman commented on ZOOKEEPER-1137:
---

Thanks for the info Flavio. I really don't have a specific use case. We found 
this problem while exploring different features. I've a patch to fix this 
problem as I mentioned in earlier comments and currently verifying the same.

Even if this is going to be deprecated we can't have a NPE. no?

Please let me know if I can upload the patch.

> AuthFLE is throwing NPE when servers are configured with different election 
> ports.
> --
>
> Key: ZOOKEEPER-1137
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1137
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.3.3
>Reporter: Laxman
>Assignee: Laxman
>Priority: Critical
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> AuthFLE is throwing NPE when servers are configured with different election 
> ports.
> *Configuration*
> {noformat}
> server.1 = 10.18.52.25:2888:3888
> server.2 = 10.18.52.205:2889:3889
> server.3 = 10.18.52.144:2899:3890
> {noformat}
> *Logs*
> {noformat}
> 2011-07-22 16:06:22,404 - INFO  
> [QuorumPeer:/0:0:0:0:0:0:0:0:65170:AuthFastLeaderElection@844] - Election 
> tally
> 2011-07-22 16:06:29,483 - ERROR [WorkerSender Thread: 
> 6:NIOServerCnxn$Factory$1@81] - Thread Thread[WorkerSender Thread: 6,5,main] 
> died
> java.lang.NullPointerException
>   at 
> org.apache.zookeeper.server.quorum.AuthFastLeaderElection$Messenger$WorkerSender.process(AuthFastLeaderElection.java:488)
>   at 
> org.apache.zookeeper.server.quorum.AuthFastLeaderElection$Messenger$WorkerSender.run(AuthFastLeaderElection.java:432)
>   at java.lang.Thread.run(Thread.java:619)
> 2011-07-22 16:06:29,583 - ERROR [WorkerSender Thread: 
> 1:NIOServerCnxn$Factory$1@81] - Thread Thread[WorkerSender Thread: 1,5,main] 
> died
> java.lang.NullPointerException
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1137) AuthFLE is throwing NPE when servers are configured with different election ports.

2011-07-27 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072177#comment-13072177
 ] 

Flavio Junqueira commented on ZOOKEEPER-1137:
-

Sure, go ahead and upload a patch. Having a test for AuthFLE would also be 
useful.

> AuthFLE is throwing NPE when servers are configured with different election 
> ports.
> --
>
> Key: ZOOKEEPER-1137
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1137
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.3.3
>Reporter: Laxman
>Assignee: Laxman
>Priority: Critical
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> AuthFLE is throwing NPE when servers are configured with different election 
> ports.
> *Configuration*
> {noformat}
> server.1 = 10.18.52.25:2888:3888
> server.2 = 10.18.52.205:2889:3889
> server.3 = 10.18.52.144:2899:3890
> {noformat}
> *Logs*
> {noformat}
> 2011-07-22 16:06:22,404 - INFO  
> [QuorumPeer:/0:0:0:0:0:0:0:0:65170:AuthFastLeaderElection@844] - Election 
> tally
> 2011-07-22 16:06:29,483 - ERROR [WorkerSender Thread: 
> 6:NIOServerCnxn$Factory$1@81] - Thread Thread[WorkerSender Thread: 6,5,main] 
> died
> java.lang.NullPointerException
>   at 
> org.apache.zookeeper.server.quorum.AuthFastLeaderElection$Messenger$WorkerSender.process(AuthFastLeaderElection.java:488)
>   at 
> org.apache.zookeeper.server.quorum.AuthFastLeaderElection$Messenger$WorkerSender.run(AuthFastLeaderElection.java:432)
>   at java.lang.Thread.run(Thread.java:619)
> 2011-07-22 16:06:29,583 - ERROR [WorkerSender Thread: 
> 1:NIOServerCnxn$Factory$1@81] - Thread Thread[WorkerSender Thread: 1,5,main] 
> died
> java.lang.NullPointerException
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (ZOOKEEPER-1090) Race condition while taking snapshot can lead to not restoring data tree correctly

2011-07-27 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-1090:
-

Hadoop Flags: [Reviewed]

+1 great find and fix vishal

> Race condition while taking snapshot can lead to not restoring data tree 
> correctly
> --
>
> Key: ZOOKEEPER-1090
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1090
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.3.3
>Reporter: Vishal K
>Assignee: Vishal K
>Priority: Critical
>  Labels: persistence, server, snapshot
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-1090
>
>
> I think I have found a bug in the snapshot mechanism.
> The problem occurs because dt.lastProcessedZxid is not synchronized (or 
> rather set before the data tree is modified):
> FileTxnSnapLog:
> {code}
> public void save(DataTree dataTree,
> ConcurrentHashMap sessionsWithTimeouts)
> throws IOException {
> long lastZxid = dataTree.lastProcessedZxid;
> LOG.info("Snapshotting: " + Long.toHexString(lastZxid));
> File snapshot=new File(
> snapDir, Util.makeSnapshotName(lastZxid));
> snapLog.serialize(dataTree, sessionsWithTimeouts, snapshot);   <=== 
> the Datatree may not have the modification for lastProcessedZxid
> }
> {code}
> DataTree:
> {code}
> public ProcessTxnResult processTxn(TxnHeader header, Record txn) {
> ProcessTxnResult rc = new ProcessTxnResult();
> String debug = "";
> try {
> rc.clientId = header.getClientId();
> rc.cxid = header.getCxid();
> rc.zxid = header.getZxid();
> rc.type = header.getType();
> rc.err = 0;
> if (rc.zxid > lastProcessedZxid) {
> lastProcessedZxid = rc.zxid;
> }
> [...modify data tree...]   
>  }
> {code}
> The lastProcessedZxid must be set after the modification is done.
> As a result, if server crashes after taking the snapshot (and the snapshot 
> does not contain change corresponding to lastProcessedZxid) restore will not 
> restore the data tree correctly:
> {code}
> public long restore(DataTree dt, Map sessions,
> PlayBackListener listener) throws IOException {
> snapLog.deserialize(dt, sessions);
> FileTxnLog txnLog = new FileTxnLog(dataDir);
> TxnIterator itr = txnLog.read(dt.lastProcessedZxid+1); <=== Assumes 
> lastProcessedZxid is deserialized
>  }
> {code}
> I have had offline discussion with Ben and Camille on this. I will be posting 
> the discussion shortly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1090) Race condition while taking snapshot can lead to not restoring data tree correctly

2011-07-27 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072180#comment-13072180
 ] 

Benjamin Reed commented on ZOOKEEPER-1090:
--

oh, i also wanted to second camille's comment about Assert. Also, instead of 
Assert.assertTrue("message", false) you can use Assert.fail("message")

> Race condition while taking snapshot can lead to not restoring data tree 
> correctly
> --
>
> Key: ZOOKEEPER-1090
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1090
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.3.3
>Reporter: Vishal K
>Assignee: Vishal K
>Priority: Critical
>  Labels: persistence, server, snapshot
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-1090
>
>
> I think I have found a bug in the snapshot mechanism.
> The problem occurs because dt.lastProcessedZxid is not synchronized (or 
> rather set before the data tree is modified):
> FileTxnSnapLog:
> {code}
> public void save(DataTree dataTree,
> ConcurrentHashMap sessionsWithTimeouts)
> throws IOException {
> long lastZxid = dataTree.lastProcessedZxid;
> LOG.info("Snapshotting: " + Long.toHexString(lastZxid));
> File snapshot=new File(
> snapDir, Util.makeSnapshotName(lastZxid));
> snapLog.serialize(dataTree, sessionsWithTimeouts, snapshot);   <=== 
> the Datatree may not have the modification for lastProcessedZxid
> }
> {code}
> DataTree:
> {code}
> public ProcessTxnResult processTxn(TxnHeader header, Record txn) {
> ProcessTxnResult rc = new ProcessTxnResult();
> String debug = "";
> try {
> rc.clientId = header.getClientId();
> rc.cxid = header.getCxid();
> rc.zxid = header.getZxid();
> rc.type = header.getType();
> rc.err = 0;
> if (rc.zxid > lastProcessedZxid) {
> lastProcessedZxid = rc.zxid;
> }
> [...modify data tree...]   
>  }
> {code}
> The lastProcessedZxid must be set after the modification is done.
> As a result, if server crashes after taking the snapshot (and the snapshot 
> does not contain change corresponding to lastProcessedZxid) restore will not 
> restore the data tree correctly:
> {code}
> public long restore(DataTree dt, Map sessions,
> PlayBackListener listener) throws IOException {
> snapLog.deserialize(dt, sessions);
> FileTxnLog txnLog = new FileTxnLog(dataDir);
> TxnIterator itr = txnLog.read(dt.lastProcessedZxid+1); <=== Assumes 
> lastProcessedZxid is deserialized
>  }
> {code}
> I have had offline discussion with Ben and Camille on this. I will be posting 
> the discussion shortly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1090) Race condition while taking snapshot can lead to not restoring data tree correctly

2011-07-27 Thread Camille Fournier (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072182#comment-13072182
 ] 

Camille Fournier commented on ZOOKEEPER-1090:
-

Ben, can you check this one in? Thanks!

> Race condition while taking snapshot can lead to not restoring data tree 
> correctly
> --
>
> Key: ZOOKEEPER-1090
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1090
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.3.3
>Reporter: Vishal K
>Assignee: Vishal K
>Priority: Critical
>  Labels: persistence, server, snapshot
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-1090
>
>
> I think I have found a bug in the snapshot mechanism.
> The problem occurs because dt.lastProcessedZxid is not synchronized (or 
> rather set before the data tree is modified):
> FileTxnSnapLog:
> {code}
> public void save(DataTree dataTree,
> ConcurrentHashMap sessionsWithTimeouts)
> throws IOException {
> long lastZxid = dataTree.lastProcessedZxid;
> LOG.info("Snapshotting: " + Long.toHexString(lastZxid));
> File snapshot=new File(
> snapDir, Util.makeSnapshotName(lastZxid));
> snapLog.serialize(dataTree, sessionsWithTimeouts, snapshot);   <=== 
> the Datatree may not have the modification for lastProcessedZxid
> }
> {code}
> DataTree:
> {code}
> public ProcessTxnResult processTxn(TxnHeader header, Record txn) {
> ProcessTxnResult rc = new ProcessTxnResult();
> String debug = "";
> try {
> rc.clientId = header.getClientId();
> rc.cxid = header.getCxid();
> rc.zxid = header.getZxid();
> rc.type = header.getType();
> rc.err = 0;
> if (rc.zxid > lastProcessedZxid) {
> lastProcessedZxid = rc.zxid;
> }
> [...modify data tree...]   
>  }
> {code}
> The lastProcessedZxid must be set after the modification is done.
> As a result, if server crashes after taking the snapshot (and the snapshot 
> does not contain change corresponding to lastProcessedZxid) restore will not 
> restore the data tree correctly:
> {code}
> public long restore(DataTree dt, Map sessions,
> PlayBackListener listener) throws IOException {
> snapLog.deserialize(dt, sessions);
> FileTxnLog txnLog = new FileTxnLog(dataDir);
> TxnIterator itr = txnLog.read(dt.lastProcessedZxid+1); <=== Assumes 
> lastProcessedZxid is deserialized
>  }
> {code}
> I have had offline discussion with Ben and Camille on this. I will be posting 
> the discussion shortly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: FW: Does abrupt kill corrupts the datadir?

2011-07-27 Thread Benjamin Reed
i agree with pat. if we use sigterm in the script, we would want to
put a timeout in to escalate to a -9 which makes the script a bit more
complicated without reason since we don't have any exit hooks that we
want to run. zookeeper is designed to recover well from hard failures,
much worse than a kill -9. i don't think we want to change that.

ben

On Wed, Jul 27, 2011 at 10:25 AM, Patrick Hunt  wrote:
> ZK has been built around the "fail fast" approach. In order to
> maintain high availability we want to ensure that restarting a server
> will result in it attempting to rejoin the quorum. IMO we would not
> want to change this (kill -9).
>
> Patrick
>
> On Tue, Jul 26, 2011 at 2:02 AM, Laxman  wrote:
>> Hi Everyone,
>>
>> Any thoughts?
>> Do we need consider changing abrupt shutdown to
>>
>> Implementations in some other hadoop eco system projects for your reference.
>> Hadoop - kill [SIGTERM]
>> HBase - kill [SIGTERM] and then "kill -9" [SIGKILL] if process hung
>> ZooKeeper - "kill -9" [SIGKILL]
>>
>>
>> -Original Message-
>> From: Laxman [mailto:lakshman...@huawei.com]
>> Sent: Wednesday, July 13, 2011 12:36 PM
>> To: 'dev@zookeeper.apache.org'
>> Subject: RE: Does abrupt kill corrupts the datadir?
>>
>> Hi Mahadev,
>>
>> Shutdown hook is just a quick thought. Another approach can be just give a
>> kill [SIGTERM] call which can be interpreted by process.
>>
>> First look at the "kill -9" triggered the following scenario.
>>>In worst case, if latest snaps in all zookeeper nodes gets corrupted there
>>>is a chance of dataloss.
>>
>> How does zookeeper can deal with this scenario gracefully?
>>
>> Also, I feel we should give a chance to application to shutdown gracefully
>> before abrupt shutdown.
>>
>> http://en.wikipedia.org/wiki/SIGKILL
>>
>> Because SIGKILL gives the process no opportunity to do cleanup operations on
>> terminating, in most system shutdown procedures an attempt is first made to
>> terminate processes using SIGTERM, before resorting to SIGKILL.
>>
>> http://rackerhacker.com/2010/03/18/sigterm-vs-sigkill/
>>
>> The application can determine what it wants to do once a SIGTERM is
>> received. While most applications will clean up their resources and stop,
>> some may not. An application may be configured to do something completely
>> different when a SIGTERM is received. Also, if the application is in a bad
>> state, such as waiting for disk I/O, it may not be able to act on the signal
>> that was sent.
>>
>> Most system administrators will usually resort to the more abrupt signal
>> when an application doesn't respond to a SIGTERM.
>>
>> -Original Message-
>> From: Mahadev Konar [mailto:maha...@hortonworks.com]
>> Sent: Wednesday, July 13, 2011 12:02 PM
>> To: dev@zookeeper.apache.org
>> Subject: Re: Does abrupt kill corrupts the datadir?
>>
>> Hi Laxman,
>>  The servers takes care of all the issues with data integrity, so a kill
>> -9 is OK. Shutdown hooks are tricky. Also, the best way to make sure
>> everything works reliably is use kill -9 :).
>>
>> Thanks
>> mahadev
>>
>> On 7/12/11 11:16 PM, "Laxman"  wrote:
>>
>>>When we stop zookeeper through zkServer.sh stop, we are aborting the
>>>zookeeper process using "kill -9".
>>>
>>>
>>>
>>>129 stop)
>>>
>>>130     echo -n "Stopping zookeeper ... "
>>>
>>>131     if [ ! -f "$ZOOPIDFILE" ]
>>>
>>>132     then
>>>
>>>133       echo "error: could not find file $ZOOPIDFILE"
>>>
>>>134       exit 1
>>>
>>>135     else
>>>
>>>136       $KILL -9 $(cat "$ZOOPIDFILE")
>>>
>>>137       rm "$ZOOPIDFILE"
>>>
>>>138       echo STOPPED
>>>
>>>139       exit 0
>>>
>>>140     fi
>>>
>>>141     ;;
>>>
>>>
>>>
>>>
>>>
>>>This may corrupt the snapshot and transaction logs. Also, its not
>>>recommended to use "kill -9".
>>>
>>>In worst case, if latest snaps in all zookeeper nodes gets corrupted there
>>>is a chance of dataloss.
>>>
>>>
>>>
>>>How about introducing a shutdown hook which will ensure zookeeper is
>>>shutdown gracefully when we call stop?
>>>
>>>
>>>
>>>Note: This is just an observation and its not found in a test.
>>>
>>>
>>>
>>>--
>>>
>>>Thanks,
>>>
>>>Laxman
>>>
>>
>>
>>
>


[jira] [Updated] (ZOOKEEPER-1025) zkCli is overly sensitive to to spaces.

2011-07-27 Thread Laxman (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laxman updated ZOOKEEPER-1025:
--

Attachment: ZOOKEEPER-1025.patch

> zkCli is overly sensitive to to spaces.
> ---
>
> Key: ZOOKEEPER-1025
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1025
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: java client
>Reporter: Jonathan Hsieh
>Assignee: Laxman
> Attachments: ZOOKEEPER-1025.patch
>
>
> Here's an example: 
> I do an ls to get znode names. I try to stat a znode.  
> {code}
> [zk: localhost:3181(CONNECTED) 1] ls /flume-nodes
> [nodes02, nodes01, nodes00, nodes05, 
> nodes04, nodes03]
> [zk: localhost:3181(CONNECTED) 3] stat /flume-nodes/nodes02 
> cZxid = 0xb
> ctime = Sun Mar 20 23:24:03 PDT 2011
> ... (success)
> {code}
> Here's something that almost looks the same.  Notice the extra space infront 
> of the znode name.
> {code}
> [zk: localhost:3181(CONNECTED) 2] stat  /flume-nodes/nodes02
> Command failed: java.lang.IllegalArgumentException: Path length must be > 0
> {code}
> This seems like unexpected behavior.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (ZOOKEEPER-1107) automating log and snapshot cleaning

2011-07-27 Thread Laxman (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laxman updated ZOOKEEPER-1107:
--

Attachment: ZOOKEEPER-1107.5.patch

> automating log and snapshot cleaning
> 
>
> Key: ZOOKEEPER-1107
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1107
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.3.3
>Reporter: Jun Rao
>Assignee: Laxman
> Attachments: ZOOKEEPER-1107.1.patch, ZOOKEEPER-1107.2.patch, 
> ZOOKEEPER-1107.3.patch, ZOOKEEPER-1107.4.patch, ZOOKEEPER-1107.5.patch, 
> ZOOKEEPER-1107.patch
>
>
> I like to have ZK itself manage the amount of snapshots and logs kept,  
> instead of relying on the PurgeTxnLog utility.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1128) Recipe wrong for Lock process.

2011-07-27 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072213#comment-13072213
 ] 

Patrick Hunt commented on ZOOKEEPER-1128:
-

Updating the docs to clarify this sounds reasonable to me. I'd encourage you to 
create a new jira for this (and a patch would be great too! :-) ). Perhaps as 
an example similar to what I gave in my comment? thanks.


> Recipe wrong for Lock process.
> --
>
> Key: ZOOKEEPER-1128
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1128
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: recipes
>Affects Versions: 3.3.3
>Reporter: yynil
>
> http://zookeeper.apache.org/doc/trunk/recipes.html
> The current recipe for Lock has the wrong process.
> Specifically, for the 
> "4. The client calls exists( ) with the watch flag set on the path in the 
> lock directory with the next lowest sequence number."
> It shouldn't be the "the next lowest sequence number". It should be the 
> "current lowest path". 
> If you're gonna use "the next lowest sequence number", you'll never wait for 
> the lock possession.
> The following is the test code:
> {code:title=LockTest.java|borderStyle=solid}
> ACL acl = new ACL(Perms.ALL, new Id("10.0.0.0/8", "1"));
> List acls = new ArrayList();
> acls.add(acl);
> String connectStr = "localhost:2181";
> final Semaphore sem = new Semaphore(0);
> ZooKeeper zooKeeper = new ZooKeeper(connectStr, 1000 * 30, new 
> Watcher() {
> @Override
> public void process(WatchedEvent event) {
> System.out.println("eventType:" + event.getType());
> System.out.println("keeperState:" + event.getState());
> if (event.getType() == Event.EventType.None) {
> if (event.getState() == Event.KeeperState.SyncConnected) {
> sem.release();
> }
> }
> }
> });
> System.out.println("state:" + zooKeeper.getState());
> System.out.println("Waiting for the state to be connected");
> try {
> sem.acquire();
> } catch (InterruptedException ex) {
> ex.printStackTrace();
> }
> System.out.println("Now state:" + zooKeeper.getState());
> String directory = "/_locknode_";
> Stat stat = zooKeeper.exists(directory, false);
> if (stat == null) {
> zooKeeper.create(directory, new byte[]{}, 
> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
> }
> String prefix = directory + "/lock-";
> String path = zooKeeper.create(prefix, new byte[]{}, 
> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL_SEQUENTIAL);
> System.out.println("Create the path for " + path);
> while (true) {
> List children = zooKeeper.getChildren(directory, false);
> Collections.sort(children);
> System.out.println("The whole lock size is " + children.size());
> String lowestPath = children.get(0);
> DecimalFormat df = new DecimalFormat("00");
> String currentSuffix = lowestPath.substring("lock-".length());
> System.out.println("CurrentSuffix is " + currentSuffix);
> int intIndex = Integer.parseInt(currentSuffix);
> if (path.equals(directory + "/" + lowestPath)) {
> //I've got the lock and release it
> System.out.println("I've got the lock at " + new Date());
> System.out.println("next index is " + intIndex);
> Thread.sleep(1);
> System.out.println("After sleep 3 seconds, I'm gonna release 
> the lock");
> zooKeeper.delete(path, -1);
> break;
> }
> final Semaphore wakeupSem = new Semaphore(0);
> stat = zooKeeper.exists(directory + "/" + lowestPath, new 
> Watcher() {
> @Override
> public void process(WatchedEvent event) {
> System.out.println("Event is " + event.getType());
> System.out.println("State is " + event.getState());
> if (event.getType() == Event.EventType.NodeDeleted) {
> wakeupSem.release();
> }
> }
> });
> if (stat != null) {
> System.out.println("Waiting for the delete of ");
> wakeupSem.acquire();
> } else {
> System.out.println("Continue to seek");
> }
> }
> {code} 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira