[jira] [Created] (ZOOKEEPER-4220) Redundant connection attempts during leader election
Alex Mirgorodskiy created ZOOKEEPER-4220: Summary: Redundant connection attempts during leader election Key: ZOOKEEPER-4220 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4220 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.5.5 Reporter: Alex Mirgorodskiy We've seen a few failures or long delays in electing a new leader when the previous one has a hard host reset (as opposed to just the service process down, since connections don't need to wait for timeout there). Symptoms are similar to https://issues.apache.org/jira/browse/ZOOKEEPER-2164. Reducing cnxTimeout from 5 to 1.5 seconds makes the problem much less frequent, but doesn't fix it completely. We are still using an old ZooKeeper version (3.5.5), and the new async connect feature will presumably avoid it. But we noticed a pattern of twice the expected number of connection attempts to the same downed instance in the log, and it appears to be due to a code glitch in QuorumCnxManager.java: {code:java} synchronized void connectOne(long sid) { ... if (lastCommittedView.containsKey(sid)) { knownId = true; if (connectOne(sid, lastCommittedView.get(sid).electionAddr)) return; } if (lastSeenQV != null && lastProposedView.containsKey(sid) && (!knownId || (lastProposedView.get(sid).electionAddr != < lastCommittedView.get(sid).electionAddr))) { knownId = true; if (connectOne(sid, lastProposedView.get(sid).electionAddr)) return; } {code} Comparing electionAddrs should be done with !equals presumably, otherwise connectOne will be invoked an extra time even in the common case when the addresses do match. The code around it has changed recently, but the check itself still exists at the top of master. It might not matter as much with the async connects, but perhaps it helps even then. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ZOOKEEPER-3757) Transaction log sync can take 20+ seconds after leader election when there is a large snapCount
Alex Kaiser created ZOOKEEPER-3757: -- Summary: Transaction log sync can take 20+ seconds after leader election when there is a large snapCount Key: ZOOKEEPER-3757 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3757 Project: ZooKeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.5.6 Reporter: Alex Kaiser Short overview: If you have a large snapCount (we are using 10,000,000) you can end up with a very large transaction log (ours are between 1GB - 1.5 GB), which can cause the sync between a newly elected leader and it's followers to take 20+ seconds. This stems from the code (FileTxnIterator.getStorageSize()) in most cases returning 0 even if the transaction log is 1GB. Long Explanation: A few years ago we had some trouble with our zookeeper cluster having many shortish (100-500ms) pauses during our peak traffic times. These ended up resulting from the master taking a snap shot. To solve this we upped the snapCount to 10,000,000 so that we weren't taking snapshots nearly as often. We also made changes to reduce the size of our snapshots (from around 2.5 GB to ~500 MB). I don't remember what version of zookeeper we were using originally, but this was all working fine using 3.4.10, but we started to have problems when we upgraded to 3.5.6 around 3 months ago. We have a fairly high transaction rate and thus end up hitting the zxid overflow about once a month, which will cause a leader election. When we were on 3.4.10, this was fine because leader election and syncing would happen within 2-4 seconds, which was low enough for us to be able to basically ignore it. However after we upgraded to 3.5.6 the pauses we saw took between 15 - 30 seconds which were unacceptable for us. For now to solve this I set zookeeper.forceSnapshotSync=true (yes, I know the comments say this is only supposed to be used for testing), which causes syncing using snapshots (only 10-50 MB) instead of the transaction log (1-1.5 GB). Technical details: I tried taking a look at the code and I think I know why this happens. From what I learned, it looks like when a follower needs to sync with a leader on the leader LearnerHandler.syncFollower() gets called. It goes through a big if statement, but at one point it will call db.getProposalsFromTxnLog(peerLastZxid, sizeLimit). That peerLastZxid could be some very old zxid if the follower hadn't taken a snapshot in a long time (i.e. has a large snapCount) and the sizeLimit will generally be 0.33 * snapshot size (in my case around 10 MB). Inside of that getProposalsFromTxnLog it will create a TxnIterator and then call getStorageSize() on it. The problem comes from the fact that this call to getStorageSize() will usually return with 0. The reason that happens is because the FileTxnIterator class has a "current" log file that it is reading, this.logFile, and a list of files that it would still have to iterate through, this.storedFiles. The getStroageSize() function though only looks at the storedFiles list, so if the iterator has one large transaction log as the "current" log file and nothing in the storedFiles list, then this method will return 0 even though there is a huge transaction log to sync. One other side affect of this problem is that even bouncing a follower can cause long (5-10 second) pauses as the leader will hold a read lock on the transaction log while it syncs up with the follower. While I know what the problem is I don't know what the best solution is. I'm willing to work on the solution, but I would appreciate suggestions. One possible solution would be to include the this.logFile in the getStorageSize() calculation, however this could cause the estimate to over estimate the amount of data that is in the iterator (possibly by a lot) and I don't know what the consequences of doing that is. I'm not quite sure what is a good way to get an accurate estimate. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ZOOKEEPER-3642) Data inconsistency when the leader crashes right after sending SNAP sync
Alex Mirgorodskiy created ZOOKEEPER-3642: Summary: Data inconsistency when the leader crashes right after sending SNAP sync Key: ZOOKEEPER-3642 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3642 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.5.6, 3.5.5 Environment: Linux 4.19.29 x86_64 Reporter: Alex Mirgorodskiy If the leader crashes after sending a SNAP sync to a learner, but before sending the NEWLEADER message, the learner will not save the snapshot to disk. But it will advance its lastProcessedZxid to that from the snapshot (call it Zxid X) A new leader will get elected, and it will resync our learner again immediately. But this time, it will use the incremental DIFF method, starting from Zxid X. A DIFF-based resync does not trigger snapshots, so the learner is still holding the original snapshot purely in memory. If the learner restarts after that, it will silently lose all the data up to Zxid X. An easy way to reproduce is to insert System.exit into LearnerHandler.java right before sending the NEWLEADER message (on the one instance that is currently running the leader, but not the others): {noformat} LOG.debug("Sending NEWLEADER message to " + sid); +if (leader.self.getId() == 1 && sid == 3) { + LOG.debug("Bail when server.1 resyncs server.3"); + System.exit(0); +} {noformat} If I remember right, the repro steps are as follows. Run with that patch in a 4-instance ensemble where server.3 is an Observer, the rest are voting members, and server.1 is the current Leader. Start server.3 after the other instances are up. It will get the initial snapshot from server.1 and server.1 will stop immediately because of the patch. Say, server.2 takes over as the new Leader. Server.3 will receive a Diff resync from server.2, but will skip persisting the snapshot. A subsequent restart of server.3 will make that instance come up with a blank data tree. The above steps assumed that server.3 is an Observer, but it can presumably happen for voting members too. Just need a 5-instance ensemble. Our workaround is to take the snapshot unconditionally on receiving NEWLEADER: {noformat} - if (snapshotNeeded) { + // Take the snapshot unconditionally. The first leader may have crashed + // after sending us a SNAP, but before sending NEWLEADER. The second leader will + // send us a DIFF, and we'd still like to take a snapshot, even though + // the upstream code used to skip it. + if (true || snapshotNeeded) { zk.takeSnapshot(); } {noformat} This is what 3.4.x series used to do. But I assume it is not the ideal fix, since it essentially disables the "snapshotNeeded" optimization. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ZOOKEEPER-3591) Inconsistent resync with dynamic reconfig
Alex Mirgorodskiy created ZOOKEEPER-3591: Summary: Inconsistent resync with dynamic reconfig Key: ZOOKEEPER-3591 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3591 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.5.5 Reporter: Alex Mirgorodskiy Attachments: instance1.log.gz, instance6.log.gz We've run into a problem where one of the zookeeper instances lost most of its data after its zk process has been restarted. We suspect an interaction between dynamic reconfiguration and snapshot-based resync of that instance. Details and some amateurish analysis are below. We can also upload transaction logs, if need be. We have a 6-instance ensemble running version 3.5.5 with 3 quorum members and 3 observers. One of the observers (Instance 6) saw its db shrink from 3162 znodes down to 10 after that instance restarted: > 2019-10-13T16:44:19.060+ [.zk-monitor-0] Monitor command mntr: zk_version > 3.5.5-afd10a8846b22a34c5a818034bb22e99dd44587b, built on 09/16/2019 18:31 GMT > zk_znode_count 3162 > -- > 2019-10-13T16:48:32.713+ [.zk-monitor-0] Monitor command mntr: zk_version > 3.5.5-afd10a8846b22a34c5a818034bb22e99dd44587b, built on 09/16/2019 18:31 GMT > zk_znode_count 10 Contrast it with Instance 1 that was the leader at the time, and whose znode_count remained stable around 3000: > 2019-10-13T16:44:48.625+ [.zk-monitor-0] Monitor command mntr: zk_version > 3.5.5-afd10a8846b22a34c5a818034bb22e99dd44587b, built on 09/16/2019 18:31 GMT > zk_znode_count 3178 > -- > ... > -- > 2019-10-13T16:48:48.731+ [.zk-monitor-0] Monitor command mntr: zk_version > 3.5.5-afd10a8846b22a34c5a818034bb22e99dd44587b, built on 09/16/2019 18:31 GMT > zk_znode_count 3223 It appears that the problem had happened 30 minutes earlier, when Instance 6 got resynced from the leader via the Snap method, yet skipped creating an on-disk snapshot. The end result was that the in-memory state was fine, but there was only the primordial snapshot.0 on disk, and transaction logs only started after the missing snapshot: $ ls -l version-2 > total 1766 > -rw-r--r-- 1 daautomation daautomation 1 Oct 13 09:14 acceptedEpoch > -rw-r--r-- 1 daautomation daautomation 1 Oct 13 10:12 currentEpoch > -rw-r--r-- 1 daautomation daautomation 2097168 Oct 13 09:44 log.602e0 > -rw-r--r-- 1 daautomation daautomation 1048592 Oct 13 10:09 log.61f1b > -rw-r--r-- 1 daautomation daautomation 4194320 Oct 13 12:16 log.63310 > -rw-r--r-- 1 daautomation daautomation 770 Oct 13 09:14 snapshot.0 So the zk reboot wiped out most of the state. Dynamic reconfig might be relevant here. Instance 6 started as an observer, got removed, and immediately re-added as a participant. Instance 2 went the other way, from participant to observer: > 2019-10-13T16:14:19.323+ ZK reconfig: removing node 6 > 2019-10-13T16:14:19.359+ ZK reconfig: adding > server.6=10.80.209.138:2888:3888:participant;0.0.0.0:2181 > 2019-10-13T16:14:19.399+ ZK reconfig: adding > server.2=10.80.209.131:2888:3888:observer;0.0.0.0:2181 Looking at the logs, Instance 6 started and received a resync snapshot from the leader right before the dynamic reconfig: > 2019-10-13T16:14:19.284+ > [.QuorumPeer[myid=6](plain=/0.0.0.0:2181)(secure=disabled)] Getting a > snapshot from leader 0x602dd > ... > 2019-10-13T16:14:19.401+ > [.QuorumPeer[myid=6](plain=/0.0.0.0:2181)(secure=disabled)] Got zxid > 0x602de expected 0x1 Had it processed the NEWLEADER packet afterwards, it would've persisted the snapshot locally. But there's no NEWLEADER message in the Instance 6 log. Instead, there's a "changes proposed in reconfig" exception, likely a result of the instance getting dynamically removed and re-added as a participant: > 2019-10-13T16:14:19.467+ > [.QuorumPeer[myid=6](plain=/0.0.0.0:2181)(secure=disabled)] Becoming a > non-voting participant > 2019-10-13T16:14:19.467+ > [.QuorumPeer[myid=6](plain=/0.0.0.0:2181)(secure=disabled)] Exception when > observing the leaderjava.lang.Exception: changes proposed in reconfig\n\tat > org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:506)\n\tat > > org.apache.zookeeper.server.quorum.Observer.observeLeader(Observer.java:74)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1258) Perhaps the NEWLEADER packet was still in the socket, but sitting behing INFORMANDACTIVATE, whose exception prevented us from processing NEWLEADER? Also, it may or may not be related, but this area got changed recently as part of https://issues.apache.org/jira/browse/ZOOKEEPER-3104. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ZOOKEEPER-3160) Custom User SSLContext
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Rankin updated ZOOKEEPER-3160: --- Fix Version/s: (was: 3.5.5) > Custom User SSLContext > -- > > Key: ZOOKEEPER-3160 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3160 > Project: ZooKeeper > Issue Type: New Feature > Components: java client >Affects Versions: 3.5.4 >Reporter: Alex Rankin >Priority: Minor > Labels: features, ready-to-commit > > The Zookeeper libraries currently allow you to set up your SSL Context via > system properties such as "zookeeper.ssl.keyStore.location" in the X509Util. > This covers most simple use cases, where users have software keystores on > their harddrive. > There are, however, a few additional scenarios that this doesn't cover. Two > possible ones would be: > # The user has a hardware keystore, loaded in using PKCS11 or something > similar. > # The user has no access to the software keystore, but can retrieve an > already-constructed SSLContext from their container. > For this, I would propose that the X509Util be extended to allow a user to > set a property such as "zookeeper.ssl.client.context" to provide a class > which supplies a custom SSL context. This gives a lot more flexibility to the > ZK client, and allows the user to construct the SSLContext in whatever way > they please (which also future proofs the implementation somewhat). > I've already completed this feature, and will put in a PR soon for it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-3160) Custom User SSLContext
Alex Rankin created ZOOKEEPER-3160: -- Summary: Custom User SSLContext Key: ZOOKEEPER-3160 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3160 Project: ZooKeeper Issue Type: New Feature Components: java client Affects Versions: 3.5.4 Reporter: Alex Rankin Fix For: 3.5.5 The Zookeeper libraries currently allow you to set up your SSL Context via system properties such as "zookeeper.ssl.keyStore.location" in the X509Util. This covers most simple use cases, where users have software keystores on their harddrive. There are, however, a few additional scenarios that this doesn't cover. Two possible ones would be: # The user has a hardware keystore, loaded in using PKCS11 or something similar. # The user has no access to the software keystore, but can retrieve an already-constructed SSLContext from their container. For this, I would propose that the X509Util be extended to allow a user to set a property such as "zookeeper.ssl.client.context" to provide a class which supplies a custom SSL context. This gives a lot more flexibility to the ZK client, and allows the user to construct the SSLContext in whatever way they please (which also future proofs the implementation somewhat). I've already completed this feature, and will put in a PR soon for it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-2950) Add keys for the Zxid from the stat command to check_zookeeper.py
Alex Bame created ZOOKEEPER-2950: Summary: Add keys for the Zxid from the stat command to check_zookeeper.py Key: ZOOKEEPER-2950 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2950 Project: ZooKeeper Issue Type: Improvement Components: scripts Affects Versions: 3.4.11, 3.5.3 Reporter: Alex Bame Priority: Trivial Add keys for the zxid and its component pieces: epoch and transaction counter. These are not reported by the 'mntr' command so they must be obtained from 'stat'. The counter is useful for tracking transaction rates, and epoch is useful for tracking leader churn. zk_zxid - the 64bit zxid from ZK zk_zxid_counter - the lower 32 bits, AKA the counter zk_zxid_epoch - the upper 32 bits, AKA the epoch -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ZOOKEEPER-2649) The ZooKeeper do not write in log session ID in which the client has been authenticated.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Zhou updated ZOOKEEPER-2649: - Component/s: (was: quorum) server > The ZooKeeper do not write in log session ID in which the client has been > authenticated. > > > Key: ZOOKEEPER-2649 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2649 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.4.9, 3.5.2 >Reporter: Alex Zhou >Priority: Trivial > Fix For: 3.4.10, 3.5.3, 3.6.0 > > > The ZooKeeper do not write in log session ID in which the client has been > authenticated. This occurs for digest and for SASL authentications: > bq. 2016-12-09 15:46:34,808 [myid:] - INFO > [SyncThread:0:ZooKeeperServer@673] - Established session 0x158e39a0a960001 > with negotiated timeout 3 for client /0:0:0:0:0:0:0:1:52626 > bq. 2016-12-09 15:46:34,838 [myid:] - INFO > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@118] - > Successfully authenticated client: authenticationID=bob; authorizationID=bob. > bq. 2016-12-09 15:46:34,848 [myid:] - INFO > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@134] - > Setting authorizedID: bob > bq. 2016-12-09 15:46:34,848 [myid:] - INFO > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@1024] - adding > SASL authorization for authorizationID: bob > bq. 2016-12-13 10:52:54,915 [myid:] - INFO > [SyncThread:0:ZooKeeperServer@673] - Established session 0x158f72acaed0001 > with negotiated timeout 3 for client /172.20.97.175:52217 > bq. 2016-12-13 10:52:55,070 [myid:] - INFO > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@118] - > Successfully authenticated client: authenticationID=u...@billab.ru; > authorizationID=u...@billab.ru. > bq. 2016-12-13 10:52:55,075 [myid:] - INFO > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@134] - > Setting authorizedID: u...@billab.ru > bq. 2016-12-13 10:52:55,075 [myid:] - INFO > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@1024] - adding > SASL authorization for authorizationID: u...@billab.ru > bq. 2016-12-19 17:43:01,395 [myid:] - INFO > [SyncThread:0:ZooKeeperServer@673] - Established session 0x158fd72521f > with negotiated timeout 3 for client /172.20.97.175:57633 > bq. 2016-12-19 17:45:53,497 [myid:] - INFO > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@952] - got auth > packet /172.20.97.175:57633 > bq. 2016-12-19 17:45:53,508 [myid:] - INFO > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@986] - auth > success /172.20.97.175:57633 > So, it is difficult to determine which client made changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2649) The ZooKeeper do not write in log session ID in which the client has been authenticated.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15763608#comment-15763608 ] Alex Zhou commented on ZOOKEEPER-2649: -- Hi, I just expect to find session ID in log record about successfull authentication. Without it, I can only find out which session data has been changed and that session was opened for client from some IP. So it would be great: {quote} 2016-12-09 15:46:34,808 [myid:] - INFO [SyncThread:0:ZooKeeperServer@673] - Established session 0x158e39a0a960001 with negotiated timeout 3 for client /0:0:0:0:0:0:0:1:52626 2016-12-09 15:46:34,838 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@118] - Successfully authenticated client: authenticationID=bob; authorizationID=bob. 2016-12-09 15:46:34,848 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@134] - Setting authorizedID: bob 2016-12-09 15:46:34,848 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@1024] - {color:red}Session 0x158e39a0a960001:{color} adding SASL authorization for authorizationID=bob 2016-12-13 10:52:54,915 [myid:] - INFO [SyncThread:0:ZooKeeperServer@673] - Established session 0x158f72acaed0001 with negotiated timeout 3 for client /172.20.97.175:52217 2016-12-13 10:52:55,070 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@118] - Successfully authenticated client: authenticationID=u...@billab.ru; authorizationID=u...@billab.ru. 2016-12-13 10:52:55,075 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@134] - Setting authorizedID: u...@billab.ru 2016-12-13 10:52:55,075 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@1024] - {color:red}Session 0x158f72acaed0001:{color} adding SASL authorization for authorizationID=u...@billab.ru 2016-12-19 17:43:01,395 [myid:] - INFO [SyncThread:0:ZooKeeperServer@673] - Established session 0x158fd72521f with negotiated timeout 3 for client /172.20.97.175:57633 2016-12-19 17:45:53,497 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@952] - {color:red}Session 0x158fd72521f:{color} got auth packet /172.20.97.175:57633 2016-12-19 17:45:53,508 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@986] - {color:red}Session 0x158fd72521f:{color} auth success {color:red}for authorizationID=ann{color}/172.20.97.175:57633 {quote} Apparently it would be also great to see the session ID in the log records about unsuccessful authentication. As well as authenticationID in the records about digest authentications ('auth success'). > The ZooKeeper do not write in log session ID in which the client has been > authenticated. > > > Key: ZOOKEEPER-2649 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2649 > Project: ZooKeeper > Issue Type: Improvement > Components: quorum >Affects Versions: 3.4.9, 3.5.2 >Reporter: Alex Zhou >Priority: Trivial > Fix For: 3.4.10, 3.5.3, 3.6.0 > > > The ZooKeeper do not write in log session ID in which the client has been > authenticated. This occurs for digest and for SASL authentications: > bq. 2016-12-09 15:46:34,808 [myid:] - INFO > [SyncThread:0:ZooKeeperServer@673] - Established session 0x158e39a0a960001 > with negotiated timeout 3 for client /0:0:0:0:0:0:0:1:52626 > bq. 2016-12-09 15:46:34,838 [myid:] - INFO > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@118] - > Successfully authenticated client: authenticationID=bob; authorizationID=bob. > bq. 2016-12-09 15:46:34,848 [myid:] - INFO > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@134] - > Setting authorizedID: bob > bq. 2016-12-09 15:46:34,848 [myid:] - INFO > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@1024] - adding > SASL authorization for authorizationID: bob > bq. 2016-12-13 10:52:54,915 [myid:] - INFO > [SyncThread:0:ZooKeeperServer@673] - Established session 0x158f72acaed0001 > with negotiated timeout 3 for client /172.20.97.175:52217 > bq. 2016-12-13 10:52:55,070 [myid:] - INFO > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@118] - > Successfully authenticated client: authenticationID=u...@billab.ru; > authorizationID=u...@billab.ru. > bq. 2016-12-13 10:52:55,075 [myid:] - INFO > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@134] - > Setting authorizedID: u...@billab.ru > bq. 2016-12-13 10:52:55,075 [myid:] - INFO > [NIOServerCxn.Factory:0.0.0.0/0
[jira] [Created] (ZOOKEEPER-2649) The ZooKeeper do not write in log session ID in which the client has been authenticated.
Alex Zhou created ZOOKEEPER-2649: Summary: The ZooKeeper do not write in log session ID in which the client has been authenticated. Key: ZOOKEEPER-2649 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2649 Project: ZooKeeper Issue Type: Improvement Components: quorum Affects Versions: 3.5.2, 3.4.9 Reporter: Alex Zhou Priority: Trivial Fix For: 3.4.10, 3.5.3, 3.6.0 The ZooKeeper do not write in log session ID in which the client has been authenticated. This occurs for digest and for SASL authentications: bq. 2016-12-09 15:46:34,808 [myid:] - INFO [SyncThread:0:ZooKeeperServer@673] - Established session 0x158e39a0a960001 with negotiated timeout 3 for client /0:0:0:0:0:0:0:1:52626 bq. 2016-12-09 15:46:34,838 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@118] - Successfully authenticated client: authenticationID=bob; authorizationID=bob. bq. 2016-12-09 15:46:34,848 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@134] - Setting authorizedID: bob bq. 2016-12-09 15:46:34,848 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@1024] - adding SASL authorization for authorizationID: bob bq. 2016-12-13 10:52:54,915 [myid:] - INFO [SyncThread:0:ZooKeeperServer@673] - Established session 0x158f72acaed0001 with negotiated timeout 3 for client /172.20.97.175:52217 bq. 2016-12-13 10:52:55,070 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@118] - Successfully authenticated client: authenticationID=u...@billab.ru; authorizationID=u...@billab.ru. bq. 2016-12-13 10:52:55,075 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@134] - Setting authorizedID: u...@billab.ru bq. 2016-12-13 10:52:55,075 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@1024] - adding SASL authorization for authorizationID: u...@billab.ru bq. 2016-12-19 17:43:01,395 [myid:] - INFO [SyncThread:0:ZooKeeperServer@673] - Established session 0x158fd72521f with negotiated timeout 3 for client /172.20.97.175:57633 bq. 2016-12-19 17:45:53,497 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@952] - got auth packet /172.20.97.175:57633 bq. 2016-12-19 17:45:53,508 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@986] - auth success /172.20.97.175:57633 So, it is difficult to determine which client made changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2424) Detect and log possible GC churn in servers.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281520#comment-15281520 ] Alex Brasetvik commented on ZOOKEEPER-2424: --- +1 :-) > Detect and log possible GC churn in servers. > > > Key: ZOOKEEPER-2424 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2424 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Reporter: Chris Nauroth > Labels: newbie > Fix For: 3.5.3 > > > Excessive JVM garbage collection pauses can harm the stability of a ZooKeeper > ensemble. If a stop-the-world GC pause in a server lasts long enough, then > the the node will drop out of the ensemble. If this happens on multiple > nodes simultaneously, then there is a risk of loss of quorum. This issue > proposes to detect long GC pauses, log warnings about them, and expose > metrics about them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2422) Improve health reporting by adding heap stats
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280275#comment-15280275 ] Alex Brasetvik commented on ZOOKEEPER-2422: --- It would be great if we could expose stats on the various memory pools, as well as GC-stats. These will make it easier to pinpoint/predict problems than just how much total heap is left. These can differ between different JVMs/-configurations, however. Sample suggested output of mntr: ``` zk_version 3.6.0-SNAPSHOT--1, built on 10/21/2015 20:34 GMT zk_avg_latency 0 zk_max_latency 0 zk_min_latency 0 zk_packets_received 6 zk_packets_sent 5 zk_num_alive_connections1 zk_outstanding_requests 0 zk_server_state standalone zk_znode_count 7 zk_watch_count 0 zk_ephemerals_count 0 zk_approximate_data_size75 zk_max_memory 3817865216 zk_total_memory 257425408 zk_free_memory 220856512 zk_jvm_young_used_memory36568896 zk_jvm_young_peak_used_memory 36568896 zk_jvm_young_max_memory 1409810432 zk_jvm_survivor_used_memory 0 zk_jvm_survivor_peak_used_memory0 zk_jvm_survivor_max_memory 11010048 zk_jvm_old_used_memory 0 zk_jvm_old_peak_used_memory 0 zk_jvm_old_max_memory 2863136768 zk_jvm_threads_count20 zk_jvm_threads_peak_count 20 zk_jvm_gc_young_count 0 zk_jvm_gc_young_time0 zk_jvm_gc_old_count 0 zk_jvm_gc_old_time 0 zk_jvm_instance_id 9164@alex.local zk_jvm_start_time 1462979784010 zk_open_file_descriptor_count 97 zk_max_file_descriptor_count10240 ``` > Improve health reporting by adding heap stats > - > > Key: ZOOKEEPER-2422 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2422 > Project: ZooKeeper > Issue Type: Improvement >Affects Versions: 3.4.8, 3.5.1, 3.6.0 >Reporter: Sergey Maslyakov >Assignee: Sergey Maslyakov > Fix For: 3.5.2, 3.6.0 > > Attachments: zookeeper-2422-3.4.patch, zookeeper-2422-3.5.patch, > zookeeper-2422-3.6.patch > > > In order to improve remote monitoring of the ZooKeeper instance using tools > like Icinga/NRPE, it is very desirable to expose JVM heap stats via a > light-weight interface. The "mntr" 4lw is a good candidate for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: `winstdint.h` and MSVC
Hey folks. Not sure if someone responded and I just didn't see it because I'm not on the dev@ list, or if there just isn't that much interest in these questions. Can someone please at least confirm that I've not screwed up sending this message somehow? On Thu, Jan 28, 2016 at 3:24 PM, Alex Clemmer <clemmer.alexan...@gmail.com> wrote: > I should note also that I have looked closely at the issue tracker > (e.g., https://issues.apache.org/jira/browse/ZOOKEEPER-1953) and > various related changesets (e.g., > http://svn.apache.org/viewvc?view=revision=1148116) but being > young and raised mostly on git I could not find a discussion of why > this file is the way that it is. So it could well be that I am simply > missing simple guidance like "make sure not to include `stdint.h` in > any project consuming the ZK C client library." > > On Thu, Jan 28, 2016 at 3:18 PM, Alex Clemmer > <clemmer.alexan...@gmail.com> wrote: >> Hey folks. >> >> We (the Apache Mesos project) are building against the ZK C client on >> Windows, with VS 2015 (and MSVC v.1900). When we attempt a vanilla >> build, including both `zookeeper.h` and `stdint.h` causes the compiler >> to complain that we're re-typedef'ing a few types (such as >> `int_fast8_t` with different underlying types) in the file, >> `winstdint.h`. >> >> I have scoured the Internet to see if I'm missing a -D flag somewhere, >> but it does not appear that this is the case. >> >> The comments in this file state that it's meant to provide a >> C9X-compliant version of `stdint.h` for Windows, but for later >> versions of MSVC some of the definitions seem to be redundant or >> different. (For example, on VS 2013 `int_fast16_t` is redefined with a >> different underlying type.) >> >> My question for you all is: am I missing something obvious, or should >> I submit a bug and a patch to resolve this issue for you? >> >> >> -- >> Alex >> >> Theory is the first term in the Taylor series of practice. -- Thomas M >> Cover (1992) > > > > -- > Alex > > Theory is the first term in the Taylor series of practice. -- Thomas M > Cover (1992) -- Alex Theory is the first term in the Taylor series of practice. -- Thomas M Cover (1992)
Re: `winstdint.h` and MSVC
Ah, ok, I've subscribed to dev@ to avoid this mistake in the future. Thanks a lot for the heads up! Let me answer your questions directly, and then step back and address the broader scope of maintaining robust support for modern platforms. We currently have no specific reason to believe that the Mesos code base is suffering bugs from the `stdint.h` warnings, but it is nonetheless worrying to us, because it means that the platform is probably not well-trafficked for our target. That said, in the immediate term, I'm willing to start with the fairly modest goal of figuring out the "best way" to get the C client to integrate with realistic codebases, which I personally think includes getting the C client to compile in the presence of the totally-reasonable-to-include `stdint.h` header. For next steps, I think the best thing to do is to open a ticket to just get this to compile in the presence of `stdint.h`, and then incrementally build out more robust, modern support from there. From there I'm happy to do the leg work to fix these issues. Stepping back: given that we expect this code to run on tens of thousands of more-modern Windows nodes (via Mesos), I think we probably all agree that it would be a good outcome to have a high level of operational predictability in the client. I am guessing we are also in agreement that this probably involves cleaning up the warnings. In the longer term I think this also means exposing more levers to people consuming these libraries -- for example, right now the `.vcxproj` files only build DLLs, but for a lot of reasons we want to allow people to choose, if not actually actually make it static-by-default (for example, DLLs have totally separate heaps in win32, and this presents some interesting issues at scale) . I'm happy to collate my suggestions into a list in the future. Let me know what would be useful for that conversation. I'm happy to make this an extended discussion and co-evolve the codebase as we find issues deploying it at scale. On Fri, Feb 5, 2016 at 3:58 PM, Chris Nauroth <cnaur...@hortonworks.com> wrote: > Hi Alex, > > I replied to this on Monday, 2/1. I'm copy-pasting the same reply below, > and this time I've put your email address on the To: line explicitly. > > > > I looked into this, and I can repro. I think this is a bug, and I don't > think any of the recent Windows fixes addresses it. Although "make sure > you don't include stdint.h" is a viable workaround, it's not best for us > to put artificial constraints on our users. > > Modern versions of Visual Studio do ship a stdint.h header, which negates > the need for the winstdint.h compatibility shim. I think a stdint.h is > available in at least Visual Studio 2010 and later. One potential fix > would be to remove winstdint.h, transition completely to use of stdint.h, > and declare that our supported build tool chain is Visual Studio 2010 and > later. I'm reluctant to do that on the 3.4 maintenance line though on > grounds of backwards-compatibility. Another approach could be conditional > compilation to include either stdint.h or winstdint.h based on choice. I > see a flag named HAVE_STDINT_H mentioned in winconfig.h, so it seems like > someone was thinking of this. However, the flag is unused from what I can > tell, so it wouldn't actually help for a project to define it before > compiling. We'd need to make code changes to support it. > > On a side note, there are numerous other compilation warnings on Windows, > unrelated to stdint.h. > > Alex, do you think you are experiencing a bug from the stdint.h warnings? > I'm trying to decide if we should file a single JIRA for a mass cleanup of > warnings, or if the stdint.h warnings you reported are somehow more > critical than the rest and need tracking in their own separate issue. > > > --Chris Nauroth > > > > > On 2/5/16, 3:53 PM, "Alex Clemmer" <clemmer.alexan...@gmail.com> wrote: > >>Hey folks. Not sure if someone responded and I just didn't see it >>because I'm not on the dev@ list, or if there just isn't that much >>interest in these questions. Can someone please at least confirm that >>I've not screwed up sending this message somehow? >> >>On Thu, Jan 28, 2016 at 3:24 PM, Alex Clemmer >><clemmer.alexan...@gmail.com> wrote: >>> I should note also that I have looked closely at the issue tracker >>> (e.g., https://issues.apache.org/jira/browse/ZOOKEEPER-1953) and >>> various related changesets (e.g., >>> http://svn.apache.org/viewvc?view=revision=1148116) but being >>> young and raised mostly on git I could not find a discussion of why >>> this file is the way that it is. So it could well be that I am simply >>> missing simple guidance like "make sure not
`winstdint.h` and MSVC
Hey folks. We (the Apache Mesos project) are building against the ZK C client on Windows, with VS 2015 (and MSVC v.1900). When we attempt a vanilla build, including both `zookeeper.h` and `stdint.h` causes the compiler to complain that we're re-typedef'ing a few types (such as `int_fast8_t` with different underlying types) in the file, `winstdint.h`. I have scoured the Internet to see if I'm missing a -D flag somewhere, but it does not appear that this is the case. The comments in this file state that it's meant to provide a C9X-compliant version of `stdint.h` for Windows, but for later versions of MSVC some of the definitions seem to be redundant or different. (For example, on VS 2013 `int_fast16_t` is redefined with a different underlying type.) My question for you all is: am I missing something obvious, or should I submit a bug and a patch to resolve this issue for you? -- Alex Theory is the first term in the Taylor series of practice. -- Thomas M Cover (1992)
Re: `winstdint.h` and MSVC
I should note also that I have looked closely at the issue tracker (e.g., https://issues.apache.org/jira/browse/ZOOKEEPER-1953) and various related changesets (e.g., http://svn.apache.org/viewvc?view=revision=1148116) but being young and raised mostly on git I could not find a discussion of why this file is the way that it is. So it could well be that I am simply missing simple guidance like "make sure not to include `stdint.h` in any project consuming the ZK C client library." On Thu, Jan 28, 2016 at 3:18 PM, Alex Clemmer <clemmer.alexan...@gmail.com> wrote: > Hey folks. > > We (the Apache Mesos project) are building against the ZK C client on > Windows, with VS 2015 (and MSVC v.1900). When we attempt a vanilla > build, including both `zookeeper.h` and `stdint.h` causes the compiler > to complain that we're re-typedef'ing a few types (such as > `int_fast8_t` with different underlying types) in the file, > `winstdint.h`. > > I have scoured the Internet to see if I'm missing a -D flag somewhere, > but it does not appear that this is the case. > > The comments in this file state that it's meant to provide a > C9X-compliant version of `stdint.h` for Windows, but for later > versions of MSVC some of the definitions seem to be redundant or > different. (For example, on VS 2013 `int_fast16_t` is redefined with a > different underlying type.) > > My question for you all is: am I missing something obvious, or should > I submit a bug and a patch to resolve this issue for you? > > > -- > Alex > > Theory is the first term in the Taylor series of practice. -- Thomas M > Cover (1992) -- Alex Theory is the first term in the Taylor series of practice. -- Thomas M Cover (1992)
[jira] [Commented] (ZOOKEEPER-2095) Add Systemd startup/conf files
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588467#comment-14588467 ] Alex Elent commented on ZOOKEEPER-2095: --- Not sure if relevant but I was not able to get systemd working with Zookeeper without adding Type=forking This is my final config: {noformat} [Unit] Description=Apache Zookeeper After=network.target [Service] Type=forking User=zookeeper Group=zookeeper SyslogIdentifier=zookeeper Restart=always RestartSec=0s ExecStart=/usr/bin/zookeeper-server start ExecStop=/usr/bin/zookeeper-server stop ExecReload=/usr/bin/zookeeper-server restart [Install] WantedBy=multi-user.target {noformat} Add Systemd startup/conf files -- Key: ZOOKEEPER-2095 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2095 Project: ZooKeeper Issue Type: Improvement Components: contrib Reporter: Guillaume ALAUX Priority: Minor Attachments: ZOOKEEPER-2095.patch As adoption of systemd by distributions grows, it would be nice to have systemd configuration and startup files for Zookeeper in the upstream tree. I would thus like to contribute the following patch which brings the followings systemd files: - {{sysusers.d_zookeeper.conf}}: creates {{zookeeper}} Linux system user to run Zookeeper - {{tmpfiles.d_zookeeper.conf}}: creates temporary {{/var/log/zookeeper}} and {{/var/lib/zookeeper} directories - {{zookeeper.service}}: regular systemd startup _script_ - {{zookeeper@.service}}: systemd startup _script_ for specific use (for instance when Zookeeper is invoked to support some other piece of software – [example for Kafka|http://pkgbuild.com/git/aur-mirror.git/tree/kafka/systemd_kafka.service#n3], [example for Storm|http://pkgbuild.com/git/aur-mirror.git/tree/storm/systemd_storm-nimbus.service#n3]) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ZOOKEEPER-1671) Remove dependency on log4j 1.2.15
Alex Blewitt created ZOOKEEPER-1671: --- Summary: Remove dependency on log4j 1.2.15 Key: ZOOKEEPER-1671 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1671 Project: ZooKeeper Issue Type: Bug Reporter: Alex Blewitt Priority: Minor The zookeeper dependency 3.4.5 (latest) depends explicitly on log4j 1.2.15, which has dependencies on com.sun.jmx which can't be resolved from Maven central. Please change the dependency to either 1.2.16, which declares these as optional, or 1.2.14, which doesn't have them at all. http://search.maven.org/remotecontent?filepath=org/apache/zookeeper/zookeeper/3.4.5/zookeeper-3.4.5.pom dependency groupIdlog4j/groupId artifactIdlog4j/artifactId version1.2.15/version scopecompile/scope /dependency This should be modified to 1.2.14 or 1.2.16 as above. It's also not clear why this is used at all; it would be better for ZooKeeper to depend only on slf4j-api, and let users determine what the right slf4j logging implementation is. With this approach, it's not possible to swap out log4j for something else. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1634) A new feature proposal to ZooKeeper: authentication enforcement
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13580124#comment-13580124 ] Alex Guan commented on ZOOKEEPER-1634: -- I believe the idea is to reuse the the existing authentication framework which is extensible and much more flexible than just providing ssl based authentication. A new feature proposal to ZooKeeper: authentication enforcement --- Key: ZOOKEEPER-1634 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1634 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Reporter: Jaewoong Choi Fix For: 3.5.0 Attachments: zookeeper_3.4.5_patch_for_authentication_enforcement.patch Original Estimate: 72h Remaining Estimate: 72h Up to the version of 3.4.5, ZooKeeperServer doesn't force the authentication if the client doesn't give any auth-info through ZooKeeper#addAuthInfo method invocation. Hence, every znode should have at least one ACL assigned otherwise any unauthenticated client can do anything on it. The current authentication/authorization mechanism of ZooKeeper described above has several points at issue: 1. At security standpoint, a maleficent client can access a znode which doesn't have any proper authorization access control set. 2. At runtime performance standpoint, authorization for every znode to every operation is unnecessarily but always evaluated against the client who bypassed the authentication phase. In other words, the current mechanism doesn't address a certain requirement at below: We want to protect a ZK server by enforcing a simple authentication to every client no matter which znode it is trying to access. Every connection (or operation) from the client won't be established but rejected if it doesn't come with a valid authentication information. As we don't have any other distinction between znodes in term of authorization, we don't want any ACLs on any znode. To address the issues mentioned above, we propose a feature called authentication enforcement to the ZK source. The idea is roughly but clearly described in a form of patch in the attached file (zookeeper_3.4.5_patch_for_authentication_enforcement.patch): which makes ZooKeeperServer enforce the authentication with the given 2 configurations: authenticationEnforced (boolean) and enforcedAuthenticationScheme (string) against every operation coming through ZooKeeperServer#processPacket method except for OpCode.auth operation. The repository base of the patch is http://svn.apache.org/repos/asf/zookeeper/tags/release-3.4.5/; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1263) fix handling of min/max session timeout value initialization
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Baird updated ZOOKEEPER-1263: -- Description: This task rolls up the changes in subtasks for easier commit. (I'm about to submit the rolled up patch) (was: This task rolls up the changes in subtasks for easier commit. (I'm about to submit the rolled up patch) bla) fix handling of min/max session timeout value initialization Key: ZOOKEEPER-1263 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1263 Project: ZooKeeper Issue Type: Task Components: server Reporter: Patrick Hunt Assignee: Rakesh R Fix For: 3.5.0 Attachments: ZOOKEEPER-1263.patch, ZOOKEEPER-1263.patch This task rolls up the changes in subtasks for easier commit. (I'm about to submit the rolled up patch) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (ZOOKEEPER-1465) Cluster availability following new leader election takes a long time with large datasets - is correlated to dataset size
Alex Gvozdenovic created ZOOKEEPER-1465: --- Summary: Cluster availability following new leader election takes a long time with large datasets - is correlated to dataset size Key: ZOOKEEPER-1465 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1465 Project: ZooKeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.4.3 Reporter: Alex Gvozdenovic Fix For: 3.4.4 When re-electing a new leader of a cluster, it takes a long time for the cluster to become available if the dataset is large Test Data -- 650mb snapshot size 20k nodes of varied size 3 member cluster On 3.4.x branch (http://svn.apache.org/repos/asf/zookeeper/branches/branch-3.4?r=1244779) -- Takes 3-4 minutes to bring up a cluster from cold Takes 40-50 secs to recover from a leader failure Takes 10 secs for a new follower to join the cluster Using the 3.3.5 release on the same hardware with the same dataset - Takes 10-20 secs to bring up a cluster from cold Takes 10 secs to recover from a leader failure Takes 10 secs for a new follower to join the cluster I can see from the logs in 3.4.x that once a new leader is elected, it pushes a new snapshot to each of the followers who need to save it before they ack the leader who can then mark the cluster as available. The kit being used is a low spec vm so the times taken are not relevant per se - more the fact that a snapshot is always sent even through there is no difference between the persisted state on each peer. No data is being added to the cluster while the peers are being restarted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: what happen in case of zxid overflow?
Mahadev, Question about leader election is that who is elected when composing *the first* quorum; I found that server myId is used in updateProposal() methd of FastLeaderElection.lookForLeader(). So, I guess that at the first time a zkserver who has middle myId becomes a leader. (I had already read attached paper) And, I agree with your comments about both zxid and cversion; I had same thought like your comments and just want to make it confirmed. Thanks, Alex 2011/10/14 Mahadev Konar maha...@hortonworks.com Alex, The zxid is a long and the likelihood of it overflowing is pretty low. The cversion though is an int. I think we had a jira to upgrade to long, but its not been fixed yet. Though being an int is not that terrible since you'd have to add/delete billion children to get it to overflow, which probably is highly unlikely. About leader election its probably much more detailed than just the id. You can find more details in : http://research.yahoo.com/files/ZooKeeper.pdf thanks mahadev On Thu, Oct 13, 2011 at 6:27 PM, 박영근(Alex) alex.p...@nexr.com wrote: Hi, All Is there any defense logic for both zxid and znode Cversion overflow? And, does a server that has middle myId among serverIds becomes a leader at initial leader election right? Thanks Alex
what happen in case of zxid overflow?
Hi, All Is there any defense logic for both zxid and znode Cversion overflow? And, does a server that has middle myId among serverIds becomes a leader at initial leader election right? Thanks Alex
Creating a znode with SEQUENTIAL_EPHEMERAL mode becomes corrupt in case of unstable network
Hi, All I met a problem in creating a znode with SEQUENTIAL_EPHEMERAL mode under unstable network condition. While a client did not receive a message that a sequential node was created, the ensemble has the znode, which is checked at zookeeper dashboard( https://github.com/phunt/zookeeper_dashboard). If the client receives a DISCONNECTED event, it tries to reconnect. Session timeout is 30 seconds. Unstable network condition is made as the following: The grinder agent sends a request of creating a znode of CreateMode. SEQUENTIAL_EPHEMERAL. ZK ensemble has three servers. Each NIC of server is down and up repeatedly; NIC of server1 become down every one minute and sleeps for 9 seconds, then up NIC of server2 become down every 2 minute and sleeps for 9 seconds, then up NIC of server3 become down every 3 minute and sleeps for 9 seconds, then up Is there any idea or related issue? Thanks in advance. Alex
Re: Creating a znode with SEQUENTIAL_EPHEMERAL mode becomes corrupt in case of unstable network
Ted, Camille Thanks for your reply. The property that enables the creation of a znode with SEQUENTIAL_EPHEMERAL mode is used in ReadWriteLock running on our analytics platform. Hang has been caused by this problem so that we should search any other solution. Anyway, I will check out related issues. Thanks, Alex 2011/9/21 Ted Dunning ted.dunn...@gmail.com If you cannot tolerate this sort of situation, then the only solution is typically to avoid sequential ephemerals. The problem is that in the presence of a flaky network you cannot always tell if a failed create actually created the znode in question. This is because the network may have failed after the create succeeded, but before you got the result. In that case, since this is a sequential ephemeral, you can't know if your file got created because you don't even know the name. Moreover, scanning doesn't help because if you could scan, you probably could have used a fixed unique name in the first place. There is a very long standing proposed (nearly complete) solution for this that requires some difficult coding. See https://issues.apache.org/jira/browse/ZOOKEEPER-22 2011/9/21 Fournier, Camille F. camille.fourn...@gs.com This is expected. In cases where the network becomes unstable, it is the responsibility of the client writer to handle disconnected events appropriately and check to verify whether nodes they tried to write around the time of these events did or did not succeed. It makes writing a Generic client for ZK very difficult (search the mailing list for zkclient and you'll read a bunch of convos around this topic). Fortunately, many things that rely on EPHEMERAL_SEQUENTIAL nodes can tolerate some duplication of data, so often it's not a huge problem. C -Original Message- From: 박영근(Alex) [mailto:alex.p...@nexr.com] Sent: Wednesday, September 21, 2011 9:16 AM To: dev@zookeeper.apache.org Cc: u...@zookeeper.apache.org Subject: Creating a znode with SEQUENTIAL_EPHEMERAL mode becomes corrupt in case of unstable network Hi, All I met a problem in creating a znode with SEQUENTIAL_EPHEMERAL mode under unstable network condition. While a client did not receive a message that a sequential node was created, the ensemble has the znode, which is checked at zookeeper dashboard( https://github.com/phunt/zookeeper_dashboard). If the client receives a DISCONNECTED event, it tries to reconnect. Session timeout is 30 seconds. Unstable network condition is made as the following: The grinder agent sends a request of creating a znode of CreateMode. SEQUENTIAL_EPHEMERAL. ZK ensemble has three servers. Each NIC of server is down and up repeatedly; NIC of server1 become down every one minute and sleeps for 9 seconds, then up NIC of server2 become down every 2 minute and sleeps for 9 seconds, then up NIC of server3 become down every 3 minute and sleeps for 9 seconds, then up Is there any idea or related issue? Thanks in advance. Alex
Re: NodeExistsException when creating a znode with sequential and ephemeral mode
Camille We applied the patch (ZOOKEEPER-1046-for333) to our SUT. There was no error. Thanks alex 2011년 8월 30일 오전 11:19, 박영근(Alex) alex.p...@nexr.com님의 말: We used 3.3.3. We will check out the latest code. Thanks Camille. Alex 2011/8/30 Camille Fournier cami...@apache.org More specifically, we fixed this for the upcoming release: https://issues.apache.org/jira/browse/ZOOKEEPER-1046 You can try checking out the latest code and building it, should fix your error. I believe 3.3.4 will be released in a week or two. c On Mon, Aug 29, 2011 at 9:56 PM, Camille Fournier cami...@apache.org wrote: What version of ZK were you using? On Mon, Aug 29, 2011 at 9:50 PM, 박영근(Alex) alex.p...@nexr.com wrote: Hi, all I met a problem of NodeExistsException when creating a znode with sequential and ephemeral mode. the number of total test was 6442314 and 797 errors had occurred. The related log message is as in the following: 2011-08-27 16:26:17,559 - INFO [ProcessThread:-1:PrepRequestProcessor@407][] - Got user-level KeeperException when processing sessionid:0x2320911802a0002 type:create cxid:0x1246d7 zxid:0xfffe txntype:unknown reqpath:n/a Error Path:/NexR/MasteElection/__rwLock/readLock-lssm07-0005967078 Error:KeeperErrorCode = NodeExists for /NexR/MasteElection/__rwLock/readLock-lssm07-0005967078 The sequential number would be created by increasing parent's Cversion in the PrepRequestProcess. So, I guess that this problem was caused by inconsistency of parent znode. Our test scenario is very aggressive: The grinder agent sends a request of creating a znode of CreateMode. SEQUENTIAL_EPHEMERAL. three number of servers compose ensemble. each NIC of server is down and up repeatedly; NIC of server1 become down every one minute and sleeping for 9 seconds, then up NIC of server2 become down every 2 minute and sleeping for 9 seconds, then up NIC of server3 become down every 3 minute and sleeping for 9 seconds, then up while the probability of error occurrence is 0.0001 as mentioned above, if the ZooKeeper cannot guarantee the consistency, it is a fatal. Is there any idea or related issue? thanks in advance. alex.
NodeExistsException when creating a znode with sequential and ephemeral mode
Hi, all I met a problem of NodeExistsException when creating a znode with sequential and ephemeral mode. the number of total test was 6442314 and 797 errors had occurred. The related log message is as in the following: 2011-08-27 16:26:17,559 - INFO [ProcessThread:-1:PrepRequestProcessor@407][] - Got user-level KeeperException when processing sessionid:0x2320911802a0002 type:create cxid:0x1246d7 zxid:0xfffe txntype:unknown reqpath:n/a Error Path:/NexR/MasteElection/__rwLock/readLock-lssm07-0005967078 Error:KeeperErrorCode = NodeExists for /NexR/MasteElection/__rwLock/readLock-lssm07-0005967078 The sequential number would be created by increasing parent's Cversion in the PrepRequestProcess. So, I guess that this problem was caused by inconsistency of parent znode. Our test scenario is very aggressive: The grinder agent sends a request of creating a znode of CreateMode. SEQUENTIAL_EPHEMERAL. three number of servers compose ensemble. each NIC of server is down and up repeatedly; NIC of server1 become down every one minute and sleeping for 9 seconds, then up NIC of server2 become down every 2 minute and sleeping for 9 seconds, then up NIC of server3 become down every 3 minute and sleeping for 9 seconds, then up while the probability of error occurrence is 0.0001 as mentioned above, if the ZooKeeper cannot guarantee the consistency, it is a fatal. Is there any idea or related issue? thanks in advance. alex.
Re: NodeExistsException when creating a znode with sequential and ephemeral mode
We used 3.3.3. We will check out the latest code. Thanks Camille. Alex 2011/8/30 Camille Fournier cami...@apache.org More specifically, we fixed this for the upcoming release: https://issues.apache.org/jira/browse/ZOOKEEPER-1046 You can try checking out the latest code and building it, should fix your error. I believe 3.3.4 will be released in a week or two. c On Mon, Aug 29, 2011 at 9:56 PM, Camille Fournier cami...@apache.org wrote: What version of ZK were you using? On Mon, Aug 29, 2011 at 9:50 PM, 박영근(Alex) alex.p...@nexr.com wrote: Hi, all I met a problem of NodeExistsException when creating a znode with sequential and ephemeral mode. the number of total test was 6442314 and 797 errors had occurred. The related log message is as in the following: 2011-08-27 16:26:17,559 - INFO [ProcessThread:-1:PrepRequestProcessor@407][] - Got user-level KeeperException when processing sessionid:0x2320911802a0002 type:create cxid:0x1246d7 zxid:0xfffe txntype:unknown reqpath:n/a Error Path:/NexR/MasteElection/__rwLock/readLock-lssm07-0005967078 Error:KeeperErrorCode = NodeExists for /NexR/MasteElection/__rwLock/readLock-lssm07-0005967078 The sequential number would be created by increasing parent's Cversion in the PrepRequestProcess. So, I guess that this problem was caused by inconsistency of parent znode. Our test scenario is very aggressive: The grinder agent sends a request of creating a znode of CreateMode. SEQUENTIAL_EPHEMERAL. three number of servers compose ensemble. each NIC of server is down and up repeatedly; NIC of server1 become down every one minute and sleeping for 9 seconds, then up NIC of server2 become down every 2 minute and sleeping for 9 seconds, then up NIC of server3 become down every 3 minute and sleeping for 9 seconds, then up while the probability of error occurrence is 0.0001 as mentioned above, if the ZooKeeper cannot guarantee the consistency, it is a fatal. Is there any idea or related issue? thanks in advance. alex.
[jira] Created: (ZOOKEEPER-996) ZkClient: stat on non-existing node causes NPE
ZkClient: stat on non-existing node causes NPE -- Key: ZOOKEEPER-996 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-996 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.3.2 Environment: CentOS release 5.5 (Final) Reporter: Alex Priority: Trivial stat on non-existing node causes NPE. client quit stat /aa Exception in thread main java.lang.NullPointerException at org.apache.zookeeper.ZooKeeperMain.printStat(ZooKeeperMain.java:130) at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:722) at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:581) at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:353) at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:311) at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:270) -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (ZOOKEEPER-997) ZkClient ignores command if there are any space in front of it
ZkClient ignores command if there are any space in front of it -- Key: ZOOKEEPER-997 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-997 Project: ZooKeeper Issue Type: Improvement Components: java client Affects Versions: 3.3.2 Environment: CentOS release 5.5 (Final) Reporter: Alex Priority: Trivial ZkClient ignores command if there are any space in front of it. For example: ls / causes following output (note space in front of ls) ZooKeeper -server host:port cmd args connect host:port get path [watch] ls path [watch] ... -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira