[jira] [Commented] (ZOOKEEPER-1090) Race condition while taking snapshot can lead to not restoring data tree correctly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049626#comment-13049626 ] Mahadev konar commented on ZOOKEEPER-1090: -- Vishal/Camille, Should this be a target for 3.4 release? Race condition while taking snapshot can lead to not restoring data tree correctly -- Key: ZOOKEEPER-1090 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1090 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.3.3 Reporter: Vishal K Priority: Critical Labels: persistence, server, snapshot Fix For: 3.4.0 I think I have found a bug in the snapshot mechanism. The problem occurs because dt.lastProcessedZxid is not synchronized (or rather set before the data tree is modified): FileTxnSnapLog: {code} public void save(DataTree dataTree, ConcurrentHashMapLong, Integer sessionsWithTimeouts) throws IOException { long lastZxid = dataTree.lastProcessedZxid; LOG.info(Snapshotting: + Long.toHexString(lastZxid)); File snapshot=new File( snapDir, Util.makeSnapshotName(lastZxid)); snapLog.serialize(dataTree, sessionsWithTimeouts, snapshot); === the Datatree may not have the modification for lastProcessedZxid } {code} DataTree: {code} public ProcessTxnResult processTxn(TxnHeader header, Record txn) { ProcessTxnResult rc = new ProcessTxnResult(); String debug = ; try { rc.clientId = header.getClientId(); rc.cxid = header.getCxid(); rc.zxid = header.getZxid(); rc.type = header.getType(); rc.err = 0; if (rc.zxid lastProcessedZxid) { lastProcessedZxid = rc.zxid; } [...modify data tree...] } {code} The lastProcessedZxid must be set after the modification is done. As a result, if server crashes after taking the snapshot (and the snapshot does not contain change corresponding to lastProcessedZxid) restore will not restore the data tree correctly: {code} public long restore(DataTree dt, MapLong, Integer sessions, PlayBackListener listener) throws IOException { snapLog.deserialize(dt, sessions); FileTxnLog txnLog = new FileTxnLog(dataDir); TxnIterator itr = txnLog.read(dt.lastProcessedZxid+1); === Assumes lastProcessedZxid is deserialized } {code} I have had offline discussion with Ben and Camille on this. I will be posting the discussion shortly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-707) c client close can crash with cptr null
[ https://issues.apache.org/jira/browse/ZOOKEEPER-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-707: Fix Version/s: (was: 3.4.0) 3.5.0 not a blocker. Moving it out of 3.4 release. c client close can crash with cptr null --- Key: ZOOKEEPER-707 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-707 Project: ZooKeeper Issue Type: Bug Components: c client Affects Versions: 3.3.0 Reporter: Patrick Hunt Assignee: Mahadev konar Priority: Critical Fix For: 3.5.0 saw this in the zktest_mt at the end of 3.3.0, seems unlikely to happen though as it only failed after running the test 10-15 times. Zookeeper_simpleSystem::testAuth ZooKeeper server started : elapsed 26011 : OK Zookeeper_simpleSystem::testHangingClientzktest-mt: src/zookeeper.c:1950: zookeeper_process: Assertion `cptr' failed. Aborted -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-545) investigate use of realtime gc as the recommened default for server vm
[ https://issues.apache.org/jira/browse/ZOOKEEPER-545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-545: Fix Version/s: (was: 3.4.0) 3.5.0 not a blocker. Moving it out of 3.4 release. investigate use of realtime gc as the recommened default for server vm -- Key: ZOOKEEPER-545 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-545 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Priority: Critical Fix For: 3.5.0 We currently don't recommend that ppl use the realtime gc when running the server, we probably should. Before we do so we need to verify that it works. We should make it the default for all our tests. concurrent vs g2 or whatever it's called (new in 1.6_15 or something?) Update all scripts to specify this option update documentation to include this option and add section in the dev/ops docs detailing it's benefits (in particular latency effects of gc) Also, -server option? any benefit for us to recommend this as well? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-675) LETest thread fails to join
[ https://issues.apache.org/jira/browse/ZOOKEEPER-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-675: Fix Version/s: (was: 3.4.0) 3.5.0 not a blocker. Moving it out of 3.4 release. LETest thread fails to join --- Key: ZOOKEEPER-675 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-675 Project: ZooKeeper Issue Type: Bug Components: leaderElection Reporter: Flavio Junqueira Assignee: Henry Robinson Priority: Critical Fix For: 3.5.0 Attachments: TEST-org.apache.zookeeper.test.LETest.txt After applying the patch of ZOOKEEPER-569, I observed a failure of LETest. From a cursory inspection of the log, I can tell that a leader is being elected, but some thread is not joining. At this point I'm not sure if this is a problem with the leader election implementation or the test itself. Just to be clear, the patch of ZOOKEEPER-569 solved a real issue, but it seems that there is yet another problem with LETest. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-697) TestQuotaQuorum is failing on Hudson
[ https://issues.apache.org/jira/browse/ZOOKEEPER-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-697: Fix Version/s: (was: 3.4.0) 3.5.0 not a blocker. Moving it out of 3.4 release. TestQuotaQuorum is failing on Hudson Key: ZOOKEEPER-697 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-697 Project: ZooKeeper Issue Type: Bug Reporter: Mahadev konar Assignee: Mahadev konar Priority: Critical Fix For: 3.5.0 The hudson test build failed http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/729/testReport/junit/org.apache.zookeeper.test/QuorumQuotaTest/testQuotaWithQuorum/ -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-670) zkpython leading to segfault on zookeeper server restart
[ https://issues.apache.org/jira/browse/ZOOKEEPER-670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-670: Fix Version/s: (was: 3.4.0) 3.5.0 not a blocker. Moving it out of 3.4 release. zkpython leading to segfault on zookeeper server restart Key: ZOOKEEPER-670 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-670 Project: ZooKeeper Issue Type: Bug Components: contrib-bindings Affects Versions: 3.2.1, 3.2.2 Environment: CentOS w/ Python 2.4 Reporter: Lei Zhang Assignee: Henry Robinson Priority: Critical Fix For: 3.5.0 Attachments: voyager.patch, zk.py Zookeeper client using zkpython segfaults on zookeeper server restart. It is reliably reproducible using the attached script zk.py. I'm able to stop segfault using the attached patch voyager.patch, but zkpython seems to have deeper issue on its use of watcher_dispatch - on zookeeper server restart, I see up to 6 invocation of watcher_dispatch while my script is simply sleeping in the main thread. This can't be right. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-805) four letter words fail with latest ubuntu nc.openbsd
[ https://issues.apache.org/jira/browse/ZOOKEEPER-805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-805: Fix Version/s: (was: 3.4.0) 3.5.0 not a blocker. Moving it out of 3.4 release. four letter words fail with latest ubuntu nc.openbsd Key: ZOOKEEPER-805 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-805 Project: ZooKeeper Issue Type: Bug Components: documentation, server Affects Versions: 3.3.1, 3.4.0 Reporter: Patrick Hunt Priority: Critical Fix For: 3.5.0 In both 3.3 branch and trunk echo stat|nc localhost 2181 fails against the ZK server on Ubuntu Lucid Lynx. I noticed this after upgrading to lucid lynx - which is now shipping openbsd nc as the default: OpenBSD netcat (Debian patchlevel 1.89-3ubuntu2) vs nc traditional [v1.10-38] which works fine. Not sure if this is a bug in us or nc.openbsd, but it's currently not working for me. Ugh. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-517) NIO factory fails to close connections when the number of file handles run out.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-517: Fix Version/s: (was: 3.4.0) 3.5.0 not a blocker. Moving it out of 3.4 release. NIO factory fails to close connections when the number of file handles run out. --- Key: ZOOKEEPER-517 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-517 Project: ZooKeeper Issue Type: Bug Components: server Reporter: Mahadev konar Assignee: Benjamin Reed Priority: Critical Fix For: 3.5.0 The code in NIO factory is such that if we fail to accept a connection due to some reasons (too many file handles maybe one of them) we do not close the connections that are in CLOSE_WAIT. We need to call an explicit close on these sockets and then close them. One of the solutions might be to move doIO before accpet so that we can still close connection even if we cannot accept connections. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-851) ZK lets any node to become an observer
[ https://issues.apache.org/jira/browse/ZOOKEEPER-851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-851: Fix Version/s: (was: 3.4.0) 3.5.0 not a blocker. Moving it out of 3.4 release. ZK lets any node to become an observer -- Key: ZOOKEEPER-851 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-851 Project: ZooKeeper Issue Type: Bug Components: quorum, server Affects Versions: 3.3.1 Reporter: Vishal K Priority: Critical Fix For: 3.5.0 I had a 3 node cluster running. The zoo.cfg on each contained 3 entries as show below: tickTime=2000 dataDir=/var/zookeeper clientPort=2181 initLimit=5 syncLimit=2 server.0=10.150.27.61:2888:3888 server.1=10.150.27.62:2888:3888 server.2=10.150.27.63:2888:3888 I wanted to add another node to the cluster. In fourth node's zoo.cfg, I created another entry for that node and started zk server. The zoo.cfg on the first 3 nodes was left unchanged. The fourth node was able to join the cluster even though the 3 nodes had no idea about the fourth node. zoo.cfg on fourth node: tickTime=2000 dataDir=/var/zookeeper clientPort=2181 initLimit=5 syncLimit=2 server.0=10.150.27.61:2888:3888 server.1=10.150.27.62:2888:3888 server.2=10.150.27.63:2888:3888 server.3=10.17.117.71:2888:3888 It looks like 10.17.117.71 is becoming an observer in this case. I was expecting that the leader will reject 10.17.117.71. # telnet 10.17.117.71 2181 Trying 10.17.117.71... Connected to 10.17.117.71. Escape character is '^]'. stat Zookeeper version: 3.3.0--1, built on 04/02/2010 22:40 GMT Clients: /10.17.117.71:37297[1](queued=0,recved=1,sent=0) Latency min/avg/max: 0/0/0 Received: 3 Sent: 2 Outstanding: 0 Zxid: 0x20065 Mode: follower Node count: 288 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-936) zkpython is leaking ACL_vector
[ https://issues.apache.org/jira/browse/ZOOKEEPER-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-936: Fix Version/s: (was: 3.4.0) 3.5.0 not a blocker. Moving it out of 3.4 release. zkpython is leaking ACL_vector -- Key: ZOOKEEPER-936 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-936 Project: ZooKeeper Issue Type: Bug Components: contrib-bindings Reporter: Gustavo Niemeyer Priority: Critical Fix For: 3.5.0 It looks like there are no calls to deallocate_ACL_vector() within zookeeper.c in the zkpython binding, which means that (at least) the result of zoo_get_acl() must be leaking. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-955) Use Atomic(Integer|Long) for (Z)Xid
[ https://issues.apache.org/jira/browse/ZOOKEEPER-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-955: Fix Version/s: (was: 3.4.0) 3.5.0 not a blocker. Moving it out of 3.4 release. Use Atomic(Integer|Long) for (Z)Xid --- Key: ZOOKEEPER-955 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-955 Project: ZooKeeper Issue Type: Improvement Components: java client, server Reporter: Thomas Koch Assignee: Thomas Koch Priority: Trivial Fix For: 3.5.0 Attachments: ZOOKEEPER-955.patch As I've read last weekend in the fantastic book Clean Code, it'd be much faster to use AtomicInteger or AtomicLong instead of synchronization blocks around each access to an int or long. The key difference is, that a synchronization block will in any case acquire and release a lock. The atomic classes use optimistic locking, a CPU operation that only changes a value if it still has not changed since the last read. In most cases the value has not changed since the last visit so the operation is just as fast as a normal operation. If it had changed, then we read again and try to change again. [1] Clean Code: A Handbook of Agile Software Craftsmanship (Robert C. Martin) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-740) zkpython leading to segfault on zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049639#comment-13049639 ] Mahadev konar commented on ZOOKEEPER-740: - Any update on this? Should we try and get this to 3.4 release? zkpython leading to segfault on zookeeper - Key: ZOOKEEPER-740 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-740 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.3.0 Reporter: Federico Assignee: Henry Robinson Priority: Critical Fix For: 3.4.0 Attachments: ZOOKEEPER-740.patch The program that we are implementing uses the python binding for zookeeper but sometimes it crash with segfault; here is the bt from gdb: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0xad244b70 (LWP 28216)] 0x080611d5 in PyObject_Call (func=0x862fab0, arg=0x8837194, kw=0x0) at ../Objects/abstract.c:2488 2488../Objects/abstract.c: No such file or directory. in ../Objects/abstract.c (gdb) bt #0 0x080611d5 in PyObject_Call (func=0x862fab0, arg=0x8837194, kw=0x0) at ../Objects/abstract.c:2488 #1 0x080d6ef2 in PyEval_CallObjectWithKeywords (func=0x862fab0, arg=0x8837194, kw=0x0) at ../Python/ceval.c:3575 #2 0x080612a0 in PyObject_CallObject (o=0x862fab0, a=0x8837194) at ../Objects/abstract.c:2480 #3 0x0047af42 in watcher_dispatch (zzh=0x86174e0, type=-1, state=1, path=0x86337c8 , context=0x8588660) at src/c/zookeeper.c:314 #4 0x00496559 in do_foreach_watcher (zh=0x86174e0, type=-1, state=1, path=0x86337c8 , list=0xa5354140) at src/zk_hashtable.c:275 #5 deliverWatchers (zh=0x86174e0, type=-1, state=1, path=0x86337c8 , list=0xa5354140) at src/zk_hashtable.c:317 #6 0x0048ae3c in process_completions (zh=0x86174e0) at src/zookeeper.c:1766 #7 0x0049706b in do_completion (v=0x86174e0) at src/mt_adaptor.c:333 #8 0x0013380e in start_thread () from /lib/tls/i686/cmov/libpthread.so.0 #9 0x002578de in clone () from /lib/tls/i686/cmov/libc.so.6 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-973) bind() could fail on Leader because it does not setReuseAddress on its ServerSocket
[ https://issues.apache.org/jira/browse/ZOOKEEPER-973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-973: Fix Version/s: (was: 3.4.0) 3.5.0 not a blocker. Moving it out of 3.4 release. bind() could fail on Leader because it does not setReuseAddress on its ServerSocket Key: ZOOKEEPER-973 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-973 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.3.2 Reporter: Vishal K Priority: Trivial Fix For: 3.5.0 setReuseAddress(true) should be used below. Leader(QuorumPeer self,LeaderZooKeeperServer zk) throws IOException { this.self = self; try { ss = new ServerSocket(self.getQuorumAddress().getPort()); } catch (BindException e) { LOG.error(Couldn't bind to port + self.getQuorumAddress().getPort(), e); throw e; } this.zk=zk; } -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-233) Create a slimer jar for clients to reduce thier disk footprint.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-233: Fix Version/s: (was: 3.4.0) 3.5.0 not a blocker. Moving it out of 3.4 release. Create a slimer jar for clients to reduce thier disk footprint. --- Key: ZOOKEEPER-233 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-233 Project: ZooKeeper Issue Type: New Feature Components: build, java client Reporter: Hiram Chirino Priority: Trivial Fix For: 3.5.0 Patrick request I open up this in issue in this [email thread|http://n2.nabble.com/ActiveMQ-is-now-using-ZooKeeper-td1573272.html] -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-277) Define PATH_SEPARATOR
[ https://issues.apache.org/jira/browse/ZOOKEEPER-277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-277: Fix Version/s: (was: 3.4.0) 3.5.0 not a blocker. Moving it out of 3.4 release. Define PATH_SEPARATOR - Key: ZOOKEEPER-277 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-277 Project: ZooKeeper Issue Type: Improvement Components: c client, documentation, java client, server, tests Reporter: Nitay Joffe Priority: Trivial Fix For: 3.5.0 We should define a constant for PATH_SEPARATOR = / and use that throughout the code rather than the hardcoded /. Users can be told to use this constant to be safe in case of future changes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-860) Add alternative search-provider to ZK site
[ https://issues.apache.org/jira/browse/ZOOKEEPER-860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-860: Fix Version/s: (was: 3.4.0) 3.5.0 not a blocker. Moving it out of 3.4 release. Add alternative search-provider to ZK site -- Key: ZOOKEEPER-860 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-860 Project: ZooKeeper Issue Type: Improvement Components: documentation Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Fix For: 3.5.0 Attachments: ZOOKEEPER-860.patch Use search-hadoop.com service to make available search in ZK sources, MLs, wiki, etc. This was initially proposed on user mailing list (http://search-hadoop.com/m/sTZ4Y1BVKWg1). The search service was already added in site's skin (common for all Hadoop related projects) before (as a part of [AVRO-626|https://issues.apache.org/jira/browse/AVRO-626]) so this issue is about enabling it for ZK. The ultimate goal is to use it at all Hadoop's sub-projects' sites. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-522) zookeeper client should throttle if its not able to connect to any of the servers.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-522: Fix Version/s: (was: 3.4.0) 3.5.0 not a blocker. Moving it out of 3.4 release. zookeeper client should throttle if its not able to connect to any of the servers. -- Key: ZOOKEEPER-522 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-522 Project: ZooKeeper Issue Type: Improvement Affects Versions: 3.2.0 Reporter: Mahadev konar Fix For: 3.5.0 Currently the zookeeper client library keeps connecting to servers if all of them are unreachable. It will go through the list time and again and try to connect. Sometimes, this might cause problems like too many clients retrying connect to servers (and there might be something wrong/delay with servers) wherein the clients will give up and will try reconnecting to other servers. This causes a huge churn in client connections sometimes leading to the zookeeper server running out of file handles. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1005) Zookeeper servers fail to elect a leader succesfully.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-1005: - Fix Version/s: (was: 3.4.0) 3.5.0 not a blocker. Moving it out of 3.4 release. Zookeeper servers fail to elect a leader succesfully. - Key: ZOOKEEPER-1005 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1005 Project: ZooKeeper Issue Type: Bug Components: quorum Affects Versions: 3.2.2 Environment: zookeeper-3.2.2; debian Reporter: Alexandre Hardy Fix For: 3.5.0 We were running 3 zookeeper servers, and simulated a failure on one of the servers. The one zookeeper node follows the other, but has trouble connecting. It looks like the following exception is the cause: {noformat} 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 INFO [zookeeper] -- [org.apache.zookeeper.server.quorum.QuorumPeer] FOLLOWING 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 INFO [zookeeper] -- [org.apache.zookeeper.server.ZooKeeperServer] Created server 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 INFO [zookeeper] -- [org.apache.zookeeper.server.quorum.Follower] Following zookeeper3/192.168.131.11:2888 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING [zookeeper] -- [org.apache.zookeeper.server.quorum.Follower] Unexpected exception, tries=0 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING java.net.ConnectException: -- Connection refused 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING -- at java.net.PlainSocketImpl.socketConnect(Native Method) 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING -- at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:310) 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING -- at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:176) 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING -- at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:163) 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING -- at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384) 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING -- at java.net.Socket.connect(Socket.java:546) 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING -- at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:156) 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING -- at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:549) {noformat} The last exception while connecting was: {noformat} 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR [zookeeper] -- [org.apache.zookeeper.server.quorum.Follower] Unexpected exception 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR java.net.ConnectException: -- Connection refused 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR -- at java.net.PlainSocketImpl.socketConnect(Native Method) 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR -- at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:310) 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR -- at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:176) 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR -- at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:163) 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR -- at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384) 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR -- at java.net.Socket.connect(Socket.java:546) 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR -- at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:156) 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR -- at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:549) 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 WARNING [zookeeper] -- [org.apache.zookeeper.server.quorum.Follower] Exception when following the leader {noformat} The leader started leading a bit later {noformat} 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] -- [org.apache.zookeeper.server.quorum.FastLeaderElection] Notification: 0, 94489312534, 25, 2, LOOKING, LOOKING, 0 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] -- [org.apache.zookeeper.server.quorum.FastLeaderElection] Adding vote 2011-03-01T14:02:32+02:00 e0-cb-4e-65-4d-7d WARNING [zookeeper] -- [org.apache.zookeeper.server.quorum.QuorumCnxManager] Cannot open channel to 1 at election address zookeeper2/192.168.132.10:3888 2011-03-01T14:02:32+02:00 e0-cb-4e-65-4d-7d WARNING -- at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:323) 2011-03-01T14:02:50+02:00
[jira] [Updated] (ZOOKEEPER-823) update ZooKeeper java client to optionally use Netty for connections
[ https://issues.apache.org/jira/browse/ZOOKEEPER-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-823: Fix Version/s: (was: 3.4.0) 3.5.0 not a blocker. Moving it out of 3.4 release. update ZooKeeper java client to optionally use Netty for connections Key: ZOOKEEPER-823 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-823 Project: ZooKeeper Issue Type: New Feature Components: java client Reporter: Patrick Hunt Assignee: Patrick Hunt Fix For: 3.5.0 Attachments: NettyNettySuiteTest.rtf, TEST-org.apache.zookeeper.test.NettyNettySuiteTest.txt.gz, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, testDisconnectedAddAuth_FAILURE, testWatchAutoResetWithPending_FAILURE This jira will port the client side connection code to use netty rather than direct nio. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely
[ https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049669#comment-13049669 ] Ted Dunning commented on ZOOKEEPER-965: --- I can take a quick look, but I am having trouble getting reliable net access. On Wed, Jun 15, 2011 at 6:29 AM, Marshall McMullen (JIRA) Need a multi-update command to allow multiple znodes to be updated safely - Key: ZOOKEEPER-965 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.3.3 Reporter: Ted Dunning Assignee: Ted Dunning Fix For: 3.4.0 Attachments: ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch The basic idea is to have a single method called multi that will accept a list of create, delete, update or check objects each of which has a desired version or file state in the case of create. If all of the version and existence constraints can be satisfied, then all updates will be done atomically. Two API styles have been suggested. One has a list as above and the other style has a Transaction that allows builder-like methods to build a set of updates and a commit method to finalize the transaction. This can trivially be reduced to the first kind of API so the list based API style should be considered the primitive and the builder style should be implemented as syntactic sugar. The total size of all the data in all updates and creates in a single transaction should be limited to 1MB. Implementation-wise this capability can be done using standard ZK internals. The changes include: - update to ZK clients to all the new call - additional wire level request - on the server, in the code that converts transactions to idempotent form, the code should be slightly extended to convert a list of operations to idempotent form. - on the client, a down-rev server that rejects the multi-update should be detected gracefully and an informative exception should be thrown. To facilitate shared development, I have established a github repository at https://github.com/tdunning/zookeeper and am happy to extend committer status to anyone who agrees to donate their code back to Apache. The final patch will be attached to this bug as normal. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely
[ https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049670#comment-13049670 ] Ted Dunning commented on ZOOKEEPER-965: --- OK. As a first step, I rebased our changes to current trunk. This will require the usual recheckout due to non-fast-forward operations. Now to the problems you are seeing. Need a multi-update command to allow multiple znodes to be updated safely - Key: ZOOKEEPER-965 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.3.3 Reporter: Ted Dunning Assignee: Ted Dunning Fix For: 3.4.0 Attachments: ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch The basic idea is to have a single method called multi that will accept a list of create, delete, update or check objects each of which has a desired version or file state in the case of create. If all of the version and existence constraints can be satisfied, then all updates will be done atomically. Two API styles have been suggested. One has a list as above and the other style has a Transaction that allows builder-like methods to build a set of updates and a commit method to finalize the transaction. This can trivially be reduced to the first kind of API so the list based API style should be considered the primitive and the builder style should be implemented as syntactic sugar. The total size of all the data in all updates and creates in a single transaction should be limited to 1MB. Implementation-wise this capability can be done using standard ZK internals. The changes include: - update to ZK clients to all the new call - additional wire level request - on the server, in the code that converts transactions to idempotent form, the code should be slightly extended to convert a list of operations to idempotent form. - on the client, a down-rev server that rejects the multi-update should be detected gracefully and an informative exception should be thrown. To facilitate shared development, I have established a github repository at https://github.com/tdunning/zookeeper and am happy to extend committer status to anyone who agrees to donate their code back to Apache. The final patch will be attached to this bug as normal. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely
[ https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049673#comment-13049673 ] Ted Dunning commented on ZOOKEEPER-965: --- I see a clean compile on my mac. Looks like I don't understand the problem. I can't run all the tests just now, but last time I looked they ran. BuzzBook-Pro:zookeeper[trunk*]$ git checkout multi Switched to branch 'multi' BuzzBook-Pro:zookeeper[multi*]$ ant clean Buildfile: /Users/tdunning/Apache/zookeeper/build.xml ... clean: BUILD SUCCESSFUL Total time: 0 seconds BuzzBook-Pro:zookeeper[multi*]$ ant compile ... version-info: [java] Unknown REVISION number, using -1 ... [javac] Compiling 52 source files to /Users/tdunning/Apache/zookeeper/build/classes ... [javac] Compiling 134 source files to /Users/tdunning/Apache/zookeeper/build/classes BUILD SUCCESSFUL Total time: 11 seconds BuzzBook-Pro:zookeeper[multi*]$ On Wed, Jun 15, 2011 at 10:01 AM, Ted Dunning (JIRA) j...@apache.org https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049670#comment-13049670] ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch a list of create, delete, update or check objects each of which has a desired version or file state in the case of create. If all of the version and existence constraints can be satisfied, then all updates will be done atomically. other style has a Transaction that allows builder-like methods to build a set of updates and a commit method to finalize the transaction. This can trivially be reduced to the first kind of API so the list based API style should be considered the primitive and the builder style should be implemented as syntactic sugar. transaction should be limited to 1MB. internals. The changes include: form, the code should be slightly extended to convert a list of operations to idempotent form. be detected gracefully and an informative exception should be thrown. at https://github.com/tdunning/zookeeper and am happy to extend committer status to anyone who agrees to donate their code back to Apache. The final patch will be attached to this bug as normal. Need a multi-update command to allow multiple znodes to be updated safely - Key: ZOOKEEPER-965 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.3.3 Reporter: Ted Dunning Assignee: Ted Dunning Fix For: 3.4.0 Attachments: ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch The basic idea is to have a single method called multi that will accept a list of create, delete, update or check objects each of which has a desired version or file state in the case of create. If all of the version and existence constraints can be satisfied, then all updates will be done atomically. Two API styles have been suggested. One has a list as above and the other style has a Transaction that allows builder-like methods to build a set of updates and a commit method to finalize the transaction. This can trivially be reduced to the first kind of API so the list based API style should be considered the primitive and the builder style should be implemented as syntactic sugar. The total size of all the data in all updates and creates in a single transaction should be limited to 1MB. Implementation-wise this capability can be done using standard ZK internals. The changes include: - update to ZK clients to all the new call - additional wire level request - on the server, in the code that converts transactions to idempotent form, the code should be slightly extended to convert a list of operations to idempotent form. - on the client, a down-rev server that rejects the multi-update should be detected gracefully and an informative exception should be thrown. To facilitate shared development, I have established a github repository at https://github.com/tdunning/zookeeper and am happy to extend committer status to anyone who agrees to donate their code back to Apache. The
[jira] [Commented] (ZOOKEEPER-1080) Provide a Leader Election framework based on Zookeeper receipe
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049725#comment-13049725 ] Hari A V commented on ZOOKEEPER-1080: - hi Sameer, How about handling of Disconnected and Expired events from Zookeeper? Here in this case there will not be any exception propagated from the Zookeper server, Instead it will notify through watcher as KeeperState.Disconnected (0). Please see the following case: Let's say Process1 is Leader and process2 is Ready state. Now, the network of Process1 goes down[Disconnected Event] for morethan sessiontimeout period. Then Process2 will get the NodeDeleted event and becomes Active. So finally Process1 Process2 both will be in Active state. [Multiple Active processes will leads to inconsistencies if we use this framework to provide HA for NameNode.] Provide a Leader Election framework based on Zookeeper receipe -- Key: ZOOKEEPER-1080 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1080 Project: ZooKeeper Issue Type: New Feature Components: contrib Affects Versions: 3.3.2 Reporter: Hari A V Attachments: LeaderElectionService.pdf, ZOOKEEPER-1080.patch, zkclient-0.1.0.jar, zookeeper-leader-0.0.1.tar.gz Currently Hadoop components such as NameNode and JobTracker are single point of failure. If Namenode or JobTracker goes down, there service will not be available until they are up and running again. If there was a Standby Namenode or JobTracker available and ready to serve when Active nodes go down, we could have reduced the service down time. Hadoop already provides a Standby Namenode implementation which is not fully a hot Standby. The common problem to be addressed in any such Active-Standby cluster is Leader Election and Failure detection. This can be done using Zookeeper as mentioned in the Zookeeper recipes. http://zookeeper.apache.org/doc/r3.3.3/recipes.html +Leader Election Service (LES)+ Any Node who wants to participate in Leader Election can use this service. They should start the service with required configurations. The service will notify the nodes whether they should be started as Active or Standby mode. Also they intimate any changes in the mode at runtime. All other complexities can be handled internally by the LES. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049774#comment-13049774 ] Camille Fournier commented on ZOOKEEPER-1046: - It's ok with me to make the change and ignore deletes. Creating a new sequential node results in a ZNODEEXISTS error - Key: ZOOKEEPER-1046 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.3.2, 3.3.3 Environment: A 3 node-cluster running Debian squeeze. Reporter: Jeremy Stribling Assignee: Vishal K Priority: Blocker Labels: sequence Fix For: 3.4.0 Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible. In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent. In one example: {noformat} [zk:ip:port(CONNECTED) 3] stat /zkrsm cZxid = 0x5 ctime = Mon Jan 17 18:28:19 PST 2011 mZxid = 0x5 mtime = Mon Jan 17 18:28:19 PST 2011 pZxid = 0x1d819 cversion = 120710 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 0 numChildren = 2955 {noformat} However, the znode /zkrsm/002d_record120804 existed on disk. In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA. The logs are named as nodeX.zxid_prefixes.log, and each new log represents an application process restart. Here's the scenario: # There's a cluster with nodes 1,2,3 using zxid 0x3. # All three nodes restart, forming a cluster of zxid 0x4. # Node 3 restarts, leading to a cluster of 0x5. At this point, it seems like node 1 is the leader of the 0x5 epoch. In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message: {noformat} 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00b2_record0001761440 {noformat} This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries. The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play. I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc. Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system. Some oddities you might notice in these logs: * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code. (They are assigned randomly, but are supposed to be consistent across restarts.) * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process). This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins: {noformat} 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner - shutdown called {noformat} * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed. I'm not sure how this would be related though, as no transactions are done during this period. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely
[ https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049797#comment-13049797 ] Marshall McMullen commented on ZOOKEEPER-965: - Ted, thanks for taking a look at this. Not sure if you noticed this, but I checked in a change to github to fix the compile error I was seeing. I just wanted you to look at it and see if/why the fix was necessary Need a multi-update command to allow multiple znodes to be updated safely - Key: ZOOKEEPER-965 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.3.3 Reporter: Ted Dunning Assignee: Ted Dunning Fix For: 3.4.0 Attachments: ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch The basic idea is to have a single method called multi that will accept a list of create, delete, update or check objects each of which has a desired version or file state in the case of create. If all of the version and existence constraints can be satisfied, then all updates will be done atomically. Two API styles have been suggested. One has a list as above and the other style has a Transaction that allows builder-like methods to build a set of updates and a commit method to finalize the transaction. This can trivially be reduced to the first kind of API so the list based API style should be considered the primitive and the builder style should be implemented as syntactic sugar. The total size of all the data in all updates and creates in a single transaction should be limited to 1MB. Implementation-wise this capability can be done using standard ZK internals. The changes include: - update to ZK clients to all the new call - additional wire level request - on the server, in the code that converts transactions to idempotent form, the code should be slightly extended to convert a list of operations to idempotent form. - on the client, a down-rev server that rejects the multi-update should be detected gracefully and an informative exception should be thrown. To facilitate shared development, I have established a github repository at https://github.com/tdunning/zookeeper and am happy to extend committer status to anyone who agrees to donate their code back to Apache. The final patch will be attached to this bug as normal. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049814#comment-13049814 ] Flavio Junqueira commented on ZOOKEEPER-1046: - If cversion counts the number of created children, we can always learn the number of deleted children by subtracting the number of current children from cversion, no? I was also wondering if there is any use case you're aware of in which it needs to have both counted. So far the proposal of counting only creations seems good to me. Creating a new sequential node results in a ZNODEEXISTS error - Key: ZOOKEEPER-1046 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.3.2, 3.3.3 Environment: A 3 node-cluster running Debian squeeze. Reporter: Jeremy Stribling Assignee: Vishal K Priority: Blocker Labels: sequence Fix For: 3.4.0 Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible. In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent. In one example: {noformat} [zk:ip:port(CONNECTED) 3] stat /zkrsm cZxid = 0x5 ctime = Mon Jan 17 18:28:19 PST 2011 mZxid = 0x5 mtime = Mon Jan 17 18:28:19 PST 2011 pZxid = 0x1d819 cversion = 120710 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 0 numChildren = 2955 {noformat} However, the znode /zkrsm/002d_record120804 existed on disk. In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA. The logs are named as nodeX.zxid_prefixes.log, and each new log represents an application process restart. Here's the scenario: # There's a cluster with nodes 1,2,3 using zxid 0x3. # All three nodes restart, forming a cluster of zxid 0x4. # Node 3 restarts, leading to a cluster of 0x5. At this point, it seems like node 1 is the leader of the 0x5 epoch. In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message: {noformat} 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00b2_record0001761440 {noformat} This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries. The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play. I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc. Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system. Some oddities you might notice in these logs: * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code. (They are assigned randomly, but are supposed to be consistent across restarts.) * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process). This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins: {noformat} 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner - shutdown called {noformat} * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed. I'm not sure how this would be related though, as no transactions are done during this period. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049817#comment-13049817 ] Benjamin Reed commented on ZOOKEEPER-1046: -- nice observation flavio! i haven't seen anyone using cversion outside of the sequence number on sequence znodes. Creating a new sequential node results in a ZNODEEXISTS error - Key: ZOOKEEPER-1046 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.3.2, 3.3.3 Environment: A 3 node-cluster running Debian squeeze. Reporter: Jeremy Stribling Assignee: Vishal K Priority: Blocker Labels: sequence Fix For: 3.4.0 Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible. In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent. In one example: {noformat} [zk:ip:port(CONNECTED) 3] stat /zkrsm cZxid = 0x5 ctime = Mon Jan 17 18:28:19 PST 2011 mZxid = 0x5 mtime = Mon Jan 17 18:28:19 PST 2011 pZxid = 0x1d819 cversion = 120710 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 0 numChildren = 2955 {noformat} However, the znode /zkrsm/002d_record120804 existed on disk. In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA. The logs are named as nodeX.zxid_prefixes.log, and each new log represents an application process restart. Here's the scenario: # There's a cluster with nodes 1,2,3 using zxid 0x3. # All three nodes restart, forming a cluster of zxid 0x4. # Node 3 restarts, leading to a cluster of 0x5. At this point, it seems like node 1 is the leader of the 0x5 epoch. In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message: {noformat} 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00b2_record0001761440 {noformat} This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries. The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play. I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc. Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system. Some oddities you might notice in these logs: * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code. (They are assigned randomly, but are supposed to be consistent across restarts.) * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process). This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins: {noformat} 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner - shutdown called {noformat} * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed. I'm not sure how this would be related though, as no transactions are done during this period. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-723) ephemeral parent znodes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049820#comment-13049820 ] Camille Fournier commented on ZOOKEEPER-723: On the one hand I really would like this feature, but on the other hand I do not like the idea of having one of these created with no children and then floating out there for some indefinite period of time until someone finally decides to create children under it. It seems confusing and hard to manage from a client perspective. All of my use cases would be completely satisfied with the nodes as real ephemeral, aka, session-based and only allowing children that are ephemeral containers/nodes from the same session. I'm curious to think of a really compelling use case where I would want this to cross sessions, and the email thread did not seem to provide one. Why don't we want this to be true ephemeral? ephemeral parent znodes --- Key: ZOOKEEPER-723 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-723 Project: ZooKeeper Issue Type: New Feature Components: server Reporter: Benjamin Reed Assignee: Daniel Gómez Ferro Attachments: ZOOKEEPER-723.patch ephemeral znodes have the nice property of automatically cleaning up after themselves when the creator goes away, but since they can't have children it is hard to build subtrees that will cleanup after the clients that are using them are gone. rather than changing the semantics of ephemeral nodes, i propose ephemeral parents: znodes that disappear when they have no more children. this cleanup would happen automatically when the last child is removed. an ephemeral parent is not tied to any particular session, so even if the creator goes away, the ephemeral parent will remain as long as there are children. the when an ephemeral parent is created it will have an initial child, so that it doesn't get immediately removed. i think this child should be an ephemeral znode with a predefined name, firstChild. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-723) ephemeral parent znodes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049831#comment-13049831 ] Benjamin Reed commented on ZOOKEEPER-723: - yeah i don't like the floating out there indefinitely with no children part either. one use case for allowing different session is the barrier like case in which you want to find out when everyone is done using a resource: you create a parent znode, /myresource, with a child called available. processes that use the resource will create children under /myresource. when the resource manager wants to stop providing the resource, it removes /myresource/available and then watches for /myresource to disappear. ephemeral parent znodes --- Key: ZOOKEEPER-723 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-723 Project: ZooKeeper Issue Type: New Feature Components: server Reporter: Benjamin Reed Assignee: Daniel Gómez Ferro Attachments: ZOOKEEPER-723.patch ephemeral znodes have the nice property of automatically cleaning up after themselves when the creator goes away, but since they can't have children it is hard to build subtrees that will cleanup after the clients that are using them are gone. rather than changing the semantics of ephemeral nodes, i propose ephemeral parents: znodes that disappear when they have no more children. this cleanup would happen automatically when the last child is removed. an ephemeral parent is not tied to any particular session, so even if the creator goes away, the ephemeral parent will remain as long as there are children. the when an ephemeral parent is created it will have an initial child, so that it doesn't get immediately removed. i think this child should be an ephemeral znode with a predefined name, firstChild. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (ZOOKEEPER-1095) Simple leader election recipe
Simple leader election recipe - Key: ZOOKEEPER-1095 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1095 Project: ZooKeeper Issue Type: Improvement Reporter: Henry Robinson Leader election recipe originally contributed to ZOOKEEPER-1080. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1080) Provide a Leader Election framework based on Zookeeper receipe
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049887#comment-13049887 ] Henry Robinson commented on ZOOKEEPER-1080: --- What we've got here are two different, but equally valid, approaches to building leader election. Since this isn't a core framework issue, we're not making a decision that everyone has to live with. Therefore there's no need for the committers to play kingmaker by only committing one of these patches. We've got room for both, just not on this JIRA. Here's what I suggest we do. * Eric - I've opened ZOOKEEPER-1095 for your contribution. Can you attach your recipe (as a diff, with copyright headers) to that ticket, and we'll work on getting it committed there? * Hari - leave your patch here, and one of the committers will do a code review shortly. Provide a Leader Election framework based on Zookeeper receipe -- Key: ZOOKEEPER-1080 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1080 Project: ZooKeeper Issue Type: New Feature Components: contrib Affects Versions: 3.3.2 Reporter: Hari A V Fix For: 3.3.2 Attachments: LeaderElectionService.pdf, ZOOKEEPER-1080.patch, zkclient-0.1.0.jar, zookeeper-leader-0.0.1.tar.gz Currently Hadoop components such as NameNode and JobTracker are single point of failure. If Namenode or JobTracker goes down, there service will not be available until they are up and running again. If there was a Standby Namenode or JobTracker available and ready to serve when Active nodes go down, we could have reduced the service down time. Hadoop already provides a Standby Namenode implementation which is not fully a hot Standby. The common problem to be addressed in any such Active-Standby cluster is Leader Election and Failure detection. This can be done using Zookeeper as mentioned in the Zookeeper recipes. http://zookeeper.apache.org/doc/r3.3.3/recipes.html +Leader Election Service (LES)+ Any Node who wants to participate in Leader Election can use this service. They should start the service with required configurations. The service will notify the nodes whether they should be started as Active or Standby mode. Also they intimate any changes in the mode at runtime. All other complexities can be handled internally by the LES. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-723) ephemeral parent znodes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049895#comment-13049895 ] Camille Fournier commented on ZOOKEEPER-723: That would be ok. Now my next question is, would we ever want to have non-ephemeral/ephemeral-container children of an ephemeral container? ephemeral parent znodes --- Key: ZOOKEEPER-723 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-723 Project: ZooKeeper Issue Type: New Feature Components: server Reporter: Benjamin Reed Assignee: Daniel Gómez Ferro Attachments: ZOOKEEPER-723.patch ephemeral znodes have the nice property of automatically cleaning up after themselves when the creator goes away, but since they can't have children it is hard to build subtrees that will cleanup after the clients that are using them are gone. rather than changing the semantics of ephemeral nodes, i propose ephemeral parents: znodes that disappear when they have no more children. this cleanup would happen automatically when the last child is removed. an ephemeral parent is not tied to any particular session, so even if the creator goes away, the ephemeral parent will remain as long as there are children. the when an ephemeral parent is created it will have an initial child, so that it doesn't get immediately removed. i think this child should be an ephemeral znode with a predefined name, firstChild. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-723) ephemeral parent znodes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049931#comment-13049931 ] Patrick Hunt commented on ZOOKEEPER-723: Wow, my brain is in an infinite loop now. ;-) Yes, iirc that's one of the issues we had touched on way back when... and one of the reasons why we kept it simply that ephemeral nodes couldn't have children. I seem to also remember another related issue, that once you start allowing arbitrarily large ephemeral trees to be built there was a concern about cleanup and it's effect on availability of the system as a whole. (still a concern I would have) note: If this znode really is ephemeral (strongly tied to the session lifetime) I don't have a problem calling it as such. however it the znode can live beyond the session lifetime that created it then Flavio's suggestion of solitary sounds good to me. (was that a typo or did you really mean solidary?) ephemeral parent znodes --- Key: ZOOKEEPER-723 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-723 Project: ZooKeeper Issue Type: New Feature Components: server Reporter: Benjamin Reed Assignee: Daniel Gómez Ferro Attachments: ZOOKEEPER-723.patch ephemeral znodes have the nice property of automatically cleaning up after themselves when the creator goes away, but since they can't have children it is hard to build subtrees that will cleanup after the clients that are using them are gone. rather than changing the semantics of ephemeral nodes, i propose ephemeral parents: znodes that disappear when they have no more children. this cleanup would happen automatically when the last child is removed. an ephemeral parent is not tied to any particular session, so even if the creator goes away, the ephemeral parent will remain as long as there are children. the when an ephemeral parent is created it will have an initial child, so that it doesn't get immediately removed. i think this child should be an ephemeral znode with a predefined name, firstChild. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely
[ https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049953#comment-13049953 ] Ted Dunning commented on ZOOKEEPER-965: --- Ahhh... no. I didn't notice that. I will take a look. On Wed, Jun 15, 2011 at 4:45 PM, Marshall McMullen (JIRA) Need a multi-update command to allow multiple znodes to be updated safely - Key: ZOOKEEPER-965 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.3.3 Reporter: Ted Dunning Assignee: Ted Dunning Fix For: 3.4.0 Attachments: ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch The basic idea is to have a single method called multi that will accept a list of create, delete, update or check objects each of which has a desired version or file state in the case of create. If all of the version and existence constraints can be satisfied, then all updates will be done atomically. Two API styles have been suggested. One has a list as above and the other style has a Transaction that allows builder-like methods to build a set of updates and a commit method to finalize the transaction. This can trivially be reduced to the first kind of API so the list based API style should be considered the primitive and the builder style should be implemented as syntactic sugar. The total size of all the data in all updates and creates in a single transaction should be limited to 1MB. Implementation-wise this capability can be done using standard ZK internals. The changes include: - update to ZK clients to all the new call - additional wire level request - on the server, in the code that converts transactions to idempotent form, the code should be slightly extended to convert a list of operations to idempotent form. - on the client, a down-rev server that rejects the multi-update should be detected gracefully and an informative exception should be thrown. To facilitate shared development, I have established a github repository at https://github.com/tdunning/zookeeper and am happy to extend committer status to anyone who agrees to donate their code back to Apache. The final patch will be attached to this bug as normal. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely
[ https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049955#comment-13049955 ] Ted Dunning commented on ZOOKEEPER-965: --- Marshall, I just tried with and without your patch. It compiles either way. My feeling is that excessive throws declarations are bad juju anyway so the current state (with your change) is better than the previous state (with the extra throws in processTxn). I would leave it as is. Need a multi-update command to allow multiple znodes to be updated safely - Key: ZOOKEEPER-965 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.3.3 Reporter: Ted Dunning Assignee: Ted Dunning Fix For: 3.4.0 Attachments: ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch The basic idea is to have a single method called multi that will accept a list of create, delete, update or check objects each of which has a desired version or file state in the case of create. If all of the version and existence constraints can be satisfied, then all updates will be done atomically. Two API styles have been suggested. One has a list as above and the other style has a Transaction that allows builder-like methods to build a set of updates and a commit method to finalize the transaction. This can trivially be reduced to the first kind of API so the list based API style should be considered the primitive and the builder style should be implemented as syntactic sugar. The total size of all the data in all updates and creates in a single transaction should be limited to 1MB. Implementation-wise this capability can be done using standard ZK internals. The changes include: - update to ZK clients to all the new call - additional wire level request - on the server, in the code that converts transactions to idempotent form, the code should be slightly extended to convert a list of operations to idempotent form. - on the client, a down-rev server that rejects the multi-update should be detected gracefully and an informative exception should be thrown. To facilitate shared development, I have established a github repository at https://github.com/tdunning/zookeeper and am happy to extend committer status to anyone who agrees to donate their code back to Apache. The final patch will be attached to this bug as normal. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (ZOOKEEPER-1096) Leader communication should listen on specified IP, not wildcard address
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt reassigned ZOOKEEPER-1096: --- Assignee: Jared Cantwell Leader communication should listen on specified IP, not wildcard address Key: ZOOKEEPER-1096 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1096 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.3.3, 3.4.0 Reporter: Jared Cantwell Assignee: Jared Cantwell Priority: Minor Attachments: ZOOKEEPER-1096.patch Server should specify the local address that is used for leader communication (and not use the default of listening on all interfaces). This is similar to the clientPortAddress parameter that was added a year ago. After reviewing the code, we can't think of a reason why only the port would be used with the wildcard interface, when servers are already connecting specifically to that interface anyway. I have submitted a patch, but it does not account for all leader election algorithms. Probably should have an option to toggle this, for backwards compatibility, although it seems like it would be a bug if this change broke things. There is some more information about making it an option here: http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-dev/201008.mbox/%3CAANLkTikkT97Djqt3CU=h2+7gnj_4p28hgcxjh345h...@mail.gmail.com%3E -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1096) Leader communication should listen on specified IP, not wildcard address
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049980#comment-13049980 ] Patrick Hunt commented on ZOOKEEPER-1096: - with git use --no-prefix when creating the patch. Leader communication should listen on specified IP, not wildcard address Key: ZOOKEEPER-1096 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1096 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.3.3, 3.4.0 Reporter: Jared Cantwell Assignee: Jared Cantwell Priority: Minor Attachments: ZOOKEEPER-1096.patch Server should specify the local address that is used for leader communication (and not use the default of listening on all interfaces). This is similar to the clientPortAddress parameter that was added a year ago. After reviewing the code, we can't think of a reason why only the port would be used with the wildcard interface, when servers are already connecting specifically to that interface anyway. I have submitted a patch, but it does not account for all leader election algorithms. Probably should have an option to toggle this, for backwards compatibility, although it seems like it would be a bug if this change broke things. There is some more information about making it an option here: http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-dev/201008.mbox/%3CAANLkTikkT97Djqt3CU=h2+7gnj_4p28hgcxjh345h...@mail.gmail.com%3E -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1096) Leader communication should listen on specified IP, not wildcard address
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049982#comment-13049982 ] Patrick Hunt commented on ZOOKEEPER-1096: - bq. Probably should have an option to toggle this, for backwards compatibility, although it seems like it would be a bug if this change broke things. I agree on both counts. Leader communication should listen on specified IP, not wildcard address Key: ZOOKEEPER-1096 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1096 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.3.3, 3.4.0 Reporter: Jared Cantwell Assignee: Jared Cantwell Priority: Minor Attachments: ZOOKEEPER-1096.patch Server should specify the local address that is used for leader communication (and not use the default of listening on all interfaces). This is similar to the clientPortAddress parameter that was added a year ago. After reviewing the code, we can't think of a reason why only the port would be used with the wildcard interface, when servers are already connecting specifically to that interface anyway. I have submitted a patch, but it does not account for all leader election algorithms. Probably should have an option to toggle this, for backwards compatibility, although it seems like it would be a bug if this change broke things. There is some more information about making it an option here: http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-dev/201008.mbox/%3CAANLkTikkT97Djqt3CU=h2+7gnj_4p28hgcxjh345h...@mail.gmail.com%3E -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1096) Leader communication should listen on specified IP, not wildcard address
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jared Cantwell updated ZOOKEEPER-1096: -- Attachment: ZOOKEEPER-1096.patch Fixed some prefixes. Leader communication should listen on specified IP, not wildcard address Key: ZOOKEEPER-1096 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1096 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.3.3, 3.4.0 Reporter: Jared Cantwell Assignee: Jared Cantwell Priority: Minor Attachments: ZOOKEEPER-1096.patch, ZOOKEEPER-1096.patch Server should specify the local address that is used for leader communication (and not use the default of listening on all interfaces). This is similar to the clientPortAddress parameter that was added a year ago. After reviewing the code, we can't think of a reason why only the port would be used with the wildcard interface, when servers are already connecting specifically to that interface anyway. I have submitted a patch, but it does not account for all leader election algorithms. Probably should have an option to toggle this, for backwards compatibility, although it seems like it would be a bug if this change broke things. There is some more information about making it an option here: http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-dev/201008.mbox/%3CAANLkTikkT97Djqt3CU=h2+7gnj_4p28hgcxjh345h...@mail.gmail.com%3E -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Zookeeper code question
Hi Folks, I am looking at the code in CommitProcessor and I had a couple of questions. 1. When a request is ready to be processed, it goes into the toProcess list. Then subsequently, it is taken out of that list and we call nextProcessor.processRequest(toProcess.get(i)). Why does this intermediate toProcess list exist? Why couldn't we call nextProcessor.processRequest(toProcess.get(i)) directly wherever toProcess.add(r) is called? I gave it some thought and couldn't figure out a correctness issue either way. 2. There are a couple of data structures that are accessed by multiple threads but are not synchronized - they are LinkedListRequest queuedWriteRequests, committedRequests. That looks like a bug. (or pls let me know if I am missing something). Thanks! Vishal
[jira] [Commented] (ZOOKEEPER-1096) Leader communication should listen on specified IP, not wildcard address
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050006#comment-13050006 ] Hadoop QA commented on ZOOKEEPER-1096: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12482704/ZOOKEEPER-1096.patch against trunk revision 1135515. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/318//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/318//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/318//console This message is automatically generated. Leader communication should listen on specified IP, not wildcard address Key: ZOOKEEPER-1096 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1096 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.3.3, 3.4.0 Reporter: Jared Cantwell Assignee: Jared Cantwell Priority: Minor Attachments: ZOOKEEPER-1096.patch, ZOOKEEPER-1096.patch Server should specify the local address that is used for leader communication (and not use the default of listening on all interfaces). This is similar to the clientPortAddress parameter that was added a year ago. After reviewing the code, we can't think of a reason why only the port would be used with the wildcard interface, when servers are already connecting specifically to that interface anyway. I have submitted a patch, but it does not account for all leader election algorithms. Probably should have an option to toggle this, for backwards compatibility, although it seems like it would be a bug if this change broke things. There is some more information about making it an option here: http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-dev/201008.mbox/%3CAANLkTikkT97Djqt3CU=h2+7gnj_4p28hgcxjh345h...@mail.gmail.com%3E -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-522) zookeeper client should throttle if its not able to connect to any of the servers.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050007#comment-13050007 ] Vishal Kathuria commented on ZOOKEEPER-522: --- Thanks for opening this Jira Mahadev, I haven't looked at the code - does this happen for both C and Java clients? zookeeper client should throttle if its not able to connect to any of the servers. -- Key: ZOOKEEPER-522 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-522 Project: ZooKeeper Issue Type: Improvement Affects Versions: 3.2.0 Reporter: Mahadev konar Fix For: 3.5.0 Currently the zookeeper client library keeps connecting to servers if all of them are unreachable. It will go through the list time and again and try to connect. Sometimes, this might cause problems like too many clients retrying connect to servers (and there might be something wrong/delay with servers) wherein the clients will give up and will try reconnecting to other servers. This causes a huge churn in client connections sometimes leading to the zookeeper server running out of file handles. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1096) Leader communication should listen on specified IP, not wildcard address
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050022#comment-13050022 ] Hadoop QA commented on ZOOKEEPER-1096: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12482704/ZOOKEEPER-1096.patch against trunk revision 1135515. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/319//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/319//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/319//console This message is automatically generated. Leader communication should listen on specified IP, not wildcard address Key: ZOOKEEPER-1096 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1096 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.3.3, 3.4.0 Reporter: Jared Cantwell Assignee: Jared Cantwell Priority: Minor Attachments: ZOOKEEPER-1096.patch, ZOOKEEPER-1096.patch Server should specify the local address that is used for leader communication (and not use the default of listening on all interfaces). This is similar to the clientPortAddress parameter that was added a year ago. After reviewing the code, we can't think of a reason why only the port would be used with the wildcard interface, when servers are already connecting specifically to that interface anyway. I have submitted a patch, but it does not account for all leader election algorithms. Probably should have an option to toggle this, for backwards compatibility, although it seems like it would be a bug if this change broke things. There is some more information about making it an option here: http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-dev/201008.mbox/%3CAANLkTikkT97Djqt3CU=h2+7gnj_4p28hgcxjh345h...@mail.gmail.com%3E -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050078#comment-13050078 ] Benjamin Reed commented on ZOOKEEPER-1046: -- two clarifying points: * this is not for 3.3. this would be a 3.4 change. we will stick with camille's fix for 3.3 * we never get the cversion from the user. you can't do conditional ops with it or pass it in any of the calls. Creating a new sequential node results in a ZNODEEXISTS error - Key: ZOOKEEPER-1046 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.3.2, 3.3.3 Environment: A 3 node-cluster running Debian squeeze. Reporter: Jeremy Stribling Assignee: Vishal K Priority: Blocker Labels: sequence Fix For: 3.4.0 Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible. In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent. In one example: {noformat} [zk:ip:port(CONNECTED) 3] stat /zkrsm cZxid = 0x5 ctime = Mon Jan 17 18:28:19 PST 2011 mZxid = 0x5 mtime = Mon Jan 17 18:28:19 PST 2011 pZxid = 0x1d819 cversion = 120710 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 0 numChildren = 2955 {noformat} However, the znode /zkrsm/002d_record120804 existed on disk. In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA. The logs are named as nodeX.zxid_prefixes.log, and each new log represents an application process restart. Here's the scenario: # There's a cluster with nodes 1,2,3 using zxid 0x3. # All three nodes restart, forming a cluster of zxid 0x4. # Node 3 restarts, leading to a cluster of 0x5. At this point, it seems like node 1 is the leader of the 0x5 epoch. In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message: {noformat} 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00b2_record0001761440 {noformat} This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries. The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play. I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc. Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system. Some oddities you might notice in these logs: * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code. (They are assigned randomly, but are supposed to be consistent across restarts.) * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process). This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins: {noformat} 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner - shutdown called {noformat} * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed. I'm not sure how this would be related though, as no transactions are done during this period. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050090#comment-13050090 ] Patrick Hunt commented on ZOOKEEPER-1046: - Ok, thanks for the clarification. In that case what do you think about this for 3.4+ ? Is it going to be possible to do this right, but also w/o too much overhead? (ie simply) vs the gains of changing the API? 3.4 I'm less worried about the semantic change, but I'd still like to avoid it if reasonably possible... Creating a new sequential node results in a ZNODEEXISTS error - Key: ZOOKEEPER-1046 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.3.2, 3.3.3 Environment: A 3 node-cluster running Debian squeeze. Reporter: Jeremy Stribling Assignee: Vishal K Priority: Blocker Labels: sequence Fix For: 3.4.0 Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible. In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent. In one example: {noformat} [zk:ip:port(CONNECTED) 3] stat /zkrsm cZxid = 0x5 ctime = Mon Jan 17 18:28:19 PST 2011 mZxid = 0x5 mtime = Mon Jan 17 18:28:19 PST 2011 pZxid = 0x1d819 cversion = 120710 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 0 numChildren = 2955 {noformat} However, the znode /zkrsm/002d_record120804 existed on disk. In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA. The logs are named as nodeX.zxid_prefixes.log, and each new log represents an application process restart. Here's the scenario: # There's a cluster with nodes 1,2,3 using zxid 0x3. # All three nodes restart, forming a cluster of zxid 0x4. # Node 3 restarts, leading to a cluster of 0x5. At this point, it seems like node 1 is the leader of the 0x5 epoch. In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message: {noformat} 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00b2_record0001761440 {noformat} This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries. The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play. I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc. Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system. Some oddities you might notice in these logs: * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code. (They are assigned randomly, but are supposed to be consistent across restarts.) * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process). This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins: {noformat} 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner - shutdown called {noformat} * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed. I'm not sure how this would be related though, as no transactions are done during this period. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (BOOKKEEPER-5) Issue with Netty in BookKeeper
[ https://issues.apache.org/jira/browse/BOOKKEEPER-5?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated BOOKKEEPER-5: -- Attachment: BOOKKEEPER-5.patch Preliminary patch to fix this problem. It does not include a test yet. Issue with Netty in BookKeeper -- Key: BOOKKEEPER-5 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-5 Project: Bookkeeper Issue Type: Bug Reporter: Flavio Junqueira Assignee: Flavio Junqueira Attachments: BOOKKEEPER-5.patch, ZOOKEEPER-998.patch In one my experiments, I found that a BookKeeper object was locked after I tried to halt it. By searching the Web, I found that the issue is described here: http://www.jboss.org/netty/community.html#nabble-td5492010 I'll upload a patch to fix it. For now, I'm marking it for 3.4.0, but if there is any chance we can get it in 3.3.3, it would be nice. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira