date:20110615


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049626#comment-13049626
 ] 

Mahadev konar commented on ZOOKEEPER-1090:
--

Vishal/Camille,
 Should this be a target for 3.4 release?

 Race condition while taking snapshot can lead to not restoring data tree 
 correctly
 --

 Key: ZOOKEEPER-1090
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1090
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.3.3
Reporter: Vishal K
Priority: Critical
  Labels: persistence, server, snapshot
 Fix For: 3.4.0


 I think I have found a bug in the snapshot mechanism.
 The problem occurs because dt.lastProcessedZxid is not synchronized (or 
 rather set before the data tree is modified):
 FileTxnSnapLog:
 {code}
 public void save(DataTree dataTree,
 ConcurrentHashMapLong, Integer sessionsWithTimeouts)
 throws IOException {
 long lastZxid = dataTree.lastProcessedZxid;
 LOG.info(Snapshotting:  + Long.toHexString(lastZxid));
 File snapshot=new File(
 snapDir, Util.makeSnapshotName(lastZxid));
 snapLog.serialize(dataTree, sessionsWithTimeouts, snapshot);   === 
 the Datatree may not have the modification for lastProcessedZxid
 }
 {code}
 DataTree:
 {code}
 public ProcessTxnResult processTxn(TxnHeader header, Record txn) {
 ProcessTxnResult rc = new ProcessTxnResult();
 String debug = ;
 try {
 rc.clientId = header.getClientId();
 rc.cxid = header.getCxid();
 rc.zxid = header.getZxid();
 rc.type = header.getType();
 rc.err = 0;
 if (rc.zxid  lastProcessedZxid) {
 lastProcessedZxid = rc.zxid;
 }
 [...modify data tree...]   
  }
 {code}
 The lastProcessedZxid must be set after the modification is done.
 As a result, if server crashes after taking the snapshot (and the snapshot 
 does not contain change corresponding to lastProcessedZxid) restore will not 
 restore the data tree correctly:
 {code}
 public long restore(DataTree dt, MapLong, Integer sessions,
 PlayBackListener listener) throws IOException {
 snapLog.deserialize(dt, sessions);
 FileTxnLog txnLog = new FileTxnLog(dataDir);
 TxnIterator itr = txnLog.read(dt.lastProcessedZxid+1); === Assumes 
 lastProcessedZxid is deserialized
  }
 {code}
 I have had offline discussion with Ben and Camille on this. I will be posting 
 the discussion shortly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-707) c client close can crash with cptr null


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-707:


Fix Version/s: (was: 3.4.0)
   3.5.0

not a blocker. Moving it out of 3.4 release.

 c client close can crash with cptr null
 ---

 Key: ZOOKEEPER-707
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-707
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.3.0
Reporter: Patrick Hunt
Assignee: Mahadev konar
Priority: Critical
 Fix For: 3.5.0


 saw this in the zktest_mt at the end of 3.3.0, seems unlikely to happen 
 though as it only failed after running the test 10-15 times.
 Zookeeper_simpleSystem::testAuth ZooKeeper server started : elapsed 26011 : OK
 Zookeeper_simpleSystem::testHangingClientzktest-mt: src/zookeeper.c:1950: 
 zookeeper_process: Assertion `cptr' failed.
 Aborted

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-545) investigate use of realtime gc as the recommened default for server vm


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-545:


Fix Version/s: (was: 3.4.0)
   3.5.0

not a blocker. Moving it out of 3.4 release.

 investigate use of realtime gc as the recommened default for server vm
 --

 Key: ZOOKEEPER-545
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-545
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Reporter: Patrick Hunt
Priority: Critical
 Fix For: 3.5.0


 We currently don't recommend that ppl use the realtime gc when running the 
 server, we probably should.
 Before we do so we need to verify that it works.
 We should make it the default for all our tests.
 concurrent vs g2 or whatever it's called (new in 1.6_15 or something?)
 Update all scripts to specify this option
 update documentation to include this option and add section in the dev/ops 
 docs detailing it's benefits (in particular latency effects of gc)
 Also, -server option? any benefit for us to recommend this as well?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-675) LETest thread fails to join


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-675:


Fix Version/s: (was: 3.4.0)
   3.5.0

not a blocker. Moving it out of 3.4 release.

 LETest thread fails to join
 ---

 Key: ZOOKEEPER-675
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-675
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection
Reporter: Flavio Junqueira
Assignee: Henry Robinson
Priority: Critical
 Fix For: 3.5.0

 Attachments: TEST-org.apache.zookeeper.test.LETest.txt


 After applying the patch of ZOOKEEPER-569, I observed a failure of LETest. 
 From a cursory inspection of the log, I can tell that a leader is being 
 elected, but some thread is not joining. At this point I'm not sure if this 
 is a problem with the leader election implementation or the test itself. 
 Just to be clear, the patch of ZOOKEEPER-569 solved a real issue, but it 
 seems that there is yet another problem with LETest.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-697) TestQuotaQuorum is failing on Hudson


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-697:


Fix Version/s: (was: 3.4.0)
   3.5.0

not a blocker. Moving it out of 3.4 release.

 TestQuotaQuorum is failing on Hudson
 

 Key: ZOOKEEPER-697
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-697
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Mahadev konar
Assignee: Mahadev konar
Priority: Critical
 Fix For: 3.5.0


 The hudson test build failed 
 http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/729/testReport/junit/org.apache.zookeeper.test/QuorumQuotaTest/testQuotaWithQuorum/

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-670) zkpython leading to segfault on zookeeper server restart


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-670:


Fix Version/s: (was: 3.4.0)
   3.5.0

not a blocker. Moving it out of 3.4 release.

 zkpython leading to segfault on zookeeper server restart
 

 Key: ZOOKEEPER-670
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-670
 Project: ZooKeeper
  Issue Type: Bug
  Components: contrib-bindings
Affects Versions: 3.2.1, 3.2.2
 Environment: CentOS w/ Python 2.4
Reporter: Lei Zhang
Assignee: Henry Robinson
Priority: Critical
 Fix For: 3.5.0

 Attachments: voyager.patch, zk.py


 Zookeeper client using zkpython segfaults on zookeeper server restart. It is 
 reliably reproducible using the attached script zk.py.
 I'm able to stop segfault using the attached patch voyager.patch, but 
 zkpython seems to have deeper issue on its use of watcher_dispatch - on 
 zookeeper server restart, I see up to 6 invocation of watcher_dispatch while 
 my script is simply sleeping in the main thread. This can't be right.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-805) four letter words fail with latest ubuntu nc.openbsd


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-805:


Fix Version/s: (was: 3.4.0)
   3.5.0

not a blocker. Moving it out of 3.4 release.

 four letter words fail with latest ubuntu nc.openbsd
 

 Key: ZOOKEEPER-805
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-805
 Project: ZooKeeper
  Issue Type: Bug
  Components: documentation, server
Affects Versions: 3.3.1, 3.4.0
Reporter: Patrick Hunt
Priority: Critical
 Fix For: 3.5.0


 In both 3.3 branch and trunk echo stat|nc localhost 2181 fails against the 
 ZK server on Ubuntu Lucid Lynx.
 I noticed this after upgrading to lucid lynx - which is now shipping openbsd 
 nc as the default:
 OpenBSD netcat (Debian patchlevel 1.89-3ubuntu2)
 vs nc traditional
 [v1.10-38]
 which works fine. Not sure if this is a bug in us or nc.openbsd, but it's 
 currently not working for me. Ugh.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-517) NIO factory fails to close connections when the number of file handles run out.


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-517:


Fix Version/s: (was: 3.4.0)
   3.5.0

not a blocker. Moving it out of 3.4 release.

 NIO factory fails to close connections when the number of file handles run 
 out.
 ---

 Key: ZOOKEEPER-517
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-517
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Reporter: Mahadev konar
Assignee: Benjamin Reed
Priority: Critical
 Fix For: 3.5.0


 The code in NIO factory is such that if we fail to accept a connection due to 
 some reasons (too many file handles maybe one of them) we do not close the 
 connections that are in CLOSE_WAIT. We need to call an explicit close on 
 these sockets and then close them. One of the solutions might be to move doIO 
 before accpet so that we can still close connection even if we cannot accept 
 connections.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-851) ZK lets any node to become an observer


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-851:


Fix Version/s: (was: 3.4.0)
   3.5.0

not a blocker. Moving it out of 3.4 release.

 ZK lets any node to become an observer
 --

 Key: ZOOKEEPER-851
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-851
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.3.1
Reporter: Vishal K
Priority: Critical
 Fix For: 3.5.0


 I had a 3 node cluster running. The zoo.cfg on each contained 3 entries as 
 show below:
 tickTime=2000
 dataDir=/var/zookeeper
 clientPort=2181
 initLimit=5
 syncLimit=2
 server.0=10.150.27.61:2888:3888
 server.1=10.150.27.62:2888:3888
 server.2=10.150.27.63:2888:3888
 I wanted to add another node to the cluster. In fourth node's zoo.cfg, I 
 created another entry for that node and started zk server. The zoo.cfg on the 
 first 3 nodes was left unchanged. The fourth node was able to join the 
 cluster even though the 3 nodes had no idea about the fourth node.
 zoo.cfg on fourth node:
 tickTime=2000
 dataDir=/var/zookeeper
 clientPort=2181
 initLimit=5
 syncLimit=2
 server.0=10.150.27.61:2888:3888
 server.1=10.150.27.62:2888:3888
 server.2=10.150.27.63:2888:3888
 server.3=10.17.117.71:2888:3888
 It looks like 10.17.117.71 is becoming an observer in this case. I was 
 expecting that the leader will reject 10.17.117.71.
 # telnet 10.17.117.71 2181
 Trying 10.17.117.71...
 Connected to 10.17.117.71.
 Escape character is '^]'.
 stat
 Zookeeper version: 3.3.0--1, built on 04/02/2010 22:40 GMT
 Clients:
  /10.17.117.71:37297[1](queued=0,recved=1,sent=0)
 Latency min/avg/max: 0/0/0
 Received: 3
 Sent: 2
 Outstanding: 0
 Zxid: 0x20065
 Mode: follower
 Node count: 288

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-936) zkpython is leaking ACL_vector


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-936:


Fix Version/s: (was: 3.4.0)
   3.5.0

not a blocker. Moving it out of 3.4 release.

 zkpython is leaking ACL_vector
 --

 Key: ZOOKEEPER-936
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-936
 Project: ZooKeeper
  Issue Type: Bug
  Components: contrib-bindings
Reporter: Gustavo Niemeyer
Priority: Critical
 Fix For: 3.5.0


 It looks like there are no calls to deallocate_ACL_vector() within 
 zookeeper.c in the zkpython binding, which means that (at least) the result 
 of zoo_get_acl() must be leaking.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-955) Use Atomic(Integer|Long) for (Z)Xid


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-955:


Fix Version/s: (was: 3.4.0)
   3.5.0

not a blocker. Moving it out of 3.4 release.

 Use Atomic(Integer|Long) for (Z)Xid
 ---

 Key: ZOOKEEPER-955
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-955
 Project: ZooKeeper
  Issue Type: Improvement
  Components: java client, server
Reporter: Thomas Koch
Assignee: Thomas Koch
Priority: Trivial
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-955.patch


 As I've read last weekend in the fantastic book Clean Code, it'd be much 
 faster to use AtomicInteger or AtomicLong instead of synchronization blocks 
 around each access to an int or long.
 The key difference is, that a synchronization block will in any case acquire 
 and release a lock. The atomic classes use optimistic locking, a CPU 
 operation that only changes a value if it still has not changed since the 
 last read.
 In most cases the value has not changed since the last visit so the operation 
 is just as fast as a normal operation. If it had changed, then we read again 
 and try to change again.
 [1] Clean Code: A Handbook of Agile Software Craftsmanship (Robert C. Martin) 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-740) zkpython leading to segfault on zookeeper


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049639#comment-13049639
 ] 

Mahadev konar commented on ZOOKEEPER-740:
-

Any update on this? Should we try and get this to 3.4 release?

 zkpython leading to segfault on zookeeper
 -

 Key: ZOOKEEPER-740
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-740
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.3.0
Reporter: Federico
Assignee: Henry Robinson
Priority: Critical
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-740.patch


 The program that we are implementing uses the python binding for zookeeper 
 but sometimes it crash with segfault; here is the bt from gdb:
 Program received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 0xad244b70 (LWP 28216)]
 0x080611d5 in PyObject_Call (func=0x862fab0, arg=0x8837194, kw=0x0)
 at ../Objects/abstract.c:2488
 2488../Objects/abstract.c: No such file or directory.
 in ../Objects/abstract.c
 (gdb) bt
 #0  0x080611d5 in PyObject_Call (func=0x862fab0, arg=0x8837194, kw=0x0)
 at ../Objects/abstract.c:2488
 #1  0x080d6ef2 in PyEval_CallObjectWithKeywords (func=0x862fab0,
 arg=0x8837194, kw=0x0) at ../Python/ceval.c:3575
 #2  0x080612a0 in PyObject_CallObject (o=0x862fab0, a=0x8837194)
 at ../Objects/abstract.c:2480
 #3  0x0047af42 in watcher_dispatch (zzh=0x86174e0, type=-1, state=1,
 path=0x86337c8 , context=0x8588660) at src/c/zookeeper.c:314
 #4  0x00496559 in do_foreach_watcher (zh=0x86174e0, type=-1, state=1,
 path=0x86337c8 , list=0xa5354140) at src/zk_hashtable.c:275
 #5  deliverWatchers (zh=0x86174e0, type=-1, state=1, path=0x86337c8 ,
 list=0xa5354140) at src/zk_hashtable.c:317
 #6  0x0048ae3c in process_completions (zh=0x86174e0) at src/zookeeper.c:1766
 #7  0x0049706b in do_completion (v=0x86174e0) at src/mt_adaptor.c:333
 #8  0x0013380e in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
 #9  0x002578de in clone () from /lib/tls/i686/cmov/libc.so.6

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-973) bind() could fail on Leader because it does not setReuseAddress on its ServerSocket


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-973:


Fix Version/s: (was: 3.4.0)
   3.5.0

not a blocker. Moving it out of 3.4 release.

 bind() could fail on Leader because it does not setReuseAddress on its 
 ServerSocket 
 

 Key: ZOOKEEPER-973
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-973
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.3.2
Reporter: Vishal K
Priority: Trivial
 Fix For: 3.5.0


 setReuseAddress(true) should be used below.
 Leader(QuorumPeer self,LeaderZooKeeperServer zk) throws IOException {
 this.self = self;
 try {
 ss = new ServerSocket(self.getQuorumAddress().getPort());
 } catch (BindException e) {
 LOG.error(Couldn't bind to port 
 + self.getQuorumAddress().getPort(), e);
 throw e;
 }
 this.zk=zk;
 }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-233) Create a slimer jar for clients to reduce thier disk footprint.


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-233:


Fix Version/s: (was: 3.4.0)
   3.5.0

not a blocker. Moving it out of 3.4 release.

 Create a slimer jar for clients to reduce thier disk footprint.
 ---

 Key: ZOOKEEPER-233
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-233
 Project: ZooKeeper
  Issue Type: New Feature
  Components: build, java client
Reporter: Hiram Chirino
Priority: Trivial
 Fix For: 3.5.0


 Patrick request I open up this in issue in this [email 
 thread|http://n2.nabble.com/ActiveMQ-is-now-using-ZooKeeper-td1573272.html]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-277) Define PATH_SEPARATOR


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-277:


Fix Version/s: (was: 3.4.0)
   3.5.0

not a blocker. Moving it out of 3.4 release.

 Define PATH_SEPARATOR
 -

 Key: ZOOKEEPER-277
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-277
 Project: ZooKeeper
  Issue Type: Improvement
  Components: c client, documentation, java client, server, tests
Reporter: Nitay Joffe
Priority: Trivial
 Fix For: 3.5.0


 We should define a constant for PATH_SEPARATOR = / and use that throughout 
 the code rather than the hardcoded /. Users can be told to use this 
 constant to be safe in case of future changes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-860) Add alternative search-provider to ZK site


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-860:


Fix Version/s: (was: 3.4.0)
   3.5.0

not a blocker. Moving it out of 3.4 release.

 Add alternative search-provider to ZK site
 --

 Key: ZOOKEEPER-860
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-860
 Project: ZooKeeper
  Issue Type: Improvement
  Components: documentation
Reporter: Alex Baranau
Assignee: Alex Baranau
Priority: Minor
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-860.patch


 Use search-hadoop.com service to make available search in ZK sources, MLs, 
 wiki, etc.
 This was initially proposed on user mailing list 
 (http://search-hadoop.com/m/sTZ4Y1BVKWg1). The search service was already 
 added in site's skin (common for all Hadoop related projects) before (as a 
 part of [AVRO-626|https://issues.apache.org/jira/browse/AVRO-626]) so this 
 issue is about enabling it for ZK. The ultimate goal is to use it at all 
 Hadoop's sub-projects' sites.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-522) zookeeper client should throttle if its not able to connect to any of the servers.


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-522:


Fix Version/s: (was: 3.4.0)
   3.5.0

not a blocker. Moving it out of 3.4 release.

 zookeeper client should throttle if its not able to connect to any of the 
 servers.
 --

 Key: ZOOKEEPER-522
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-522
 Project: ZooKeeper
  Issue Type: Improvement
Affects Versions: 3.2.0
Reporter: Mahadev konar
 Fix For: 3.5.0


 Currently the zookeeper client library keeps connecting to servers if all of 
 them are unreachable. It will go through the list time and again and try to 
 connect. Sometimes, this might cause problems like too many clients retrying 
 connect to servers (and there might be something wrong/delay with servers) 
 wherein the clients will give up and will try reconnecting to other servers. 
 This causes a huge churn in client connections sometimes leading to the 
 zookeeper server running out of file handles.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-1005) Zookeeper servers fail to elect a leader succesfully.


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1005:
-

Fix Version/s: (was: 3.4.0)
   3.5.0

not a blocker. Moving it out of 3.4 release.

 Zookeeper servers fail to elect a leader succesfully.
 -

 Key: ZOOKEEPER-1005
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1005
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.2.2
 Environment: zookeeper-3.2.2; debian
Reporter: Alexandre Hardy
 Fix For: 3.5.0


 We were running 3 zookeeper servers, and simulated a failure on one of the 
 servers. 
 The one zookeeper node follows the other, but has trouble connecting. It 
 looks like the following exception is the cause:
 {noformat}
 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 INFO [zookeeper] --  
 [org.apache.zookeeper.server.quorum.QuorumPeer] FOLLOWING
 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 INFO [zookeeper] --  
 [org.apache.zookeeper.server.ZooKeeperServer] Created server
 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 INFO [zookeeper] --  
 [org.apache.zookeeper.server.quorum.Follower] Following 
 zookeeper3/192.168.131.11:2888
 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING [zookeeper] --  
 [org.apache.zookeeper.server.quorum.Follower] Unexpected exception, tries=0
 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING 
 java.net.ConnectException: --  Connection refused
 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING  -- at 
 java.net.PlainSocketImpl.socketConnect(Native Method)
 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING  -- at 
 java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:310)
 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING  -- at 
 java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:176)
 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING  -- at 
 java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:163)
 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING  -- at 
 java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING  -- at 
 java.net.Socket.connect(Socket.java:546)
 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING  -- at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:156)
 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING  -- at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:549)
 {noformat}
 The last exception while connecting was:
 {noformat}
 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR [zookeeper] --  
 [org.apache.zookeeper.server.quorum.Follower] Unexpected exception
 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR java.net.ConnectException: -- 
  Connection refused
 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR  -- at 
 java.net.PlainSocketImpl.socketConnect(Native Method)
 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR  -- at 
 java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:310)
 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR  -- at 
 java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:176)
 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR  -- at 
 java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:163)
 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR  -- at 
 java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR  -- at 
 java.net.Socket.connect(Socket.java:546)
 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR  -- at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:156)
 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR  -- at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:549)
 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 WARNING [zookeeper] --  
 [org.apache.zookeeper.server.quorum.Follower] Exception when following the 
 leader
 {noformat}
 The leader started leading a bit later 
 {noformat}
 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] --  
 [org.apache.zookeeper.server.quorum.FastLeaderElection] Notification: 0, 
 94489312534, 25, 2, LOOKING, LOOKING, 0
 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] --  
 [org.apache.zookeeper.server.quorum.FastLeaderElection] Adding vote
 2011-03-01T14:02:32+02:00 e0-cb-4e-65-4d-7d WARNING [zookeeper] --  
 [org.apache.zookeeper.server.quorum.QuorumCnxManager] Cannot open channel to 
 1 at election address zookeeper2/192.168.132.10:3888
 2011-03-01T14:02:32+02:00 e0-cb-4e-65-4d-7d WARNING  -- at 
 org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:323)
 2011-03-01T14:02:50+02:00

[jira] [Updated] (ZOOKEEPER-823) update ZooKeeper java client to optionally use Netty for connections


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-823:


Fix Version/s: (was: 3.4.0)
   3.5.0

not a blocker. Moving it out of 3.4 release.

 update ZooKeeper java client to optionally use Netty for connections
 

 Key: ZOOKEEPER-823
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-823
 Project: ZooKeeper
  Issue Type: New Feature
  Components: java client
Reporter: Patrick Hunt
Assignee: Patrick Hunt
 Fix For: 3.5.0

 Attachments: NettyNettySuiteTest.rtf, 
 TEST-org.apache.zookeeper.test.NettyNettySuiteTest.txt.gz, 
 ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, 
 ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, 
 ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, 
 ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, 
 ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, testDisconnectedAddAuth_FAILURE, 
 testWatchAutoResetWithPending_FAILURE


 This jira will port the client side connection code to use netty rather than 
 direct nio.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049669#comment-13049669
 ] 

Ted Dunning commented on ZOOKEEPER-965:
---

I can take a quick look, but I am having trouble getting reliable net access.

On Wed, Jun 15, 2011 at 6:29 AM, Marshall McMullen (JIRA)


 Need a multi-update command to allow multiple znodes to be updated safely
 -

 Key: ZOOKEEPER-965
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch


 The basic idea is to have a single method called multi that will accept a 
 list of create, delete, update or check objects each of which has a desired 
 version or file state in the case of create.  If all of the version and 
 existence constraints can be satisfied, then all updates will be done 
 atomically.
 Two API styles have been suggested.  One has a list as above and the other 
 style has a Transaction that allows builder-like methods to build a set of 
 updates and a commit method to finalize the transaction.  This can trivially 
 be reduced to the first kind of API so the list based API style should be 
 considered the primitive and the builder style should be implemented as 
 syntactic sugar.
 The total size of all the data in all updates and creates in a single 
 transaction should be limited to 1MB.
 Implementation-wise this capability can be done using standard ZK internals.  
 The changes include:
 - update to ZK clients to all the new call
 - additional wire level request
 - on the server, in the code that converts transactions to idempotent form, 
 the code should be slightly extended to convert a list of operations to 
 idempotent form.
 - on the client, a down-rev server that rejects the multi-update should be 
 detected gracefully and an informative exception should be thrown.
 To facilitate shared development, I have established a github repository at 
 https://github.com/tdunning/zookeeper  and am happy to extend committer 
 status to anyone who agrees to donate their code back to Apache.  The final 
 patch will be attached to this bug as normal.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

[
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049670#comment-13049670
]

Ted Dunning commented on ZOOKEEPER-965:
---

OK. As a first step, I rebased our changes to current trunk.

This will require the usual recheckout due to non-fast-forward operations.

Now to the problems you are seeing.

Need a multi-update command to allow multiple znodes to be updated safely
-

Key: ZOOKEEPER-965
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
Project: ZooKeeper
Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
Fix For: 3.4.0

Attachments: ZOOKEEPER-965.patch, ZOOKEEPER-965.patch,
ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch,
ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch,
ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch,
ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch,
ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch,
ZOOKEEPER-965.patch, ZOOKEEPER-965.patch

The basic idea is to have a single method called multi that will accept a
list of create, delete, update or check objects each of which has a desired
version or file state in the case of create. If all of the version and
existence constraints can be satisfied, then all updates will be done
atomically.
Two API styles have been suggested. One has a list as above and the other
style has a Transaction that allows builder-like methods to build a set of
updates and a commit method to finalize the transaction. This can trivially
be reduced to the first kind of API so the list based API style should be
considered the primitive and the builder style should be implemented as
syntactic sugar.
The total size of all the data in all updates and creates in a single
transaction should be limited to 1MB.
Implementation-wise this capability can be done using standard ZK internals.
The changes include:
- update to ZK clients to all the new call
- additional wire level request
- on the server, in the code that converts transactions to idempotent form,
the code should be slightly extended to convert a list of operations to
idempotent form.
- on the client, a down-rev server that rejects the multi-update should be
detected gracefully and an informative exception should be thrown.
To facilitate shared development, I have established a github repository at
https://github.com/tdunning/zookeeper and am happy to extend committer
status to anyone who agrees to donate their code back to Apache. The final
patch will be attached to this bug as normal.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

[
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049673#comment-13049673
]

Ted Dunning commented on ZOOKEEPER-965:
---

I see a clean compile on my mac. Looks like I don't understand the problem.
I can't run all the tests just now, but last time I looked they ran.

BuzzBook-Pro:zookeeper[trunk*]$ git checkout multi
Switched to branch 'multi'
BuzzBook-Pro:zookeeper[multi*]$ ant clean
Buildfile: /Users/tdunning/Apache/zookeeper/build.xml
...
clean:
BUILD SUCCESSFUL
Total time: 0 seconds
BuzzBook-Pro:zookeeper[multi*]$ ant compile
...
version-info:
[java] Unknown REVISION number, using -1
...
[javac] Compiling 52 source files to
/Users/tdunning/Apache/zookeeper/build/classes
...
[javac] Compiling 134 source files to
/Users/tdunning/Apache/zookeeper/build/classes
BUILD SUCCESSFUL
Total time: 11 seconds
BuzzBook-Pro:zookeeper[multi*]$

On Wed, Jun 15, 2011 at 10:01 AM, Ted Dunning (JIRA) j...@apache.org
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049670#comment-13049670]
ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch,
ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch,
ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch,
ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch,
ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch,
ZOOKEEPER-965.patch, ZOOKEEPER-965.patch
a list of create, delete, update or check objects each of which has a
desired version or file state in the case of create. If all of the version
and existence constraints can be satisfied, then all updates will be done
atomically.
other style has a Transaction that allows builder-like methods to build a
set of updates and a commit method to finalize the transaction. This can
trivially be reduced to the first kind of API so the list based API style
should be considered the primitive and the builder style should be
implemented as syntactic sugar.
transaction should be limited to 1MB.
internals. The changes include:
form, the code should be slightly extended to convert a list of operations
to idempotent form.
be detected gracefully and an informative exception should be thrown.
at https://github.com/tdunning/zookeeper and am happy to extend committer
status to anyone who agrees to donate their code back to Apache. The final
patch will be attached to this bug as normal.

Need a multi-update command to allow multiple znodes to be updated safely
-

Key: ZOOKEEPER-965
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
Project: ZooKeeper
Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
Fix For: 3.4.0

[jira] [Commented] (ZOOKEEPER-1080) Provide a Leader Election framework based on Zookeeper receipe

2011-06-15 Thread Hari A V (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049725#comment-13049725
]

Hari A V commented on ZOOKEEPER-1080:
-

hi Sameer,

How about handling of Disconnected and Expired events from
Zookeeper?

Here in this case there will not be any exception propagated from the
Zookeper server, Instead it will notify through watcher as
KeeperState.Disconnected (0).
Please see the following case:
Let's say Process1 is Leader and process2 is Ready state. Now, the
network of Process1 goes down[Disconnected Event] for morethan sessiontimeout
period. Then Process2 will get the NodeDeleted event and becomes Active. So
finally Process1 Process2 both will be in Active state.
[Multiple Active processes will leads to inconsistencies if we use this
framework to provide HA for NameNode.]

Provide a Leader Election framework based on Zookeeper receipe
--

Key: ZOOKEEPER-1080
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1080
Project: ZooKeeper
Issue Type: New Feature
Components: contrib
Affects Versions: 3.3.2
Reporter: Hari A V
Attachments: LeaderElectionService.pdf, ZOOKEEPER-1080.patch,
zkclient-0.1.0.jar, zookeeper-leader-0.0.1.tar.gz

Currently Hadoop components such as NameNode and JobTracker are single point
of failure.
If Namenode or JobTracker goes down, there service will not be available
until they are up and running again. If there was a Standby Namenode or
JobTracker available and ready to serve when Active nodes go down, we could
have reduced the service down time. Hadoop already provides a Standby
Namenode implementation which is not fully a hot Standby.
The common problem to be addressed in any such Active-Standby cluster is
Leader Election and Failure detection. This can be done using Zookeeper as
mentioned in the Zookeeper recipes.
http://zookeeper.apache.org/doc/r3.3.3/recipes.html
+Leader Election Service (LES)+
Any Node who wants to participate in Leader Election can use this service.
They should start the service with required configurations. The service will
notify the nodes whether they should be started as Active or Standby mode.
Also they intimate any changes in the mode at runtime. All other complexities
can be handled internally by the LES.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

2011-06-15 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049774#comment-13049774
 ] 

Camille Fournier commented on ZOOKEEPER-1046:
-

It's ok with me to make the change and ignore deletes.

 Creating a new sequential node results in a ZNODEEXISTS error
 -

 Key: ZOOKEEPER-1046
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.3.2, 3.3.3
 Environment: A 3 node-cluster running Debian squeeze.
Reporter: Jeremy Stribling
Assignee: Vishal K
Priority: Blocker
  Labels: sequence
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, 
 ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz


 On several occasions, I've seen a create() with the sequential flag set fail 
 with a ZNODEEXISTS error, and I don't think that should ever be possible.  In 
 past runs, I've been able to closely inspect the state of the system with the 
 command line client, and saw that the parent znode's cversion is smaller than 
 the sequential number of existing children znode under that parent.  In one 
 example:
 {noformat}
 [zk:ip:port(CONNECTED) 3] stat /zkrsm
 cZxid = 0x5
 ctime = Mon Jan 17 18:28:19 PST 2011
 mZxid = 0x5
 mtime = Mon Jan 17 18:28:19 PST 2011
 pZxid = 0x1d819
 cversion = 120710
 dataVersion = 0
 aclVersion = 0
 ephemeralOwner = 0x0
 dataLength = 0
 numChildren = 2955
 {noformat}
 However, the znode /zkrsm/002d_record120804 existed on disk.
 In a recent run, I was able to capture the Zookeeper logs, and I will attach 
 them to this JIRA.  The logs are named as nodeX.zxid_prefixes.log, and each 
 new log represents an application process restart.
 Here's the scenario:
 # There's a cluster with nodes 1,2,3 using zxid 0x3.
 # All three nodes restart, forming a cluster of zxid 0x4.
 # Node 3 restarts, leading to a cluster of 0x5.
 At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its 
 log (node1.0x4-0x5.log) you can see the first (of many) instances of the 
 following message:
 {noformat}
 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO 
 org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level 
 KeeperException when processing sessionid:0x512f466bd44e0002 type:create 
 cxid:0x4da376ab zxid:0xfffe txntype:unknown reqpath:n/a Error 
 Path:/zkrsm/00b2_record0001761440 Error:KeeperErrorCode = 
 NodeExists for /zkrsm/00b2_record0001761440
 {noformat}
 This then repeats forever as my application isn't expecting to ever get this 
 error message on a sequential node create, and just continually retries.  The 
 message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes 
 into play.
 I don't see anything terribly fishy in the transition between the epochs; the 
 correct snapshots seem to be getting transferred, etc.  Unfortunately I don't 
 have a ZK snapshot/log that exhibits the problem when starting with a fresh 
 system.
 Some oddities you might notice in these logs:
 * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a 
 bug in our application code.  (They are assigned randomly, but are supposed 
 to be consistent across restarts.)
 * We manage node membership dynamically, and our application restarts the 
 ZooKeeperServer classes whenever a new node wants to join (without restarting 
 the entire application process).  This is why you'll see messages like the 
 following in node1.0x4-0x5.log before a new election begins:
 {noformat}
 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO 
 org.apache.zookeeper.server.quorum.Learner  - shutdown called
 {noformat}
 * There is in fact one of these dynamic membership changes in 
 node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how 
 this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-06-15 Thread Marshall McMullen (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049797#comment-13049797
 ] 

Marshall McMullen commented on ZOOKEEPER-965:
-

Ted, thanks for taking a look at this. Not sure if you noticed this, but I 
checked in a change to github to fix the compile error I was seeing. I just 
wanted you to look at it and see if/why the fix was necessary

 Need a multi-update command to allow multiple znodes to be updated safely
 -

 Key: ZOOKEEPER-965
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch


 The basic idea is to have a single method called multi that will accept a 
 list of create, delete, update or check objects each of which has a desired 
 version or file state in the case of create.  If all of the version and 
 existence constraints can be satisfied, then all updates will be done 
 atomically.
 Two API styles have been suggested.  One has a list as above and the other 
 style has a Transaction that allows builder-like methods to build a set of 
 updates and a commit method to finalize the transaction.  This can trivially 
 be reduced to the first kind of API so the list based API style should be 
 considered the primitive and the builder style should be implemented as 
 syntactic sugar.
 The total size of all the data in all updates and creates in a single 
 transaction should be limited to 1MB.
 Implementation-wise this capability can be done using standard ZK internals.  
 The changes include:
 - update to ZK clients to all the new call
 - additional wire level request
 - on the server, in the code that converts transactions to idempotent form, 
 the code should be slightly extended to convert a list of operations to 
 idempotent form.
 - on the client, a down-rev server that rejects the multi-update should be 
 detected gracefully and an informative exception should be thrown.
 To facilitate shared development, I have established a github repository at 
 https://github.com/tdunning/zookeeper  and am happy to extend committer 
 status to anyone who agrees to donate their code back to Apache.  The final 
 patch will be attached to this bug as normal.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

2011-06-15 Thread Flavio Junqueira (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049814#comment-13049814
]

Flavio Junqueira commented on ZOOKEEPER-1046:
-

If cversion counts the number of created children, we can always learn the
number of deleted children by subtracting the number of current children from
cversion, no? I was also wondering if there is any use case you're aware of in
which it needs to have both counted.

So far the proposal of counting only creations seems good to me.

Creating a new sequential node results in a ZNODEEXISTS error
-

Key: ZOOKEEPER-1046
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
Project: ZooKeeper
Issue Type: Bug
Components: server
Affects Versions: 3.3.2, 3.3.3
Environment: A 3 node-cluster running Debian squeeze.
Reporter: Jeremy Stribling
Assignee: Vishal K
Priority: Blocker
Labels: sequence
Fix For: 3.4.0

Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch,
ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz

On several occasions, I've seen a create() with the sequential flag set fail
with a ZNODEEXISTS error, and I don't think that should ever be possible. In
past runs, I've been able to closely inspect the state of the system with the
command line client, and saw that the parent znode's cversion is smaller than
the sequential number of existing children znode under that parent. In one
example:
{noformat}
[zk:ip:port(CONNECTED) 3] stat /zkrsm
cZxid = 0x5
ctime = Mon Jan 17 18:28:19 PST 2011
mZxid = 0x5
mtime = Mon Jan 17 18:28:19 PST 2011
pZxid = 0x1d819
cversion = 120710
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 0
numChildren = 2955
{noformat}
However, the znode /zkrsm/002d_record120804 existed on disk.
In a recent run, I was able to capture the Zookeeper logs, and I will attach
them to this JIRA. The logs are named as nodeX.zxid_prefixes.log, and each
new log represents an application process restart.
Here's the scenario:
# There's a cluster with nodes 1,2,3 using zxid 0x3.
# All three nodes restart, forming a cluster of zxid 0x4.
# Node 3 restarts, leading to a cluster of 0x5.
At this point, it seems like node 1 is the leader of the 0x5 epoch. In its
log (node1.0x4-0x5.log) you can see the first (of many) instances of the
following message:
{noformat}
2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO
org.apache.zookeeper.server.PrepRequestProcessor - Got user-level
KeeperException when processing sessionid:0x512f466bd44e0002 type:create
cxid:0x4da376ab zxid:0xfffe txntype:unknown reqpath:n/a Error
Path:/zkrsm/00b2_record0001761440 Error:KeeperErrorCode =
NodeExists for /zkrsm/00b2_record0001761440
{noformat}
This then repeats forever as my application isn't expecting to ever get this
error message on a sequential node create, and just continually retries. The
message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes
into play.
I don't see anything terribly fishy in the transition between the epochs; the
correct snapshots seem to be getting transferred, etc. Unfortunately I don't
have a ZK snapshot/log that exhibits the problem when starting with a fresh
system.
Some oddities you might notice in these logs:
* Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a
bug in our application code. (They are assigned randomly, but are supposed
to be consistent across restarts.)
* We manage node membership dynamically, and our application restarts the
ZooKeeperServer classes whenever a new node wants to join (without restarting
the entire application process). This is why you'll see messages like the
following in node1.0x4-0x5.log before a new election begins:
{noformat}
2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO
org.apache.zookeeper.server.quorum.Learner - shutdown called
{noformat}
* There is in fact one of these dynamic membership changes in
node1.0x4-0x5.log, just before the 0x4 epoch is formed. I'm not sure how
this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

2011-06-15 Thread Benjamin Reed (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049817#comment-13049817
]

Benjamin Reed commented on ZOOKEEPER-1046:
--

nice observation flavio! i haven't seen anyone using cversion outside of the
sequence number on sequence znodes.

Creating a new sequential node results in a ZNODEEXISTS error
-

Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch,
ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-723) ephemeral parent znodes

2011-06-15 Thread Camille Fournier (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049820#comment-13049820
]

Camille Fournier commented on ZOOKEEPER-723:

On the one hand I really would like this feature, but on the other hand I do
not like the idea of having one of these created with no children and then
floating out there for some indefinite period of time until someone finally
decides to create children under it. It seems confusing and hard to manage from
a client perspective.

All of my use cases would be completely satisfied with the nodes as real
ephemeral, aka, session-based and only allowing children that are ephemeral
containers/nodes from the same session. I'm curious to think of a really
compelling use case where I would want this to cross sessions, and the email
thread did not seem to provide one. Why don't we want this to be true ephemeral?

ephemeral parent znodes
---

Key: ZOOKEEPER-723
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-723
Project: ZooKeeper
Issue Type: New Feature
Components: server
Reporter: Benjamin Reed
Assignee: Daniel Gómez Ferro
Attachments: ZOOKEEPER-723.patch

ephemeral znodes have the nice property of automatically cleaning up after
themselves when the creator goes away, but since they can't have children it
is hard to build subtrees that will cleanup after the clients that are using
them are gone.
rather than changing the semantics of ephemeral nodes, i propose ephemeral
parents: znodes that disappear when they have no more children. this cleanup
would happen automatically when the last child is removed. an ephemeral
parent is not tied to any particular session, so even if the creator goes
away, the ephemeral parent will remain as long as there are children.
the when an ephemeral parent is created it will have an initial child, so
that it doesn't get immediately removed. i think this child should be an
ephemeral znode with a predefined name, firstChild.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-723) ephemeral parent znodes

2011-06-15 Thread Benjamin Reed (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049831#comment-13049831
]

Benjamin Reed commented on ZOOKEEPER-723:
-

yeah i don't like the floating out there indefinitely with no children part
either.

one use case for allowing different session is the barrier like case in which
you want to find out when everyone is done using a resource: you create a
parent znode, /myresource, with a child called available. processes that use
the resource will create children under /myresource. when the resource manager
wants to stop providing the resource, it removes /myresource/available and then
watches for /myresource to disappear.

ephemeral parent znodes
---

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (ZOOKEEPER-1095) Simple leader election recipe

2011-06-15 Thread Henry Robinson (JIRA)

Simple leader election recipe
-

 Key: ZOOKEEPER-1095
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1095
 Project: ZooKeeper
  Issue Type: Improvement
Reporter: Henry Robinson


Leader election recipe originally contributed to ZOOKEEPER-1080.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1080) Provide a Leader Election framework based on Zookeeper receipe

2011-06-15 Thread Henry Robinson (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049887#comment-13049887
]

Henry Robinson commented on ZOOKEEPER-1080:
---

What we've got here are two different, but equally valid, approaches to
building leader election. Since this isn't a core framework issue, we're not
making a decision that everyone has to live with. Therefore there's no need for
the committers to play kingmaker by only committing one of these patches. We've
got room for both, just not on this JIRA.

Here's what I suggest we do.

* Eric - I've opened ZOOKEEPER-1095 for your contribution. Can you attach your
recipe (as a diff, with copyright headers) to that ticket, and we'll work on
getting it committed there?
* Hari - leave your patch here, and one of the committers will do a code review
shortly.

Provide a Leader Election framework based on Zookeeper receipe
--

Key: ZOOKEEPER-1080
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1080
Project: ZooKeeper
Issue Type: New Feature
Components: contrib
Affects Versions: 3.3.2
Reporter: Hari A V
Fix For: 3.3.2

Attachments: LeaderElectionService.pdf, ZOOKEEPER-1080.patch,
zkclient-0.1.0.jar, zookeeper-leader-0.0.1.tar.gz

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-723) ephemeral parent znodes

2011-06-15 Thread Camille Fournier (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049895#comment-13049895
]

Camille Fournier commented on ZOOKEEPER-723:

That would be ok.

Now my next question is, would we ever want to have
non-ephemeral/ephemeral-container children of an ephemeral container?

ephemeral parent znodes
---

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-723) ephemeral parent znodes

[
https://issues.apache.org/jira/browse/ZOOKEEPER-723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049931#comment-13049931
]

Patrick Hunt commented on ZOOKEEPER-723:

Wow, my brain is in an infinite loop now. ;-)

Yes, iirc that's one of the issues we had touched on way back when... and one
of the reasons why we kept it simply that ephemeral nodes couldn't have
children. I seem to also remember another related issue, that once you start
allowing arbitrarily large ephemeral trees to be built there was a concern
about cleanup and it's effect on availability of the system as a whole. (still
a concern I would have)

note: If this znode really is ephemeral (strongly tied to the session lifetime)
I don't have a problem calling it as such. however it the znode can live beyond
the session lifetime that created it then Flavio's suggestion of solitary
sounds good to me. (was that a typo or did you really mean solidary?)

ephemeral parent znodes
---

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

[
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049953#comment-13049953
]

Ted Dunning commented on ZOOKEEPER-965:
---

Ahhh... no.

I didn't notice that. I will take a look.

On Wed, Jun 15, 2011 at 4:45 PM, Marshall McMullen (JIRA)

Need a multi-update command to allow multiple znodes to be updated safely
-

Key: ZOOKEEPER-965
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
Project: ZooKeeper
Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
Fix For: 3.4.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

[
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049955#comment-13049955
]

Ted Dunning commented on ZOOKEEPER-965:
---

Marshall,

I just tried with and without your patch. It compiles either way.

My feeling is that excessive throws declarations are bad juju anyway so the
current state (with your change)
is better than the previous state (with the extra throws in processTxn).

I would leave it as is.

Need a multi-update command to allow multiple znodes to be updated safely
-

Key: ZOOKEEPER-965
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
Project: ZooKeeper
Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
Fix For: 3.4.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (ZOOKEEPER-1096) Leader communication should listen on specified IP, not wildcard address


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt reassigned ZOOKEEPER-1096:
---

Assignee: Jared Cantwell

 Leader communication should listen on specified IP, not wildcard address
 

 Key: ZOOKEEPER-1096
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1096
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.3.3, 3.4.0
Reporter: Jared Cantwell
Assignee: Jared Cantwell
Priority: Minor
 Attachments: ZOOKEEPER-1096.patch


 Server should specify the local address that is used for leader communication 
 (and not use the default of listening on all interfaces).  This is similar to 
 the clientPortAddress parameter that was added a year ago.  After reviewing 
 the code, we can't think of a reason why only the port would be used with the 
 wildcard interface, when servers are already connecting specifically to that 
 interface anyway.
 I have submitted a patch, but it does not account for all leader election 
 algorithms.
 Probably should have an option to toggle this, for backwards compatibility, 
 although it seems like it would be a bug if this change broke things.
 There is some more information about making it an option here:
 http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-dev/201008.mbox/%3CAANLkTikkT97Djqt3CU=h2+7gnj_4p28hgcxjh345h...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1096) Leader communication should listen on specified IP, not wildcard address


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049980#comment-13049980
 ] 

Patrick Hunt commented on ZOOKEEPER-1096:
-

with git use --no-prefix when creating the patch.

 Leader communication should listen on specified IP, not wildcard address
 

 Key: ZOOKEEPER-1096
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1096
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.3.3, 3.4.0
Reporter: Jared Cantwell
Assignee: Jared Cantwell
Priority: Minor
 Attachments: ZOOKEEPER-1096.patch


 Server should specify the local address that is used for leader communication 
 (and not use the default of listening on all interfaces).  This is similar to 
 the clientPortAddress parameter that was added a year ago.  After reviewing 
 the code, we can't think of a reason why only the port would be used with the 
 wildcard interface, when servers are already connecting specifically to that 
 interface anyway.
 I have submitted a patch, but it does not account for all leader election 
 algorithms.
 Probably should have an option to toggle this, for backwards compatibility, 
 although it seems like it would be a bug if this change broke things.
 There is some more information about making it an option here:
 http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-dev/201008.mbox/%3CAANLkTikkT97Djqt3CU=h2+7gnj_4p28hgcxjh345h...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1096) Leader communication should listen on specified IP, not wildcard address

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049982#comment-13049982
]

Patrick Hunt commented on ZOOKEEPER-1096:
-

bq. Probably should have an option to toggle this, for backwards compatibility,
although it seems like it would be a bug if this change broke things.

I agree on both counts.

Leader communication should listen on specified IP, not wildcard address

Key: ZOOKEEPER-1096
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1096
Project: ZooKeeper
Issue Type: Improvement
Components: server
Affects Versions: 3.3.3, 3.4.0
Reporter: Jared Cantwell
Assignee: Jared Cantwell
Priority: Minor
Attachments: ZOOKEEPER-1096.patch

Server should specify the local address that is used for leader communication
(and not use the default of listening on all interfaces). This is similar to
the clientPortAddress parameter that was added a year ago. After reviewing
the code, we can't think of a reason why only the port would be used with the
wildcard interface, when servers are already connecting specifically to that
interface anyway.
I have submitted a patch, but it does not account for all leader election
algorithms.
Probably should have an option to toggle this, for backwards compatibility,
although it seems like it would be a bug if this change broke things.
There is some more information about making it an option here:
http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-dev/201008.mbox/%3CAANLkTikkT97Djqt3CU=h2+7gnj_4p28hgcxjh345h...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-1096) Leader communication should listen on specified IP, not wildcard address

2011-06-15 Thread Jared Cantwell (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jared Cantwell updated ZOOKEEPER-1096:
--

Attachment: ZOOKEEPER-1096.patch

Fixed some prefixes.

 Leader communication should listen on specified IP, not wildcard address
 

 Key: ZOOKEEPER-1096
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1096
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.3.3, 3.4.0
Reporter: Jared Cantwell
Assignee: Jared Cantwell
Priority: Minor
 Attachments: ZOOKEEPER-1096.patch, ZOOKEEPER-1096.patch


 Server should specify the local address that is used for leader communication 
 (and not use the default of listening on all interfaces).  This is similar to 
 the clientPortAddress parameter that was added a year ago.  After reviewing 
 the code, we can't think of a reason why only the port would be used with the 
 wildcard interface, when servers are already connecting specifically to that 
 interface anyway.
 I have submitted a patch, but it does not account for all leader election 
 algorithms.
 Probably should have an option to toggle this, for backwards compatibility, 
 although it seems like it would be a bug if this change broke things.
 There is some more information about making it an option here:
 http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-dev/201008.mbox/%3CAANLkTikkT97Djqt3CU=h2+7gnj_4p28hgcxjh345h...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Zookeeper code question

2011-06-15 Thread Vishal Kathuria

Hi Folks,
I am looking at the code in CommitProcessor and I had a couple of questions.


1.   When a request is ready to be processed, it goes into the toProcess 
list. Then subsequently, it is taken out of that list and we call 
nextProcessor.processRequest(toProcess.get(i)). Why does this intermediate 
toProcess list exist? Why couldn't we call 
nextProcessor.processRequest(toProcess.get(i)) directly wherever 
toProcess.add(r) is called? I gave it some thought and couldn't figure out a 
correctness issue either way.

2.   There are a couple of data structures that are accessed by multiple 
threads but are not synchronized - they are LinkedListRequest 
queuedWriteRequests, committedRequests. That looks like a bug. (or pls let me 
know if I am missing something).

Thanks!
Vishal

[jira] [Commented] (ZOOKEEPER-1096) Leader communication should listen on specified IP, not wildcard address

2011-06-15 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050006#comment-13050006
]

Hadoop QA commented on ZOOKEEPER-1096:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12482704/ZOOKEEPER-1096.patch
against trunk revision 1135515.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified
tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/318//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/318//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/318//console

This message is automatically generated.

Leader communication should listen on specified IP, not wildcard address

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-522) zookeeper client should throttle if its not able to connect to any of the servers.

2011-06-15 Thread Vishal Kathuria (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050007#comment-13050007
 ] 

Vishal Kathuria commented on ZOOKEEPER-522:
---

Thanks for opening this Jira Mahadev,
I haven't looked at the code - does this happen for both C and Java clients?

 zookeeper client should throttle if its not able to connect to any of the 
 servers.
 --

 Key: ZOOKEEPER-522
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-522
 Project: ZooKeeper
  Issue Type: Improvement
Affects Versions: 3.2.0
Reporter: Mahadev konar
 Fix For: 3.5.0


 Currently the zookeeper client library keeps connecting to servers if all of 
 them are unreachable. It will go through the list time and again and try to 
 connect. Sometimes, this might cause problems like too many clients retrying 
 connect to servers (and there might be something wrong/delay with servers) 
 wherein the clients will give up and will try reconnecting to other servers. 
 This causes a huge churn in client connections sometimes leading to the 
 zookeeper server running out of file handles.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1096) Leader communication should listen on specified IP, not wildcard address

2011-06-15 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050022#comment-13050022
]

Hadoop QA commented on ZOOKEEPER-1096:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12482704/ZOOKEEPER-1096.patch
against trunk revision 1135515.

+1 @author. The patch does not contain any @author tags.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/319//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/319//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/319//console

This message is automatically generated.

Leader communication should listen on specified IP, not wildcard address

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

2011-06-15 Thread Benjamin Reed (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050078#comment-13050078
]

Benjamin Reed commented on ZOOKEEPER-1046:
--

two clarifying points:

* this is not for 3.3. this would be a 3.4 change. we will stick with camille's
fix for 3.3

* we never get the cversion from the user. you can't do conditional ops with it
or pass it in any of the calls.

Creating a new sequential node results in a ZNODEEXISTS error
-

Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch,
ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error