[jira] [Commented] (ZOOKEEPER-2469) infinite loop in ZK re-login
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367064#comment-15367064 ] Mahadev konar commented on ZOOKEEPER-2469: -- [~sershe] done. > infinite loop in ZK re-login > > > Key: ZOOKEEPER-2469 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2469 > Project: ZooKeeper > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > > {noformat} > int retry = 1; > while (retry >= 0) { > try { > reLogin(); > break; > } catch (LoginException le) { > if (retry > 0) { > --retry; > // sleep for 10 seconds. > try { > Thread.sleep(10 * 1000); > } catch (InterruptedException e) { > LOG.error("Interrupted during login > retry after LoginException:", le); > throw le; > } > } else { > LOG.error("Could not refresh TGT for > principal: " + principal + ".", le); > } > } > } > {noformat} > will retry forever. Should return like the one above -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2469) infinite loop in ZK re-login
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-2469: - Assignee: Sergey Shelukhin > infinite loop in ZK re-login > > > Key: ZOOKEEPER-2469 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2469 > Project: ZooKeeper > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > > {noformat} > int retry = 1; > while (retry >= 0) { > try { > reLogin(); > break; > } catch (LoginException le) { > if (retry > 0) { > --retry; > // sleep for 10 seconds. > try { > Thread.sleep(10 * 1000); > } catch (InterruptedException e) { > LOG.error("Interrupted during login > retry after LoginException:", le); > throw le; > } > } else { > LOG.error("Could not refresh TGT for > principal: " + principal + ".", le); > } > } > } > {noformat} > will retry forever. Should return like the one above -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [VOTE] Apache ZooKeeper release 3.5.0-alpha candidate 0
+1 - downloaded the bits and ran some tests. Looks good to go. thanks mahadev Mahadev Konar Hortonworks Inc. http://hortonworks.com/ On Mon, Aug 4, 2014 at 5:48 PM, Camille Fournier wrote: > +1 started up a server from the jar, ran some basic tests. > > > On Mon, Aug 4, 2014 at 6:21 PM, Jian Huang wrote: > > > On Mon, Aug 4, 2014 at 3:17 PM, Flavio Junqueira < > > fpjunque...@yahoo.com.invalid> wrote: > > > > > +1, ran tests, checked files and signatures, ran some quorum tests > > > including reconfigurations. lgtm! > > > > > > -Flavio > > > > > > On 02 Aug 2014, at 00:08, Patrick Hunt wrote: > > > > > > > This is a release candidate for 3.5.0-alpha. > > > > > > > > The full release notes is available at: > > > > > > > > > > > > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12316644&projectId=12310801 > > > > > > > > *** Please download, test and vote, I expect the vote to run for a > > > > minimum of 72 hours from the time this email was sent *** > > > > > > > > Source files: > > > > http://people.apache.org/~phunt/zookeeper-3.5.0-alpha-candidate-0/ > > > > > > > > Maven staging repo: > > > > > > > > > > https://repository.apache.org/content/groups/staging/org/apache/zookeeper/zookeeper/3.5.0-alpha/ > > > > > > > > The tag to be voted upon: > > > > https://svn.apache.org/repos/asf/zookeeper/tags/release-3.5.0-rc0 > > > > > > > > ZooKeeper's KEYS file containing PGP keys we use to sign the release: > > > > > > > > http://www.apache.org/dist/zookeeper/KEYS > > > > > > > > Should we release this candidate? > > > > > > > > Patrick > > > > > > > > > -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Updated] (ZOOKEEPER-1575) adding .gitattributes to prevent CRLF and LF mismatches for source and text files
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-1575: - Fix Version/s: 3.5.0 > adding .gitattributes to prevent CRLF and LF mismatches for source and text > files > - > > Key: ZOOKEEPER-1575 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1575 > Project: ZooKeeper > Issue Type: Bug >Reporter: Raja Aluri >Assignee: Raja Aluri > Fix For: 3.5.0 > > Attachments: ZOOKEEPER-1575.trunk.patch > > > adding .gitattributes to prevent CRLF and LF mismatches for source and text > files -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1848) [WINDOWS] Java NIO socket channels does not work with Windows ipv6 on JDK6
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13954961#comment-13954961 ] Mahadev konar commented on ZOOKEEPER-1848: -- +1 for the patch. Rerunning it through jenkins again. > [WINDOWS] Java NIO socket channels does not work with Windows ipv6 on JDK6 > -- > > Key: ZOOKEEPER-1848 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1848 > Project: ZooKeeper > Issue Type: Bug >Reporter: Enis Soztutar >Assignee: Enis Soztutar > Fix For: 3.5.0 > > Attachments: zookeeper-1848_v1.patch, zookeeper-1848_v2.patch > > > ZK uses Java NIO to create ServerSorcket's from ServerSocketChannels. Under > windows, the ipv4 and ipv6 is implemented independently, and Java seems that > it cannot reuse the same socket channel for both ipv4 and ipv6 sockets. We > are getting "java.net.SocketException: Address family not supported by > protocol > family" exceptions. When, ZK client resolves "localhost", it gets both v4 > 127.0.0.1 and v6 ::1 address, but the socket channel cannot bind to both v4 > and v6. > The problem is reported as: > http://bugs.sun.com/view_bug.do?bug_id=6230761 > http://stackoverflow.com/questions/1357091/binding-an-ipv6-server-socket-on-windows > Although the JDK bug is reported as resolved, I have tested with jdk1.6.0_33 > without any success. Although JDK7 seems to have fixed this problem. > See HBASE-6825 for reference. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: [VOTE] Apache ZooKeeper release 3.4.6 candidate 0
+1 Verified the signatures and the artifacts. thanks mahadev Mahadev Konar Hortonworks Inc. http://hortonworks.com/ On Mon, Feb 24, 2014 at 12:20 PM, Michi Mutsuzaki wrote: > +1 > > ant test passed on ubuntu 12.04. > > On Sun, Feb 23, 2014 at 12:23 PM, Ted Yu wrote: >> I pointed HBase 0.98 at 3.4.6 RC0 in the staging repo. >> I ran through test suite and it passed: >> >> [INFO] BUILD SUCCESS >> [INFO] >> >> [INFO] Total time: 1:09:42.116s >> [INFO] Finished at: Sun Feb 23 19:21:04 UTC 2014 >> [INFO] Final Memory: 48M/503M >> >> Cheers >> >> >> On Sun, Feb 23, 2014 at 11:39 AM, Flavio Junqueira >> wrote: >> >>> This is a bugfix release candidate for 3.4.5. It fixes 117 issues, >>> including issues that affect >>> leader election, Zab, and SASL authentication. >>> >>> The full release notes is available at: >>> >>> >>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310801&version=12323310 >>> >>> *** Please download, test and vote by March 9th 2014, 23:59 UTC+0. *** >>> >>> Source files: >>> http://people.apache.org/~fpj/zookeeper-3.4.6-candidate-0/ >>> >>> Maven staging repo: >>> >>> https://repository.apache.org/content/groups/staging/org/apache/zookeeper/zookeeper/3.4.6/ >>> >>> The tag to be voted upon: >>> https://svn.apache.org/repos/asf/zookeeper/tags/release-3.4.6-rc0 >>> >>> ZooKeeper's KEYS file containing PGP keys we use to sign the release: >>> >>> http://www.apache.org/dist/zookeeper/KEYS >>> >>> Should we release this candidate? >>> >>> -Flavio -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Commented] (ZOOKEEPER-1667) Watch event isn't handled correctly when a client reestablish to a server
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13801581#comment-13801581 ] Mahadev konar commented on ZOOKEEPER-1667: -- +1 - the patch looks good to me. > Watch event isn't handled correctly when a client reestablish to a server > - > > Key: ZOOKEEPER-1667 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1667 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.3.6, 3.4.5 >Reporter: Jacky007 >Assignee: Flavio Junqueira >Priority: Blocker > Fix For: 3.4.6, 3.5.0 > > Attachments: ZOOKEEPER-1667-b3.4.patch, ZOOKEEPER-1667-b3.4.patch, > ZOOKEEPER-1667.patch, ZOOKEEPER-1667-r34.patch, ZOOKEEPER-1667-trunk.patch > > > When a client reestablish to a server, it will send the watches which have > not been triggered. But the code in DataTree does not handle it correctly. > It is obvious, we just do not notice it :) > scenario: > 1) Client a set a data watch on /d, then disconnect, client b delete /d and > create it again. When client a reestablish to zk, it will receive a > NodeCreated rather than a NodeDataChanged. > 2) Client a set a exists watch on /e(not exist), then disconnect, client b > create /e. When client a reestablish to zk, it will receive a NodeDataChanged > rather than a NodeCreated. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1646) mt c client tests fail on Ubuntu Raring
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798082#comment-13798082 ] Mahadev konar commented on ZOOKEEPER-1646: -- +1 for the patch. Nice catch Pat! > mt c client tests fail on Ubuntu Raring > --- > > Key: ZOOKEEPER-1646 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1646 > Project: ZooKeeper > Issue Type: Bug > Components: c client >Affects Versions: 3.4.5, 3.5.0 > Environment: Ubuntu 13.04 (raring), glibc 2.17 >Reporter: James Page >Assignee: Patrick Hunt >Priority: Blocker > Fix For: 3.4.6, 3.5.0 > > Attachments: ZOOKEEPER-1646.patch > > > Misc tests fail in the c client binding under the current Ubuntu development > release: > ./zktest-mt > ZooKeeper server startedRunning > Zookeeper_clientretry::testRetry ZooKeeper server started ZooKeeper server > started : elapsed 9315 : OK > Zookeeper_operations::testAsyncWatcher1 : assertion : elapsed 1054 > Zookeeper_operations::testAsyncGetOperation : assertion : elapsed 1055 > Zookeeper_operations::testOperationsAndDisconnectConcurrently1 : assertion : > elapsed 1066 > Zookeeper_operations::testOperationsAndDisconnectConcurrently2 : elapsed 0 : > OK > Zookeeper_operations::testConcurrentOperations1 : assertion : elapsed 1055 > Zookeeper_init::testBasic : elapsed 1 : OK > Zookeeper_init::testAddressResolution : elapsed 0 : OK > Zookeeper_init::testMultipleAddressResolution : elapsed 0 : OK > Zookeeper_init::testNullAddressString : elapsed 0 : OK > Zookeeper_init::testEmptyAddressString : elapsed 0 : OK > Zookeeper_init::testOneSpaceAddressString : elapsed 0 : OK > Zookeeper_init::testTwoSpacesAddressString : elapsed 0 : OK > Zookeeper_init::testInvalidAddressString1 : elapsed 0 : OK > Zookeeper_init::testInvalidAddressString2 : elapsed 175 : OK > Zookeeper_init::testNonexistentHost : elapsed 92 : OK > Zookeeper_init::testOutOfMemory_init : elapsed 0 : OK > Zookeeper_init::testOutOfMemory_getaddrs1 : elapsed 0 : OK > Zookeeper_init::testOutOfMemory_getaddrs2 : elapsed 1 : OK > Zookeeper_init::testPermuteAddrsList : elapsed 0 : OK > Zookeeper_close::testIOThreadStoppedOnExpire : assertion : elapsed 1056 > Zookeeper_close::testCloseUnconnected : elapsed 0 : OK > Zookeeper_close::testCloseUnconnected1 : elapsed 91 : OK > Zookeeper_close::testCloseConnected1 : assertion : elapsed 1056 > Zookeeper_close::testCloseFromWatcher1 : assertion : elapsed 1076 > Zookeeper_simpleSystem::testAsyncWatcherAutoReset ZooKeeper server started : > elapsed 12155 : OK > Zookeeper_simpleSystem::testDeserializeString : elapsed 0 : OK > Zookeeper_simpleSystem::testNullData : elapsed 1031 : OK > Zookeeper_simpleSystem::testIPV6 : elapsed 1005 : OK > Zookeeper_simpleSystem::testPath : elapsed 1024 : OK > Zookeeper_simpleSystem::testPathValidation : elapsed 1053 : OK > Zookeeper_simpleSystem::testPing : elapsed 17287 : OK > Zookeeper_simpleSystem::testAcl : elapsed 1019 : OK > Zookeeper_simpleSystem::testChroot : elapsed 3052 : OK > Zookeeper_simpleSystem::testAuth : assertion : elapsed 7010 > Zookeeper_simpleSystem::testHangingClient : elapsed 1015 : OK > Zookeeper_simpleSystem::testWatcherAutoResetWithGlobal ZooKeeper server > started ZooKeeper server started ZooKeeper server started : elapsed 20556 : OK > Zookeeper_simpleSystem::testWatcherAutoResetWithLocal ZooKeeper server > started ZooKeeper server started ZooKeeper server started : elapsed 20563 : OK > Zookeeper_simpleSystem::testGetChildren2 : elapsed 1041 : OK > Zookeeper_multi::testCreate : elapsed 1017 : OK > Zookeeper_multi::testCreateDelete : elapsed 1007 : OK > Zookeeper_multi::testInvalidVersion : elapsed 1011 : OK > Zookeeper_multi::testNestedCreate : elapsed 1009 : OK > Zookeeper_multi::testSetData : elapsed 6019 : OK > Zookeeper_multi::testUpdateConflict : elapsed 1014 : OK > Zookeeper_multi::testDeleteUpdateConflict : elapsed 1007 : OK > Zookeeper_multi::testAsyncMulti : elapsed 2001 : OK > Zookeeper_multi::testMultiFail : elapsed 1006 : OK > Zookeeper_multi::testCheck : elapsed 1020 : OK > Zookeeper_multi::testWatch : elapsed 2013 : OK > Zookeeper_watchers::testDefaultSessionWatcher1zktest-mt: > tests/ZKMocks.cc:271: SyncedBoolCondition > DeliverWatchersWrapper::isDelivered() const: Assertion `i<1000' failed. > Aborted (core dumped) > It would appear that the zookeeper connection does not transition to > connected within the required time; I increased the time allowed but no > change. > Ubuntu raring has glibc 2.17; the
[jira] [Created] (ZOOKEEPER-1791) ZooKeeper package includes unnecessary jars that are part of the package.
Mahadev konar created ZOOKEEPER-1791: Summary: ZooKeeper package includes unnecessary jars that are part of the package. Key: ZOOKEEPER-1791 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1791 Project: ZooKeeper Issue Type: Bug Components: build Affects Versions: 3.5.0 Reporter: Mahadev konar Assignee: Mahadev konar Fix For: 3.5.0 Attachments: ZOOKEEPER-1791.patch ZooKeeper package includes unnecessary jars that are part of the package. Packages like fatjar and {code} maven-ant-tasks-2.1.3.jar maven-artifact-2.2.1.jar maven-artifact-manager-2.2.1.jar maven-error-diagnostics-2.2.1.jar maven-model-2.2.1.jar maven-plugin-registry-2.2.1.jar maven-profile-2.2.1.jar maven-project-2.2.1.jar maven-repository-metadata-2.2.1.jar {code} are part of the zookeeper package and rpm (via bigtop). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (ZOOKEEPER-1791) ZooKeeper package includes unnecessary jars that are part of the package.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-1791: - Attachment: ZOOKEEPER-1791.patch > ZooKeeper package includes unnecessary jars that are part of the package. > - > > Key: ZOOKEEPER-1791 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1791 > Project: ZooKeeper > Issue Type: Bug > Components: build >Affects Versions: 3.5.0 > Reporter: Mahadev konar > Assignee: Mahadev konar > Fix For: 3.5.0 > > Attachments: ZOOKEEPER-1791.patch > > > ZooKeeper package includes unnecessary jars that are part of the package. > Packages like fatjar and > {code} > maven-ant-tasks-2.1.3.jar > maven-artifact-2.2.1.jar > maven-artifact-manager-2.2.1.jar > maven-error-diagnostics-2.2.1.jar > maven-model-2.2.1.jar > maven-plugin-registry-2.2.1.jar > maven-profile-2.2.1.jar > maven-project-2.2.1.jar > maven-repository-metadata-2.2.1.jar > {code} > are part of the zookeeper package and rpm (via bigtop). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-442) need a way to remove watches that are no longer of interest
[ https://issues.apache.org/jira/browse/ZOOKEEPER-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791557#comment-13791557 ] Mahadev konar commented on ZOOKEEPER-442: - Thanks Rakesh. Good to see the initiative. Ill read through the doc and get back to you. > need a way to remove watches that are no longer of interest > --- > > Key: ZOOKEEPER-442 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-442 > Project: ZooKeeper > Issue Type: New Feature >Reporter: Benjamin Reed >Assignee: Daniel Gómez Ferro >Priority: Critical > Fix For: 3.5.0 > > Attachments: Remove Watch API.pdf, ZOOKEEPER-442.patch, > ZOOKEEPER-442.patch, ZOOKEEPER-442.patch, ZOOKEEPER-442.patch, > ZOOKEEPER-442.patch, ZOOKEEPER-442.patch, ZOOKEEPER-442.patch > > > currently the only way a watch cleared is to trigger it. we need a way to > enumerate the outstanding watch objects, find watch events the objects are > watching for, and remove interests in an event. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets
[ https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13790067#comment-13790067 ] Mahadev konar commented on ZOOKEEPER-900: - [~phunt] I htink we can close this one in favor of another jira. > FLE implementation should be improved to use non-blocking sockets > - > > Key: ZOOKEEPER-900 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900 > Project: ZooKeeper > Issue Type: Bug >Reporter: Vishal Kher >Assignee: Vishal Kher >Priority: Critical > Fix For: 3.5.0 > > Attachments: ZOOKEEPER-900.patch, ZOOKEEPER-900.patch1, > ZOOKEEPER-900.patch2 > > > From earlier email exchanges: > 1. Blocking connects and accepts: > a) The first problem is in manager.toSend(). This invokes connectOne(), which > does a blocking connect. While testing, I changed the code so that > connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() > does a socketChannel.connect(). After starting AsyncConnect, connectOne > starts a timer. connectOne continues with normal operations if the connection > is established before the timer expires, otherwise, when the timer expires it > interrupts AsyncConnect() thread and returns. In this way, I can have an > upper bound on the amount of time we need to wait for connect to succeed. Of > course, this was a quick fix for my testing. Ideally, we should use Selector > to do non-blocking connects/accepts. I am planning to do that later once we > at least have a quick fix for the problem and consensus from others for the > real fix (this problem is big blocker for us). Note that it is OK to do > blocking IO in SenderWorker and RecvWorker threads since they block IO to the > respective peer. > b) The blocking IO problem is not just restricted to connectOne(), but also > in receiveConnection(). The Listener thread calls receiveConnection() for > each incoming connection request. receiveConnection does blocking IO to get > peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the > peer that had sent the connection request. All of this is happening from the > Listener. In short, if a peer fails after initiating a connection, the > Listener thread won't be able to accept connections from other peers, because > it would be stuck in read() or connetOne(). Also the code has an inherent > cycle. initiateConnection() and receiveConnection() will have to be very > carefully synchronized otherwise, we could run into deadlocks. This code is > going to be difficult to maintain/modify. > Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1147) Add support for local sessions
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13788998#comment-13788998 ] Mahadev konar commented on ZOOKEEPER-1147: -- [~fpj] looks like the patch is ready to get in. You want to look through before we commit? > Add support for local sessions > -- > > Key: ZOOKEEPER-1147 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1147 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.3.3 >Reporter: Vishal Kathuria >Assignee: Thawan Kooburat > Labels: api-change, scaling > Fix For: 3.5.0 > > Attachments: ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, > ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, > ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, > ZOOKEEPER-1147.patch > > Original Estimate: 840h > Remaining Estimate: 840h > > This improvement is in the bucket of making ZooKeeper work at a large scale. > We are planning on having about a 1 million clients connect to a ZooKeeper > ensemble through a set of 50-100 observers. Majority of these clients are > read only - ie they do not do any updates or create ephemeral nodes. > In ZooKeeper today, the client creates a session and the session creation is > handled like any other update. In the above use case, the session create/drop > workload can easily overwhelm an ensemble. The following is a proposal for a > "local session", to support a larger number of connections. > 1. The idea is to introduce a new type of session - "local" session. A > "local" session doesn't have a full functionality of a normal session. > 2. Local sessions cannot create ephemeral nodes. > 3. Once a local session is lost, you cannot re-establish it using the > session-id/password. The session and its watches are gone for good. > 4. When a local session connects, the session info is only maintained > on the zookeeper server (in this case, an observer) that it is connected to. > The leader is not aware of the creation of such a session and there is no > state written to disk. > 5. The pings and expiration is handled by the server that the session > is connected to. > With the above changes, we can make ZooKeeper scale to a much larger number > of clients without making the core ensemble a bottleneck. > In terms of API, there are two options that are being considered > 1. Let the client specify at the connect time which kind of session do they > want. > 2. All sessions connect as local sessions and automatically get promoted to > global sessions when they do an operation that requires a global session > (e.g. creating an ephemeral node) > Chubby took the approach of lazily promoting all sessions to global, but I > don't think that would work in our case, where we want to keep sessions which > never create ephemeral nodes as always local. Option 2 would make it more > broadly usable but option 1 would be easier to implement. > We are thinking of implementing option 1 as the first cut. There would be a > client flag, IsLocalSession (much like the current readOnly flag) that would > be used to determine whether to create a local session or a global session. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (ZOOKEEPER-1147) Add support for local sessions
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-1147: - Attachment: ZOOKEEPER-1147.patch Minor conflict with the current patch fails on applying with QuorumPeerMain.java - attaching a new one which fixes the conflict. > Add support for local sessions > -- > > Key: ZOOKEEPER-1147 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1147 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.3.3 >Reporter: Vishal Kathuria >Assignee: Thawan Kooburat > Labels: api-change, scaling > Fix For: 3.5.0 > > Attachments: ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, > ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, > ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, > ZOOKEEPER-1147.patch > > Original Estimate: 840h > Remaining Estimate: 840h > > This improvement is in the bucket of making ZooKeeper work at a large scale. > We are planning on having about a 1 million clients connect to a ZooKeeper > ensemble through a set of 50-100 observers. Majority of these clients are > read only - ie they do not do any updates or create ephemeral nodes. > In ZooKeeper today, the client creates a session and the session creation is > handled like any other update. In the above use case, the session create/drop > workload can easily overwhelm an ensemble. The following is a proposal for a > "local session", to support a larger number of connections. > 1. The idea is to introduce a new type of session - "local" session. A > "local" session doesn't have a full functionality of a normal session. > 2. Local sessions cannot create ephemeral nodes. > 3. Once a local session is lost, you cannot re-establish it using the > session-id/password. The session and its watches are gone for good. > 4. When a local session connects, the session info is only maintained > on the zookeeper server (in this case, an observer) that it is connected to. > The leader is not aware of the creation of such a session and there is no > state written to disk. > 5. The pings and expiration is handled by the server that the session > is connected to. > With the above changes, we can make ZooKeeper scale to a much larger number > of clients without making the core ensemble a bottleneck. > In terms of API, there are two options that are being considered > 1. Let the client specify at the connect time which kind of session do they > want. > 2. All sessions connect as local sessions and automatically get promoted to > global sessions when they do an operation that requires a global session > (e.g. creating an ephemeral node) > Chubby took the approach of lazily promoting all sessions to global, but I > don't think that would work in our case, where we want to keep sessions which > never create ephemeral nodes as always local. Option 2 would make it more > broadly usable but option 1 would be easier to implement. > We are thinking of implementing option 1 as the first cut. There would be a > client flag, IsLocalSession (much like the current readOnly flag) that would > be used to determine whether to create a local session or a global session. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-442) need a way to remove watches that are no longer of interest
[ https://issues.apache.org/jira/browse/ZOOKEEPER-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13788503#comment-13788503 ] Mahadev konar commented on ZOOKEEPER-442: - [~eribeiro] if you are interested, feel free to take it up. I'd be happy to provide guidance/other help on this. Thanks > need a way to remove watches that are no longer of interest > --- > > Key: ZOOKEEPER-442 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-442 > Project: ZooKeeper > Issue Type: New Feature >Reporter: Benjamin Reed >Assignee: Daniel Gómez Ferro >Priority: Critical > Fix For: 3.5.0 > > Attachments: ZOOKEEPER-442.patch, ZOOKEEPER-442.patch, > ZOOKEEPER-442.patch, ZOOKEEPER-442.patch, ZOOKEEPER-442.patch, > ZOOKEEPER-442.patch, ZOOKEEPER-442.patch > > > currently the only way a watch cleared is to trigger it. we need a way to > enumerate the outstanding watch objects, find watch events the objects are > watching for, and remove interests in an event. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (ZOOKEEPER-1696) Fail to run zookeeper client on Weblogic application server
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar resolved ZOOKEEPER-1696. -- Resolution: Fixed Committed the right patch. Thanks Jeffrey! > Fail to run zookeeper client on Weblogic application server > --- > > Key: ZOOKEEPER-1696 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1696 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.5 > Environment: Java version: jdk170_06 > WebLogic Server Version: 10.3.6.0 >Reporter: Dmitry Konstantinov >Assignee: Jeffrey Zhong >Priority: Critical > Fix For: 3.4.6 > > Attachments: zookeeper-1696.patch, zookeeper-1696-v1.patch, > zookeeper-1696-v2.patch > > > The problem in details is described here: > http://comments.gmane.org/gmane.comp.java.zookeeper.user/2897 > The provided link also contains a reference to fix implementation. > {noformat} > > <[ACTIVE] ExecuteThread: '2' for queue: > 'weblogic.kernel.Default (devapp090:2182)> <> <> <1366794208810> > null, unexpected error, closing socket connection and attempting reconnect > java.lang.IllegalArgumentException: No Configuration was registered that can > handle the configuration named Client > at > com.bea.common.security.jdkutils.JAASConfiguration.getAppConfigurationEntry(JAASConfiguration.java:130) > at > org.apache.zookeeper.client.ZooKeeperSaslClient.(ZooKeeperSaslClient.java:97) > at > org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:943) > at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:993) > > > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (ZOOKEEPER-1696) Fail to run zookeeper client on Weblogic application server
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar reopened ZOOKEEPER-1696: -- Reopening looks like I committed the wrong patch. > Fail to run zookeeper client on Weblogic application server > --- > > Key: ZOOKEEPER-1696 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1696 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.5 > Environment: Java version: jdk170_06 > WebLogic Server Version: 10.3.6.0 >Reporter: Dmitry Konstantinov >Assignee: Jeffrey Zhong >Priority: Critical > Fix For: 3.4.6 > > Attachments: zookeeper-1696.patch, zookeeper-1696-v1.patch, > zookeeper-1696-v2.patch > > > The problem in details is described here: > http://comments.gmane.org/gmane.comp.java.zookeeper.user/2897 > The provided link also contains a reference to fix implementation. > {noformat} > > <[ACTIVE] ExecuteThread: '2' for queue: > 'weblogic.kernel.Default (devapp090:2182)> <> <> <1366794208810> > null, unexpected error, closing socket connection and attempting reconnect > java.lang.IllegalArgumentException: No Configuration was registered that can > handle the configuration named Client > at > com.bea.common.security.jdkutils.JAASConfiguration.getAppConfigurationEntry(JAASConfiguration.java:130) > at > org.apache.zookeeper.client.ZooKeeperSaslClient.(ZooKeeperSaslClient.java:97) > at > org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:943) > at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:993) > > > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1696) Fail to run zookeeper client on Weblogic application server
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13770315#comment-13770315 ] Mahadev konar commented on ZOOKEEPER-1696: -- +1 for the patch. Given it ran through jenkins committing this to 3.4 and trunk. > Fail to run zookeeper client on Weblogic application server > --- > > Key: ZOOKEEPER-1696 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1696 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.5 > Environment: Java version: jdk170_06 > WebLogic Server Version: 10.3.6.0 >Reporter: Dmitry Konstantinov >Assignee: Jeffrey Zhong >Priority: Critical > Fix For: 3.4.6 > > Attachments: zookeeper-1696.patch > > > The problem in details is described here: > http://comments.gmane.org/gmane.comp.java.zookeeper.user/2897 > The provided link also contains a reference to fix implementation. > {noformat} > > <[ACTIVE] ExecuteThread: '2' for queue: > 'weblogic.kernel.Default (devapp090:2182)> <> <> <1366794208810> > null, unexpected error, closing socket connection and attempting reconnect > java.lang.IllegalArgumentException: No Configuration was registered that can > handle the configuration named Client > at > com.bea.common.security.jdkutils.JAASConfiguration.getAppConfigurationEntry(JAASConfiguration.java:130) > at > org.apache.zookeeper.client.ZooKeeperSaslClient.(ZooKeeperSaslClient.java:97) > at > org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:943) > at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:993) > > > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (ZOOKEEPER-1696) Fail to run zookeeper client on Weblogic application server
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13770315#comment-13770315 ] Mahadev konar edited comment on ZOOKEEPER-1696 at 9/18/13 2:10 AM: --- +1 for the patch. Given it ran through jenkins we can commit this to 3.4 and trunk. was (Author: mahadev): +1 for the patch. Given it ran through jenkins committing this to 3.4 and trunk. > Fail to run zookeeper client on Weblogic application server > --- > > Key: ZOOKEEPER-1696 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1696 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.5 > Environment: Java version: jdk170_06 > WebLogic Server Version: 10.3.6.0 >Reporter: Dmitry Konstantinov >Assignee: Jeffrey Zhong >Priority: Critical > Fix For: 3.4.6 > > Attachments: zookeeper-1696.patch > > > The problem in details is described here: > http://comments.gmane.org/gmane.comp.java.zookeeper.user/2897 > The provided link also contains a reference to fix implementation. > {noformat} > > <[ACTIVE] ExecuteThread: '2' for queue: > 'weblogic.kernel.Default (devapp090:2182)> <> <> <1366794208810> > null, unexpected error, closing socket connection and attempting reconnect > java.lang.IllegalArgumentException: No Configuration was registered that can > handle the configuration named Client > at > com.bea.common.security.jdkutils.JAASConfiguration.getAppConfigurationEntry(JAASConfiguration.java:130) > at > org.apache.zookeeper.client.ZooKeeperSaslClient.(ZooKeeperSaslClient.java:97) > at > org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:943) > at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:993) > > > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1751) ClientCnxn#run could miss the second ping or connection get dropped before a ping
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13770310#comment-13770310 ] Mahadev konar commented on ZOOKEEPER-1751: -- +1 for the patch. This is good to have since it can cause some race conditions during the client pings. > ClientCnxn#run could miss the second ping or connection get dropped before a > ping > - > > Key: ZOOKEEPER-1751 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1751 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.5 >Reporter: Jeffrey Zhong >Assignee: Jeffrey Zhong > Fix For: 3.4.6 > > Attachments: zookeeper-1751.patch > > > We could throw SessionTimeoutException exception even when timeToNextPing may > also be negative depending on the time when the following line is executed by > the thread because we check time out before sending a ping. > {code} > to = readTimeout - clientCnxnSocket.getIdleRecv(); > {code} > In addition, we only ping twice no matter how long the session time out value > is. For example, we set session time out = 60mins then we only try ping twice > in 40mins window. Therefore, the connection could be dropped by OS after idle > time out. > The issue is causing randomly "connection loss" or "session expired" issues > in client side which is bad for applications like HBase. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1751) ClientCnxn#run could miss the second ping or connection get dropped before a ping
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-1751: - Fix Version/s: 3.4.6 > ClientCnxn#run could miss the second ping or connection get dropped before a > ping > - > > Key: ZOOKEEPER-1751 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1751 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.5 >Reporter: Jeffrey Zhong >Assignee: Jeffrey Zhong > Fix For: 3.4.6 > > Attachments: zookeeper-1751.patch > > > We could throw SessionTimeoutException exception even when timeToNextPing may > also be negative depending on the time when the following line is executed by > the thread because we check time out before sending a ping. > {code} > to = readTimeout - clientCnxnSocket.getIdleRecv(); > {code} > In addition, we only ping twice no matter how long the session time out value > is. For example, we set session time out = 60mins then we only try ping twice > in 40mins window. Therefore, the connection could be dropped by OS after idle > time out. > The issue is causing randomly "connection loss" or "session expired" issues > in client side which is bad for applications like HBase. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13770309#comment-13770309 ] Mahadev konar commented on ZOOKEEPER-1733: -- Running this through jenkins. > FLETest#testLE is flaky on windows boxes > > > Key: ZOOKEEPER-1733 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.5 >Reporter: Jeffrey Zhong >Assignee: Jeffrey Zhong > Fix For: 3.5.0 > > Attachments: zookeeper-1733.patch > > > FLETest#testLE fail intermittently on windows boxes. The reason is that in > LEThread#run() we have: > {code} > if(leader == i){ > synchronized(finalObj){ > successCount++; > if(successCount > (count/2)) > finalObj.notify(); > } > break; > } > {code} > Basically once we have a confirmed leader, the leader thread dies due to the > "break" of while loop. > While in the verification step, we check if the leader thread alive or not as > following: > {code} >if(threads.get((int) leader).isAlive()){ >Assert.fail("Leader hasn't joined: " + leader); >} > {code} > On windows boxes, the above verification step fails frequently because leader > thread most likely already exits. > Do we know why we have the leader alive verification step only lead thread > can bump up successCount >= count/2? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-1733: - Fix Version/s: (was: 3.4.6) 3.5.0 > FLETest#testLE is flaky on windows boxes > > > Key: ZOOKEEPER-1733 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.5 >Reporter: Jeffrey Zhong >Assignee: Jeffrey Zhong > Fix For: 3.5.0 > > Attachments: zookeeper-1733.patch > > > FLETest#testLE fail intermittently on windows boxes. The reason is that in > LEThread#run() we have: > {code} > if(leader == i){ > synchronized(finalObj){ > successCount++; > if(successCount > (count/2)) > finalObj.notify(); > } > break; > } > {code} > Basically once we have a confirmed leader, the leader thread dies due to the > "break" of while loop. > While in the verification step, we check if the leader thread alive or not as > following: > {code} >if(threads.get((int) leader).isAlive()){ >Assert.fail("Leader hasn't joined: " + leader); >} > {code} > On windows boxes, the above verification step fails frequently because leader > thread most likely already exits. > Do we know why we have the leader alive verification step only lead thread > can bump up successCount >= count/2? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-1733: - Fix Version/s: 3.4.6 > FLETest#testLE is flaky on windows boxes > > > Key: ZOOKEEPER-1733 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.5 >Reporter: Jeffrey Zhong >Assignee: Jeffrey Zhong > Fix For: 3.4.6 > > Attachments: zookeeper-1733.patch > > > FLETest#testLE fail intermittently on windows boxes. The reason is that in > LEThread#run() we have: > {code} > if(leader == i){ > synchronized(finalObj){ > successCount++; > if(successCount > (count/2)) > finalObj.notify(); > } > break; > } > {code} > Basically once we have a confirmed leader, the leader thread dies due to the > "break" of while loop. > While in the verification step, we check if the leader thread alive or not as > following: > {code} >if(threads.get((int) leader).isAlive()){ >Assert.fail("Leader hasn't joined: " + leader); >} > {code} > On windows boxes, the above verification step fails frequently because leader > thread most likely already exits. > Do we know why we have the leader alive verification step only lead thread > can bump up successCount >= count/2? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1696) Fail to run zookeeper client on Weblogic application server
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13770307#comment-13770307 ] Mahadev konar commented on ZOOKEEPER-1696: -- The same patch applies to 3.4 and trunk. > Fail to run zookeeper client on Weblogic application server > --- > > Key: ZOOKEEPER-1696 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1696 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.5 > Environment: Java version: jdk170_06 > WebLogic Server Version: 10.3.6.0 >Reporter: Dmitry Konstantinov >Assignee: Jeffrey Zhong >Priority: Critical > Fix For: 3.4.6 > > Attachments: zookeeper-1696.patch > > > The problem in details is described here: > http://comments.gmane.org/gmane.comp.java.zookeeper.user/2897 > The provided link also contains a reference to fix implementation. > {noformat} > > <[ACTIVE] ExecuteThread: '2' for queue: > 'weblogic.kernel.Default (devapp090:2182)> <> <> <1366794208810> > null, unexpected error, closing socket connection and attempting reconnect > java.lang.IllegalArgumentException: No Configuration was registered that can > handle the configuration named Client > at > com.bea.common.security.jdkutils.JAASConfiguration.getAppConfigurationEntry(JAASConfiguration.java:130) > at > org.apache.zookeeper.client.ZooKeeperSaslClient.(ZooKeeperSaslClient.java:97) > at > org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:943) > at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:993) > > > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1696) Fail to run zookeeper client on Weblogic application server
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-1696: - Fix Version/s: 3.4.6 > Fail to run zookeeper client on Weblogic application server > --- > > Key: ZOOKEEPER-1696 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1696 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.5 > Environment: Java version: jdk170_06 > WebLogic Server Version: 10.3.6.0 >Reporter: Dmitry Konstantinov >Assignee: Jeffrey Zhong >Priority: Critical > Fix For: 3.4.6 > > Attachments: zookeeper-1696.patch > > > The problem in details is described here: > http://comments.gmane.org/gmane.comp.java.zookeeper.user/2897 > The provided link also contains a reference to fix implementation. > {noformat} > > <[ACTIVE] ExecuteThread: '2' for queue: > 'weblogic.kernel.Default (devapp090:2182)> <> <> <1366794208810> > null, unexpected error, closing socket connection and attempting reconnect > java.lang.IllegalArgumentException: No Configuration was registered that can > handle the configuration named Client > at > com.bea.common.security.jdkutils.JAASConfiguration.getAppConfigurationEntry(JAASConfiguration.java:130) > at > org.apache.zookeeper.client.ZooKeeperSaslClient.(ZooKeeperSaslClient.java:97) > at > org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:943) > at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:993) > > > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1657) Increased CPU usage by unnecessary SASL checks
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761595#comment-13761595 ] Mahadev konar commented on ZOOKEEPER-1657: -- +1 for the patch. Looks good. Thanks Eugene/Flavio. > Increased CPU usage by unnecessary SASL checks > -- > > Key: ZOOKEEPER-1657 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1657 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.5 >Reporter: Gunnar Wagenknecht >Assignee: Philip K. Warren > Labels: performance > Fix For: 3.5.0, 3.4.6 > > Attachments: ZOOKEEPER-1657.patch, ZOOKEEPER-1657.patch, > ZOOKEEPER-1657.patch, ZOOKEEPER-1657.patch, ZOOKEEPER-1657.patch, > zookeeper-hotspot-gone.png, zookeeper-hotspot.png > > > I did some profiling in one of our Java environments and found an interesting > footprint in ZooKeeper. The SASL support seems to trigger a lot times on the > client although it's not even in use. > Is there a switch to disable SASL completely? > The attached screenshot shows a 10-minute profiling session on one of our > production Jetty servers. The Jetty server handles ~1k web requests per > minute. The average response time per web request is a few milli seconds. The > profiling was performed on a machine running for >24h. > We noticed a significant CPU increase on our servers when deploying an update > from ZooKeeper 3.3.2 to ZooKeeper 3.4.5. Thus, we started investigating. The > screenshot shows that only 32% CPU time are spent in Jetty. In contrast, 65% > are spend in ZooKeeper. > A few notes/thoughts: > * {{ClientCnxn$SendThread.clientTunneledAuthenticationInProgress}} seems to > be the culprit > * {{javax.security.auth.login.Configuration.getConfiguration}} seems to be > called very often? > * There is quite a bit reflection involved in > {{java.security.AccessController.doPrivileged}} > * No security manager is active in the JVM: I tend to place an if-check in > the code before calling {{AccessController.doPrivileged}}. When no SM is > installed, the runnable can be called directly which safes cycles. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [Release 3.5.0] Any news yet?
It would be good if Flavio wants to try doing the RM. Flavio? thanks mahadev On Wed, Jul 10, 2013 at 10:20 AM, Patrick Hunt wrote: > Mahadev do you want to RM 3.4.6 or should Flavio try his hand at doing > a release? > > Patrick > > On Wed, Jul 10, 2013 at 9:50 AM, Mahadev Konar > wrote: >> 1147 is pretty close. I am working on getting this into trunk. >> >> Hopefully today/tomm. >> >> thanks >> mahadev >> >> On Wed, Jul 10, 2013 at 7:01 AM, Flavio Junqueira >> wrote: >>> I've also been wondering about 3.4.6. I don't mind being the RM for 3.5.0 >>> if you want to do 3.4.6. >>> >>> -Flavio >>> >>> On Jul 9, 2013, at 5:49 PM, Patrick Hunt wrote: >>> >>>> I'd like to see a 3.5.0-alpha soon. I agree re 1147 and iirc it was >>>> pretty close (Mahadev?). ZOOKEEPER-1346 (jetty support for monitoring) >>>> should also go in. It's pretty much ready afair. >>>> >>>> I'm happy to RM 3.5 if we can get past these open issues. >>>> >>>> Patrick >>>> >>>> >>>> On Tue, Jul 9, 2013 at 8:32 AM, Raúl Gutiérrez Segalés >>>> wrote: >>>>> Hi Stefan, >>>>> >>>>> On 9 July 2013 08:13, Stefan Egli wrote: >>>>>> Hi, >>>>>> >>>>>> We're evaluating using ZooKeeper, and esp the embedded mode >>>>>> (ZOOKEEPER-107 - [0]), for an implementation of the Sling Discovery API >>>>>> ([1]). Since ZOOKEEPER-107 is planned for 3.5.0 I was wondering what the >>>>>> release schedule of 3.5.0 is, or any plan thereof? (I saw a discussion >>>>>> about releasing it from Dec 2012 [1]). >>>>>> >>>>> >>>>> I think as of now the biggest blocker is: >>>>> >>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-1147 >>>>> >>>>> Besides needing a final review it needs better documentation and an >>>>> extra small patch (I proposed one) to support rolling updates when >>>>> enabling local sessions. >>>>> >>>>> Cheers, >>>>> -rgs >>>
Re: [Release 3.5.0] Any news yet?
1147 is pretty close. I am working on getting this into trunk. Hopefully today/tomm. thanks mahadev On Wed, Jul 10, 2013 at 7:01 AM, Flavio Junqueira wrote: > I've also been wondering about 3.4.6. I don't mind being the RM for 3.5.0 if > you want to do 3.4.6. > > -Flavio > > On Jul 9, 2013, at 5:49 PM, Patrick Hunt wrote: > >> I'd like to see a 3.5.0-alpha soon. I agree re 1147 and iirc it was >> pretty close (Mahadev?). ZOOKEEPER-1346 (jetty support for monitoring) >> should also go in. It's pretty much ready afair. >> >> I'm happy to RM 3.5 if we can get past these open issues. >> >> Patrick >> >> >> On Tue, Jul 9, 2013 at 8:32 AM, Raúl Gutiérrez Segalés >> wrote: >>> Hi Stefan, >>> >>> On 9 July 2013 08:13, Stefan Egli wrote: Hi, We're evaluating using ZooKeeper, and esp the embedded mode (ZOOKEEPER-107 - [0]), for an implementation of the Sling Discovery API ([1]). Since ZOOKEEPER-107 is planned for 3.5.0 I was wondering what the release schedule of 3.5.0 is, or any plan thereof? (I saw a discussion about releasing it from Dec 2012 [1]). >>> >>> I think as of now the biggest blocker is: >>> >>> https://issues.apache.org/jira/browse/ZOOKEEPER-1147 >>> >>> Besides needing a final review it needs better documentation and an >>> extra small patch (I proposed one) to support rolling updates when >>> enabling local sessions. >>> >>> Cheers, >>> -rgs >
[jira] [Commented] (ZOOKEEPER-767) Submitting Demo/Recipe Shared / Exclusive Lock Code
[ https://issues.apache.org/jira/browse/ZOOKEEPER-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658530#comment-13658530 ] Mahadev konar commented on ZOOKEEPER-767: - Flavio, Agreed, I think its definitely a better match for Curator. > Submitting Demo/Recipe Shared / Exclusive Lock Code > --- > > Key: ZOOKEEPER-767 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-767 > Project: ZooKeeper > Issue Type: Improvement > Components: recipes >Affects Versions: 3.3.0 >Reporter: Sam Baskinger >Assignee: Sam Baskinger >Priority: Minor > Fix For: 3.5.0 > > Attachments: ZOOKEEPER-767.patch, ZOOKEEPER-767.patch, > ZOOKEEPER-767.patch, ZOOKEEPER-767.patch, ZOOKEEPER-767.patch, > ZOOKEEPER-767.patch > > Time Spent: 8h > > Networked Insights would like to share-back some code for shared/exclusive > locking that we are using in our labs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1686) Publish ZK 3.4.5 test jar
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-1686: - Assignee: Mahadev konar > Publish ZK 3.4.5 test jar > - > > Key: ZOOKEEPER-1686 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1686 > Project: ZooKeeper > Issue Type: Bug > Components: build, tests >Affects Versions: 3.4.5 >Reporter: Todd Lipcon > Assignee: Mahadev konar > > ZooKeeper 3.4.2 used to publish a jar with the tests classifier for use by > downstream project tests. It seems this didn't get published for 3.4.4 or > 3.4.5 (see > https://repository.apache.org/index.html#nexus-search;quick~org.apache.zookeeper). > Would someone mind please publishing these artifacts? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1657) Increased CPU usage by unnecessary SASL checks
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-1657: - Fix Version/s: 3.4.6 3.5.0 > Increased CPU usage by unnecessary SASL checks > -- > > Key: ZOOKEEPER-1657 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1657 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.5 >Reporter: Gunnar Wagenknecht > Labels: performance > Fix For: 3.5.0, 3.4.6 > > Attachments: ZOOKEEPER-1657.patch, ZOOKEEPER-1657.patch, > ZOOKEEPER-1657.patch, zookeeper-hotspot.png > > > I did some profiling in one of our Java environments and found an interesting > footprint in ZooKeeper. The SASL support seems to trigger a lot times on the > client although it's not even in use. > Is there a switch to disable SASL completely? > The attached screenshot shows a 10-minute profiling session on one of our > production Jetty servers. The Jetty server handles ~1k web requests per > minute. The average response time per web request is a few milli seconds. The > profiling was performed on a machine running for >24h. > We noticed a significant CPU increase on our servers when deploying an update > from ZooKeeper 3.3.2 to ZooKeeper 3.4.5. Thus, we started investigating. The > screenshot shows that only 32% CPU time are spent in Jetty. In contrast, 65% > are spend in ZooKeeper. > A few notes/thoughts: > * {{ClientCnxn$SendThread.clientTunneledAuthenticationInProgress}} seems to > be the culprit > * {{javax.security.auth.login.Configuration.getConfiguration}} seems to be > called very often? > * There is quite a bit reflection involved in > {{java.security.AccessController.doPrivileged}} > * No security manager is active in the JVM: I tend to place an if-check in > the code before calling {{AccessController.doPrivileged}}. When no SM is > installed, the runnable can be called directly which safes cycles. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1551) Observer ignore txns that comes after snapshot and UPTODATE
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13592043#comment-13592043 ] Mahadev konar commented on ZOOKEEPER-1551: -- [~fpj] would you be able to review the latest patch? > Observer ignore txns that comes after snapshot and UPTODATE > > > Key: ZOOKEEPER-1551 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1551 > Project: ZooKeeper > Issue Type: Bug > Components: quorum, server >Affects Versions: 3.4.3 >Reporter: Thawan Kooburat >Assignee: Thawan Kooburat >Priority: Blocker > Fix For: 3.5.0, 3.4.6 > > Attachments: ZOOKEEPER-1551.patch, ZOOKEEPER-1551.patch > > > In Learner.java, txns which comes after the learner has taken the snapshot > (after NEWLEADER packet) are stored in packetsNotCommitted. The follower has > special logic to apply these txns at the end of syncWithLeader() method. > However, the observer will ignore these txns completely, causing data > inconsistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13592041#comment-13592041 ] Mahadev konar commented on ZOOKEEPER-1382: -- Michael, Would you be able to upload a patch for trunk as well? > Zookeeper server holds onto dead/expired session ids in the watch data > structures > - > > Key: ZOOKEEPER-1382 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.5 >Reporter: Neha Narkhede >Assignee: Neha Narkhede >Priority: Critical > Fix For: 3.4.6 > > Attachments: ZOOKEEPER-1382_3.3.4.patch, > ZOOKEEPER-1382-branch-3.4.patch > > > I've observed that zookeeper server holds onto expired session ids in the > watcher data structures. The result is the wchp command reports session ids > that cannot be found through cons/dump and those expired session ids sit > there maybe until the server is restarted. Here are snippets from the client > and the server logs that lead to this state, for one particular session id > 0x134485fd7bcb26f - > There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 > and I'm using ZkClient to connect to the cluster > From the application log - > application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] > [main-SendThread(223.prod:12913)] [application Session establishment complete > on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, > negotiated timeout = 6000 > application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] > [main-SendThread(223.prod:12913)] [application] Client session timed out, > have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing > socket connection and attempting reconnect > application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] > [main-SendThread(226.prod:12913)] [application] Unable to reconnect to > ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket > connection > On the leader zk, 225 - > zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO > [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, > timeout of 6000ms exceeded > zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO > [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination > for sessionid: 0x134485fd7bcb26f > On the server, the client was initially connected to, 223 - > zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO > [CommitProcessor:1:NIOServerCnxn@1580] - Established session > 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020 > zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO > [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client > /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f > Here are the log snippets from 226, which is the server, the client > reconnected to, before getting session expired event - > 2012-01-27 09:52:38,190 - INFO > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client > attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367 > 2012-01-27 09:52:38,191 - INFO > [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session > 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired > 2012-01-27 09:52:38,191 - INFO > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed > socket connection for client /172.17.136.82:49367 which had sessionid > 0x134485fd7bcb26f > wchp output from 226, taken on 01/30 - > nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f > *226.*wchp* | wc -l > 3 > wchp output from 223, taken on 01/30 - > nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f > *223.*wchp* | wc -l > 0 > cons output from 223 and 226, taken on 01/30 - > nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f > *226.*cons* | wc -l > 0 > nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f > *223.*cons* | wc -l > 0 > So, what seems to have happened is that the client was able to re-register > the watches on the new server (226), after it got disconnected from 223, > inspite of having an expired session id. > In NIOServerCnxn, I saw that after suspecting that a session is expired, a > server removes t
[jira] [Updated] (ZOOKEEPER-1624) PrepRequestProcessor abort multi-operation incorrectly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-1624: - Fix Version/s: 3.5.0 > PrepRequestProcessor abort multi-operation incorrectly > -- > > Key: ZOOKEEPER-1624 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1624 > Project: ZooKeeper > Issue Type: Bug > Components: server >Reporter: Thawan Kooburat >Assignee: Thawan Kooburat >Priority: Critical > Fix For: 3.5.0 > > Attachments: ZOOKEEPER-1624.patch > > > We found this issue when trying to issue multiple instances of the following > multi-op concurrently > multi { > 1. create sequential node /a- > 2. create node /b > } > The expected result is that only the first multi-op request should success > and the rest of request should fail because /b is already exist > However, the reported result is that the subsequence multi-op failed because > of sequential node creation failed which is not possible. > Below is the return code for each sub-op when issuing 3 instances of the > above multi-op asynchronously > 1. ZOK, ZOK > 2. ZOK, ZNODEEXISTS, > 3. ZNODEEXISTS, ZRUNTIMEINCONSISTENCY, > When I added more debug log. The cause is that PrepRequestProcessor rollback > outstandingChanges of the second multi-op incorrectly causing sequential node > name generation to be incorrect. Below is the sequential node name generated > by PrepRequestProcessor > 1. create /a-0001 > 2. create /a-0003 > 3. create /a-0001 > The bug is getPendingChanges() method. In failed to copied ChangeRecord for > the parent node ("/"). So rollbackPendingChanges() cannot restore the right > previous change record of the parent node when aborting the second multi-op > The impact of this bug is that sequential node creation on the same parent > node may fail until the previous one is committed. I am not sure if there is > other implication or not. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-1621: - Assignee: Mahadev konar > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur >Assignee: Mahadev konar > Fix For: 3.5.0 > > Attachments: zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > It seems to me that writing the transaction log should be fully atomic to > avoid such situations. Is this not the case? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557022#comment-13557022 ] Mahadev konar commented on ZOOKEEPER-1621: -- Looks like the header was incomplete. Unfortunately we do not handle corrupt header but do handle corrupt txn's later. Am suprised that this happened twice in a row for 2 users. Ill upload a patch and test case. > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur > Fix For: 3.5.0 > > Attachments: zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > It seems to me that writing the transaction log should be fully atomic to > avoid such situations. Is this not the case? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1147) Add support for local sessions
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557012#comment-13557012 ] Mahadev konar commented on ZOOKEEPER-1147: -- [~thawan] I thin the above scenario is ok. The only issue I think we have is the sensitive local sessions. Since we have had too many issues with disconnects and session expiry I think this might cause more issues than we already have. Is there something we can do here? I cant seem to find a way around it without doing client side changes. > Add support for local sessions > -- > > Key: ZOOKEEPER-1147 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1147 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.3.3 >Reporter: Vishal Kathuria >Assignee: Thawan Kooburat > Labels: api-change, scaling > Fix For: 3.5.0 > > Attachments: ZOOKEEPER-1147.patch > > Original Estimate: 840h > Remaining Estimate: 840h > > This improvement is in the bucket of making ZooKeeper work at a large scale. > We are planning on having about a 1 million clients connect to a ZooKeeper > ensemble through a set of 50-100 observers. Majority of these clients are > read only - ie they do not do any updates or create ephemeral nodes. > In ZooKeeper today, the client creates a session and the session creation is > handled like any other update. In the above use case, the session create/drop > workload can easily overwhelm an ensemble. The following is a proposal for a > "local session", to support a larger number of connections. > 1. The idea is to introduce a new type of session - "local" session. A > "local" session doesn't have a full functionality of a normal session. > 2. Local sessions cannot create ephemeral nodes. > 3. Once a local session is lost, you cannot re-establish it using the > session-id/password. The session and its watches are gone for good. > 4. When a local session connects, the session info is only maintained > on the zookeeper server (in this case, an observer) that it is connected to. > The leader is not aware of the creation of such a session and there is no > state written to disk. > 5. The pings and expiration is handled by the server that the session > is connected to. > With the above changes, we can make ZooKeeper scale to a much larger number > of clients without making the core ensemble a bottleneck. > In terms of API, there are two options that are being considered > 1. Let the client specify at the connect time which kind of session do they > want. > 2. All sessions connect as local sessions and automatically get promoted to > global sessions when they do an operation that requires a global session > (e.g. creating an ephemeral node) > Chubby took the approach of lazily promoting all sessions to global, but I > don't think that would work in our case, where we want to keep sessions which > never create ephemeral nodes as always local. Option 2 would make it more > broadly usable but option 1 would be easier to implement. > We are thinking of implementing option 1 as the first cut. There would be a > client flag, IsLocalSession (much like the current readOnly flag) that would > be used to determine whether to create a local session or a global session. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1572) Add an async interface for multi request
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556983#comment-13556983 ] Mahadev konar commented on ZOOKEEPER-1572: -- The patch looks good to me. Will go ahead and commit after running through hudson. > Add an async interface for multi request > > > Key: ZOOKEEPER-1572 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1572 > Project: ZooKeeper > Issue Type: Improvement > Components: java client >Reporter: Sijie Guo >Assignee: Sijie Guo > Labels: review > Fix For: 3.5.0 > > Attachments: ZOOKEEPER-1572.diff, ZOOKEEPER-1572.diff > > > Currently there is no async interface for multi request in ZooKeeper java > client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (ZOOKEEPER-1147) Add support for local sessions
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556316#comment-13556316 ] Mahadev konar edited comment on ZOOKEEPER-1147 at 1/17/13 3:42 PM: --- bq. Yes, a session retains the same ID when it is upgraded from local session to global session. I think this is desirable. Can you elaborate why this may cause problem? Yes its desirable. Before I comment on what I think might be wrong, when does the server who has the local sessionid remove it from its data structures? Is it when it gets a create session in final request processor? Until then the session is a local session? was (Author: mahadev): bq. Yes, a session retains the same ID when it is upgraded from local session to global session. I think this is desirable. Can you elaborate why this may cause problem? Yes its desirable. Before I comment on what I think might be wrong, when does the server who has the local sessionid remove it from its data structures? Is it when it gets a response from in final request processor about the session creation? Until then the session is in a local session? > Add support for local sessions > -- > > Key: ZOOKEEPER-1147 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1147 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.3.3 >Reporter: Vishal Kathuria >Assignee: Thawan Kooburat > Labels: api-change, scaling > Fix For: 3.5.0 > > Attachments: ZOOKEEPER-1147.patch > > Original Estimate: 840h > Remaining Estimate: 840h > > This improvement is in the bucket of making ZooKeeper work at a large scale. > We are planning on having about a 1 million clients connect to a ZooKeeper > ensemble through a set of 50-100 observers. Majority of these clients are > read only - ie they do not do any updates or create ephemeral nodes. > In ZooKeeper today, the client creates a session and the session creation is > handled like any other update. In the above use case, the session create/drop > workload can easily overwhelm an ensemble. The following is a proposal for a > "local session", to support a larger number of connections. > 1. The idea is to introduce a new type of session - "local" session. A > "local" session doesn't have a full functionality of a normal session. > 2. Local sessions cannot create ephemeral nodes. > 3. Once a local session is lost, you cannot re-establish it using the > session-id/password. The session and its watches are gone for good. > 4. When a local session connects, the session info is only maintained > on the zookeeper server (in this case, an observer) that it is connected to. > The leader is not aware of the creation of such a session and there is no > state written to disk. > 5. The pings and expiration is handled by the server that the session > is connected to. > With the above changes, we can make ZooKeeper scale to a much larger number > of clients without making the core ensemble a bottleneck. > In terms of API, there are two options that are being considered > 1. Let the client specify at the connect time which kind of session do they > want. > 2. All sessions connect as local sessions and automatically get promoted to > global sessions when they do an operation that requires a global session > (e.g. creating an ephemeral node) > Chubby took the approach of lazily promoting all sessions to global, but I > don't think that would work in our case, where we want to keep sessions which > never create ephemeral nodes as always local. Option 2 would make it more > broadly usable but option 1 would be easier to implement. > We are thinking of implementing option 1 as the first cut. There would be a > client flag, IsLocalSession (much like the current readOnly flag) that would > be used to determine whether to create a local session or a global session. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1147) Add support for local sessions
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556316#comment-13556316 ] Mahadev konar commented on ZOOKEEPER-1147: -- bq. Yes, a session retains the same ID when it is upgraded from local session to global session. I think this is desirable. Can you elaborate why this may cause problem? Yes its desirable. Before I comment on what I think might be wrong, when does the server who has the local sessionid remove it from its data structures? Is it when it gets a response from in final request processor about the session creation? Until then the session is in a local session? > Add support for local sessions > -- > > Key: ZOOKEEPER-1147 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1147 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.3.3 >Reporter: Vishal Kathuria >Assignee: Thawan Kooburat > Labels: api-change, scaling > Fix For: 3.5.0 > > Attachments: ZOOKEEPER-1147.patch > > Original Estimate: 840h > Remaining Estimate: 840h > > This improvement is in the bucket of making ZooKeeper work at a large scale. > We are planning on having about a 1 million clients connect to a ZooKeeper > ensemble through a set of 50-100 observers. Majority of these clients are > read only - ie they do not do any updates or create ephemeral nodes. > In ZooKeeper today, the client creates a session and the session creation is > handled like any other update. In the above use case, the session create/drop > workload can easily overwhelm an ensemble. The following is a proposal for a > "local session", to support a larger number of connections. > 1. The idea is to introduce a new type of session - "local" session. A > "local" session doesn't have a full functionality of a normal session. > 2. Local sessions cannot create ephemeral nodes. > 3. Once a local session is lost, you cannot re-establish it using the > session-id/password. The session and its watches are gone for good. > 4. When a local session connects, the session info is only maintained > on the zookeeper server (in this case, an observer) that it is connected to. > The leader is not aware of the creation of such a session and there is no > state written to disk. > 5. The pings and expiration is handled by the server that the session > is connected to. > With the above changes, we can make ZooKeeper scale to a much larger number > of clients without making the core ensemble a bottleneck. > In terms of API, there are two options that are being considered > 1. Let the client specify at the connect time which kind of session do they > want. > 2. All sessions connect as local sessions and automatically get promoted to > global sessions when they do an operation that requires a global session > (e.g. creating an ephemeral node) > Chubby took the approach of lazily promoting all sessions to global, but I > don't think that would work in our case, where we want to keep sessions which > never create ephemeral nodes as always local. Option 2 would make it more > broadly usable but option 1 would be easier to implement. > We are thinking of implementing option 1 as the first cut. There would be a > client flag, IsLocalSession (much like the current readOnly flag) that would > be used to determine whether to create a local session or a global session. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1622) session ids will be negative in the year 2022
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555698#comment-13555698 ] Mahadev konar commented on ZOOKEEPER-1622: -- Nice catch Eric! I think we do document that id be between 0 and 255 but maybe we should error out if that is not the case. > session ids will be negative in the year 2022 > - > > Key: ZOOKEEPER-1622 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1622 > Project: ZooKeeper > Issue Type: Bug >Reporter: Eric Newton >Priority: Trivial > > Someone decided to use a large number for their myid file. This cause > session ids to go negative, and our software (Apache Accumulo) did not handle > this very well. While diagnosing the problem, I noticed this in SessionImpl: > {noformat} >public static long initializeNextSession(long id) { > long nextSid = 0; > nextSid = (System.currentTimeMillis() << 24) >> 8; > nextSid = nextSid | (id <<56); > return nextSid; > } > {noformat} > When the 40th bit in System.currentTimeMillis() is a one, sign extension will > fill the upper 8 bytes of nextSid, and id will not make the session id > unique. I recommend changing the right shift to the logical shift: > {noformat} >public static long initializeNextSession(long id) { > long nextSid = 0; > nextSid = (System.currentTimeMillis() << 24) >>> 8; > nextSid = nextSid | (id <<56); > return nextSid; > } > {noformat} > But, we have until the year 2022 before we have to worry about it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (ZOOKEEPER-1612) Zookeeper unable to recover and start once datadir disk is full and disk space cleared
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar resolved ZOOKEEPER-1612. -- Resolution: Duplicate Duplicate of ZOOKEEPER-1621. > Zookeeper unable to recover and start once datadir disk is full and disk > space cleared > -- > > Key: ZOOKEEPER-1612 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1612 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.3 >Reporter: suja s > > Once zookeeper data dir disk becomes full, the process gets shut down. > {noformat} > 2012-12-14 13:22:26,959 [myid:2] - ERROR > [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@276] - Severe > unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109) > at java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:56) > at java.io.DataOutputStream.write(DataOutputStream.java:90) > at java.io.FilterOutputStream.write(FilterOutputStream.java:80) > at > org.apache.jute.BinaryOutputArchive.writeBuffer(BinaryOutputArchive.java:119) > at org.apache.zookeeper.server.DataNode.serialize(DataNode.java:168) > at > org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:123) > at > org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:1115) > at > org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:1130) > at > org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:1130) > at org.apache.zookeeper.server.DataTree.serialize(DataTree.java:1179) > at > org.apache.zookeeper.server.util.SerializeUtils.serializeSnapshot(SerializeUtils.java:138) > at > org.apache.zookeeper.server.persistence.FileSnap.serialize(FileSnap.java:213) > at > org.apache.zookeeper.server.persistence.FileSnap.serialize(FileSnap.java:230) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.save(FileTxnSnapLog.java:242) > at > org.apache.zookeeper.server.ZooKeeperServer.takeSnapshot(ZooKeeperServer.java:274) > at > org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:407) > at > org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:82) > at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:759) > {noformat} > Later disk space is cleared and zk started again. Startup of zk fails as it > is not able to read snapshot properly. (Since load from disk failed it is not > able to join peers in the quorum and get a snapshot diff) > {noformat} > 2012-12-14 16:20:31,489 [myid:2] - INFO [main:FileSnap@83] - Reading > snapshot ../dataDir/version-2/snapshot.100042 > 2012-12-14 16:20:31,564 [myid:2] - ERROR [main:QuorumPeer@472] - Unable to > load database on disk > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:132) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:436) > at > org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:428) >
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555318#comment-13555318 ] Mahadev konar commented on ZOOKEEPER-1621: -- Ill makr 1612 as dup. Thanks for pointing that out Edward. > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur > Fix For: 3.5.0 > > Attachments: zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > It seems to me that writing the transaction log should be fully atomic to > avoid such situations. Is this not the case? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555192#comment-13555192 ] Mahadev konar commented on ZOOKEEPER-1621: -- David, I thought you said it does not recover when disk was full, but looks like the disk is still full? No? > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur > Fix For: 3.5.0 > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > It seems to me that writing the transaction log should be fully atomic to > avoid such situations. Is this not the case? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555169#comment-13555169 ] Mahadev konar commented on ZOOKEEPER-1621: -- David, So there exceptions are thrown when ZooKeeper is running? Am not sure why its exiting so many times. Do you guys restart the ZK server if it dies? > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur > Fix For: 3.5.0 > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > It seems to me that writing the transaction log should be fully atomic to > avoid such situations. Is this not the case? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-1621: - Fix Version/s: (was: 3.4.6) 3.5.0 > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur > Fix For: 3.5.0 > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > It seems to me that writing the transaction log should be fully atomic to > avoid such situations. Is this not the case? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-1621: - Priority: Major (was: Critical) > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur > Fix For: 3.4.6 > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > It seems to me that writing the transaction log should be fully atomic to > avoid such situations. Is this not the case? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-1621: - Fix Version/s: 3.4.6 > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur > Fix For: 3.4.6 > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > It seems to me that writing the transaction log should be fully atomic to > avoid such situations. Is this not the case? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-1621: - Priority: Critical (was: Major) > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur >Priority: Critical > Fix For: 3.4.6 > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > It seems to me that writing the transaction log should be fully atomic to > avoid such situations. Is this not the case? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1147) Add support for local sessions
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554819#comment-13554819 ] Mahadev konar commented on ZOOKEEPER-1147: -- [~thawan] this helps. Thanks for the information. I still have a couple of more questions: - Will a read only client always get a session expiration if a disconnect happens even though its not tried all the other servers? - Is the local session id the same as global session id when its created (I mean as the long value)? If its the same I think we have a problem with the shifting of client between servers.. bq. When a client reconnects to B, its sessionId won’t exist in B’s local session tracker. So B will send validation packet. If CreateSession issued by A is committed before validation packet arrive the client will be able to connect. Otherwise, the client will get session expired because the quorum hasn’t know about this session yet. If the client also tries to connect back to A again, the session is already removed from local session tracker. So A will need to send a validation packet to the leader. The outcome should be the same as B depending on the timing of the request. > Add support for local sessions > -- > > Key: ZOOKEEPER-1147 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1147 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.3.3 >Reporter: Vishal Kathuria >Assignee: Thawan Kooburat > Labels: api-change, scaling > Fix For: 3.5.0 > > Attachments: ZOOKEEPER-1147.patch > > Original Estimate: 840h > Remaining Estimate: 840h > > This improvement is in the bucket of making ZooKeeper work at a large scale. > We are planning on having about a 1 million clients connect to a ZooKeeper > ensemble through a set of 50-100 observers. Majority of these clients are > read only - ie they do not do any updates or create ephemeral nodes. > In ZooKeeper today, the client creates a session and the session creation is > handled like any other update. In the above use case, the session create/drop > workload can easily overwhelm an ensemble. The following is a proposal for a > "local session", to support a larger number of connections. > 1. The idea is to introduce a new type of session - "local" session. A > "local" session doesn't have a full functionality of a normal session. > 2. Local sessions cannot create ephemeral nodes. > 3. Once a local session is lost, you cannot re-establish it using the > session-id/password. The session and its watches are gone for good. > 4. When a local session connects, the session info is only maintained > on the zookeeper server (in this case, an observer) that it is connected to. > The leader is not aware of the creation of such a session and there is no > state written to disk. > 5. The pings and expiration is handled by the server that the session > is connected to. > With the above changes, we can make ZooKeeper scale to a much larger number > of clients without making the core ensemble a bottleneck. > In terms of API, there are two options that are being considered > 1. Let the client specify at the connect time which kind of session do they > want. > 2. All sessions connect as local sessions and automatically get promoted to > global sessions when they do an operation that requires a global session > (e.g. creating an ephemeral node) > Chubby took the approach of lazily promoting all sessions to global, but I > don't think that would work in our case, where we want to keep sessions which > never create ephemeral nodes as always local. Option 2 would make it more > broadly usable but option 1 would be easier to implement. > We are thinking of implementing option 1 as the first cut. There would be a > client flag, IsLocalSession (much like the current readOnly flag) that would > be used to determine whether to create a local session or a global session. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1147) Add support for local sessions
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13553582#comment-13553582 ] Mahadev konar commented on ZOOKEEPER-1147: -- I started reviewing through the patch but I think we will need to add a little more details on the design to make further progress on this. There are quite a few cases that come up when we think about this, so a little more details on the design will go a long way. [~thawan] can we add some comments on the design (dont want to make too laborious an effort) but something which explains the whole end to end design - things like: - when is the session created - does the create of ephemeral node wait on the return for create session (at the follower) - what happens if the create for session is sent at server A and the client disconnects to some other server B which ends up sending it again and then disconnects and connects back to server A. - what happens to the local session once the global session is created? Would you be able to write a short design for this (couple of paragraphs should suffice as a comment on the jira)? > Add support for local sessions > -- > > Key: ZOOKEEPER-1147 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1147 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.3.3 >Reporter: Vishal Kathuria >Assignee: Thawan Kooburat > Labels: api-change, scaling > Fix For: 3.5.0 > > Attachments: ZOOKEEPER-1147.patch > > Original Estimate: 840h > Remaining Estimate: 840h > > This improvement is in the bucket of making ZooKeeper work at a large scale. > We are planning on having about a 1 million clients connect to a ZooKeeper > ensemble through a set of 50-100 observers. Majority of these clients are > read only - ie they do not do any updates or create ephemeral nodes. > In ZooKeeper today, the client creates a session and the session creation is > handled like any other update. In the above use case, the session create/drop > workload can easily overwhelm an ensemble. The following is a proposal for a > "local session", to support a larger number of connections. > 1. The idea is to introduce a new type of session - "local" session. A > "local" session doesn't have a full functionality of a normal session. > 2. Local sessions cannot create ephemeral nodes. > 3. Once a local session is lost, you cannot re-establish it using the > session-id/password. The session and its watches are gone for good. > 4. When a local session connects, the session info is only maintained > on the zookeeper server (in this case, an observer) that it is connected to. > The leader is not aware of the creation of such a session and there is no > state written to disk. > 5. The pings and expiration is handled by the server that the session > is connected to. > With the above changes, we can make ZooKeeper scale to a much larger number > of clients without making the core ensemble a bottleneck. > In terms of API, there are two options that are being considered > 1. Let the client specify at the connect time which kind of session do they > want. > 2. All sessions connect as local sessions and automatically get promoted to > global sessions when they do an operation that requires a global session > (e.g. creating an ephemeral node) > Chubby took the approach of lazily promoting all sessions to global, but I > don't think that would work in our case, where we want to keep sessions which > never create ephemeral nodes as always local. Option 2 would make it more > broadly usable but option 1 would be easier to implement. > We are thinking of implementing option 1 as the first cut. There would be a > client flag, IsLocalSession (much like the current readOnly flag) that would > be used to determine whether to create a local session or a global session. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1549) Data inconsistency when follower is receiving a DIFF with a dirty snapshot
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13553574#comment-13553574 ] Mahadev konar commented on ZOOKEEPER-1549: -- Thanks [~thawan]! > Data inconsistency when follower is receiving a DIFF with a dirty snapshot > -- > > Key: ZOOKEEPER-1549 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1549 > Project: ZooKeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.4.3 >Reporter: Jacky007 >Assignee: Thawan Kooburat >Priority: Blocker > Fix For: 3.4.6 > > Attachments: case.patch, ZOOKEEPER-1549-learner.patch > > > the trunc code (from ZOOKEEPER-1154?) cannot work correct if the snapshot is > not correct. > here is scenario(similar to 1154): > Initial Condition > 1.Lets say there are three nodes in the ensemble A,B,C with A being the > leader > 2.The current epoch is 7. > 3.For simplicity of the example, lets say zxid is a two digit number, > with epoch being the first digit. > 4.The zxid is 73 > 5.All the nodes have seen the change 73 and have persistently logged it. > Step 1 > Request with zxid 74 is issued. The leader A writes it to the log but there > is a crash of the entire ensemble and B,C never write the change 74 to their > log. > Step 2 > A,B restart, A is elected as the new leader, and A will load data and take a > clean snapshot(change 74 is in it), then send diff to B, but B died before > sync with A. A died later. > Step 3 > B,C restart, A is still down > B,C form the quorum > B is the new leader. Lets say B minCommitLog is 71 and maxCommitLog is 73 > epoch is now 8, zxid is 80 > Request with zxid 81 is successful. On B, minCommitLog is now 71, > maxCommitLog is 81 > Step 4 > A starts up. It applies the change in request with zxid 74 to its in-memory > data tree > A contacts B to registerAsFollower and provides 74 as its ZxId > Since 71<=74<=81, B decides to send A the diff. > Problem: > The problem with the above sequence is that after truncate the log, A will > load the snapshot again which is not correct. > In 3.3 branch, FileTxnSnapLog.restore does not call listener(ZOOKEEPER-874), > the leader will send a snapshot to follower, it will not be a problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1549) Data inconsistency when follower is receiving a DIFF with a dirty snapshot
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-1549: - Assignee: Thawan Kooburat > Data inconsistency when follower is receiving a DIFF with a dirty snapshot > -- > > Key: ZOOKEEPER-1549 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1549 > Project: ZooKeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.4.3 >Reporter: Jacky007 >Assignee: Thawan Kooburat >Priority: Blocker > Fix For: 3.4.6 > > Attachments: case.patch, ZOOKEEPER-1549-learner.patch > > > the trunc code (from ZOOKEEPER-1154?) cannot work correct if the snapshot is > not correct. > here is scenario(similar to 1154): > Initial Condition > 1.Lets say there are three nodes in the ensemble A,B,C with A being the > leader > 2.The current epoch is 7. > 3.For simplicity of the example, lets say zxid is a two digit number, > with epoch being the first digit. > 4.The zxid is 73 > 5.All the nodes have seen the change 73 and have persistently logged it. > Step 1 > Request with zxid 74 is issued. The leader A writes it to the log but there > is a crash of the entire ensemble and B,C never write the change 74 to their > log. > Step 2 > A,B restart, A is elected as the new leader, and A will load data and take a > clean snapshot(change 74 is in it), then send diff to B, but B died before > sync with A. A died later. > Step 3 > B,C restart, A is still down > B,C form the quorum > B is the new leader. Lets say B minCommitLog is 71 and maxCommitLog is 73 > epoch is now 8, zxid is 80 > Request with zxid 81 is successful. On B, minCommitLog is now 71, > maxCommitLog is 81 > Step 4 > A starts up. It applies the change in request with zxid 74 to its in-memory > data tree > A contacts B to registerAsFollower and provides 74 as its ZxId > Since 71<=74<=81, B decides to send A the diff. > Problem: > The problem with the above sequence is that after truncate the log, A will > load the snapshot again which is not correct. > In 3.3 branch, FileTxnSnapLog.restore does not call listener(ZOOKEEPER-874), > the leader will send a snapshot to follower, it will not be a problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1549) Data inconsistency when follower is receiving a DIFF with a dirty snapshot
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-1549: - Fix Version/s: 3.4.6 > Data inconsistency when follower is receiving a DIFF with a dirty snapshot > -- > > Key: ZOOKEEPER-1549 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1549 > Project: ZooKeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.4.3 >Reporter: Jacky007 >Priority: Blocker > Fix For: 3.4.6 > > Attachments: case.patch, ZOOKEEPER-1549-learner.patch > > > the trunc code (from ZOOKEEPER-1154?) cannot work correct if the snapshot is > not correct. > here is scenario(similar to 1154): > Initial Condition > 1.Lets say there are three nodes in the ensemble A,B,C with A being the > leader > 2.The current epoch is 7. > 3.For simplicity of the example, lets say zxid is a two digit number, > with epoch being the first digit. > 4.The zxid is 73 > 5.All the nodes have seen the change 73 and have persistently logged it. > Step 1 > Request with zxid 74 is issued. The leader A writes it to the log but there > is a crash of the entire ensemble and B,C never write the change 74 to their > log. > Step 2 > A,B restart, A is elected as the new leader, and A will load data and take a > clean snapshot(change 74 is in it), then send diff to B, but B died before > sync with A. A died later. > Step 3 > B,C restart, A is still down > B,C form the quorum > B is the new leader. Lets say B minCommitLog is 71 and maxCommitLog is 73 > epoch is now 8, zxid is 80 > Request with zxid 81 is successful. On B, minCommitLog is now 71, > maxCommitLog is 81 > Step 4 > A starts up. It applies the change in request with zxid 74 to its in-memory > data tree > A contacts B to registerAsFollower and provides 74 as its ZxId > Since 71<=74<=81, B decides to send A the diff. > Problem: > The problem with the above sequence is that after truncate the log, A will > load the snapshot again which is not correct. > In 3.3 branch, FileTxnSnapLog.restore does not call listener(ZOOKEEPER-874), > the leader will send a snapshot to follower, it will not be a problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1603) StaticHostProviderTest testUpdateClientMigrateOrNot hangs
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13536229#comment-13536229 ] Mahadev konar commented on ZOOKEEPER-1603: -- Pat, Not sure why we had this. Seems like an over sight. > StaticHostProviderTest testUpdateClientMigrateOrNot hangs > - > > Key: ZOOKEEPER-1603 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1603 > Project: ZooKeeper > Issue Type: Bug > Components: tests >Affects Versions: 3.5.0 >Reporter: Patrick Hunt >Assignee: Alexander Shraer >Priority: Blocker > Fix For: 3.5.0 > > Attachments: ZOOKEEPER-1603-ver1.patch, ZOOKEEPER-1603-ver2.patch > > > StaticHostProviderTest method testUpdateClientMigrateOrNot hangs forever. > On my laptop getHostName for 10.10.10.* takes 5+ seconds per call. As a > result this method effectively runs forever. > Every time I run this test it hangs. Consistent. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1504) Multi-thread NIOServerCnxn
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13534246#comment-13534246 ] Mahadev konar commented on ZOOKEEPER-1504: -- Pat, Makes sense. We can do it in a separate jira. > Multi-thread NIOServerCnxn > -- > > Key: ZOOKEEPER-1504 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1504 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.4.3, 3.4.4, 3.5.0 >Reporter: Jay Shrauner >Assignee: Jay Shrauner > Labels: performance, scaling > Fix For: 3.5.0 > > Attachments: ZOOKEEPER-1504.patch, ZOOKEEPER-1504.patch, > ZOOKEEPER-1504.patch, ZOOKEEPER-1504.patch, ZOOKEEPER-1504.patch > > > NIOServerCnxnFactory is single threaded, which doesn't scale well to large > numbers of clients. This is particularly noticeable when thousands of clients > connect. I propose multi-threading this code as follows: > - 1 acceptor thread, for accepting new connections > - 1-N selector threads > - 0-M I/O worker threads > Numbers of threads are configurable, with defaults scaling according to > number of cores. Communication with the selector threads is handled via > LinkedBlockingQueues, and connections are permanently assigned to a > particular selector thread so that all potentially blocking SelectionKey > operations can be performed solely by the selector thread. An ExecutorService > is used for the worker threads. > On a 32 core machine running Linux 2.6.38, achieved best performance with 4 > selector threads and 64 worker threads for a 70% +/- 5% improvement in > throughput. > This patch incorporates and supersedes the patches for > https://issues.apache.org/jira/browse/ZOOKEEPER-517 > https://issues.apache.org/jira/browse/ZOOKEEPER-1444 > New classes introduced in this patch are: > - ExpiryQueue (from ZOOKEEPER-1444): factor out the logic from > SessionTrackerImpl used to expire sessions so that the same logic can be used > to expire connections > - RateLogger (from ZOOKEEPER-517): rate limit error message logging, > currently only used to throttle rate of logging "out of file descriptors" > errors > - WorkerService (also in ZOOKEEPER-1505): ExecutorService wrapper that > makes worker threads daemon threads and names then in an easily debuggable > manner. Supports assignable threads (as used by CommitProcessor) and > non-assignable threads (as used here). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1572) Add an async interface for multi request
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13533712#comment-13533712 ] Mahadev konar commented on ZOOKEEPER-1572: -- Flavio/Sejie, I am taking a look at this. Might need a day or 2 (maximum until tuesday) to review this. > Add an async interface for multi request > > > Key: ZOOKEEPER-1572 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1572 > Project: ZooKeeper > Issue Type: Improvement > Components: java client >Reporter: Sijie Guo >Assignee: Sijie Guo > Fix For: 3.5.0 > > Attachments: ZOOKEEPER-1572.diff, ZOOKEEPER-1572.diff > > > Currently there is no async interface for multi request in ZooKeeper java > client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1572) Add an async interface for multi request
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-1572: - Fix Version/s: (was: 3.4.6) > Add an async interface for multi request > > > Key: ZOOKEEPER-1572 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1572 > Project: ZooKeeper > Issue Type: Improvement > Components: java client >Reporter: Sijie Guo >Assignee: Sijie Guo > Fix For: 3.5.0 > > Attachments: ZOOKEEPER-1572.diff, ZOOKEEPER-1572.diff > > > Currently there is no async interface for multi request in ZooKeeper java > client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1572) Add an async interface for multi request
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13533710#comment-13533710 ] Mahadev konar commented on ZOOKEEPER-1572: -- Removing it from 3.4 branch. We shouldnt commit new features in 3.4 branch. > Add an async interface for multi request > > > Key: ZOOKEEPER-1572 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1572 > Project: ZooKeeper > Issue Type: Improvement > Components: java client >Reporter: Sijie Guo >Assignee: Sijie Guo > Fix For: 3.5.0, 3.4.6 > > Attachments: ZOOKEEPER-1572.diff, ZOOKEEPER-1572.diff > > > Currently there is no async interface for multi request in ZooKeeper java > client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1574) mismatched CR/LF endings in text files
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13533708#comment-13533708 ] Mahadev konar commented on ZOOKEEPER-1574: -- Nikita/Raja, So we can just do a prop set and commit then? I tried this: find * | grep "java$" | xargs svn propset -R svn:eol-style native and its only changing the properties. Is this all we need to do on 3.4 and trunk? This is definitely better than committing the diff. > mismatched CR/LF endings in text files > -- > > Key: ZOOKEEPER-1574 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1574 > Project: ZooKeeper > Issue Type: Bug >Reporter: Raja Aluri >Assignee: Raja Aluri > Attachments: ZOOKEEPER-1574.branch-3.4.patch, > ZOOKEEPER-1574.trunk.patch > > > Source code in zookeeper repo has a bunch of files that have CRLF endings. > With more development happening on windows there is a higher chance of more > CRLF files getting into the source tree. > I would like to avoid that by creating .gitattributes file which prevents > sources from having CRLF entries in text files. > But before adding the .gitattributes file we need to normalize the existing > tree, so that people when they sync after .giattributes change wont end up > with a bunch of modified files in their workspace. > I am adding a couple of links here to give more primer on what exactly is the > issue and how we are trying to fix it. > [http://git-scm.com/docs/gitattributes#_checking_out_and_checking_in] > [http://stackoverflow.com/questions/170961/whats-the-best-crlf-handling-strategy-with-git] > I will submit a separate bug and patch for .gitattributes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1578) org.apache.zookeeper.server.quorum.Zab1_0Test failed due to hard code with 33556 port
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13533695#comment-13533695 ] Mahadev konar commented on ZOOKEEPER-1578: -- +1 the patch looks good. > org.apache.zookeeper.server.quorum.Zab1_0Test failed due to hard code with > 33556 port > - > > Key: ZOOKEEPER-1578 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1578 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.3 >Reporter: Li Ping Zhang >Assignee: Li Ping Zhang > Labels: patch > Attachments: ZOOKEEPER-1578-branch-3.4.patch, > ZOOKEEPER-1578-trunk.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > org.apache.zookeeper.server.quorum.Zab1_0Test was failed both with SUN JDK > and open JDK. > [junit] Running org.apache.zookeeper.server.quorum.Zab1_0Test > [junit] Tests run: 8, Failures: 0, Errors: 1, Time elapsed: 18.334 sec > [junit] Test org.apache.zookeeper.server.quorum.Zab1_0Test FAILED > Zab1_0Test log: > Zab1_0Test log: > 2012-07-11 23:17:15,579 [myid:] - INFO [main:Leader@427] - Shutdown called > java.lang.Exception: shutdown Leader! reason: end of test > at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:427) > at > org.apache.zookeeper.server.quorum.Zab1_0Test.testLastAcceptedEpoch(Zab1_0Test.java:211) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:48) > 2012-07-11 23:17:15,584 [myid:] - ERROR [main:Leader@139] - Couldn't bind to > port 33556 > java.net.BindException: Address already in use > at java.net.PlainSocketImpl.bind(PlainSocketImpl.java:402) > at java.net.ServerSocket.bind(ServerSocket.java:328) > at java.net.ServerSocket.bind(ServerSocket.java:286) > at org.apache.zookeeper.server.quorum.Leader.(Leader.java:137) > at > org.apache.zookeeper.server.quorum.Zab1_0Test.createLeader(Zab1_0Test.java:810) > at > org.apache.zookeeper.server.quorum.Zab1_0Test.testLeaderInElectingFollowers(Zab1_0Test.java:224) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2012-07-11 23:17:20,202 [myid:] - ERROR > [LearnerHandler-bdvm039.svl.ibm.com/9.30.122.48:40153:LearnerHandler@559] - > Unex > pected exception causing shutdown while sock still open > java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.read(SocketInputStream.java:129) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read(BufferedInputStream.java:237) > at java.io.DataInputStream.readInt(DataInputStream.java:370) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83) > at > org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) > at > org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:291) > 2012-07-11 23:17:20,203 [myid:] - WARN > [LearnerHandler-bdvm039.svl.ibm.com/9.30.122.48:40153:LearnerHandler@569] - > > *** GOODBYE bdvm039.svl.ibm.com/9.30.122.48:40153 > 2012-07-11 23:17:20,204 [myid:] - INFO [Thread-20:Leader@421] - Shutting down > 2012-07-11 23:17:20,204 [myid:] - INFO [Thread-20:Leader@427] - Shutdown > called > java.lang.Exception: shutdown Leader! reason: lead ended > this failure seems 33556 port is already used, but it is not in use with > command check in fact. There is a hard code in unit test, we can improve it > with code patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1569) support upsert: setData if the node exists, otherwise, create a new node
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13533692#comment-13533692 ] Mahadev konar commented on ZOOKEEPER-1569: -- Jimmy, Can you please explain the semantics of such an operation? What would a return value be? When would this operation fail? When would it succeed? > support upsert: setData if the node exists, otherwise, create a new node > > > Key: ZOOKEEPER-1569 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1569 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang > Attachments: zk-1569.patch, zk-1569_v1.1.patch, zk-1569_v2.patch > > > Currently, ZooKeeper supports setData and create. If it can support upsert > like in SQL, it will be great. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1504) Multi-thread NIOServerCnxn
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13533687#comment-13533687 ] Mahadev konar commented on ZOOKEEPER-1504: -- Thawan, I was looking at the patch and it looks like you always have one acceptor thread. Is one acceptor thread enough when we have 1000's of immediate connections to the ZK servers in case of bootstrap or network glitches? Did you never see an issue with this? Read through the patch as well. Looks good to me otherwise. > Multi-thread NIOServerCnxn > -- > > Key: ZOOKEEPER-1504 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1504 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.4.3, 3.4.4, 3.5.0 >Reporter: Jay Shrauner >Assignee: Jay Shrauner > Labels: performance, scaling > Fix For: 3.5.0 > > Attachments: ZOOKEEPER-1504.patch, ZOOKEEPER-1504.patch, > ZOOKEEPER-1504.patch, ZOOKEEPER-1504.patch, ZOOKEEPER-1504.patch > > > NIOServerCnxnFactory is single threaded, which doesn't scale well to large > numbers of clients. This is particularly noticeable when thousands of clients > connect. I propose multi-threading this code as follows: > - 1 acceptor thread, for accepting new connections > - 1-N selector threads > - 0-M I/O worker threads > Numbers of threads are configurable, with defaults scaling according to > number of cores. Communication with the selector threads is handled via > LinkedBlockingQueues, and connections are permanently assigned to a > particular selector thread so that all potentially blocking SelectionKey > operations can be performed solely by the selector thread. An ExecutorService > is used for the worker threads. > On a 32 core machine running Linux 2.6.38, achieved best performance with 4 > selector threads and 64 worker threads for a 70% +/- 5% improvement in > throughput. > This patch incorporates and supersedes the patches for > https://issues.apache.org/jira/browse/ZOOKEEPER-517 > https://issues.apache.org/jira/browse/ZOOKEEPER-1444 > New classes introduced in this patch are: > - ExpiryQueue (from ZOOKEEPER-1444): factor out the logic from > SessionTrackerImpl used to expire sessions so that the same logic can be used > to expire connections > - RateLogger (from ZOOKEEPER-517): rate limit error message logging, > currently only used to throttle rate of logging "out of file descriptors" > errors > - WorkerService (also in ZOOKEEPER-1505): ExecutorService wrapper that > makes worker threads daemon threads and names then in an easily debuggable > manner. Supports assignable threads (as used by CommitProcessor) and > non-assignable threads (as used here). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1480) ClientCnxn(1161) can't get the current zk server add, so that - Session 0x for server null, unexpected error
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13533678#comment-13533678 ] Mahadev konar commented on ZOOKEEPER-1480: -- Hey Leader, There are quite a few chinese characters in the patch. Can you please remove those? Also, can you please create a patch against trunk? Thanks > ClientCnxn(1161) can't get the current zk server add, so that - Session 0x > for server null, unexpected error > > > Key: ZOOKEEPER-1480 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1480 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.3 >Reporter: Leader Ni >Assignee: Leader Ni > Labels: client, getCurrentZooKeeperAddr > Fix For: 3.5.0 > > Attachments: getCurrentZooKeeperAddr_for_3.4.3.patch, > getCurrentZooKeeperAddr_for_branch3.4.patch > > > When zookeeper occur an unexpected error( Not SessionExpiredException, > SessionTimeoutException and EndOfStreamException), ClientCnxn(1161) will log > such as the formart "Session 0x for server null, unexpected error, closing > socket connection and attempting reconnect ". The log at line 1161 in > zookeeper-3.3.3 > We found that, zookeeper use > "((SocketChannel)sockKey.channel()).socket().getRemoteSocketAddress()" to get > zookeeper addr. But,Sometimes, it logs "Session 0x for server null", you > know, if log null, developer can't determine the current zookeeper addr that > client is connected or connecting. > I add a method in Class SendThread:InetSocketAddress > org.apache.zookeeper.ClientCnxn.SendThread.getCurrentZooKeeperAddr(). > Here: > /** > * Returns the address to which the socket is connected. > * > * @return ip address of the remote side of the connection or null if not > * connected > */ > @Override > SocketAddress getRemoteSocketAddress() { >// a lot could go wrong here, so rather than put in a bunch of code >// to check for nulls all down the chain let's do it the simple >// yet bulletproof way > . -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (ZOOKEEPER-1552) Enable sync request processor in Observer
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13533676#comment-13533676 ] Mahadev konar edited comment on ZOOKEEPER-1552 at 12/17/12 6:33 AM: Thawan, This is a good idea. As for the patch, I think we have too many system properties spread around in the source code. Its best if we can use the ZooKeeper config file for this. What do others think? Other than that, the patch looks good. was (Author: mahadev): Thawan, This is a good idea. As for the patch, I think we have too many system properties spread around in the source code. Its best if we can use the ZooKeeper config file for this. What do others think? > Enable sync request processor in Observer > - > > Key: ZOOKEEPER-1552 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1552 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.4.3 >Reporter: Thawan Kooburat >Assignee: Thawan Kooburat > Fix For: 3.5.0 > > Attachments: ZOOKEEPER-1552.patch, ZOOKEEPER-1552.patch > > > Observer doesn't forward its txns to SyncRequestProcessor. So it never > persists the txns onto disk or periodically creates snapshots. This increases > the start-up time since it will get the entire snapshot if the observer has > be running for a long time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1552) Enable sync request processor in Observer
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13533676#comment-13533676 ] Mahadev konar commented on ZOOKEEPER-1552: -- Thawan, This is a good idea. As for the patch, I think we have too many system properties spread around in the source code. Its best if we can use the ZooKeeper config file for this. What do others think? > Enable sync request processor in Observer > - > > Key: ZOOKEEPER-1552 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1552 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.4.3 >Reporter: Thawan Kooburat >Assignee: Thawan Kooburat > Fix For: 3.5.0 > > Attachments: ZOOKEEPER-1552.patch, ZOOKEEPER-1552.patch > > > Observer doesn't forward its txns to SyncRequestProcessor. So it never > persists the txns onto disk or periodically creates snapshots. This increases > the start-up time since it will get the entire snapshot if the observer has > be running for a long time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1488) Some links are not working in the Zookeeper Documentation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13533674#comment-13533674 ] Mahadev konar commented on ZOOKEEPER-1488: -- bq. By the way, I have just seen that the PDF generated in the in the docs section still has a 2008 copyright notice ("Copyright © 2008 The Apache Software Foundation. All rights reserved"). Should I open a ticket to update this? Or may I try to include in this patch? Thanks for pointing that out Edward. Please open a jira for that. > Some links are not working in the Zookeeper Documentation > - > > Key: ZOOKEEPER-1488 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1488 > Project: ZooKeeper > Issue Type: Bug > Components: documentation >Affects Versions: 3.4.3 >Reporter: Kiran BC >Assignee: Edward Ribeiro >Priority: Minor > Attachments: ZOOKEEPER-1488.patch, ZOOKEEPER-1488.patch > > > There are some internal link errors in the Zookeeper documentation. The list > is as follows: > docs\zookeeperAdmin.html -> tickTime and datadir > docs\zookeeperOver.html -> fg_zkComponents, fg_zkPerfReliability and > fg_zkPerfRW > docs\zookeeperStarted.html -> Logging -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1593) Add Debian style /etc/default/zookeeper support to init script
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13533673#comment-13533673 ] Mahadev konar commented on ZOOKEEPER-1593: -- Michi/Dirkjan, Unfortunately these package files are mostly unused and we probably should be getting rid of them given BigTop is doing all the packaging work. Dirkjan are you using the packaging in production? Do you think BigTop packaging might be of help to you? > Add Debian style /etc/default/zookeeper support to init script > -- > > Key: ZOOKEEPER-1593 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1593 > Project: ZooKeeper > Issue Type: Improvement > Components: scripts >Affects Versions: 3.4.5 > Environment: Debian Linux 6.0 >Reporter: Dirkjan Bussink >Priority: Minor > Attachments: zookeeper_debian_default.patch > > > In our configuration we use a different data directory for Zookeeper. The > problem is that the current Debian init.d script has the default location > hardcoded: > ZOOPIDDIR=/var/lib/zookeeper/data > ZOOPIDFILE=${ZOOPIDDIR}/zookeeper_server.pid > By using the standard Debian practice of allowing for a > /etc/default/zookeeper we can redefine these variables to point to the > correct location: > ZOOPIDDIR=/var/lib/zookeeper/data > ZOOPIDFILE=${ZOOPIDDIR}/zookeeper_server.pid > [ -r /etc/default/zookeeper ] && . /etc/default/zookeeper > This currently can't be done through /usr/libexec/zkEnv.sh, since that is > loaded before ZOOPIDDIR and ZOOPIDFILE are set. Any change there would > therefore undo the setup made in for example /etc/zookeeper/zookeeper-env.sh. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1335) Add support for --config to zkEnv.sh to specify a config directory different than what is expected
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13533666#comment-13533666 ] Mahadev konar commented on ZOOKEEPER-1335: -- +1 for the patch. Looks good to me. Pat doesnt look like we have much documentation in forrest for zkServer.sh so I dont think we need any forrest docs update. > Add support for --config to zkEnv.sh to specify a config directory different > than what is expected > -- > > Key: ZOOKEEPER-1335 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1335 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Arpit Gupta >Assignee: Arpit Gupta > Fix For: 3.5.0 > > Attachments: ZOOKEEPER-1335.patch, ZOOKEEPER-1335.patch > > > zkEnv.sh expects ZOOCFGDIR env variable set. If not it looks for the conf dir > in the ZOOKEEPER_PREFIX dir or in /etc/zookeeper. It would be great if we can > support --config option where at run time you could specify a different > config directory. We do the same thing in hadoop. > With this you should be able to do > /usr/sbin/zkServer.sh --config /some/conf/dir start|stop -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-575) remove System.exit calls to make the server more container friendly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-575: Attachment: ZOOKEEPER-575_4.patch Updated the patch for trunk. This would be really be nice to get in and make it cleaner to embed ZK. > remove System.exit calls to make the server more container friendly > --- > > Key: ZOOKEEPER-575 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-575 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.4.0 >Reporter: Patrick Hunt >Assignee: Andrew Finnell > Fix For: 3.5.0 > > Attachments: ZOOKEEPER-575-2.patch, ZOOKEEPER-575-3.patch, > ZOOKEEPER-575_4.patch, ZOOKEEPER-575.patch > > > There are a handful of places left in the code that still use System.exit, we > should remove these to make the server > more container friendly. > There are some legitimate places for the exits - in *Main.java for example > should be fine - these are the command > line main routines. Containers should be embedding code that runs just below > this layer (or we should refactor > so that it would). > The tricky bit is ensuring the server shuts down in case of an unrecoverable > error occurring, afaik these are the > locations where we still have sys exit calls. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Subject: [ANNOUNCE] Apache ZooKeeper 3.4.5
Hi Jordan, Looks like I forgot to release from the nexus repo. Just did it. Please check again. thanks mahadev On Mon, Nov 19, 2012 at 10:56 AM, Jordan Zimmerman wrote: > I still don't see the artifacts on Maven Central. It usually doesn't take > this long. > > -JZ > > On Nov 18, 2012, at 5:15 PM, Mahadev Konar wrote: > >> Also, >> I have published the artifacts to maven. Do let me know if you see >> any issues with that. >> >> thanks >> mahadev >> >> On Sun, Nov 18, 2012 at 5:09 PM, Mahadev Konar >> wrote: >>> Please ignore the "subject" in the subject. Too much copy paste :). >>> >>> thanks >>> mahadev >>> >>> >>> On Sun, Nov 18, 2012 at 5:06 PM, Mahadev Konar >>> wrote: >>>> The Apache ZooKeeper team is proud to announce Apache ZooKeeper version >>>> 3.4.5 >>>> >>>> ZooKeeper is a high-performance coordination service for distributed >>>> applications. It exposes common services - such as naming, >>>> configuration management, synchronization, and group services - in a >>>> simple interface so you don't have to write them from scratch. You can >>>> use it off-the-shelf to implement consensus, group management, leader >>>> election, and presence protocols. And you can build on it for your >>>> own, specific needs. >>>> >>>> For ZooKeeper release details and downloads, visit: >>>> http://zookeeper.apache.org/releases.html >>>> >>>> ZooKeeper 3.4.5 Release Notes are at: >>>> http://zookeeper.apache.org/doc/r3.4.5/releasenotes.html >>>> >>>> >>>> thanks >>>> mahadev >>>> >>>> We would like to thank the contributors that made the release possible. >>>> >>>> Regards, >>>> >>>> The ZooKeeper Team >
Re: Subject: [ANNOUNCE] Apache ZooKeeper 3.4.5
Also, I have published the artifacts to maven. Do let me know if you see any issues with that. thanks mahadev On Sun, Nov 18, 2012 at 5:09 PM, Mahadev Konar wrote: > Please ignore the "subject" in the subject. Too much copy paste :). > > thanks > mahadev > > > On Sun, Nov 18, 2012 at 5:06 PM, Mahadev Konar > wrote: >> The Apache ZooKeeper team is proud to announce Apache ZooKeeper version 3.4.5 >> >> ZooKeeper is a high-performance coordination service for distributed >> applications. It exposes common services - such as naming, >> configuration management, synchronization, and group services - in a >> simple interface so you don't have to write them from scratch. You can >> use it off-the-shelf to implement consensus, group management, leader >> election, and presence protocols. And you can build on it for your >> own, specific needs. >> >> For ZooKeeper release details and downloads, visit: >> http://zookeeper.apache.org/releases.html >> >> ZooKeeper 3.4.5 Release Notes are at: >> http://zookeeper.apache.org/doc/r3.4.5/releasenotes.html >> >> >> thanks >> mahadev >> >> We would like to thank the contributors that made the release possible. >> >> Regards, >> >> The ZooKeeper Team
Re: Subject: [ANNOUNCE] Apache ZooKeeper 3.4.5
Please ignore the "subject" in the subject. Too much copy paste :). thanks mahadev On Sun, Nov 18, 2012 at 5:06 PM, Mahadev Konar wrote: > The Apache ZooKeeper team is proud to announce Apache ZooKeeper version 3.4.5 > > ZooKeeper is a high-performance coordination service for distributed > applications. It exposes common services - such as naming, > configuration management, synchronization, and group services - in a > simple interface so you don't have to write them from scratch. You can > use it off-the-shelf to implement consensus, group management, leader > election, and presence protocols. And you can build on it for your > own, specific needs. > > For ZooKeeper release details and downloads, visit: > http://zookeeper.apache.org/releases.html > > ZooKeeper 3.4.5 Release Notes are at: > http://zookeeper.apache.org/doc/r3.4.5/releasenotes.html > > > thanks > mahadev > > We would like to thank the contributors that made the release possible. > > Regards, > > The ZooKeeper Team
Subject: [ANNOUNCE] Apache ZooKeeper 3.4.5
The Apache ZooKeeper team is proud to announce Apache ZooKeeper version 3.4.5 ZooKeeper is a high-performance coordination service for distributed applications. It exposes common services - such as naming, configuration management, synchronization, and group services - in a simple interface so you don't have to write them from scratch. You can use it off-the-shelf to implement consensus, group management, leader election, and presence protocols. And you can build on it for your own, specific needs. For ZooKeeper release details and downloads, visit: http://zookeeper.apache.org/releases.html ZooKeeper 3.4.5 Release Notes are at: http://zookeeper.apache.org/doc/r3.4.5/releasenotes.html thanks mahadev We would like to thank the contributors that made the release possible. Regards, The ZooKeeper Team
Re: [VOTE] Release ZooKeeper 3.4.5 (candidate 1)
Nothing as such. With 6 +1's and 4 binding the vote passes. I will be updating the release artifacts tonight or in case I get tired and fall asleep, itll be tommorrow. thanks maahdev
Re: [VOTE] Release ZooKeeper 3.4.5 (candidate 1)
Thanks Pat and Jimmy! mahadev On Wed, Nov 14, 2012 at 11:35 AM, Jimmy Xiang wrote: > Of course, with ZK 3.4.5 RC 1. I verified there is only this version > of zk jar in the classpath for both HBase and HDFS. > > On Wed, Nov 14, 2012 at 11:34 AM, Jimmy Xiang wrote: >> I tested it with JDK 1.7_9 on a live HBase cluster (trunk version, 1 >> master and 4 region servers) and it went very well. The cluster >> started up ok. I created a table, loaded around 90k records, regions >> split/assigned properly. >> >> Thanks, >> Jimmy >> >> >> >> On Wed, Nov 14, 2012 at 11:22 AM, Patrick Hunt wrote: >>> Jimmy mentioned that he might have some time to try it out with hbase >>> - Jimmy how did your testing go? >>> >>> Patrick >>> >>> On Wed, Nov 14, 2012 at 10:31 AM, Mahadev Konar >>> wrote: >>>> Thanks Ted! >>>> >>>> mahadev >>>> >>>> >>>> On Tue, Nov 13, 2012 at 5:02 PM, Ted Yu wrote: >>>>> Using jdk 1.7 u9, I saw the following test failures: >>>>> >>>>> Failed tests: >>>>> testRSSplitEphemeralsDisappearButDaughtersAreOnlinedAfterShutdownHandling(org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster) >>>>> >>>>> testMultiRowMutationMultiThreads(org.apache.hadoop.hbase.regionserver.TestAtomicOperation): >>>>> expected:<0> but was:<1> >>>>> queueFailover(org.apache.hadoop.hbase.replication.TestReplication): >>>>> Waited too much time for queueFailover replication. Waited 74466ms. >>>>> >>>>> Tests in error: >>>>> Broken_testSync(org.apache.hadoop.hbase.regionserver.wal.TestHLog): >>>>> Error >>>>> Recovery for block blk_-3290996327764601512_1015 failed because recovery >>>>> from primary datanode 127.0.0.1:53866 failed 6 times. Pipeline was >>>>> 127.0.0.1:53866. Aborting... >>>>> testSplit(org.apache.hadoop.hbase.regionserver.wal.TestHLog): 3 >>>>> exceptions [org.apache.hadoop.ipc.RemoteException: >>>>> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on >>>>> /user/hduser/hbase/TestHLog/3d02052e6bcac5f74e57d2a75e6bf583/recovered.edits/004.temp >>>>> File is not open for writing. Holder DFSClient_1365323924 does not have >>>>> any >>>>> open files.(..) >>>>> >>>>> They passed when I ran them standalone. queueFailover has been a flaky >>>>> test. >>>>> >>>>> FYI >>>>> >>>>> On Tue, Nov 13, 2012 at 4:15 PM, Ted Yu wrote: >>>>> >>>>>> I have run HBase trunk test suite with jdk 1.6 using zookeeper 3.4.5 RC1 >>>>>> in local maven repo. >>>>>> Tests passed. >>>>>> >>>>>> Cheers >>>>>> >>>>>> >>>>>> On Tue, Nov 13, 2012 at 3:16 PM, Mahadev Konar >>>>>> wrote: >>>>>> >>>>>>> Anyone from hbase team wants to try it out before we close the vote? >>>>>>> Looks like Roman did some basic testing with HBase, so thats helpful. >>>>>>> >>>>>>> thanks >>>>>>> mahadev >>>>>>> >>>>>>> >>>>>>> On Mon, Nov 12, 2012 at 8:54 AM, Roman Shaposhnik >>>>>>> wrote: >>>>>>> > On Mon, Nov 5, 2012 at 12:20 AM, Mahadev Konar >>>>>>> > >>>>>>> wrote: >>>>>>> >> Hi all, >>>>>>> >> >>>>>>> >> I have created a candidate build for ZooKeeper 3.4.5. This includes >>>>>>> >> the fix for ZOOKEEPER-1560. >>>>>>> >> Please take a look at the release notes for the jira list. >>>>>>> >> >>>>>>> >> *** Please download, test and VOTE before the >>>>>>> >> *** vote closes 12:00 midnight PT on Friday, Nov 9th.*** >>>>>>> >> >>>>>>> >> http://people.apache.org/~mahadev/zookeeper-3.4.5-candidate-1/ >>>>>>> >> >>>>>>> >> Should we release this? >>>>>>> > >>>>>>> > +1 (non-binding) >>>>>>> > >>>>>>> > based on Bigtop testing (HBase 0.94.2, Hadoop 2.0.2-alpha, Giraph >>>>>>> > 0.2-SNAPSHOT, Solr 4.0.0) >>>>>>> > >>>>>>> > Thanks, >>>>>>> > Roman. >>>>>>> >>>>>> >>>>>>
Re: [VOTE] Release ZooKeeper 3.4.5 (candidate 1)
Thanks Ted! mahadev On Tue, Nov 13, 2012 at 5:02 PM, Ted Yu wrote: > Using jdk 1.7 u9, I saw the following test failures: > > Failed tests: > testRSSplitEphemeralsDisappearButDaughtersAreOnlinedAfterShutdownHandling(org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster) > > testMultiRowMutationMultiThreads(org.apache.hadoop.hbase.regionserver.TestAtomicOperation): > expected:<0> but was:<1> > queueFailover(org.apache.hadoop.hbase.replication.TestReplication): > Waited too much time for queueFailover replication. Waited 74466ms. > > Tests in error: > Broken_testSync(org.apache.hadoop.hbase.regionserver.wal.TestHLog): Error > Recovery for block blk_-3290996327764601512_1015 failed because recovery > from primary datanode 127.0.0.1:53866 failed 6 times. Pipeline was > 127.0.0.1:53866. Aborting... > testSplit(org.apache.hadoop.hbase.regionserver.wal.TestHLog): 3 > exceptions [org.apache.hadoop.ipc.RemoteException: > org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on > /user/hduser/hbase/TestHLog/3d02052e6bcac5f74e57d2a75e6bf583/recovered.edits/004.temp > File is not open for writing. Holder DFSClient_1365323924 does not have any > open files.(..) > > They passed when I ran them standalone. queueFailover has been a flaky test. > > FYI > > On Tue, Nov 13, 2012 at 4:15 PM, Ted Yu wrote: > >> I have run HBase trunk test suite with jdk 1.6 using zookeeper 3.4.5 RC1 >> in local maven repo. >> Tests passed. >> >> Cheers >> >> >> On Tue, Nov 13, 2012 at 3:16 PM, Mahadev Konar >> wrote: >> >>> Anyone from hbase team wants to try it out before we close the vote? >>> Looks like Roman did some basic testing with HBase, so thats helpful. >>> >>> thanks >>> mahadev >>> >>> >>> On Mon, Nov 12, 2012 at 8:54 AM, Roman Shaposhnik wrote: >>> > On Mon, Nov 5, 2012 at 12:20 AM, Mahadev Konar >>> wrote: >>> >> Hi all, >>> >> >>> >> I have created a candidate build for ZooKeeper 3.4.5. This includes >>> >> the fix for ZOOKEEPER-1560. >>> >> Please take a look at the release notes for the jira list. >>> >> >>> >> *** Please download, test and VOTE before the >>> >> *** vote closes 12:00 midnight PT on Friday, Nov 9th.*** >>> >> >>> >> http://people.apache.org/~mahadev/zookeeper-3.4.5-candidate-1/ >>> >> >>> >> Should we release this? >>> > >>> > +1 (non-binding) >>> > >>> > based on Bigtop testing (HBase 0.94.2, Hadoop 2.0.2-alpha, Giraph >>> > 0.2-SNAPSHOT, Solr 4.0.0) >>> > >>> > Thanks, >>> > Roman. >>> >> >>
Re: [VOTE] Release ZooKeeper 3.4.5 (candidate 1)
Anyone from hbase team wants to try it out before we close the vote? Looks like Roman did some basic testing with HBase, so thats helpful. thanks mahadev On Mon, Nov 12, 2012 at 8:54 AM, Roman Shaposhnik wrote: > On Mon, Nov 5, 2012 at 12:20 AM, Mahadev Konar > wrote: >> Hi all, >> >> I have created a candidate build for ZooKeeper 3.4.5. This includes >> the fix for ZOOKEEPER-1560. >> Please take a look at the release notes for the jira list. >> >> *** Please download, test and VOTE before the >> *** vote closes 12:00 midnight PT on Friday, Nov 9th.*** >> >> http://people.apache.org/~mahadev/zookeeper-3.4.5-candidate-1/ >> >> Should we release this? > > +1 (non-binding) > > based on Bigtop testing (HBase 0.94.2, Hadoop 2.0.2-alpha, Giraph > 0.2-SNAPSHOT, Solr 4.0.0) > > Thanks, > Roman.
[VOTE] Release ZooKeeper 3.4.5 (candidate 1)
Hi all, I have created a candidate build for ZooKeeper 3.4.5. This includes the fix for ZOOKEEPER-1560. Please take a look at the release notes for the jira list. *** Please download, test and VOTE before the *** vote closes 12:00 midnight PT on Friday, Nov 9th.*** http://people.apache.org/~mahadev/zookeeper-3.4.5-candidate-1/ Should we release this? thanks mahadev
Re: [VOTE] Release ZooKeeper 3.4.5 (candidate 0)
Thanks Ted. Will review the changes over the weekend. Thanks again mahadev On Fri, Oct 12, 2012 at 1:12 PM, Ted Yu wrote: > Patch v7 for ZOOKEEPER-1560 passes test suite. > > Please take a look. > > On Thu, Oct 11, 2012 at 2:45 PM, Mahadev Konar wrote: > >> Thanks Alex for bringing it up. Ill hold the release for now. I see a >> patch on 1560. Ill take a look and we'll see how to roll this into >> 3.4.5. >> >> thanks >> mahadev >> >> On Thu, Oct 11, 2012 at 2:42 PM, Alexander Shraer >> wrote: >> > Hi Mahadev, >> > >> > ZOOKEEPER-1560 and ZOOKEEPER-1561 indicate a potentially serious issue, >> > introduced recently in ZOOKEEPER-1437. Please consider this w.r.t. the >> > 3.4.5 release. >> > >> > Best Regards, >> > Alex >> > >> > On Wed, Oct 10, 2012 at 10:38 PM, Mahadev Konar >> wrote: >> >> I think we have waited enough. Closing the vote now. >> >> >> >> With 5 +1's (3 binding) the vote passes. I will do the needful for >> >> getting the release out. >> >> >> >> Thanks for voting folks. >> >> >> >> mahadev >> >> >> >> On Wed, Oct 10, 2012 at 9:04 AM, Flavio Junqueira >> wrote: >> >>> +1 >> >>> >> >>> -Flavio >> >>> >> >>> On Oct 8, 2012, at 7:05 AM, Mahadev Konar wrote: >> >>> >> >>>> Given Eugene's findings on ZOOKEEPER-1557, I think we can continue >> >>>> rolling the current RC out. Others please vote on the thread if you >> >>>> see any issues with that. Folks who have already voted, please re vote >> >>>> in case you have a change of opinion. >> >>>> >> >>>> As for myself, I ran a couple of tests with the RC using open jdk 7 >> >>>> and things seem to work. >> >>>> >> >>>> +1 from my side. Pat/Ben/Flavio/others what do you guys think? >> >>>> >> >>>> thanks >> >>>> mahadev >> >>>> >> >>>> On Sun, Oct 7, 2012 at 8:34 AM, Ted Yu wrote: >> >>>>> Currently ZooKeeper_branch34_openjdk7 and ZooKeeper_branch34_jdk7 >> are using >> >>>>> lock ZooKeeper-solaris. >> >>>>> I think ZooKeeper_branch34_openjdk7 and ZooKeeper_branch34_jdk7 >> should use >> >>>>> a separate lock since they wouldn't run on a Solaris machine. >> >>>>> I didn't seem to find how a new lock name can be added. >> >>>>> >> >>>>> Recent builds for ZooKeeper_branch34_openjdk7 and >> ZooKeeper_branch34_jdk7 >> >>>>> have been green. >> >>>>> >> >>>>> Cheers >> >>>>> >> >>>>> On Sun, Oct 7, 2012 at 6:56 AM, Patrick Hunt >> wrote: >> >>>>> >> >>>>>> I've seen that before, it's a flakey test that's unrelated to the >> sasl >> >>>>>> stuff. >> >>>>>> >> >>>>>> Patrick >> >>>>>> >> >>>>>> On Sat, Oct 6, 2012 at 2:25 PM, Ted Yu wrote: >> >>>>>>> I saw one test failure: >> >>>>>>> >> >>>>>>> >> >>>>>> >> https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper_branch34_openjdk7/9/testReport/org.apache.zookeeper.server.quorum/QuorumPeerMainTest/testHighestZxidJoinLate/ >> >>>>>>> >> >>>>>>> FYI >> >>>>>>> >> >>>>>>> On Sat, Oct 6, 2012 at 7:16 AM, Ted Yu >> wrote: >> >>>>>>> >> >>>>>>>> Up in ZOOKEEPER-1557, Eugene separated one test out and test >> failure >> >>>>>> seems >> >>>>>>>> to be gone. >> >>>>>>>> >> >>>>>>>> For ZooKeeper_branch34_jdk7, the two failed builds: >> >>>>>>>> #10 corresponded to ZooKeeper_branch34_openjdk7 build #7, >> >>>>>>>> #8 corresponded to ZooKeeper_branch34_openjdk7 build #5 >> >>>>>>>> where tests failed due to BindException >> >>>>>>>> >> >>>>>>>>
Re: [VOTE] Release ZooKeeper 3.4.5 (candidate 0)
Thanks Alex for bringing it up. Ill hold the release for now. I see a patch on 1560. Ill take a look and we'll see how to roll this into 3.4.5. thanks mahadev On Thu, Oct 11, 2012 at 2:42 PM, Alexander Shraer wrote: > Hi Mahadev, > > ZOOKEEPER-1560 and ZOOKEEPER-1561 indicate a potentially serious issue, > introduced recently in ZOOKEEPER-1437. Please consider this w.r.t. the > 3.4.5 release. > > Best Regards, > Alex > > On Wed, Oct 10, 2012 at 10:38 PM, Mahadev Konar > wrote: >> I think we have waited enough. Closing the vote now. >> >> With 5 +1's (3 binding) the vote passes. I will do the needful for >> getting the release out. >> >> Thanks for voting folks. >> >> mahadev >> >> On Wed, Oct 10, 2012 at 9:04 AM, Flavio Junqueira wrote: >>> +1 >>> >>> -Flavio >>> >>> On Oct 8, 2012, at 7:05 AM, Mahadev Konar wrote: >>> >>>> Given Eugene's findings on ZOOKEEPER-1557, I think we can continue >>>> rolling the current RC out. Others please vote on the thread if you >>>> see any issues with that. Folks who have already voted, please re vote >>>> in case you have a change of opinion. >>>> >>>> As for myself, I ran a couple of tests with the RC using open jdk 7 >>>> and things seem to work. >>>> >>>> +1 from my side. Pat/Ben/Flavio/others what do you guys think? >>>> >>>> thanks >>>> mahadev >>>> >>>> On Sun, Oct 7, 2012 at 8:34 AM, Ted Yu wrote: >>>>> Currently ZooKeeper_branch34_openjdk7 and ZooKeeper_branch34_jdk7 are >>>>> using >>>>> lock ZooKeeper-solaris. >>>>> I think ZooKeeper_branch34_openjdk7 and ZooKeeper_branch34_jdk7 should use >>>>> a separate lock since they wouldn't run on a Solaris machine. >>>>> I didn't seem to find how a new lock name can be added. >>>>> >>>>> Recent builds for ZooKeeper_branch34_openjdk7 and ZooKeeper_branch34_jdk7 >>>>> have been green. >>>>> >>>>> Cheers >>>>> >>>>> On Sun, Oct 7, 2012 at 6:56 AM, Patrick Hunt wrote: >>>>> >>>>>> I've seen that before, it's a flakey test that's unrelated to the sasl >>>>>> stuff. >>>>>> >>>>>> Patrick >>>>>> >>>>>> On Sat, Oct 6, 2012 at 2:25 PM, Ted Yu wrote: >>>>>>> I saw one test failure: >>>>>>> >>>>>>> >>>>>> https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper_branch34_openjdk7/9/testReport/org.apache.zookeeper.server.quorum/QuorumPeerMainTest/testHighestZxidJoinLate/ >>>>>>> >>>>>>> FYI >>>>>>> >>>>>>> On Sat, Oct 6, 2012 at 7:16 AM, Ted Yu wrote: >>>>>>> >>>>>>>> Up in ZOOKEEPER-1557, Eugene separated one test out and test failure >>>>>> seems >>>>>>>> to be gone. >>>>>>>> >>>>>>>> For ZooKeeper_branch34_jdk7, the two failed builds: >>>>>>>> #10 corresponded to ZooKeeper_branch34_openjdk7 build #7, >>>>>>>> #8 corresponded to ZooKeeper_branch34_openjdk7 build #5 >>>>>>>> where tests failed due to BindException >>>>>>>> >>>>>>>> Cheers >>>>>>>> >>>>>>>> >>>>>>>> On Sat, Oct 6, 2012 at 7:06 AM, Patrick Hunt wrote: >>>>>>>> >>>>>>>>> Yes. Those ubuntu machines have two slots each. If both tests run at >>>>>>>>> the same time... bam. >>>>>>>>> >>>>>>>>> I just added exclusion locks to the configuration of these two jobs, >>>>>>>>> that should help. >>>>>>>>> >>>>>>>>> Patrick >>>>>>>>> >>>>>>>>> On Fri, Oct 5, 2012 at 8:58 PM, Ted Yu wrote: >>>>>>>>>> I think that was due to the following running on the same machine at >>>>>> the >>>>>>>>>> same time: >>>>>>>>>> >>>>>>>>>> Building remotely on ubuntu4 >>>>>>>>>> <https://builds.apache
Re: [VOTE] Release ZooKeeper 3.4.5 (candidate 0)
I think we have waited enough. Closing the vote now. With 5 +1's (3 binding) the vote passes. I will do the needful for getting the release out. Thanks for voting folks. mahadev On Wed, Oct 10, 2012 at 9:04 AM, Flavio Junqueira wrote: > +1 > > -Flavio > > On Oct 8, 2012, at 7:05 AM, Mahadev Konar wrote: > >> Given Eugene's findings on ZOOKEEPER-1557, I think we can continue >> rolling the current RC out. Others please vote on the thread if you >> see any issues with that. Folks who have already voted, please re vote >> in case you have a change of opinion. >> >> As for myself, I ran a couple of tests with the RC using open jdk 7 >> and things seem to work. >> >> +1 from my side. Pat/Ben/Flavio/others what do you guys think? >> >> thanks >> mahadev >> >> On Sun, Oct 7, 2012 at 8:34 AM, Ted Yu wrote: >>> Currently ZooKeeper_branch34_openjdk7 and ZooKeeper_branch34_jdk7 are using >>> lock ZooKeeper-solaris. >>> I think ZooKeeper_branch34_openjdk7 and ZooKeeper_branch34_jdk7 should use >>> a separate lock since they wouldn't run on a Solaris machine. >>> I didn't seem to find how a new lock name can be added. >>> >>> Recent builds for ZooKeeper_branch34_openjdk7 and ZooKeeper_branch34_jdk7 >>> have been green. >>> >>> Cheers >>> >>> On Sun, Oct 7, 2012 at 6:56 AM, Patrick Hunt wrote: >>> >>>> I've seen that before, it's a flakey test that's unrelated to the sasl >>>> stuff. >>>> >>>> Patrick >>>> >>>> On Sat, Oct 6, 2012 at 2:25 PM, Ted Yu wrote: >>>>> I saw one test failure: >>>>> >>>>> >>>> https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper_branch34_openjdk7/9/testReport/org.apache.zookeeper.server.quorum/QuorumPeerMainTest/testHighestZxidJoinLate/ >>>>> >>>>> FYI >>>>> >>>>> On Sat, Oct 6, 2012 at 7:16 AM, Ted Yu wrote: >>>>> >>>>>> Up in ZOOKEEPER-1557, Eugene separated one test out and test failure >>>> seems >>>>>> to be gone. >>>>>> >>>>>> For ZooKeeper_branch34_jdk7, the two failed builds: >>>>>> #10 corresponded to ZooKeeper_branch34_openjdk7 build #7, >>>>>> #8 corresponded to ZooKeeper_branch34_openjdk7 build #5 >>>>>> where tests failed due to BindException >>>>>> >>>>>> Cheers >>>>>> >>>>>> >>>>>> On Sat, Oct 6, 2012 at 7:06 AM, Patrick Hunt wrote: >>>>>> >>>>>>> Yes. Those ubuntu machines have two slots each. If both tests run at >>>>>>> the same time... bam. >>>>>>> >>>>>>> I just added exclusion locks to the configuration of these two jobs, >>>>>>> that should help. >>>>>>> >>>>>>> Patrick >>>>>>> >>>>>>> On Fri, Oct 5, 2012 at 8:58 PM, Ted Yu wrote: >>>>>>>> I think that was due to the following running on the same machine at >>>> the >>>>>>>> same time: >>>>>>>> >>>>>>>> Building remotely on ubuntu4 >>>>>>>> <https://builds.apache.org/computer/ubuntu4> in workspace >>>>>>>> /home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_openjdk7 >>>>>>>> >>>>>>>> We should introduce randomized port so that test suite can execute in >>>>>>>> parallel. >>>>>>>> >>>>>>>> Cheers >>>>>>>> >>>>>>>> On Fri, Oct 5, 2012 at 8:55 PM, Ted Yu wrote: >>>>>>>> >>>>>>>>> Some tests failed in build 8 due to (See >>>>>>>>> >>>>>>>>> >>>>>>> >>>> https://builds.apache.org//view/S-Z/view/ZooKeeper/job/ZooKeeper_branch34_jdk7/8/testReport/org.apache.zookeeper.server/ZxidRolloverTest/testRolloverThenRestart/ >>>>>>> ): >>>>>>>>> >>>>>>>>> java.lang.RuntimeException: java.net.BindException: Address already >>>> in >>>>>>> use >>>>>>>>> at >>>>>>> org.apache.zookeeper.test.QuorumUtil.
[jira] [Commented] (ZOOKEEPER-1557) jenkins jdk7 test failure in testBadSaslAuthNotifiesWatch
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13471676#comment-13471676 ] Mahadev konar commented on ZOOKEEPER-1557: -- Thanks Eugene .. Interesting > jenkins jdk7 test failure in testBadSaslAuthNotifiesWatch > - > > Key: ZOOKEEPER-1557 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1557 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.5.0, 3.4.5 >Reporter: Patrick Hunt >Assignee: Eugene Koontz > Fix For: 3.5.0, 3.4.6 > > Attachments: jstack.out, SaslAuthFailTest.log, ZOOKEEPER-1557.patch > > > Failure of testBadSaslAuthNotifiesWatch on the jenkins jdk7 job: > https://builds.apache.org/job/ZooKeeper-trunk-jdk7/407/ > haven't seen this before. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [VOTE] Release ZooKeeper 3.4.5 (candidate 0)
erCnxnFactory.configure(NIOServerCnxnFactory.java:95) >> >>> >> at >> >>> >> org.apache.zookeeper.server.ServerCnxnFactory.createFactory(ServerCnxnFactory.java:125) >> >>> >> at >> >>> >> org.apache.zookeeper.server.quorum.QuorumPeer.(QuorumPeer.java:517) >> >>> >> at >> >>> org.apache.zookeeper.test.QuorumUtil.(QuorumUtil.java:113) >> >>> >> >> >>> >> >> >>> >> >> >>> >> On Fri, Oct 5, 2012 at 9:56 AM, Patrick Hunt >> wrote: >> >>> >> >> >>> >>> fwiw: I setup jdk7 and openjdk7 jobs last night for branch34 on >> >>> >>> jenkins and they are looking good so far: >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >> https://builds.apache.org//view/S-Z/view/ZooKeeper/job/ZooKeeper_branch34_jdk7/ >> >>> >>> >> >>> >>> >> >>> >> https://builds.apache.org//view/S-Z/view/ZooKeeper/job/ZooKeeper_branch34_openjdk7/ >> >>> >>> >> >>> >>> Patrick >> >>> >>> >> >>> >>> On Thu, Oct 4, 2012 at 11:17 PM, Patrick Hunt >> >>> wrote: >> >>> >>> > Doesn't look good, failed a second time: >> >>> >>> > >> >>> >>> > >> >>> >>> >> >>> >> https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper-trunk-jdk7/408/ >> >>> >>> > >> >>> >>> > java.util.concurrent.TimeoutException: Did not connect >> >>> >>> > at >> >>> >>> >> >>> >> org.apache.zookeeper.test.ClientBase$CountdownWatcher.waitForConnected(ClientBase.java:129) >> >>> >>> > at >> >>> >>> >> >>> >> org.apache.zookeeper.test.WatcherTest.testWatchAutoResetWithPending(WatcherTest.java:199) >> >>> >>> > >> >>> >>> > >> >>> >>> > Patrick >> >>> >>> > >> >>> >>> > On Thu, Oct 4, 2012 at 4:15 PM, Mahadev Konar < >> >>> maha...@hortonworks.com> >> >>> >>> wrote: >> >>> >>> >> Good point Ted. >> >>> >>> >> Eugene, >> >>> >>> >> Would you be able to take a quick look and point out the threat >> >>> >>> level? :) >> >>> >>> >> >> >>> >>> >> I have kicked off new build to see if its reproducible or not. >> >>> >>> >> >> >>> >>> >> thanks >> >>> >>> >> mahadev >> >>> >>> >> >> >>> >>> >> On Thu, Oct 4, 2012 at 4:10 PM, Ted Yu >> >>> wrote: >> >>> >>> >>> Should ZOOKEEPER-1557 be given some time so that we track down >> >>> root >> >>> >>> cause ? >> >>> >>> >>> >> >>> >>> >>> Thanks >> >>> >>> >>> >> >>> >>> >>> On Wed, Oct 3, 2012 at 11:34 PM, Patrick Hunt < >> ph...@apache.org> >> >>> >>> wrote: >> >>> >>> >>> >> >>> >>> >>>> +1, sig/xsum are correct, ran rat an that looked good. All the >> >>> unit >> >>> >>> >>>> tests pass for me on jdk6 and openjdk7 (ubuntu 12.04). Also >> ran >> >>> >>> >>>> 1/3/5/13 server clusters using openjdk7, everything seems to >> be >> >>> >>> >>>> working. >> >>> >>> >>>> >> >>> >>> >>>> Patrick >> >>> >>> >>>> >> >>> >>> >>>> On Sun, Sep 30, 2012 at 11:15 AM, Mahadev Konar < >> >>> >>> maha...@hortonworks.com> >> >>> >>> >>>> wrote: >> >>> >>> >>>> > Hi all, >> >>> >>> >>>> > >> >>> >>> >>>> > I have created a candidate build for ZooKeeper 3.4.5. 2 >> >>> JIRAs are >> >>> >>> >>>> > addressed in this release. This includes the critical >> bugfix >> >>> >>> >>>> ZOOKEEPER-1550 >> >>> >>> >>>> > which address the client connection issue. >> >>> >>> >>>> > >> >>> >>> >>>> > *** Please download, test and VOTE before the >> >>> >>> >>>> > *** vote closes 12:00 midnight PT on Friday, Oct 5th.*** >> >>> >>> >>>> > >> >>> >>> >>>> > Note that I am extending the vote period for a little >> longer so >> >>> >>> that >> >>> >>> >>>> > folks get time to test this out. >> >>> >>> >>>> > >> >>> >>> >>>> > >> >>> http://people.apache.org/~mahadev/zookeeper-3.4.5-candidate-0/ >> >>> >>> >>>> > >> >>> >>> >>>> > Should we release this? >> >>> >>> >>>> > >> >>> >>> >>>> > thanks >> >>> >>> >>>> > mahadev >> >>> >>> >>>> >> >>> >>> >> >>> >> >> >>> >> >> >>> >> >> >> >> >>
[jira] [Comment Edited] (ZOOKEEPER-1557) jenkins jdk7 test failure in testBadSaslAuthNotifiesWatch
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13471403#comment-13471403 ] Mahadev konar edited comment on ZOOKEEPER-1557 at 10/8/12 5:04 AM: --- Thanks Eugene for taking a look at it. Given your analysis above it doesnt look like we have a full knowledge of whats causing the issue. Given that this is not SASL related and could be related to how our test framework runs, I think we can move this out to 3.4.6 and get 3.4.5 out the door with what we have now. What do you think? was (Author: mahadev): Thanks Eugene for taking a look at it. Given your any analysis above it doesnt look like we have a full knowledge of whats causing the issue. Given that this is not SASL related and could be related to how our test framework runs, I think we can move this out to 3.4.6 and get 3.4.5 out the door with what we have now. What do you think? > jenkins jdk7 test failure in testBadSaslAuthNotifiesWatch > - > > Key: ZOOKEEPER-1557 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1557 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.5.0, 3.4.5 >Reporter: Patrick Hunt >Assignee: Eugene Koontz > Fix For: 3.5.0, 3.4.6 > > Attachments: jstack.out, SaslAuthFailTest.log, ZOOKEEPER-1557.patch > > > Failure of testBadSaslAuthNotifiesWatch on the jenkins jdk7 job: > https://builds.apache.org/job/ZooKeeper-trunk-jdk7/407/ > haven't seen this before. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1557) jenkins jdk7 test failure in testBadSaslAuthNotifiesWatch
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-1557: - Fix Version/s: (was: 3.4.5) 3.4.6 > jenkins jdk7 test failure in testBadSaslAuthNotifiesWatch > - > > Key: ZOOKEEPER-1557 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1557 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.5.0, 3.4.5 >Reporter: Patrick Hunt >Assignee: Eugene Koontz > Fix For: 3.5.0, 3.4.6 > > Attachments: jstack.out, SaslAuthFailTest.log, ZOOKEEPER-1557.patch > > > Failure of testBadSaslAuthNotifiesWatch on the jenkins jdk7 job: > https://builds.apache.org/job/ZooKeeper-trunk-jdk7/407/ > haven't seen this before. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1557) jenkins jdk7 test failure in testBadSaslAuthNotifiesWatch
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13471403#comment-13471403 ] Mahadev konar commented on ZOOKEEPER-1557: -- Thanks Eugene for taking a look at it. Given your any analysis above it doesnt look like we have a full knowledge of whats causing the issue. Given that this is not SASL related and could be related to how our test framework runs, I think we can move this out to 3.4.6 and get 3.4.5 out the door with what we have now. What do you think? > jenkins jdk7 test failure in testBadSaslAuthNotifiesWatch > - > > Key: ZOOKEEPER-1557 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1557 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.5.0, 3.4.5 >Reporter: Patrick Hunt >Assignee: Eugene Koontz > Fix For: 3.5.0, 3.4.5 > > Attachments: jstack.out, SaslAuthFailTest.log, ZOOKEEPER-1557.patch > > > Failure of testBadSaslAuthNotifiesWatch on the jenkins jdk7 job: > https://builds.apache.org/job/ZooKeeper-trunk-jdk7/407/ > haven't seen this before. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [VOTE] Release ZooKeeper 3.4.5 (candidate 0)
Good point Ted. Eugene, Would you be able to take a quick look and point out the threat level? :) I have kicked off new build to see if its reproducible or not. thanks mahadev On Thu, Oct 4, 2012 at 4:10 PM, Ted Yu wrote: > Should ZOOKEEPER-1557 be given some time so that we track down root cause ? > > Thanks > > On Wed, Oct 3, 2012 at 11:34 PM, Patrick Hunt wrote: > >> +1, sig/xsum are correct, ran rat an that looked good. All the unit >> tests pass for me on jdk6 and openjdk7 (ubuntu 12.04). Also ran >> 1/3/5/13 server clusters using openjdk7, everything seems to be >> working. >> >> Patrick >> >> On Sun, Sep 30, 2012 at 11:15 AM, Mahadev Konar >> wrote: >> > Hi all, >> > >> > I have created a candidate build for ZooKeeper 3.4.5. 2 JIRAs are >> > addressed in this release. This includes the critical bugfix >> ZOOKEEPER-1550 >> > which address the client connection issue. >> > >> > *** Please download, test and VOTE before the >> > *** vote closes 12:00 midnight PT on Friday, Oct 5th.*** >> > >> > Note that I am extending the vote period for a little longer so that >> > folks get time to test this out. >> > >> > http://people.apache.org/~mahadev/zookeeper-3.4.5-candidate-0/ >> > >> > Should we release this? >> > >> > thanks >> > mahadev >>
[VOTE] Release ZooKeeper 3.4.5 (candidate 0)
Hi all, I have created a candidate build for ZooKeeper 3.4.5. 2 JIRAs are addressed in this release. This includes the critical bugfix ZOOKEEPER-1550 which address the client connection issue. *** Please download, test and VOTE before the *** vote closes 12:00 midnight PT on Friday, Oct 5th.*** Note that I am extending the vote period for a little longer so that folks get time to test this out. http://people.apache.org/~mahadev/zookeeper-3.4.5-candidate-0/ Should we release this? thanks mahadev
Re: SASL problem with 3.4.4 Java client
Thanks to Eugene we have all green on our builds (including jdk7). Ill spin up a new RC. Thanks again Eugene! mahadev On Wed, Sep 26, 2012 at 11:26 AM, Eugene Koontz wrote: > On 9/26/12 11:08 AM, Patrick Hunt wrote: >> >> >> I didn't notice any feedback to the list on these issues, perhaps I missed >> it? >> > Hi Pat, > > I should have mentioned the test failures that we noticed with > ZOOKEEPER-1497; I regret not bringing these to yours and the community's > attention. I will look into it more today. > > -Eugene
[jira] [Updated] (ZOOKEEPER-1477) Test failures with Java 7 on Mac OS X
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-1477: - Priority: Major (was: Blocker) Downgrading to Major given the recent updates on this jira. > Test failures with Java 7 on Mac OS X > - > > Key: ZOOKEEPER-1477 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1477 > Project: ZooKeeper > Issue Type: Bug > Components: server, tests >Affects Versions: 3.4.3 > Environment: Mac OS X Lion (10.7.4) > Java version: > java version "1.7.0_04" > Java(TM) SE Runtime Environment (build 1.7.0_04-b21) > Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode) >Reporter: Diwaker Gupta > Fix For: 3.4.6 > > Attachments: with-ZK-1550.txt > > > I downloaded ZK 3.4.3 sources and ran {{ant test}}. Many of the tests failed, > including ZooKeeperTest. A common symptom was spurious > {{ConnectionLossException}}: > {code} > 2012-06-01 12:01:23,420 [myid:] - INFO > [main:JUnit4ZKTestRunner$LoggedInvokeMethod@54] - TEST METHOD FAILED > testDeleteRecursiveAsync > org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode > = ConnectionLoss for / > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246) > at > org.apache.zookeeper.ZooKeeperTest.testDeleteRecursiveAsync(ZooKeeperTest.java:77) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ... (snipped) > {code} > As background, I was actually investigating some non-deterministic failures > when using Netflix's Curator with Java 7 (see > https://github.com/Netflix/curator/issues/79). After a while, I figured I > should establish a clean ZK baseline first and realized it is actually a ZK > issue, not a Curator issue. > We are trying to migrate to Java 7 but this is a blocking issue for us right > now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1477) Test failures with Java 7 on Mac OS X
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465243#comment-13465243 ] Mahadev konar commented on ZOOKEEPER-1477: -- Thats fine Diwaker. Ill downgrade this jira to a major and mark it for the next release. We can just ship 3.4.5 with fix for ZOOKEEPER-1550. Itll be good to upload the tests logs for those that fail but its not urgent. We can do it later for 3.4.6. Thanks. > Test failures with Java 7 on Mac OS X > - > > Key: ZOOKEEPER-1477 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1477 > Project: ZooKeeper > Issue Type: Bug > Components: server, tests >Affects Versions: 3.4.3 > Environment: Mac OS X Lion (10.7.4) > Java version: > java version "1.7.0_04" > Java(TM) SE Runtime Environment (build 1.7.0_04-b21) > Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode) >Reporter: Diwaker Gupta >Priority: Blocker > Fix For: 3.4.5 > > Attachments: with-ZK-1550.txt > > > I downloaded ZK 3.4.3 sources and ran {{ant test}}. Many of the tests failed, > including ZooKeeperTest. A common symptom was spurious > {{ConnectionLossException}}: > {code} > 2012-06-01 12:01:23,420 [myid:] - INFO > [main:JUnit4ZKTestRunner$LoggedInvokeMethod@54] - TEST METHOD FAILED > testDeleteRecursiveAsync > org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode > = ConnectionLoss for / > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246) > at > org.apache.zookeeper.ZooKeeperTest.testDeleteRecursiveAsync(ZooKeeperTest.java:77) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ... (snipped) > {code} > As background, I was actually investigating some non-deterministic failures > when using Netflix's Curator with Java 7 (see > https://github.com/Netflix/curator/issues/79). After a while, I figured I > should establish a clean ZK baseline first and realized it is actually a ZK > issue, not a Curator issue. > We are trying to migrate to Java 7 but this is a blocking issue for us right > now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1477) Test failures with Java 7 on Mac OS X
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465108#comment-13465108 ] Mahadev konar commented on ZOOKEEPER-1477: -- Diwaker, The usual time on a linux box is around 40 mins or so. > Test failures with Java 7 on Mac OS X > - > > Key: ZOOKEEPER-1477 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1477 > Project: ZooKeeper > Issue Type: Bug > Components: server, tests >Affects Versions: 3.4.3 > Environment: Mac OS X Lion (10.7.4) > Java version: > java version "1.7.0_04" > Java(TM) SE Runtime Environment (build 1.7.0_04-b21) > Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode) >Reporter: Diwaker Gupta >Priority: Blocker > Fix For: 3.4.5 > > > I downloaded ZK 3.4.3 sources and ran {{ant test}}. Many of the tests failed, > including ZooKeeperTest. A common symptom was spurious > {{ConnectionLossException}}: > {code} > 2012-06-01 12:01:23,420 [myid:] - INFO > [main:JUnit4ZKTestRunner$LoggedInvokeMethod@54] - TEST METHOD FAILED > testDeleteRecursiveAsync > org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode > = ConnectionLoss for / > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246) > at > org.apache.zookeeper.ZooKeeperTest.testDeleteRecursiveAsync(ZooKeeperTest.java:77) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ... (snipped) > {code} > As background, I was actually investigating some non-deterministic failures > when using Netflix's Curator with Java 7 (see > https://github.com/Netflix/curator/issues/79). After a while, I figured I > should establish a clean ZK baseline first and realized it is actually a ZK > issue, not a Curator issue. > We are trying to migrate to Java 7 but this is a blocking issue for us right > now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1477) Test failures with Java 7 on Mac OS X
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465097#comment-13465097 ] Mahadev konar commented on ZOOKEEPER-1477: -- Thanks Diwaker. Could you please upload a summary of the tests failing and the logs as well? > Test failures with Java 7 on Mac OS X > - > > Key: ZOOKEEPER-1477 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1477 > Project: ZooKeeper > Issue Type: Bug > Components: server, tests >Affects Versions: 3.4.3 > Environment: Mac OS X Lion (10.7.4) > Java version: > java version "1.7.0_04" > Java(TM) SE Runtime Environment (build 1.7.0_04-b21) > Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode) >Reporter: Diwaker Gupta >Priority: Blocker > Fix For: 3.4.5 > > > I downloaded ZK 3.4.3 sources and ran {{ant test}}. Many of the tests failed, > including ZooKeeperTest. A common symptom was spurious > {{ConnectionLossException}}: > {code} > 2012-06-01 12:01:23,420 [myid:] - INFO > [main:JUnit4ZKTestRunner$LoggedInvokeMethod@54] - TEST METHOD FAILED > testDeleteRecursiveAsync > org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode > = ConnectionLoss for / > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246) > at > org.apache.zookeeper.ZooKeeperTest.testDeleteRecursiveAsync(ZooKeeperTest.java:77) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ... (snipped) > {code} > As background, I was actually investigating some non-deterministic failures > when using Netflix's Curator with Java 7 (see > https://github.com/Netflix/curator/issues/79). After a while, I figured I > should establish a clean ZK baseline first and realized it is actually a ZK > issue, not a Curator issue. > We are trying to migrate to Java 7 but this is a blocking issue for us right > now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1477) Test failures with Java 7 on Mac OS X
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465077#comment-13465077 ] Mahadev konar commented on ZOOKEEPER-1477: -- Diwaker, Would you be able to run the tests along with Eugenes patch on ZOOKEEPER-1550 ? If not please let me know. I can go ahead and run it. > Test failures with Java 7 on Mac OS X > - > > Key: ZOOKEEPER-1477 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1477 > Project: ZooKeeper > Issue Type: Bug > Components: server, tests >Affects Versions: 3.4.3 > Environment: Mac OS X Lion (10.7.4) > Java version: > java version "1.7.0_04" > Java(TM) SE Runtime Environment (build 1.7.0_04-b21) > Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode) >Reporter: Diwaker Gupta >Priority: Blocker > Fix For: 3.4.5 > > > I downloaded ZK 3.4.3 sources and ran {{ant test}}. Many of the tests failed, > including ZooKeeperTest. A common symptom was spurious > {{ConnectionLossException}}: > {code} > 2012-06-01 12:01:23,420 [myid:] - INFO > [main:JUnit4ZKTestRunner$LoggedInvokeMethod@54] - TEST METHOD FAILED > testDeleteRecursiveAsync > org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode > = ConnectionLoss for / > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246) > at > org.apache.zookeeper.ZooKeeperTest.testDeleteRecursiveAsync(ZooKeeperTest.java:77) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ... (snipped) > {code} > As background, I was actually investigating some non-deterministic failures > when using Netflix's Curator with Java 7 (see > https://github.com/Netflix/curator/issues/79). After a while, I figured I > should establish a clean ZK baseline first and realized it is actually a ZK > issue, not a Curator issue. > We are trying to migrate to Java 7 but this is a blocking issue for us right > now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1550) ZooKeeperSaslClient does not finish anonymous login on OpenJDK
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464387#comment-13464387 ] Mahadev konar commented on ZOOKEEPER-1550: -- Eugene, Still failing :)... > ZooKeeperSaslClient does not finish anonymous login on OpenJDK > -- > > Key: ZOOKEEPER-1550 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1550 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.4 >Reporter: Robert Macomber >Assignee: Eugene Koontz >Priority: Blocker > Fix For: 3.4.5 > > Attachments: ZOOKEEPER-1550.patch, ZOOKEEPER-1550.patch > > > On OpenJDK, {{javax.security.auth.login.Configuration.getConfiguration}} does > not throw an exception. > {{ZooKeeperSaslClient.clientTunneledAuthenticationInProgress}} uses an > exception from that method as a proxy for "this client is not configured to > use SASL" and as a result no commands can be sent, since it is still waiting > for auth to complete. > [Link to mailing list > discussion|http://comments.gmane.org/gmane.comp.java.zookeeper.user/2667] > The relevant bit of logs from OpenJDK and Oracle versions of 'connect and do > getChildren("/")': > {code:title=OpenJDK} > INFO [main] 2012-09-25 14:02:24,545 com.socrata.Main Waiting for connection... > DEBUG [main] 2012-09-25 14:02:24,548 com.socrata.zookeeper.ZooKeeperProvider > Waiting for connected-state... > INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,576 > org.apache.zookeeper.ClientCnxn Opening socket connection to server > mike.local/10.0.2.106:2181. Will not attempt to authenticate using SASL > (unknown error) > INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,584 > org.apache.zookeeper.ClientCnxn Socket connection established to > mike.local/10.0.2.106:2181, initiating session > DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,586 > org.apache.zookeeper.ClientCnxn Session establishment request sent on > mike.local/10.0.2.106:2181 > INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,600 > org.apache.zookeeper.ClientCnxn Session establishment complete on server > mike.local/10.0.2.106:2181, sessionid = 0x139ff2e85b60005, negotiated timeout > = 4 > DEBUG [main-EventThread] 2012-09-25 14:02:24,614 > com.socrata.zookeeper.ZooKeeperProvider ConnectionStateChanged(Connected) > DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,636 > org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: > clientPath:/ serverPath:/ finished:false header:: 0,12 replyHeader:: 0,0,0 > request:: '/,F response:: v{} until SASL authentication completes. > DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,923 > org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: > clientPath:/ serverPath:/ finished:false header:: 0,12 replyHeader:: 0,0,0 > request:: '/,F response:: v{} until SASL authentication completes. > DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,924 > org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: > clientPath:/ serverPath:/ finished:false header:: 0,12 replyHeader:: 0,0,0 > request:: '/,F response:: v{} until SASL authentication completes. > DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,924 > org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: > clientPath:null serverPath:null finished:false header:: -2,11 replyHeader:: > null request:: null response:: nulluntil SASL authentication completes. > DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,260 > org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: > clientPath:/ serverPath:/ finished:false header:: 0,12 replyHeader:: 0,0,0 > request:: '/,F response:: v{} until SASL authentication completes. > DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,260 > org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: > clientPath:null serverPath:null finished:false header:: -2,11 replyHeader:: > null request:: null response:: nulluntil SASL authentication completes. > DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,261 > org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: > clientPath:/ serverPath:/ finished:false header:: 0,12 replyHeader:: 0,0,0 > request:: '/,F response:: v{} until SASL authentication completes. > DEBUG [main-SendThread(mike.loc
[jira] [Commented] (ZOOKEEPER-1550) ZooKeeperSaslClient does not finish anonymous login on OpenJDK
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464370#comment-13464370 ] Mahadev konar commented on ZOOKEEPER-1550: -- Eugene, Looks like the sasl test failed. Can you please take a look? > ZooKeeperSaslClient does not finish anonymous login on OpenJDK > -- > > Key: ZOOKEEPER-1550 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1550 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.4 >Reporter: Robert Macomber >Assignee: Eugene Koontz >Priority: Blocker > Fix For: 3.4.5 > > Attachments: ZOOKEEPER-1550.patch > > > On OpenJDK, {{javax.security.auth.login.Configuration.getConfiguration}} does > not throw an exception. > {{ZooKeeperSaslClient.clientTunneledAuthenticationInProgress}} uses an > exception from that method as a proxy for "this client is not configured to > use SASL" and as a result no commands can be sent, since it is still waiting > for auth to complete. > [Link to mailing list > discussion|http://comments.gmane.org/gmane.comp.java.zookeeper.user/2667] > The relevant bit of logs from OpenJDK and Oracle versions of 'connect and do > getChildren("/")': > {code:title=OpenJDK} > INFO [main] 2012-09-25 14:02:24,545 com.socrata.Main Waiting for connection... > DEBUG [main] 2012-09-25 14:02:24,548 com.socrata.zookeeper.ZooKeeperProvider > Waiting for connected-state... > INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,576 > org.apache.zookeeper.ClientCnxn Opening socket connection to server > mike.local/10.0.2.106:2181. Will not attempt to authenticate using SASL > (unknown error) > INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,584 > org.apache.zookeeper.ClientCnxn Socket connection established to > mike.local/10.0.2.106:2181, initiating session > DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,586 > org.apache.zookeeper.ClientCnxn Session establishment request sent on > mike.local/10.0.2.106:2181 > INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,600 > org.apache.zookeeper.ClientCnxn Session establishment complete on server > mike.local/10.0.2.106:2181, sessionid = 0x139ff2e85b60005, negotiated timeout > = 4 > DEBUG [main-EventThread] 2012-09-25 14:02:24,614 > com.socrata.zookeeper.ZooKeeperProvider ConnectionStateChanged(Connected) > DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,636 > org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: > clientPath:/ serverPath:/ finished:false header:: 0,12 replyHeader:: 0,0,0 > request:: '/,F response:: v{} until SASL authentication completes. > DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,923 > org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: > clientPath:/ serverPath:/ finished:false header:: 0,12 replyHeader:: 0,0,0 > request:: '/,F response:: v{} until SASL authentication completes. > DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,924 > org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: > clientPath:/ serverPath:/ finished:false header:: 0,12 replyHeader:: 0,0,0 > request:: '/,F response:: v{} until SASL authentication completes. > DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,924 > org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: > clientPath:null serverPath:null finished:false header:: -2,11 replyHeader:: > null request:: null response:: nulluntil SASL authentication completes. > DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,260 > org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: > clientPath:/ serverPath:/ finished:false header:: 0,12 replyHeader:: 0,0,0 > request:: '/,F response:: v{} until SASL authentication completes. > DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,260 > org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: > clientPath:null serverPath:null finished:false header:: -2,11 replyHeader:: > null request:: null response:: nulluntil SASL authentication completes. > DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,261 > org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: > clientPath:/ serverPath:/ finished:false header:: 0,12 replyHeader:: 0,0,0 > request:: '/,F response:: v{} until SASL authentication completes. > DEBUG [main-SendThre
[jira] [Commented] (ZOOKEEPER-1550) ZooKeeperSaslClient does not finish anonymous login on OpenJDK
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464357#comment-13464357 ] Mahadev konar commented on ZOOKEEPER-1550: -- Awesome, Ill check this in and kick of the builds on jdk 7 and see if it all works. > ZooKeeperSaslClient does not finish anonymous login on OpenJDK > -- > > Key: ZOOKEEPER-1550 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1550 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.4 >Reporter: Robert Macomber >Assignee: Eugene Koontz >Priority: Blocker > Fix For: 3.4.5 > > Attachments: ZOOKEEPER-1550.patch > > > On OpenJDK, {{javax.security.auth.login.Configuration.getConfiguration}} does > not throw an exception. > {{ZooKeeperSaslClient.clientTunneledAuthenticationInProgress}} uses an > exception from that method as a proxy for "this client is not configured to > use SASL" and as a result no commands can be sent, since it is still waiting > for auth to complete. > [Link to mailing list > discussion|http://comments.gmane.org/gmane.comp.java.zookeeper.user/2667] > The relevant bit of logs from OpenJDK and Oracle versions of 'connect and do > getChildren("/")': > {code:title=OpenJDK} > INFO [main] 2012-09-25 14:02:24,545 com.socrata.Main Waiting for connection... > DEBUG [main] 2012-09-25 14:02:24,548 com.socrata.zookeeper.ZooKeeperProvider > Waiting for connected-state... > INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,576 > org.apache.zookeeper.ClientCnxn Opening socket connection to server > mike.local/10.0.2.106:2181. Will not attempt to authenticate using SASL > (unknown error) > INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,584 > org.apache.zookeeper.ClientCnxn Socket connection established to > mike.local/10.0.2.106:2181, initiating session > DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,586 > org.apache.zookeeper.ClientCnxn Session establishment request sent on > mike.local/10.0.2.106:2181 > INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,600 > org.apache.zookeeper.ClientCnxn Session establishment complete on server > mike.local/10.0.2.106:2181, sessionid = 0x139ff2e85b60005, negotiated timeout > = 4 > DEBUG [main-EventThread] 2012-09-25 14:02:24,614 > com.socrata.zookeeper.ZooKeeperProvider ConnectionStateChanged(Connected) > DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,636 > org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: > clientPath:/ serverPath:/ finished:false header:: 0,12 replyHeader:: 0,0,0 > request:: '/,F response:: v{} until SASL authentication completes. > DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,923 > org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: > clientPath:/ serverPath:/ finished:false header:: 0,12 replyHeader:: 0,0,0 > request:: '/,F response:: v{} until SASL authentication completes. > DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,924 > org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: > clientPath:/ serverPath:/ finished:false header:: 0,12 replyHeader:: 0,0,0 > request:: '/,F response:: v{} until SASL authentication completes. > DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,924 > org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: > clientPath:null serverPath:null finished:false header:: -2,11 replyHeader:: > null request:: null response:: nulluntil SASL authentication completes. > DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,260 > org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: > clientPath:/ serverPath:/ finished:false header:: 0,12 replyHeader:: 0,0,0 > request:: '/,F response:: v{} until SASL authentication completes. > DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,260 > org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: > clientPath:null serverPath:null finished:false header:: -2,11 replyHeader:: > null request:: null response:: nulluntil SASL authentication completes. > DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,261 > org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: > clientPath:/ serverPath:/ finished:false header:: 0,12 replyHeader:: 0,0,0 > request:: '/,F response:: v{} until SASL authentication completes. > DEBUG
[jira] [Commented] (ZOOKEEPER-1550) ZooKeeperSaslClient does not finish anonymous login on OpenJDK
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464339#comment-13464339 ] Mahadev konar commented on ZOOKEEPER-1550: -- Thanks Eugene. Robert, can you verify this patch as well? Thanks > ZooKeeperSaslClient does not finish anonymous login on OpenJDK > -- > > Key: ZOOKEEPER-1550 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1550 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.4 >Reporter: Robert Macomber >Assignee: Eugene Koontz >Priority: Blocker > Fix For: 3.4.5 > > Attachments: ZOOKEEPER-1550.patch > > > On OpenJDK, {{javax.security.auth.login.Configuration.getConfiguration}} does > not throw an exception. > {{ZooKeeperSaslClient.clientTunneledAuthenticationInProgress}} uses an > exception from that method as a proxy for "this client is not configured to > use SASL" and as a result no commands can be sent, since it is still waiting > for auth to complete. > [Link to mailing list > discussion|http://comments.gmane.org/gmane.comp.java.zookeeper.user/2667] > The relevant bit of logs from OpenJDK and Oracle versions of 'connect and do > getChildren("/")': > {code:title=OpenJDK} > INFO [main] 2012-09-25 14:02:24,545 com.socrata.Main Waiting for connection... > DEBUG [main] 2012-09-25 14:02:24,548 com.socrata.zookeeper.ZooKeeperProvider > Waiting for connected-state... > INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,576 > org.apache.zookeeper.ClientCnxn Opening socket connection to server > mike.local/10.0.2.106:2181. Will not attempt to authenticate using SASL > (unknown error) > INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,584 > org.apache.zookeeper.ClientCnxn Socket connection established to > mike.local/10.0.2.106:2181, initiating session > DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,586 > org.apache.zookeeper.ClientCnxn Session establishment request sent on > mike.local/10.0.2.106:2181 > INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,600 > org.apache.zookeeper.ClientCnxn Session establishment complete on server > mike.local/10.0.2.106:2181, sessionid = 0x139ff2e85b60005, negotiated timeout > = 4 > DEBUG [main-EventThread] 2012-09-25 14:02:24,614 > com.socrata.zookeeper.ZooKeeperProvider ConnectionStateChanged(Connected) > DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,636 > org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: > clientPath:/ serverPath:/ finished:false header:: 0,12 replyHeader:: 0,0,0 > request:: '/,F response:: v{} until SASL authentication completes. > DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,923 > org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: > clientPath:/ serverPath:/ finished:false header:: 0,12 replyHeader:: 0,0,0 > request:: '/,F response:: v{} until SASL authentication completes. > DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,924 > org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: > clientPath:/ serverPath:/ finished:false header:: 0,12 replyHeader:: 0,0,0 > request:: '/,F response:: v{} until SASL authentication completes. > DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,924 > org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: > clientPath:null serverPath:null finished:false header:: -2,11 replyHeader:: > null request:: null response:: nulluntil SASL authentication completes. > DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,260 > org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: > clientPath:/ serverPath:/ finished:false header:: 0,12 replyHeader:: 0,0,0 > request:: '/,F response:: v{} until SASL authentication completes. > DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,260 > org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: > clientPath:null serverPath:null finished:false header:: -2,11 replyHeader:: > null request:: null response:: nulluntil SASL authentication completes. > DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,261 > org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: > clientPath:/ serverPath:/ finished:false header:: 0,12 replyHeader:: 0,0,0 > request:: '/,F response:: v{} until SASL authentication completes. > DEBUG [main-SendThre