[jira] [Commented] (ZOOKEEPER-2469) infinite loop in ZK re-login

2016-07-07 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367064#comment-15367064
 ] 

Mahadev konar commented on ZOOKEEPER-2469:
--

[~sershe] done.

> infinite loop in ZK re-login
> 
>
> Key: ZOOKEEPER-2469
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2469
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> {noformat}
> int retry = 1;
> while (retry >= 0) {
> try {
> reLogin();
> break;
> } catch (LoginException le) {
> if (retry > 0) {
> --retry;
> // sleep for 10 seconds.
> try {
> Thread.sleep(10 * 1000);
> } catch (InterruptedException e) {
> LOG.error("Interrupted during login 
> retry after LoginException:", le);
> throw le;
> }
> } else {
> LOG.error("Could not refresh TGT for 
> principal: " + principal + ".", le);
> }
> }
> }
> {noformat}
> will retry forever. Should return like the one above



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2469) infinite loop in ZK re-login

2016-07-07 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-2469:
-
Assignee: Sergey Shelukhin

> infinite loop in ZK re-login
> 
>
> Key: ZOOKEEPER-2469
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2469
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> {noformat}
> int retry = 1;
> while (retry >= 0) {
> try {
> reLogin();
> break;
> } catch (LoginException le) {
> if (retry > 0) {
> --retry;
> // sleep for 10 seconds.
> try {
> Thread.sleep(10 * 1000);
> } catch (InterruptedException e) {
> LOG.error("Interrupted during login 
> retry after LoginException:", le);
> throw le;
> }
> } else {
> LOG.error("Could not refresh TGT for 
> principal: " + principal + ".", le);
> }
> }
> }
> {noformat}
> will retry forever. Should return like the one above



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] Apache ZooKeeper release 3.5.0-alpha candidate 0

2014-08-04 Thread Mahadev Konar
+1 - downloaded the bits and ran some tests. Looks good to go.

thanks
mahadev

Mahadev Konar
Hortonworks Inc.
http://hortonworks.com/


On Mon, Aug 4, 2014 at 5:48 PM, Camille Fournier  wrote:

> +1 started up a server from the jar, ran some basic tests.
>
>
> On Mon, Aug 4, 2014 at 6:21 PM, Jian Huang  wrote:
>
> > On Mon, Aug 4, 2014 at 3:17 PM, Flavio Junqueira <
> > fpjunque...@yahoo.com.invalid> wrote:
> >
> > > +1, ran tests, checked files and signatures, ran some quorum tests
> > > including reconfigurations. lgtm!
> > >
> > > -Flavio
> > >
> > > On 02 Aug 2014, at 00:08, Patrick Hunt  wrote:
> > >
> > > > This is a release candidate for 3.5.0-alpha.
> > > >
> > > > The full release notes is available at:
> > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12316644&projectId=12310801
> > > >
> > > > *** Please download, test and vote, I expect the vote to run for a
> > > > minimum of 72 hours from the time this email was sent ***
> > > >
> > > > Source files:
> > > > http://people.apache.org/~phunt/zookeeper-3.5.0-alpha-candidate-0/
> > > >
> > > > Maven staging repo:
> > > >
> > >
> >
> https://repository.apache.org/content/groups/staging/org/apache/zookeeper/zookeeper/3.5.0-alpha/
> > > >
> > > > The tag to be voted upon:
> > > > https://svn.apache.org/repos/asf/zookeeper/tags/release-3.5.0-rc0
> > > >
> > > > ZooKeeper's KEYS file containing PGP keys we use to sign the release:
> > > >
> > > > http://www.apache.org/dist/zookeeper/KEYS
> > > >
> > > > Should we release this candidate?
> > > >
> > > > Patrick
> > >
> > >
> >
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Updated] (ZOOKEEPER-1575) adding .gitattributes to prevent CRLF and LF mismatches for source and text files

2014-03-30 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1575:
-

Fix Version/s: 3.5.0

> adding .gitattributes to prevent CRLF and LF mismatches for source and text 
> files
> -
>
> Key: ZOOKEEPER-1575
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1575
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Raja Aluri
>Assignee: Raja Aluri
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-1575.trunk.patch
>
>
> adding .gitattributes to prevent CRLF and LF mismatches for source and text 
> files



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ZOOKEEPER-1848) [WINDOWS] Java NIO socket channels does not work with Windows ipv6 on JDK6

2014-03-30 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13954961#comment-13954961
 ] 

Mahadev konar commented on ZOOKEEPER-1848:
--

+1 for the patch. Rerunning it through jenkins again.

> [WINDOWS] Java NIO socket channels does not work with Windows ipv6 on JDK6
> --
>
> Key: ZOOKEEPER-1848
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1848
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 3.5.0
>
> Attachments: zookeeper-1848_v1.patch, zookeeper-1848_v2.patch
>
>
> ZK uses Java NIO to create ServerSorcket's from ServerSocketChannels. Under 
> windows, the ipv4 and ipv6 is implemented independently, and Java seems that 
> it cannot reuse the same socket channel for both ipv4 and ipv6 sockets. We 
> are getting "java.net.SocketException: Address family not supported by 
> protocol
> family" exceptions. When, ZK client resolves "localhost", it gets both v4 
> 127.0.0.1 and v6 ::1 address, but the socket channel cannot bind to both v4 
> and v6.
> The problem is reported as:
> http://bugs.sun.com/view_bug.do?bug_id=6230761
> http://stackoverflow.com/questions/1357091/binding-an-ipv6-server-socket-on-windows
> Although the JDK bug is reported as resolved, I have tested with jdk1.6.0_33 
> without any success. Although JDK7 seems to have fixed this problem. 
> See HBASE-6825 for reference. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: [VOTE] Apache ZooKeeper release 3.4.6 candidate 0

2014-02-25 Thread Mahadev Konar
+1

Verified the signatures and the artifacts.

thanks
mahadev
Mahadev Konar
Hortonworks Inc.
http://hortonworks.com/


On Mon, Feb 24, 2014 at 12:20 PM, Michi Mutsuzaki  wrote:
> +1
>
> ant test passed on ubuntu 12.04.
>
> On Sun, Feb 23, 2014 at 12:23 PM, Ted Yu  wrote:
>> I pointed HBase 0.98 at 3.4.6 RC0 in the staging repo.
>> I ran through test suite and it passed:
>>
>> [INFO] BUILD SUCCESS
>> [INFO]
>> 
>> [INFO] Total time: 1:09:42.116s
>> [INFO] Finished at: Sun Feb 23 19:21:04 UTC 2014
>> [INFO] Final Memory: 48M/503M
>>
>> Cheers
>>
>>
>> On Sun, Feb 23, 2014 at 11:39 AM, Flavio Junqueira 
>> wrote:
>>
>>> This is a bugfix release candidate for 3.4.5. It fixes 117 issues,
>>> including issues that affect
>>> leader election, Zab, and SASL authentication.
>>>
>>> The full release notes is available at:
>>>
>>>
>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310801&version=12323310
>>>
>>> *** Please download, test and vote by March 9th 2014, 23:59 UTC+0. ***
>>>
>>> Source files:
>>> http://people.apache.org/~fpj/zookeeper-3.4.6-candidate-0/
>>>
>>> Maven staging repo:
>>>
>>> https://repository.apache.org/content/groups/staging/org/apache/zookeeper/zookeeper/3.4.6/
>>>
>>> The tag to be voted upon:
>>> https://svn.apache.org/repos/asf/zookeeper/tags/release-3.4.6-rc0
>>>
>>> ZooKeeper's KEYS file containing PGP keys we use to sign the release:
>>>
>>> http://www.apache.org/dist/zookeeper/KEYS
>>>
>>> Should we release this candidate?
>>>
>>> -Flavio

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Commented] (ZOOKEEPER-1667) Watch event isn't handled correctly when a client reestablish to a server

2013-10-22 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13801581#comment-13801581
 ] 

Mahadev konar commented on ZOOKEEPER-1667:
--

+1 - the patch looks good to me.

> Watch event isn't handled correctly when a client reestablish to a server
> -
>
> Key: ZOOKEEPER-1667
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1667
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.3.6, 3.4.5
>Reporter: Jacky007
>Assignee: Flavio Junqueira
>Priority: Blocker
> Fix For: 3.4.6, 3.5.0
>
> Attachments: ZOOKEEPER-1667-b3.4.patch, ZOOKEEPER-1667-b3.4.patch, 
> ZOOKEEPER-1667.patch, ZOOKEEPER-1667-r34.patch, ZOOKEEPER-1667-trunk.patch
>
>
> When a client reestablish to a server, it will send the watches which have 
> not been triggered. But the code in DataTree does not handle it correctly.
> It is obvious, we just do not notice it :)
> scenario:
> 1) Client a set a data watch on /d, then disconnect, client b delete /d and 
> create it again. When client a reestablish to zk, it will receive a 
> NodeCreated rather than a NodeDataChanged.
> 2) Client a set a exists watch on /e(not exist), then disconnect, client b 
> create /e. When client a reestablish to zk, it will receive a NodeDataChanged 
> rather than a NodeCreated.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1646) mt c client tests fail on Ubuntu Raring

2013-10-17 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798082#comment-13798082
 ] 

Mahadev konar commented on ZOOKEEPER-1646:
--

+1 for the patch. Nice catch Pat!

> mt c client tests fail on Ubuntu Raring
> ---
>
> Key: ZOOKEEPER-1646
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1646
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.4.5, 3.5.0
> Environment: Ubuntu 13.04 (raring), glibc 2.17
>Reporter: James Page
>Assignee: Patrick Hunt
>Priority: Blocker
> Fix For: 3.4.6, 3.5.0
>
> Attachments: ZOOKEEPER-1646.patch
>
>
> Misc tests fail in the c client binding under the current Ubuntu development 
> release:
> ./zktest-mt 
>  ZooKeeper server startedRunning 
> Zookeeper_clientretry::testRetry ZooKeeper server started ZooKeeper server 
> started : elapsed 9315 : OK
> Zookeeper_operations::testAsyncWatcher1 : assertion : elapsed 1054
> Zookeeper_operations::testAsyncGetOperation : assertion : elapsed 1055
> Zookeeper_operations::testOperationsAndDisconnectConcurrently1 : assertion : 
> elapsed 1066
> Zookeeper_operations::testOperationsAndDisconnectConcurrently2 : elapsed 0 : 
> OK
> Zookeeper_operations::testConcurrentOperations1 : assertion : elapsed 1055
> Zookeeper_init::testBasic : elapsed 1 : OK
> Zookeeper_init::testAddressResolution : elapsed 0 : OK
> Zookeeper_init::testMultipleAddressResolution : elapsed 0 : OK
> Zookeeper_init::testNullAddressString : elapsed 0 : OK
> Zookeeper_init::testEmptyAddressString : elapsed 0 : OK
> Zookeeper_init::testOneSpaceAddressString : elapsed 0 : OK
> Zookeeper_init::testTwoSpacesAddressString : elapsed 0 : OK
> Zookeeper_init::testInvalidAddressString1 : elapsed 0 : OK
> Zookeeper_init::testInvalidAddressString2 : elapsed 175 : OK
> Zookeeper_init::testNonexistentHost : elapsed 92 : OK
> Zookeeper_init::testOutOfMemory_init : elapsed 0 : OK
> Zookeeper_init::testOutOfMemory_getaddrs1 : elapsed 0 : OK
> Zookeeper_init::testOutOfMemory_getaddrs2 : elapsed 1 : OK
> Zookeeper_init::testPermuteAddrsList : elapsed 0 : OK
> Zookeeper_close::testIOThreadStoppedOnExpire : assertion : elapsed 1056
> Zookeeper_close::testCloseUnconnected : elapsed 0 : OK
> Zookeeper_close::testCloseUnconnected1 : elapsed 91 : OK
> Zookeeper_close::testCloseConnected1 : assertion : elapsed 1056
> Zookeeper_close::testCloseFromWatcher1 : assertion : elapsed 1076
> Zookeeper_simpleSystem::testAsyncWatcherAutoReset ZooKeeper server started : 
> elapsed 12155 : OK
> Zookeeper_simpleSystem::testDeserializeString : elapsed 0 : OK
> Zookeeper_simpleSystem::testNullData : elapsed 1031 : OK
> Zookeeper_simpleSystem::testIPV6 : elapsed 1005 : OK
> Zookeeper_simpleSystem::testPath : elapsed 1024 : OK
> Zookeeper_simpleSystem::testPathValidation : elapsed 1053 : OK
> Zookeeper_simpleSystem::testPing : elapsed 17287 : OK
> Zookeeper_simpleSystem::testAcl : elapsed 1019 : OK
> Zookeeper_simpleSystem::testChroot : elapsed 3052 : OK
> Zookeeper_simpleSystem::testAuth : assertion : elapsed 7010
> Zookeeper_simpleSystem::testHangingClient : elapsed 1015 : OK
> Zookeeper_simpleSystem::testWatcherAutoResetWithGlobal ZooKeeper server 
> started ZooKeeper server started ZooKeeper server started : elapsed 20556 : OK
> Zookeeper_simpleSystem::testWatcherAutoResetWithLocal ZooKeeper server 
> started ZooKeeper server started ZooKeeper server started : elapsed 20563 : OK
> Zookeeper_simpleSystem::testGetChildren2 : elapsed 1041 : OK
> Zookeeper_multi::testCreate : elapsed 1017 : OK
> Zookeeper_multi::testCreateDelete : elapsed 1007 : OK
> Zookeeper_multi::testInvalidVersion : elapsed 1011 : OK
> Zookeeper_multi::testNestedCreate : elapsed 1009 : OK
> Zookeeper_multi::testSetData : elapsed 6019 : OK
> Zookeeper_multi::testUpdateConflict : elapsed 1014 : OK
> Zookeeper_multi::testDeleteUpdateConflict : elapsed 1007 : OK
> Zookeeper_multi::testAsyncMulti : elapsed 2001 : OK
> Zookeeper_multi::testMultiFail : elapsed 1006 : OK
> Zookeeper_multi::testCheck : elapsed 1020 : OK
> Zookeeper_multi::testWatch : elapsed 2013 : OK
> Zookeeper_watchers::testDefaultSessionWatcher1zktest-mt: 
> tests/ZKMocks.cc:271: SyncedBoolCondition 
> DeliverWatchersWrapper::isDelivered() const: Assertion `i<1000' failed.
> Aborted (core dumped)
> It would appear that the zookeeper connection does not transition to 
> connected within the required time; I increased the time allowed but no 
> change.
> Ubuntu raring has glibc 2.17; the

[jira] [Created] (ZOOKEEPER-1791) ZooKeeper package includes unnecessary jars that are part of the package.

2013-10-10 Thread Mahadev konar (JIRA)
Mahadev konar created ZOOKEEPER-1791:


 Summary: ZooKeeper package includes unnecessary jars that are part 
of the package.
 Key: ZOOKEEPER-1791
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1791
 Project: ZooKeeper
  Issue Type: Bug
  Components: build
Affects Versions: 3.5.0
Reporter: Mahadev konar
Assignee: Mahadev konar
 Fix For: 3.5.0
 Attachments: ZOOKEEPER-1791.patch

ZooKeeper package includes unnecessary jars that are part of the package.

Packages like fatjar and 

{code}
maven-ant-tasks-2.1.3.jar
maven-artifact-2.2.1.jar
maven-artifact-manager-2.2.1.jar
maven-error-diagnostics-2.2.1.jar
maven-model-2.2.1.jar
maven-plugin-registry-2.2.1.jar
maven-profile-2.2.1.jar
maven-project-2.2.1.jar
maven-repository-metadata-2.2.1.jar
{code}

are part of the zookeeper package and rpm (via bigtop). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (ZOOKEEPER-1791) ZooKeeper package includes unnecessary jars that are part of the package.

2013-10-10 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1791:
-

Attachment: ZOOKEEPER-1791.patch

> ZooKeeper package includes unnecessary jars that are part of the package.
> -
>
> Key: ZOOKEEPER-1791
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1791
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.5.0
>    Reporter: Mahadev konar
>    Assignee: Mahadev konar
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-1791.patch
>
>
> ZooKeeper package includes unnecessary jars that are part of the package.
> Packages like fatjar and 
> {code}
> maven-ant-tasks-2.1.3.jar
> maven-artifact-2.2.1.jar
> maven-artifact-manager-2.2.1.jar
> maven-error-diagnostics-2.2.1.jar
> maven-model-2.2.1.jar
> maven-plugin-registry-2.2.1.jar
> maven-profile-2.2.1.jar
> maven-project-2.2.1.jar
> maven-repository-metadata-2.2.1.jar
> {code}
> are part of the zookeeper package and rpm (via bigtop). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-442) need a way to remove watches that are no longer of interest

2013-10-10 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791557#comment-13791557
 ] 

Mahadev konar commented on ZOOKEEPER-442:
-

Thanks Rakesh. Good to see the initiative. Ill read through the doc and get 
back to you. 



> need a way to remove watches that are no longer of interest
> ---
>
> Key: ZOOKEEPER-442
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-442
> Project: ZooKeeper
>  Issue Type: New Feature
>Reporter: Benjamin Reed
>Assignee: Daniel Gómez Ferro
>Priority: Critical
> Fix For: 3.5.0
>
> Attachments: Remove Watch API.pdf, ZOOKEEPER-442.patch, 
> ZOOKEEPER-442.patch, ZOOKEEPER-442.patch, ZOOKEEPER-442.patch, 
> ZOOKEEPER-442.patch, ZOOKEEPER-442.patch, ZOOKEEPER-442.patch
>
>
> currently the only way a watch cleared is to trigger it. we need a way to 
> enumerate the outstanding watch objects, find watch events the objects are 
> watching for, and remove interests in an event.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets

2013-10-08 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13790067#comment-13790067
 ] 

Mahadev konar commented on ZOOKEEPER-900:
-

[~phunt] I htink we can close this one in favor of another jira.


> FLE implementation should be improved to use non-blocking sockets
> -
>
> Key: ZOOKEEPER-900
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Vishal Kher
>Assignee: Vishal Kher
>Priority: Critical
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-900.patch, ZOOKEEPER-900.patch1, 
> ZOOKEEPER-900.patch2
>
>
> From earlier email exchanges:
> 1. Blocking connects and accepts:
> a) The first problem is in manager.toSend(). This invokes connectOne(), which 
> does a blocking connect. While testing, I changed the code so that 
> connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() 
> does a socketChannel.connect(). After starting AsyncConnect, connectOne 
> starts a timer. connectOne continues with normal operations if the connection 
> is established before the timer expires, otherwise, when the timer expires it 
> interrupts AsyncConnect() thread and returns. In this way, I can have an 
> upper bound on the amount of time we need to wait for connect to succeed. Of 
> course, this was a quick fix for my testing. Ideally, we should use Selector 
> to do non-blocking connects/accepts. I am planning to do that later once we 
> at least have a quick fix for the problem and consensus from others for the 
> real fix (this problem is big blocker for us). Note that it is OK to do 
> blocking IO in SenderWorker and RecvWorker threads since they block IO to the 
> respective peer.
> b) The blocking IO problem is not just restricted to connectOne(), but also 
> in receiveConnection(). The Listener thread calls receiveConnection() for 
> each incoming connection request. receiveConnection does blocking IO to get 
> peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the 
> peer that had sent the connection request. All of this is happening from the 
> Listener. In short, if a peer fails after initiating a connection, the 
> Listener thread won't be able to accept connections from other peers, because 
> it would be stuck in read() or connetOne(). Also the code has an inherent 
> cycle. initiateConnection() and receiveConnection() will have to be very 
> carefully synchronized otherwise, we could run into deadlocks. This code is 
> going to be difficult to maintain/modify.
> Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1147) Add support for local sessions

2013-10-08 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13788998#comment-13788998
 ] 

Mahadev konar commented on ZOOKEEPER-1147:
--

[~fpj] looks like the patch is ready to get in. You want to look through before 
we commit? 


> Add support for local sessions
> --
>
> Key: ZOOKEEPER-1147
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1147
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.3.3
>Reporter: Vishal Kathuria
>Assignee: Thawan Kooburat
>  Labels: api-change, scaling
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, 
> ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, 
> ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, 
> ZOOKEEPER-1147.patch
>
>   Original Estimate: 840h
>  Remaining Estimate: 840h
>
> This improvement is in the bucket of making ZooKeeper work at a large scale. 
> We are planning on having about a 1 million clients connect to a ZooKeeper 
> ensemble through a set of 50-100 observers. Majority of these clients are 
> read only - ie they do not do any updates or create ephemeral nodes.
> In ZooKeeper today, the client creates a session and the session creation is 
> handled like any other update. In the above use case, the session create/drop 
> workload can easily overwhelm an ensemble. The following is a proposal for a 
> "local session", to support a larger number of connections.
> 1.   The idea is to introduce a new type of session - "local" session. A 
> "local" session doesn't have a full functionality of a normal session.
> 2.   Local sessions cannot create ephemeral nodes.
> 3.   Once a local session is lost, you cannot re-establish it using the 
> session-id/password. The session and its watches are gone for good.
> 4.   When a local session connects, the session info is only maintained 
> on the zookeeper server (in this case, an observer) that it is connected to. 
> The leader is not aware of the creation of such a session and there is no 
> state written to disk.
> 5.   The pings and expiration is handled by the server that the session 
> is connected to.
> With the above changes, we can make ZooKeeper scale to a much larger number 
> of clients without making the core ensemble a bottleneck.
> In terms of API, there are two options that are being considered
> 1. Let the client specify at the connect time which kind of session do they 
> want.
> 2. All sessions connect as local sessions and automatically get promoted to 
> global sessions when they do an operation that requires a global session 
> (e.g. creating an ephemeral node)
> Chubby took the approach of lazily promoting all sessions to global, but I 
> don't think that would work in our case, where we want to keep sessions which 
> never create ephemeral nodes as always local. Option 2 would make it more 
> broadly usable but option 1 would be easier to implement.
> We are thinking of implementing option 1 as the first cut. There would be a 
> client flag, IsLocalSession (much like the current readOnly flag) that would 
> be used to determine whether to create a local session or a global session.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (ZOOKEEPER-1147) Add support for local sessions

2013-10-08 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1147:
-

Attachment: ZOOKEEPER-1147.patch

Minor conflict with the current patch fails on applying with 
QuorumPeerMain.java - attaching a new one which fixes the conflict.

> Add support for local sessions
> --
>
> Key: ZOOKEEPER-1147
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1147
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.3.3
>Reporter: Vishal Kathuria
>Assignee: Thawan Kooburat
>  Labels: api-change, scaling
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, 
> ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, 
> ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, 
> ZOOKEEPER-1147.patch
>
>   Original Estimate: 840h
>  Remaining Estimate: 840h
>
> This improvement is in the bucket of making ZooKeeper work at a large scale. 
> We are planning on having about a 1 million clients connect to a ZooKeeper 
> ensemble through a set of 50-100 observers. Majority of these clients are 
> read only - ie they do not do any updates or create ephemeral nodes.
> In ZooKeeper today, the client creates a session and the session creation is 
> handled like any other update. In the above use case, the session create/drop 
> workload can easily overwhelm an ensemble. The following is a proposal for a 
> "local session", to support a larger number of connections.
> 1.   The idea is to introduce a new type of session - "local" session. A 
> "local" session doesn't have a full functionality of a normal session.
> 2.   Local sessions cannot create ephemeral nodes.
> 3.   Once a local session is lost, you cannot re-establish it using the 
> session-id/password. The session and its watches are gone for good.
> 4.   When a local session connects, the session info is only maintained 
> on the zookeeper server (in this case, an observer) that it is connected to. 
> The leader is not aware of the creation of such a session and there is no 
> state written to disk.
> 5.   The pings and expiration is handled by the server that the session 
> is connected to.
> With the above changes, we can make ZooKeeper scale to a much larger number 
> of clients without making the core ensemble a bottleneck.
> In terms of API, there are two options that are being considered
> 1. Let the client specify at the connect time which kind of session do they 
> want.
> 2. All sessions connect as local sessions and automatically get promoted to 
> global sessions when they do an operation that requires a global session 
> (e.g. creating an ephemeral node)
> Chubby took the approach of lazily promoting all sessions to global, but I 
> don't think that would work in our case, where we want to keep sessions which 
> never create ephemeral nodes as always local. Option 2 would make it more 
> broadly usable but option 1 would be easier to implement.
> We are thinking of implementing option 1 as the first cut. There would be a 
> client flag, IsLocalSession (much like the current readOnly flag) that would 
> be used to determine whether to create a local session or a global session.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-442) need a way to remove watches that are no longer of interest

2013-10-07 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13788503#comment-13788503
 ] 

Mahadev konar commented on ZOOKEEPER-442:
-

[~eribeiro] if you are interested, feel free to take it up. I'd be happy to 
provide guidance/other help on this.

Thanks

> need a way to remove watches that are no longer of interest
> ---
>
> Key: ZOOKEEPER-442
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-442
> Project: ZooKeeper
>  Issue Type: New Feature
>Reporter: Benjamin Reed
>Assignee: Daniel Gómez Ferro
>Priority: Critical
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-442.patch, ZOOKEEPER-442.patch, 
> ZOOKEEPER-442.patch, ZOOKEEPER-442.patch, ZOOKEEPER-442.patch, 
> ZOOKEEPER-442.patch, ZOOKEEPER-442.patch
>
>
> currently the only way a watch cleared is to trigger it. we need a way to 
> enumerate the outstanding watch objects, find watch events the objects are 
> watching for, and remove interests in an event.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (ZOOKEEPER-1696) Fail to run zookeeper client on Weblogic application server

2013-09-27 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar resolved ZOOKEEPER-1696.
--

Resolution: Fixed

Committed the right patch. Thanks Jeffrey!

> Fail to run zookeeper client on Weblogic application server
> ---
>
> Key: ZOOKEEPER-1696
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1696
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.5
> Environment: Java version: jdk170_06
> WebLogic Server Version: 10.3.6.0 
>Reporter: Dmitry Konstantinov
>Assignee: Jeffrey Zhong
>Priority: Critical
> Fix For: 3.4.6
>
> Attachments: zookeeper-1696.patch, zookeeper-1696-v1.patch, 
> zookeeper-1696-v2.patch
>
>
> The problem in details is described here: 
> http://comments.gmane.org/gmane.comp.java.zookeeper.user/2897
> The provided link also contains a reference to fix implementation.
> {noformat}
>    
>   <[ACTIVE] ExecuteThread: '2' for queue: 
> 'weblogic.kernel.Default (devapp090:2182)>  <> <> <1366794208810> 
>   null, unexpected error, closing socket connection and attempting reconnect
> java.lang.IllegalArgumentException: No Configuration was registered that can 
> handle the configuration named Client
> at 
> com.bea.common.security.jdkutils.JAASConfiguration.getAppConfigurationEntry(JAASConfiguration.java:130)
> at 
> org.apache.zookeeper.client.ZooKeeperSaslClient.(ZooKeeperSaslClient.java:97)
> at 
> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:943)
> at 
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:993)
> >
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (ZOOKEEPER-1696) Fail to run zookeeper client on Weblogic application server

2013-09-27 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar reopened ZOOKEEPER-1696:
--


Reopening looks like I committed the wrong patch.

> Fail to run zookeeper client on Weblogic application server
> ---
>
> Key: ZOOKEEPER-1696
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1696
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.5
> Environment: Java version: jdk170_06
> WebLogic Server Version: 10.3.6.0 
>Reporter: Dmitry Konstantinov
>Assignee: Jeffrey Zhong
>Priority: Critical
> Fix For: 3.4.6
>
> Attachments: zookeeper-1696.patch, zookeeper-1696-v1.patch, 
> zookeeper-1696-v2.patch
>
>
> The problem in details is described here: 
> http://comments.gmane.org/gmane.comp.java.zookeeper.user/2897
> The provided link also contains a reference to fix implementation.
> {noformat}
>    
>   <[ACTIVE] ExecuteThread: '2' for queue: 
> 'weblogic.kernel.Default (devapp090:2182)>  <> <> <1366794208810> 
>   null, unexpected error, closing socket connection and attempting reconnect
> java.lang.IllegalArgumentException: No Configuration was registered that can 
> handle the configuration named Client
> at 
> com.bea.common.security.jdkutils.JAASConfiguration.getAppConfigurationEntry(JAASConfiguration.java:130)
> at 
> org.apache.zookeeper.client.ZooKeeperSaslClient.(ZooKeeperSaslClient.java:97)
> at 
> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:943)
> at 
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:993)
> >
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1696) Fail to run zookeeper client on Weblogic application server

2013-09-17 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13770315#comment-13770315
 ] 

Mahadev konar commented on ZOOKEEPER-1696:
--

+1 for the patch. Given it ran through jenkins committing this to 3.4 and trunk.

> Fail to run zookeeper client on Weblogic application server
> ---
>
> Key: ZOOKEEPER-1696
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1696
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.5
> Environment: Java version: jdk170_06
> WebLogic Server Version: 10.3.6.0 
>Reporter: Dmitry Konstantinov
>Assignee: Jeffrey Zhong
>Priority: Critical
> Fix For: 3.4.6
>
> Attachments: zookeeper-1696.patch
>
>
> The problem in details is described here: 
> http://comments.gmane.org/gmane.comp.java.zookeeper.user/2897
> The provided link also contains a reference to fix implementation.
> {noformat}
>    
>   <[ACTIVE] ExecuteThread: '2' for queue: 
> 'weblogic.kernel.Default (devapp090:2182)>  <> <> <1366794208810> 
>   null, unexpected error, closing socket connection and attempting reconnect
> java.lang.IllegalArgumentException: No Configuration was registered that can 
> handle the configuration named Client
> at 
> com.bea.common.security.jdkutils.JAASConfiguration.getAppConfigurationEntry(JAASConfiguration.java:130)
> at 
> org.apache.zookeeper.client.ZooKeeperSaslClient.(ZooKeeperSaslClient.java:97)
> at 
> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:943)
> at 
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:993)
> >
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (ZOOKEEPER-1696) Fail to run zookeeper client on Weblogic application server

2013-09-17 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13770315#comment-13770315
 ] 

Mahadev konar edited comment on ZOOKEEPER-1696 at 9/18/13 2:10 AM:
---

+1 for the patch. Given it ran through jenkins we can commit this to 3.4 and 
trunk.

  was (Author: mahadev):
+1 for the patch. Given it ran through jenkins committing this to 3.4 and 
trunk.
  
> Fail to run zookeeper client on Weblogic application server
> ---
>
> Key: ZOOKEEPER-1696
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1696
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.5
> Environment: Java version: jdk170_06
> WebLogic Server Version: 10.3.6.0 
>Reporter: Dmitry Konstantinov
>Assignee: Jeffrey Zhong
>Priority: Critical
> Fix For: 3.4.6
>
> Attachments: zookeeper-1696.patch
>
>
> The problem in details is described here: 
> http://comments.gmane.org/gmane.comp.java.zookeeper.user/2897
> The provided link also contains a reference to fix implementation.
> {noformat}
>    
>   <[ACTIVE] ExecuteThread: '2' for queue: 
> 'weblogic.kernel.Default (devapp090:2182)>  <> <> <1366794208810> 
>   null, unexpected error, closing socket connection and attempting reconnect
> java.lang.IllegalArgumentException: No Configuration was registered that can 
> handle the configuration named Client
> at 
> com.bea.common.security.jdkutils.JAASConfiguration.getAppConfigurationEntry(JAASConfiguration.java:130)
> at 
> org.apache.zookeeper.client.ZooKeeperSaslClient.(ZooKeeperSaslClient.java:97)
> at 
> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:943)
> at 
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:993)
> >
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1751) ClientCnxn#run could miss the second ping or connection get dropped before a ping

2013-09-17 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13770310#comment-13770310
 ] 

Mahadev konar commented on ZOOKEEPER-1751:
--

+1 for the patch. This is good to have since it can cause some race conditions 
during the  client pings.

> ClientCnxn#run could miss the second ping or connection get dropped before a 
> ping
> -
>
> Key: ZOOKEEPER-1751
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1751
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.5
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
> Fix For: 3.4.6
>
> Attachments: zookeeper-1751.patch
>
>
> We could throw SessionTimeoutException exception even when timeToNextPing may 
> also be negative depending on the time when the following line is executed by 
> the thread because we check time out before sending a ping.
> {code}
>   to = readTimeout - clientCnxnSocket.getIdleRecv();
> {code}
> In addition, we only ping twice no matter how long the session time out value 
> is. For example, we set session time out = 60mins then we only try ping twice 
> in 40mins window. Therefore, the connection could be dropped by OS after idle 
> time out.
> The issue is causing randomly "connection loss" or "session expired" issues 
> in client side which is bad for applications like HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1751) ClientCnxn#run could miss the second ping or connection get dropped before a ping

2013-09-17 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1751:
-

Fix Version/s: 3.4.6

> ClientCnxn#run could miss the second ping or connection get dropped before a 
> ping
> -
>
> Key: ZOOKEEPER-1751
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1751
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.5
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
> Fix For: 3.4.6
>
> Attachments: zookeeper-1751.patch
>
>
> We could throw SessionTimeoutException exception even when timeToNextPing may 
> also be negative depending on the time when the following line is executed by 
> the thread because we check time out before sending a ping.
> {code}
>   to = readTimeout - clientCnxnSocket.getIdleRecv();
> {code}
> In addition, we only ping twice no matter how long the session time out value 
> is. For example, we set session time out = 60mins then we only try ping twice 
> in 40mins window. Therefore, the connection could be dropped by OS after idle 
> time out.
> The issue is causing randomly "connection loss" or "session expired" issues 
> in client side which is bad for applications like HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes

2013-09-17 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13770309#comment-13770309
 ] 

Mahadev konar commented on ZOOKEEPER-1733:
--

Running this through jenkins.

> FLETest#testLE is flaky on windows boxes
> 
>
> Key: ZOOKEEPER-1733
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.5
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
> Fix For: 3.5.0
>
> Attachments: zookeeper-1733.patch
>
>
> FLETest#testLE fail intermittently on windows boxes. The reason is that in 
> LEThread#run() we have:
> {code}
> if(leader == i){
> synchronized(finalObj){
> successCount++;
> if(successCount > (count/2)) 
> finalObj.notify();
> }
> break;
> }
> {code}
> Basically once we have a confirmed leader, the leader thread dies due to the 
> "break" of while loop. 
> While in the verification step, we check if the leader thread alive or not as 
> following:
> {code}
>if(threads.get((int) leader).isAlive()){
>Assert.fail("Leader hasn't joined: " + leader);
>}
> {code}
> On windows boxes, the above verification step fails frequently because leader 
> thread most likely already exits.
> Do we know why we have the leader alive verification step only lead thread 
> can bump up successCount >= count/2?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes

2013-09-17 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1733:
-

Fix Version/s: (was: 3.4.6)
   3.5.0

> FLETest#testLE is flaky on windows boxes
> 
>
> Key: ZOOKEEPER-1733
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.5
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
> Fix For: 3.5.0
>
> Attachments: zookeeper-1733.patch
>
>
> FLETest#testLE fail intermittently on windows boxes. The reason is that in 
> LEThread#run() we have:
> {code}
> if(leader == i){
> synchronized(finalObj){
> successCount++;
> if(successCount > (count/2)) 
> finalObj.notify();
> }
> break;
> }
> {code}
> Basically once we have a confirmed leader, the leader thread dies due to the 
> "break" of while loop. 
> While in the verification step, we check if the leader thread alive or not as 
> following:
> {code}
>if(threads.get((int) leader).isAlive()){
>Assert.fail("Leader hasn't joined: " + leader);
>}
> {code}
> On windows boxes, the above verification step fails frequently because leader 
> thread most likely already exits.
> Do we know why we have the leader alive verification step only lead thread 
> can bump up successCount >= count/2?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes

2013-09-17 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1733:
-

Fix Version/s: 3.4.6

> FLETest#testLE is flaky on windows boxes
> 
>
> Key: ZOOKEEPER-1733
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.5
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
> Fix For: 3.4.6
>
> Attachments: zookeeper-1733.patch
>
>
> FLETest#testLE fail intermittently on windows boxes. The reason is that in 
> LEThread#run() we have:
> {code}
> if(leader == i){
> synchronized(finalObj){
> successCount++;
> if(successCount > (count/2)) 
> finalObj.notify();
> }
> break;
> }
> {code}
> Basically once we have a confirmed leader, the leader thread dies due to the 
> "break" of while loop. 
> While in the verification step, we check if the leader thread alive or not as 
> following:
> {code}
>if(threads.get((int) leader).isAlive()){
>Assert.fail("Leader hasn't joined: " + leader);
>}
> {code}
> On windows boxes, the above verification step fails frequently because leader 
> thread most likely already exits.
> Do we know why we have the leader alive verification step only lead thread 
> can bump up successCount >= count/2?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1696) Fail to run zookeeper client on Weblogic application server

2013-09-17 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13770307#comment-13770307
 ] 

Mahadev konar commented on ZOOKEEPER-1696:
--

The same patch applies to 3.4 and trunk. 

> Fail to run zookeeper client on Weblogic application server
> ---
>
> Key: ZOOKEEPER-1696
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1696
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.5
> Environment: Java version: jdk170_06
> WebLogic Server Version: 10.3.6.0 
>Reporter: Dmitry Konstantinov
>Assignee: Jeffrey Zhong
>Priority: Critical
> Fix For: 3.4.6
>
> Attachments: zookeeper-1696.patch
>
>
> The problem in details is described here: 
> http://comments.gmane.org/gmane.comp.java.zookeeper.user/2897
> The provided link also contains a reference to fix implementation.
> {noformat}
>    
>   <[ACTIVE] ExecuteThread: '2' for queue: 
> 'weblogic.kernel.Default (devapp090:2182)>  <> <> <1366794208810> 
>   null, unexpected error, closing socket connection and attempting reconnect
> java.lang.IllegalArgumentException: No Configuration was registered that can 
> handle the configuration named Client
> at 
> com.bea.common.security.jdkutils.JAASConfiguration.getAppConfigurationEntry(JAASConfiguration.java:130)
> at 
> org.apache.zookeeper.client.ZooKeeperSaslClient.(ZooKeeperSaslClient.java:97)
> at 
> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:943)
> at 
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:993)
> >
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1696) Fail to run zookeeper client on Weblogic application server

2013-09-17 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1696:
-

Fix Version/s: 3.4.6

> Fail to run zookeeper client on Weblogic application server
> ---
>
> Key: ZOOKEEPER-1696
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1696
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.5
> Environment: Java version: jdk170_06
> WebLogic Server Version: 10.3.6.0 
>Reporter: Dmitry Konstantinov
>Assignee: Jeffrey Zhong
>Priority: Critical
> Fix For: 3.4.6
>
> Attachments: zookeeper-1696.patch
>
>
> The problem in details is described here: 
> http://comments.gmane.org/gmane.comp.java.zookeeper.user/2897
> The provided link also contains a reference to fix implementation.
> {noformat}
>    
>   <[ACTIVE] ExecuteThread: '2' for queue: 
> 'weblogic.kernel.Default (devapp090:2182)>  <> <> <1366794208810> 
>   null, unexpected error, closing socket connection and attempting reconnect
> java.lang.IllegalArgumentException: No Configuration was registered that can 
> handle the configuration named Client
> at 
> com.bea.common.security.jdkutils.JAASConfiguration.getAppConfigurationEntry(JAASConfiguration.java:130)
> at 
> org.apache.zookeeper.client.ZooKeeperSaslClient.(ZooKeeperSaslClient.java:97)
> at 
> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:943)
> at 
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:993)
> >
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1657) Increased CPU usage by unnecessary SASL checks

2013-09-08 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761595#comment-13761595
 ] 

Mahadev konar commented on ZOOKEEPER-1657:
--

+1 for the patch. Looks good. Thanks Eugene/Flavio.

> Increased CPU usage by unnecessary SASL checks
> --
>
> Key: ZOOKEEPER-1657
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1657
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.5
>Reporter: Gunnar Wagenknecht
>Assignee: Philip K. Warren
>  Labels: performance
> Fix For: 3.5.0, 3.4.6
>
> Attachments: ZOOKEEPER-1657.patch, ZOOKEEPER-1657.patch, 
> ZOOKEEPER-1657.patch, ZOOKEEPER-1657.patch, ZOOKEEPER-1657.patch, 
> zookeeper-hotspot-gone.png, zookeeper-hotspot.png
>
>
> I did some profiling in one of our Java environments and found an interesting 
> footprint in ZooKeeper. The SASL support seems to trigger a lot times on the 
> client although it's not even in use.
> Is there a switch to disable SASL completely?
> The attached screenshot shows a 10-minute profiling session on one of our 
> production Jetty servers. The Jetty server handles ~1k web requests per 
> minute. The average response time per web request is a few milli seconds. The 
> profiling was performed on a machine running for >24h. 
> We noticed a significant CPU increase on our servers when deploying an update 
> from ZooKeeper 3.3.2 to ZooKeeper 3.4.5. Thus, we started investigating. The 
> screenshot shows that only 32% CPU time are spent in Jetty. In contrast, 65% 
> are spend in ZooKeeper. 
> A few notes/thoughts:
> * {{ClientCnxn$SendThread.clientTunneledAuthenticationInProgress}} seems to 
> be the culprit
> * {{javax.security.auth.login.Configuration.getConfiguration}} seems to be 
> called very often?
> * There is quite a bit reflection involved in 
> {{java.security.AccessController.doPrivileged}}
> * No security manager is active in the JVM: I tend to place an if-check in 
> the code before calling {{AccessController.doPrivileged}}. When no SM is 
> installed, the runnable can be called directly which safes cycles.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [Release 3.5.0] Any news yet?

2013-07-10 Thread Mahadev Konar
It would be good if Flavio wants to try doing the RM. Flavio?

thanks
mahadev

On Wed, Jul 10, 2013 at 10:20 AM, Patrick Hunt  wrote:
> Mahadev do you want to RM 3.4.6 or should Flavio try his hand at doing
> a release?
>
> Patrick
>
> On Wed, Jul 10, 2013 at 9:50 AM, Mahadev Konar  
> wrote:
>> 1147 is pretty close. I am working on getting this into trunk.
>>
>> Hopefully today/tomm.
>>
>> thanks
>> mahadev
>>
>> On Wed, Jul 10, 2013 at 7:01 AM, Flavio Junqueira  
>> wrote:
>>> I've also been wondering about 3.4.6. I don't mind being the RM for 3.5.0 
>>> if you want to do 3.4.6.
>>>
>>> -Flavio
>>>
>>> On Jul 9, 2013, at 5:49 PM, Patrick Hunt  wrote:
>>>
>>>> I'd like to see a 3.5.0-alpha soon. I agree re 1147 and iirc it was
>>>> pretty close (Mahadev?). ZOOKEEPER-1346 (jetty support for monitoring)
>>>> should also go in. It's pretty much ready afair.
>>>>
>>>> I'm happy to RM 3.5 if we can get past these open issues.
>>>>
>>>> Patrick
>>>>
>>>>
>>>> On Tue, Jul 9, 2013 at 8:32 AM, Raúl Gutiérrez Segalés
>>>>  wrote:
>>>>> Hi Stefan,
>>>>>
>>>>> On 9 July 2013 08:13, Stefan Egli  wrote:
>>>>>> Hi,
>>>>>>
>>>>>> We're evaluating using ZooKeeper, and esp the embedded mode 
>>>>>> (ZOOKEEPER-107 - [0]), for an implementation of the Sling Discovery API 
>>>>>> ([1]). Since ZOOKEEPER-107 is planned for 3.5.0 I was wondering what the 
>>>>>> release schedule of 3.5.0 is, or any plan thereof? (I saw a discussion 
>>>>>> about releasing it from Dec 2012 [1]).
>>>>>>
>>>>>
>>>>> I think as of now the biggest blocker is:
>>>>>
>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-1147
>>>>>
>>>>> Besides needing a final review it needs better documentation and an
>>>>> extra small patch (I proposed one) to support rolling updates when
>>>>> enabling local sessions.
>>>>>
>>>>> Cheers,
>>>>> -rgs
>>>


Re: [Release 3.5.0] Any news yet?

2013-07-10 Thread Mahadev Konar
1147 is pretty close. I am working on getting this into trunk.

Hopefully today/tomm.

thanks
mahadev

On Wed, Jul 10, 2013 at 7:01 AM, Flavio Junqueira  wrote:
> I've also been wondering about 3.4.6. I don't mind being the RM for 3.5.0 if 
> you want to do 3.4.6.
>
> -Flavio
>
> On Jul 9, 2013, at 5:49 PM, Patrick Hunt  wrote:
>
>> I'd like to see a 3.5.0-alpha soon. I agree re 1147 and iirc it was
>> pretty close (Mahadev?). ZOOKEEPER-1346 (jetty support for monitoring)
>> should also go in. It's pretty much ready afair.
>>
>> I'm happy to RM 3.5 if we can get past these open issues.
>>
>> Patrick
>>
>>
>> On Tue, Jul 9, 2013 at 8:32 AM, Raúl Gutiérrez Segalés
>>  wrote:
>>> Hi Stefan,
>>>
>>> On 9 July 2013 08:13, Stefan Egli  wrote:
 Hi,

 We're evaluating using ZooKeeper, and esp the embedded mode (ZOOKEEPER-107 
 - [0]), for an implementation of the Sling Discovery API ([1]). Since 
 ZOOKEEPER-107 is planned for 3.5.0 I was wondering what the release 
 schedule of 3.5.0 is, or any plan thereof? (I saw a discussion about 
 releasing it from Dec 2012 [1]).

>>>
>>> I think as of now the biggest blocker is:
>>>
>>> https://issues.apache.org/jira/browse/ZOOKEEPER-1147
>>>
>>> Besides needing a final review it needs better documentation and an
>>> extra small patch (I proposed one) to support rolling updates when
>>> enabling local sessions.
>>>
>>> Cheers,
>>> -rgs
>


[jira] [Commented] (ZOOKEEPER-767) Submitting Demo/Recipe Shared / Exclusive Lock Code

2013-05-15 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658530#comment-13658530
 ] 

Mahadev konar commented on ZOOKEEPER-767:
-

Flavio,
 Agreed, I think its definitely a better match for Curator. 

> Submitting Demo/Recipe Shared / Exclusive Lock Code
> ---
>
> Key: ZOOKEEPER-767
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-767
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: recipes
>Affects Versions: 3.3.0
>Reporter: Sam Baskinger
>Assignee: Sam Baskinger
>Priority: Minor
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-767.patch, ZOOKEEPER-767.patch, 
> ZOOKEEPER-767.patch, ZOOKEEPER-767.patch, ZOOKEEPER-767.patch, 
> ZOOKEEPER-767.patch
>
>  Time Spent: 8h
>
> Networked Insights would like to share-back some code for shared/exclusive 
> locking that we are using in our labs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1686) Publish ZK 3.4.5 test jar

2013-04-06 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1686:
-

Assignee: Mahadev konar

> Publish ZK 3.4.5 test jar
> -
>
> Key: ZOOKEEPER-1686
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1686
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: build, tests
>Affects Versions: 3.4.5
>Reporter: Todd Lipcon
>    Assignee: Mahadev konar
>
> ZooKeeper 3.4.2 used to publish a jar with the tests classifier for use by 
> downstream project tests. It seems this didn't get published for 3.4.4 or 
> 3.4.5 (see 
> https://repository.apache.org/index.html#nexus-search;quick~org.apache.zookeeper).
>  Would someone mind please publishing these artifacts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1657) Increased CPU usage by unnecessary SASL checks

2013-03-03 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1657:
-

Fix Version/s: 3.4.6
   3.5.0

> Increased CPU usage by unnecessary SASL checks
> --
>
> Key: ZOOKEEPER-1657
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1657
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.5
>Reporter: Gunnar Wagenknecht
>  Labels: performance
> Fix For: 3.5.0, 3.4.6
>
> Attachments: ZOOKEEPER-1657.patch, ZOOKEEPER-1657.patch, 
> ZOOKEEPER-1657.patch, zookeeper-hotspot.png
>
>
> I did some profiling in one of our Java environments and found an interesting 
> footprint in ZooKeeper. The SASL support seems to trigger a lot times on the 
> client although it's not even in use.
> Is there a switch to disable SASL completely?
> The attached screenshot shows a 10-minute profiling session on one of our 
> production Jetty servers. The Jetty server handles ~1k web requests per 
> minute. The average response time per web request is a few milli seconds. The 
> profiling was performed on a machine running for >24h. 
> We noticed a significant CPU increase on our servers when deploying an update 
> from ZooKeeper 3.3.2 to ZooKeeper 3.4.5. Thus, we started investigating. The 
> screenshot shows that only 32% CPU time are spent in Jetty. In contrast, 65% 
> are spend in ZooKeeper. 
> A few notes/thoughts:
> * {{ClientCnxn$SendThread.clientTunneledAuthenticationInProgress}} seems to 
> be the culprit
> * {{javax.security.auth.login.Configuration.getConfiguration}} seems to be 
> called very often?
> * There is quite a bit reflection involved in 
> {{java.security.AccessController.doPrivileged}}
> * No security manager is active in the JVM: I tend to place an if-check in 
> the code before calling {{AccessController.doPrivileged}}. When no SM is 
> installed, the runnable can be called directly which safes cycles.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1551) Observer ignore txns that comes after snapshot and UPTODATE

2013-03-03 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13592043#comment-13592043
 ] 

Mahadev konar commented on ZOOKEEPER-1551:
--

[~fpj] would you be able to review the latest patch?

> Observer ignore txns that comes after snapshot and UPTODATE 
> 
>
> Key: ZOOKEEPER-1551
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1551
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum, server
>Affects Versions: 3.4.3
>Reporter: Thawan Kooburat
>Assignee: Thawan Kooburat
>Priority: Blocker
> Fix For: 3.5.0, 3.4.6
>
> Attachments: ZOOKEEPER-1551.patch, ZOOKEEPER-1551.patch
>
>
> In Learner.java, txns which comes after the learner has taken the snapshot 
> (after NEWLEADER packet) are stored in packetsNotCommitted. The follower has 
> special logic to apply these txns at the end of syncWithLeader() method. 
> However, the observer will ignore these txns completely, causing data 
> inconsistency. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures

2013-03-03 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13592041#comment-13592041
 ] 

Mahadev konar commented on ZOOKEEPER-1382:
--

Michael,
 Would you be able to upload a patch for trunk as well?

> Zookeeper server holds onto dead/expired session ids in the watch data 
> structures
> -
>
> Key: ZOOKEEPER-1382
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.5
>Reporter: Neha Narkhede
>Assignee: Neha Narkhede
>Priority: Critical
> Fix For: 3.4.6
>
> Attachments: ZOOKEEPER-1382_3.3.4.patch, 
> ZOOKEEPER-1382-branch-3.4.patch
>
>
> I've observed that zookeeper server holds onto expired session ids in the 
> watcher data structures. The result is the wchp command reports session ids 
> that cannot be found through cons/dump and those expired session ids sit 
> there maybe until the server is restarted. Here are snippets from the client 
> and the server logs that lead to this state, for one particular session id 
> 0x134485fd7bcb26f -
> There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 
> and I'm using ZkClient to connect to the cluster
> From the application log -
> application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] 
> [main-SendThread(223.prod:12913)] [application Session establishment complete 
> on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, 
> negotiated timeout = 6000
> application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] 
> [main-SendThread(223.prod:12913)] [application] Client session timed out, 
> have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing 
> socket connection and attempting reconnect
> application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] 
> [main-SendThread(226.prod:12913)] [application] Unable to reconnect to 
> ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket 
> connection
> On the leader zk, 225 -
> zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
> [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, 
> timeout of 6000ms exceeded
> zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
> [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination 
> for sessionid: 0x134485fd7bcb26f
> On the server, the client was initially connected to, 223 -
> zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO  
> [CommitProcessor:1:NIOServerCnxn@1580] - Established session 
> 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020
> zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO  
> [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client 
> /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f
> Here are the log snippets from 226, which is the server, the client 
> reconnected to, before getting session expired event -
> 2012-01-27 09:52:38,190 - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client 
> attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367
> 2012-01-27 09:52:38,191 - INFO  
> [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 
> 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired
> 2012-01-27 09:52:38,191 - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed 
> socket connection for client /172.17.136.82:49367 which had sessionid 
> 0x134485fd7bcb26f
> wchp output from 226, taken on 01/30 -
> nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
> *226.*wchp* | wc -l
> 3
> wchp output from 223, taken on 01/30 -
> nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
> *223.*wchp* | wc -l
> 0
> cons output from 223 and 226, taken on 01/30 -
> nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
> *226.*cons* | wc -l
> 0
> nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
> *223.*cons* | wc -l
> 0
> So, what seems to have happened is that the client was able to re-register 
> the watches on the new server (226), after it got disconnected from 223, 
> inspite of having an expired session id. 
> In NIOServerCnxn, I saw that after suspecting that a session is expired, a 
> server removes t

[jira] [Updated] (ZOOKEEPER-1624) PrepRequestProcessor abort multi-operation incorrectly

2013-01-17 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1624:
-

Fix Version/s: 3.5.0

> PrepRequestProcessor abort multi-operation incorrectly
> --
>
> Key: ZOOKEEPER-1624
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1624
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Reporter: Thawan Kooburat
>Assignee: Thawan Kooburat
>Priority: Critical
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-1624.patch
>
>
> We found this issue when trying to issue multiple instances of the following 
> multi-op concurrently
> multi {
> 1. create sequential node /a- 
> 2. create node /b
> }
> The expected result is that only the first multi-op request should success 
> and the rest of request should fail because /b is already exist
> However, the reported result is that the subsequence multi-op failed because 
> of sequential node creation failed which is not possible.
> Below is the return code for each sub-op when issuing 3 instances of the 
> above multi-op asynchronously
> 1. ZOK, ZOK
> 2. ZOK, ZNODEEXISTS,
> 3. ZNODEEXISTS, ZRUNTIMEINCONSISTENCY,
> When I added more debug log. The cause is that PrepRequestProcessor rollback 
> outstandingChanges of the second multi-op incorrectly causing sequential node 
> name generation to be incorrect. Below is the sequential node name generated 
> by PrepRequestProcessor
> 1. create /a-0001
> 2. create /a-0003
> 3. create /a-0001
> The bug is getPendingChanges() method. In failed to copied ChangeRecord for 
> the parent node ("/").  So rollbackPendingChanges() cannot restore the right 
> previous change record of the parent node when aborting the second multi-op
> The impact of this bug is that sequential node creation on the same parent 
> node may fail until the previous one is committed. I am not sure if there is 
> other implication or not.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2013-01-17 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1621:
-

Assignee: Mahadev konar

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Assignee: Mahadev konar
> Fix For: 3.5.0
>
> Attachments: zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> It seems to me that writing the transaction log should be fully atomic to 
> avoid such situations. Is this not the case?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2013-01-17 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557022#comment-13557022
 ] 

Mahadev konar commented on ZOOKEEPER-1621:
--

Looks like the header was incomplete. Unfortunately we do not handle corrupt 
header but do handle corrupt txn's later. Am suprised that this happened twice 
in a row for 2 users. Ill upload a patch and test case.

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
> Fix For: 3.5.0
>
> Attachments: zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> It seems to me that writing the transaction log should be fully atomic to 
> avoid such situations. Is this not the case?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1147) Add support for local sessions

2013-01-17 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557012#comment-13557012
 ] 

Mahadev konar commented on ZOOKEEPER-1147:
--

[~thawan] I thin the above scenario is ok. The only issue I think we have is 
the sensitive local sessions. Since we have had too many issues with 
disconnects and session expiry I think this might cause more issues than we 
already have. Is there something we can do here? I cant seem to find a way 
around it without doing client side changes.


> Add support for local sessions
> --
>
> Key: ZOOKEEPER-1147
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1147
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.3.3
>Reporter: Vishal Kathuria
>Assignee: Thawan Kooburat
>  Labels: api-change, scaling
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-1147.patch
>
>   Original Estimate: 840h
>  Remaining Estimate: 840h
>
> This improvement is in the bucket of making ZooKeeper work at a large scale. 
> We are planning on having about a 1 million clients connect to a ZooKeeper 
> ensemble through a set of 50-100 observers. Majority of these clients are 
> read only - ie they do not do any updates or create ephemeral nodes.
> In ZooKeeper today, the client creates a session and the session creation is 
> handled like any other update. In the above use case, the session create/drop 
> workload can easily overwhelm an ensemble. The following is a proposal for a 
> "local session", to support a larger number of connections.
> 1.   The idea is to introduce a new type of session - "local" session. A 
> "local" session doesn't have a full functionality of a normal session.
> 2.   Local sessions cannot create ephemeral nodes.
> 3.   Once a local session is lost, you cannot re-establish it using the 
> session-id/password. The session and its watches are gone for good.
> 4.   When a local session connects, the session info is only maintained 
> on the zookeeper server (in this case, an observer) that it is connected to. 
> The leader is not aware of the creation of such a session and there is no 
> state written to disk.
> 5.   The pings and expiration is handled by the server that the session 
> is connected to.
> With the above changes, we can make ZooKeeper scale to a much larger number 
> of clients without making the core ensemble a bottleneck.
> In terms of API, there are two options that are being considered
> 1. Let the client specify at the connect time which kind of session do they 
> want.
> 2. All sessions connect as local sessions and automatically get promoted to 
> global sessions when they do an operation that requires a global session 
> (e.g. creating an ephemeral node)
> Chubby took the approach of lazily promoting all sessions to global, but I 
> don't think that would work in our case, where we want to keep sessions which 
> never create ephemeral nodes as always local. Option 2 would make it more 
> broadly usable but option 1 would be easier to implement.
> We are thinking of implementing option 1 as the first cut. There would be a 
> client flag, IsLocalSession (much like the current readOnly flag) that would 
> be used to determine whether to create a local session or a global session.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1572) Add an async interface for multi request

2013-01-17 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556983#comment-13556983
 ] 

Mahadev konar commented on ZOOKEEPER-1572:
--

The patch looks good to me. Will go ahead and commit after running through 
hudson.


> Add an async interface for multi request
> 
>
> Key: ZOOKEEPER-1572
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1572
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: java client
>Reporter: Sijie Guo
>Assignee: Sijie Guo
>  Labels: review
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-1572.diff, ZOOKEEPER-1572.diff
>
>
> Currently there is no async interface for multi request in ZooKeeper java 
> client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (ZOOKEEPER-1147) Add support for local sessions

2013-01-17 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556316#comment-13556316
 ] 

Mahadev konar edited comment on ZOOKEEPER-1147 at 1/17/13 3:42 PM:
---

bq. Yes, a session retains the same ID when it is upgraded from local session 
to global session. I think this is desirable. Can you elaborate why this may 
cause problem?

Yes its desirable. Before I comment on what I think might be wrong, when does 
the server who has the local sessionid remove it from its data structures? Is 
it when it gets a create session in final request processor? Until then the 
session is  a local session? 


  was (Author: mahadev):
bq. Yes, a session retains the same ID when it is upgraded from local 
session to global session. I think this is desirable. Can you elaborate why 
this may cause problem?

Yes its desirable. Before I comment on what I think might be wrong, when does 
the server who has the local sessionid remove it from its data structures? Is 
it when it gets a response from in final request processor about the session 
creation? Until then the session is in a local session? 

  
> Add support for local sessions
> --
>
> Key: ZOOKEEPER-1147
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1147
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.3.3
>Reporter: Vishal Kathuria
>Assignee: Thawan Kooburat
>  Labels: api-change, scaling
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-1147.patch
>
>   Original Estimate: 840h
>  Remaining Estimate: 840h
>
> This improvement is in the bucket of making ZooKeeper work at a large scale. 
> We are planning on having about a 1 million clients connect to a ZooKeeper 
> ensemble through a set of 50-100 observers. Majority of these clients are 
> read only - ie they do not do any updates or create ephemeral nodes.
> In ZooKeeper today, the client creates a session and the session creation is 
> handled like any other update. In the above use case, the session create/drop 
> workload can easily overwhelm an ensemble. The following is a proposal for a 
> "local session", to support a larger number of connections.
> 1.   The idea is to introduce a new type of session - "local" session. A 
> "local" session doesn't have a full functionality of a normal session.
> 2.   Local sessions cannot create ephemeral nodes.
> 3.   Once a local session is lost, you cannot re-establish it using the 
> session-id/password. The session and its watches are gone for good.
> 4.   When a local session connects, the session info is only maintained 
> on the zookeeper server (in this case, an observer) that it is connected to. 
> The leader is not aware of the creation of such a session and there is no 
> state written to disk.
> 5.   The pings and expiration is handled by the server that the session 
> is connected to.
> With the above changes, we can make ZooKeeper scale to a much larger number 
> of clients without making the core ensemble a bottleneck.
> In terms of API, there are two options that are being considered
> 1. Let the client specify at the connect time which kind of session do they 
> want.
> 2. All sessions connect as local sessions and automatically get promoted to 
> global sessions when they do an operation that requires a global session 
> (e.g. creating an ephemeral node)
> Chubby took the approach of lazily promoting all sessions to global, but I 
> don't think that would work in our case, where we want to keep sessions which 
> never create ephemeral nodes as always local. Option 2 would make it more 
> broadly usable but option 1 would be easier to implement.
> We are thinking of implementing option 1 as the first cut. There would be a 
> client flag, IsLocalSession (much like the current readOnly flag) that would 
> be used to determine whether to create a local session or a global session.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1147) Add support for local sessions

2013-01-17 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556316#comment-13556316
 ] 

Mahadev konar commented on ZOOKEEPER-1147:
--

bq. Yes, a session retains the same ID when it is upgraded from local session 
to global session. I think this is desirable. Can you elaborate why this may 
cause problem?

Yes its desirable. Before I comment on what I think might be wrong, when does 
the server who has the local sessionid remove it from its data structures? Is 
it when it gets a response from in final request processor about the session 
creation? Until then the session is in a local session? 


> Add support for local sessions
> --
>
> Key: ZOOKEEPER-1147
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1147
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.3.3
>Reporter: Vishal Kathuria
>Assignee: Thawan Kooburat
>  Labels: api-change, scaling
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-1147.patch
>
>   Original Estimate: 840h
>  Remaining Estimate: 840h
>
> This improvement is in the bucket of making ZooKeeper work at a large scale. 
> We are planning on having about a 1 million clients connect to a ZooKeeper 
> ensemble through a set of 50-100 observers. Majority of these clients are 
> read only - ie they do not do any updates or create ephemeral nodes.
> In ZooKeeper today, the client creates a session and the session creation is 
> handled like any other update. In the above use case, the session create/drop 
> workload can easily overwhelm an ensemble. The following is a proposal for a 
> "local session", to support a larger number of connections.
> 1.   The idea is to introduce a new type of session - "local" session. A 
> "local" session doesn't have a full functionality of a normal session.
> 2.   Local sessions cannot create ephemeral nodes.
> 3.   Once a local session is lost, you cannot re-establish it using the 
> session-id/password. The session and its watches are gone for good.
> 4.   When a local session connects, the session info is only maintained 
> on the zookeeper server (in this case, an observer) that it is connected to. 
> The leader is not aware of the creation of such a session and there is no 
> state written to disk.
> 5.   The pings and expiration is handled by the server that the session 
> is connected to.
> With the above changes, we can make ZooKeeper scale to a much larger number 
> of clients without making the core ensemble a bottleneck.
> In terms of API, there are two options that are being considered
> 1. Let the client specify at the connect time which kind of session do they 
> want.
> 2. All sessions connect as local sessions and automatically get promoted to 
> global sessions when they do an operation that requires a global session 
> (e.g. creating an ephemeral node)
> Chubby took the approach of lazily promoting all sessions to global, but I 
> don't think that would work in our case, where we want to keep sessions which 
> never create ephemeral nodes as always local. Option 2 would make it more 
> broadly usable but option 1 would be easier to implement.
> We are thinking of implementing option 1 as the first cut. There would be a 
> client flag, IsLocalSession (much like the current readOnly flag) that would 
> be used to determine whether to create a local session or a global session.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1622) session ids will be negative in the year 2022

2013-01-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555698#comment-13555698
 ] 

Mahadev konar commented on ZOOKEEPER-1622:
--

Nice catch Eric! I think we do document that id be between 0 and 255 but maybe 
we should error out if that is not the case.


> session ids will be negative in the year 2022
> -
>
> Key: ZOOKEEPER-1622
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1622
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Eric Newton
>Priority: Trivial
>
> Someone decided to use a large number for their myid file.  This cause 
> session ids to go negative, and our software (Apache Accumulo) did not handle 
> this very well.  While diagnosing the problem, I noticed this in SessionImpl:
> {noformat}
>public static long initializeNextSession(long id) {
> long nextSid = 0;
> nextSid = (System.currentTimeMillis() << 24) >> 8;
> nextSid =  nextSid | (id <<56);
> return nextSid;
> }
> {noformat}
> When the 40th bit in System.currentTimeMillis() is a one, sign extension will 
> fill the upper 8 bytes of nextSid, and id will not make the session id 
> unique.  I recommend changing the right shift to the logical shift:
> {noformat}
>public static long initializeNextSession(long id) {
> long nextSid = 0;
> nextSid = (System.currentTimeMillis() << 24) >>> 8;
> nextSid =  nextSid | (id <<56);
> return nextSid;
> }
> {noformat}
> But, we have until the year 2022 before we have to worry about it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (ZOOKEEPER-1612) Zookeeper unable to recover and start once datadir disk is full and disk space cleared

2013-01-16 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar resolved ZOOKEEPER-1612.
--

Resolution: Duplicate

Duplicate of ZOOKEEPER-1621.

> Zookeeper unable to recover and start once datadir disk is full and disk 
> space cleared
> --
>
> Key: ZOOKEEPER-1612
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1612
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.3
>Reporter: suja s
>
> Once zookeeper data dir disk becomes full, the process gets shut down.
> {noformat}
> 2012-12-14 13:22:26,959 [myid:2] - ERROR 
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@276] - Severe 
> unrecoverable error, exiting
> java.io.IOException: No space left on device
>   at java.io.FileOutputStream.writeBytes(Native Method)
>   at java.io.FileOutputStream.write(FileOutputStream.java:282)
>   at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>   at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
>   at java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:56)
>   at java.io.DataOutputStream.write(DataOutputStream.java:90)
>   at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
>   at 
> org.apache.jute.BinaryOutputArchive.writeBuffer(BinaryOutputArchive.java:119)
>   at org.apache.zookeeper.server.DataNode.serialize(DataNode.java:168)
>   at 
> org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:123)
>   at 
> org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:1115)
>   at 
> org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:1130)
>   at 
> org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:1130)
>   at org.apache.zookeeper.server.DataTree.serialize(DataTree.java:1179)
>   at 
> org.apache.zookeeper.server.util.SerializeUtils.serializeSnapshot(SerializeUtils.java:138)
>   at 
> org.apache.zookeeper.server.persistence.FileSnap.serialize(FileSnap.java:213)
>   at 
> org.apache.zookeeper.server.persistence.FileSnap.serialize(FileSnap.java:230)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.save(FileTxnSnapLog.java:242)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.takeSnapshot(ZooKeeperServer.java:274)
>   at 
> org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:407)
>   at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:82)
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:759)
> {noformat}
> Later disk space is cleared and zk started again. Startup of zk fails as it 
> is not able to read snapshot properly. (Since load from disk failed it is not 
> able to join peers in the quorum and get a snapshot diff)
> {noformat}
> 2012-12-14 16:20:31,489 [myid:2] - INFO  [main:FileSnap@83] - Reading 
> snapshot ../dataDir/version-2/snapshot.100042
> 2012-12-14 16:20:31,564 [myid:2] - ERROR [main:QuorumPeer@472] - Unable to 
> load database on disk
> java.io.EOFException
>   at java.io.DataInputStream.readInt(DataInputStream.java:375)
>   at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>   at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:132)
>   at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:436)
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:428)
>   

[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2013-01-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555318#comment-13555318
 ] 

Mahadev konar commented on ZOOKEEPER-1621:
--

Ill makr 1612 as dup. Thanks for pointing that out Edward.



> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
> Fix For: 3.5.0
>
> Attachments: zookeeper.log.gz
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> It seems to me that writing the transaction log should be fully atomic to 
> avoid such situations. Is this not the case?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2013-01-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555192#comment-13555192
 ] 

Mahadev konar commented on ZOOKEEPER-1621:
--

David,
 I thought you said it does not recover when disk was full, but looks like the 
disk is still full? No?

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
> Fix For: 3.5.0
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> It seems to me that writing the transaction log should be fully atomic to 
> avoid such situations. Is this not the case?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2013-01-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555169#comment-13555169
 ] 

Mahadev konar commented on ZOOKEEPER-1621:
--

David,
 So there exceptions are thrown when ZooKeeper is running? Am not sure why its 
exiting so many times. Do you guys restart the ZK server if it dies?

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
> Fix For: 3.5.0
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> It seems to me that writing the transaction log should be fully atomic to 
> avoid such situations. Is this not the case?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2013-01-16 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1621:
-

Fix Version/s: (was: 3.4.6)
   3.5.0

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
> Fix For: 3.5.0
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> It seems to me that writing the transaction log should be fully atomic to 
> avoid such situations. Is this not the case?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2013-01-16 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1621:
-

Priority: Major  (was: Critical)

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
> Fix For: 3.4.6
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> It seems to me that writing the transaction log should be fully atomic to 
> avoid such situations. Is this not the case?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2013-01-16 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1621:
-

Fix Version/s: 3.4.6

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
> Fix For: 3.4.6
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> It seems to me that writing the transaction log should be fully atomic to 
> avoid such situations. Is this not the case?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2013-01-16 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1621:
-

Priority: Critical  (was: Major)

> ZooKeeper does not recover from crash when disk was full
> 
>
> Key: ZOOKEEPER-1621
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Ubuntu 12.04, Amazon EC2 instance
>Reporter: David Arthur
>Priority: Critical
> Fix For: 3.4.6
>
>
> The disk that ZooKeeper was using filled up. During a snapshot write, I got 
> the following exception
> 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
> Severe unrecoverable error, exiting
> java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> Then many subsequent exceptions like:
> 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
> partial.
> 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
> exception, exiting abnormally
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> It seems to me that writing the transaction log should be fully atomic to 
> avoid such situations. Is this not the case?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1147) Add support for local sessions

2013-01-15 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554819#comment-13554819
 ] 

Mahadev konar commented on ZOOKEEPER-1147:
--

[~thawan] this helps. Thanks for the information. I still have a couple of more 
questions:

- Will a read only client always get a session expiration if a disconnect 
happens even though its not tried all the other servers? 
- Is the local session id the same as global session id when its created (I 
mean as the long value)? If its the same I think we have a problem with the 
shifting of client between servers.. 

bq. When a client reconnects to B, its sessionId won’t exist in B’s local 
session tracker. So B will send validation packet. If CreateSession issued by A 
is committed before validation packet arrive the client will be able to 
connect. Otherwise, the client will get session expired because the quorum 
hasn’t know about this session yet. If the client also tries to connect back to 
A again, the session is already removed from local session tracker. So A will 
need to send a validation packet to the leader. The outcome should be the same 
as B depending on the timing of the request.

> Add support for local sessions
> --
>
> Key: ZOOKEEPER-1147
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1147
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.3.3
>Reporter: Vishal Kathuria
>Assignee: Thawan Kooburat
>  Labels: api-change, scaling
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-1147.patch
>
>   Original Estimate: 840h
>  Remaining Estimate: 840h
>
> This improvement is in the bucket of making ZooKeeper work at a large scale. 
> We are planning on having about a 1 million clients connect to a ZooKeeper 
> ensemble through a set of 50-100 observers. Majority of these clients are 
> read only - ie they do not do any updates or create ephemeral nodes.
> In ZooKeeper today, the client creates a session and the session creation is 
> handled like any other update. In the above use case, the session create/drop 
> workload can easily overwhelm an ensemble. The following is a proposal for a 
> "local session", to support a larger number of connections.
> 1.   The idea is to introduce a new type of session - "local" session. A 
> "local" session doesn't have a full functionality of a normal session.
> 2.   Local sessions cannot create ephemeral nodes.
> 3.   Once a local session is lost, you cannot re-establish it using the 
> session-id/password. The session and its watches are gone for good.
> 4.   When a local session connects, the session info is only maintained 
> on the zookeeper server (in this case, an observer) that it is connected to. 
> The leader is not aware of the creation of such a session and there is no 
> state written to disk.
> 5.   The pings and expiration is handled by the server that the session 
> is connected to.
> With the above changes, we can make ZooKeeper scale to a much larger number 
> of clients without making the core ensemble a bottleneck.
> In terms of API, there are two options that are being considered
> 1. Let the client specify at the connect time which kind of session do they 
> want.
> 2. All sessions connect as local sessions and automatically get promoted to 
> global sessions when they do an operation that requires a global session 
> (e.g. creating an ephemeral node)
> Chubby took the approach of lazily promoting all sessions to global, but I 
> don't think that would work in our case, where we want to keep sessions which 
> never create ephemeral nodes as always local. Option 2 would make it more 
> broadly usable but option 1 would be easier to implement.
> We are thinking of implementing option 1 as the first cut. There would be a 
> client flag, IsLocalSession (much like the current readOnly flag) that would 
> be used to determine whether to create a local session or a global session.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1147) Add support for local sessions

2013-01-15 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13553582#comment-13553582
 ] 

Mahadev konar commented on ZOOKEEPER-1147:
--

I started reviewing through the patch but I think we will need to add a little 
more details on the design to make further progress on this. There are quite a 
few cases that come up when we think about this, so a little more details on 
the design will go a long way.

[~thawan] can we add some comments on the design (dont want to make too 
laborious an effort) but something which explains the whole end to end design - 
things like:

- when is the session created
- does the create of ephemeral node wait on the return for create session (at 
the follower)
- what happens if the create for session is sent at server A and the client 
disconnects to some other server B which ends up sending it again and then 
disconnects and connects back to server A.

- what happens to the local session once the global session is created?

Would you be able to write a short design for this (couple of paragraphs should 
suffice as a comment on the jira)?


> Add support for local sessions
> --
>
> Key: ZOOKEEPER-1147
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1147
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.3.3
>Reporter: Vishal Kathuria
>Assignee: Thawan Kooburat
>  Labels: api-change, scaling
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-1147.patch
>
>   Original Estimate: 840h
>  Remaining Estimate: 840h
>
> This improvement is in the bucket of making ZooKeeper work at a large scale. 
> We are planning on having about a 1 million clients connect to a ZooKeeper 
> ensemble through a set of 50-100 observers. Majority of these clients are 
> read only - ie they do not do any updates or create ephemeral nodes.
> In ZooKeeper today, the client creates a session and the session creation is 
> handled like any other update. In the above use case, the session create/drop 
> workload can easily overwhelm an ensemble. The following is a proposal for a 
> "local session", to support a larger number of connections.
> 1.   The idea is to introduce a new type of session - "local" session. A 
> "local" session doesn't have a full functionality of a normal session.
> 2.   Local sessions cannot create ephemeral nodes.
> 3.   Once a local session is lost, you cannot re-establish it using the 
> session-id/password. The session and its watches are gone for good.
> 4.   When a local session connects, the session info is only maintained 
> on the zookeeper server (in this case, an observer) that it is connected to. 
> The leader is not aware of the creation of such a session and there is no 
> state written to disk.
> 5.   The pings and expiration is handled by the server that the session 
> is connected to.
> With the above changes, we can make ZooKeeper scale to a much larger number 
> of clients without making the core ensemble a bottleneck.
> In terms of API, there are two options that are being considered
> 1. Let the client specify at the connect time which kind of session do they 
> want.
> 2. All sessions connect as local sessions and automatically get promoted to 
> global sessions when they do an operation that requires a global session 
> (e.g. creating an ephemeral node)
> Chubby took the approach of lazily promoting all sessions to global, but I 
> don't think that would work in our case, where we want to keep sessions which 
> never create ephemeral nodes as always local. Option 2 would make it more 
> broadly usable but option 1 would be easier to implement.
> We are thinking of implementing option 1 as the first cut. There would be a 
> client flag, IsLocalSession (much like the current readOnly flag) that would 
> be used to determine whether to create a local session or a global session.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1549) Data inconsistency when follower is receiving a DIFF with a dirty snapshot

2013-01-14 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13553574#comment-13553574
 ] 

Mahadev konar commented on ZOOKEEPER-1549:
--

Thanks [~thawan]!

> Data inconsistency when follower is receiving a DIFF with a dirty snapshot
> --
>
> Key: ZOOKEEPER-1549
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1549
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.3
>Reporter: Jacky007
>Assignee: Thawan Kooburat
>Priority: Blocker
> Fix For: 3.4.6
>
> Attachments: case.patch, ZOOKEEPER-1549-learner.patch
>
>
> the trunc code (from ZOOKEEPER-1154?) cannot work correct if the snapshot is 
> not correct.
> here is scenario(similar to 1154):
> Initial Condition
> 1.Lets say there are three nodes in the ensemble A,B,C with A being the 
> leader
> 2.The current epoch is 7. 
> 3.For simplicity of the example, lets say zxid is a two digit number, 
> with epoch being the first digit.
> 4.The zxid is 73
> 5.All the nodes have seen the change 73 and have persistently logged it.
> Step 1
> Request with zxid 74 is issued. The leader A writes it to the log but there 
> is a crash of the entire ensemble and B,C never write the change 74 to their 
> log.
> Step 2
> A,B restart, A is elected as the new leader,  and A will load data and take a 
> clean snapshot(change 74 is in it), then send diff to B, but B died before 
> sync with A. A died later.
> Step 3
> B,C restart, A is still down
> B,C form the quorum
> B is the new leader. Lets say B minCommitLog is 71 and maxCommitLog is 73
> epoch is now 8, zxid is 80
> Request with zxid 81 is successful. On B, minCommitLog is now 71, 
> maxCommitLog is 81
> Step 4
> A starts up. It applies the change in request with zxid 74 to its in-memory 
> data tree
> A contacts B to registerAsFollower and provides 74 as its ZxId
> Since 71<=74<=81, B decides to send A the diff. 
> Problem:
> The problem with the above sequence is that after truncate the log, A will 
> load the snapshot again which is not correct.
> In 3.3 branch, FileTxnSnapLog.restore does not call listener(ZOOKEEPER-874), 
> the leader will send a snapshot to follower, it will not be a problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1549) Data inconsistency when follower is receiving a DIFF with a dirty snapshot

2013-01-14 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1549:
-

Assignee: Thawan Kooburat

> Data inconsistency when follower is receiving a DIFF with a dirty snapshot
> --
>
> Key: ZOOKEEPER-1549
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1549
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.3
>Reporter: Jacky007
>Assignee: Thawan Kooburat
>Priority: Blocker
> Fix For: 3.4.6
>
> Attachments: case.patch, ZOOKEEPER-1549-learner.patch
>
>
> the trunc code (from ZOOKEEPER-1154?) cannot work correct if the snapshot is 
> not correct.
> here is scenario(similar to 1154):
> Initial Condition
> 1.Lets say there are three nodes in the ensemble A,B,C with A being the 
> leader
> 2.The current epoch is 7. 
> 3.For simplicity of the example, lets say zxid is a two digit number, 
> with epoch being the first digit.
> 4.The zxid is 73
> 5.All the nodes have seen the change 73 and have persistently logged it.
> Step 1
> Request with zxid 74 is issued. The leader A writes it to the log but there 
> is a crash of the entire ensemble and B,C never write the change 74 to their 
> log.
> Step 2
> A,B restart, A is elected as the new leader,  and A will load data and take a 
> clean snapshot(change 74 is in it), then send diff to B, but B died before 
> sync with A. A died later.
> Step 3
> B,C restart, A is still down
> B,C form the quorum
> B is the new leader. Lets say B minCommitLog is 71 and maxCommitLog is 73
> epoch is now 8, zxid is 80
> Request with zxid 81 is successful. On B, minCommitLog is now 71, 
> maxCommitLog is 81
> Step 4
> A starts up. It applies the change in request with zxid 74 to its in-memory 
> data tree
> A contacts B to registerAsFollower and provides 74 as its ZxId
> Since 71<=74<=81, B decides to send A the diff. 
> Problem:
> The problem with the above sequence is that after truncate the log, A will 
> load the snapshot again which is not correct.
> In 3.3 branch, FileTxnSnapLog.restore does not call listener(ZOOKEEPER-874), 
> the leader will send a snapshot to follower, it will not be a problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1549) Data inconsistency when follower is receiving a DIFF with a dirty snapshot

2013-01-14 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1549:
-

Fix Version/s: 3.4.6

> Data inconsistency when follower is receiving a DIFF with a dirty snapshot
> --
>
> Key: ZOOKEEPER-1549
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1549
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.3
>Reporter: Jacky007
>Priority: Blocker
> Fix For: 3.4.6
>
> Attachments: case.patch, ZOOKEEPER-1549-learner.patch
>
>
> the trunc code (from ZOOKEEPER-1154?) cannot work correct if the snapshot is 
> not correct.
> here is scenario(similar to 1154):
> Initial Condition
> 1.Lets say there are three nodes in the ensemble A,B,C with A being the 
> leader
> 2.The current epoch is 7. 
> 3.For simplicity of the example, lets say zxid is a two digit number, 
> with epoch being the first digit.
> 4.The zxid is 73
> 5.All the nodes have seen the change 73 and have persistently logged it.
> Step 1
> Request with zxid 74 is issued. The leader A writes it to the log but there 
> is a crash of the entire ensemble and B,C never write the change 74 to their 
> log.
> Step 2
> A,B restart, A is elected as the new leader,  and A will load data and take a 
> clean snapshot(change 74 is in it), then send diff to B, but B died before 
> sync with A. A died later.
> Step 3
> B,C restart, A is still down
> B,C form the quorum
> B is the new leader. Lets say B minCommitLog is 71 and maxCommitLog is 73
> epoch is now 8, zxid is 80
> Request with zxid 81 is successful. On B, minCommitLog is now 71, 
> maxCommitLog is 81
> Step 4
> A starts up. It applies the change in request with zxid 74 to its in-memory 
> data tree
> A contacts B to registerAsFollower and provides 74 as its ZxId
> Since 71<=74<=81, B decides to send A the diff. 
> Problem:
> The problem with the above sequence is that after truncate the log, A will 
> load the snapshot again which is not correct.
> In 3.3 branch, FileTxnSnapLog.restore does not call listener(ZOOKEEPER-874), 
> the leader will send a snapshot to follower, it will not be a problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1603) StaticHostProviderTest testUpdateClientMigrateOrNot hangs

2012-12-19 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13536229#comment-13536229
 ] 

Mahadev konar commented on ZOOKEEPER-1603:
--

Pat,
 Not sure why we had this. Seems like an over sight.

> StaticHostProviderTest testUpdateClientMigrateOrNot hangs
> -
>
> Key: ZOOKEEPER-1603
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1603
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 3.5.0
>Reporter: Patrick Hunt
>Assignee: Alexander Shraer
>Priority: Blocker
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-1603-ver1.patch, ZOOKEEPER-1603-ver2.patch
>
>
> StaticHostProviderTest method testUpdateClientMigrateOrNot hangs forever.
> On my laptop getHostName for 10.10.10.* takes 5+ seconds per call. As a 
> result this method effectively runs forever.
> Every time I run this test it hangs. Consistent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1504) Multi-thread NIOServerCnxn

2012-12-17 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13534246#comment-13534246
 ] 

Mahadev konar commented on ZOOKEEPER-1504:
--

Pat,
 Makes sense. We can do it in a separate jira.

> Multi-thread NIOServerCnxn
> --
>
> Key: ZOOKEEPER-1504
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1504
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.4.3, 3.4.4, 3.5.0
>Reporter: Jay Shrauner
>Assignee: Jay Shrauner
>  Labels: performance, scaling
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-1504.patch, ZOOKEEPER-1504.patch, 
> ZOOKEEPER-1504.patch, ZOOKEEPER-1504.patch, ZOOKEEPER-1504.patch
>
>
> NIOServerCnxnFactory is single threaded, which doesn't scale well to large 
> numbers of clients. This is particularly noticeable when thousands of clients 
> connect. I propose multi-threading this code as follows:
> - 1   acceptor thread, for accepting new connections
> - 1-N selector threads
> - 0-M I/O worker threads
> Numbers of threads are configurable, with defaults scaling according to 
> number of cores. Communication with the selector threads is handled via 
> LinkedBlockingQueues, and connections are permanently assigned to a 
> particular selector thread so that all potentially blocking SelectionKey 
> operations can be performed solely by the selector thread. An ExecutorService 
> is used for the worker threads.
> On a 32 core machine running Linux 2.6.38, achieved best performance with 4 
> selector threads and 64 worker threads for a 70% +/- 5% improvement in 
> throughput.
> This patch incorporates and supersedes the patches for
> https://issues.apache.org/jira/browse/ZOOKEEPER-517
> https://issues.apache.org/jira/browse/ZOOKEEPER-1444
> New classes introduced in this patch are:
>   - ExpiryQueue (from ZOOKEEPER-1444): factor out the logic from 
> SessionTrackerImpl used to expire sessions so that the same logic can be used 
> to expire connections
>   - RateLogger (from ZOOKEEPER-517): rate limit error message logging, 
> currently only used to throttle rate of logging "out of file descriptors" 
> errors
>   - WorkerService (also in ZOOKEEPER-1505): ExecutorService wrapper that 
> makes worker threads daemon threads and names then in an easily debuggable 
> manner. Supports assignable threads (as used by CommitProcessor) and 
> non-assignable threads (as used here).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1572) Add an async interface for multi request

2012-12-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13533712#comment-13533712
 ] 

Mahadev konar commented on ZOOKEEPER-1572:
--

Flavio/Sejie,
 I am taking a look at this. Might need a day or 2 (maximum until tuesday) to 
review this. 

> Add an async interface for multi request
> 
>
> Key: ZOOKEEPER-1572
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1572
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: java client
>Reporter: Sijie Guo
>Assignee: Sijie Guo
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-1572.diff, ZOOKEEPER-1572.diff
>
>
> Currently there is no async interface for multi request in ZooKeeper java 
> client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1572) Add an async interface for multi request

2012-12-16 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1572:
-

Fix Version/s: (was: 3.4.6)

> Add an async interface for multi request
> 
>
> Key: ZOOKEEPER-1572
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1572
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: java client
>Reporter: Sijie Guo
>Assignee: Sijie Guo
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-1572.diff, ZOOKEEPER-1572.diff
>
>
> Currently there is no async interface for multi request in ZooKeeper java 
> client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1572) Add an async interface for multi request

2012-12-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13533710#comment-13533710
 ] 

Mahadev konar commented on ZOOKEEPER-1572:
--

Removing it from 3.4 branch. We shouldnt commit new features in 3.4 branch.

> Add an async interface for multi request
> 
>
> Key: ZOOKEEPER-1572
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1572
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: java client
>Reporter: Sijie Guo
>Assignee: Sijie Guo
> Fix For: 3.5.0, 3.4.6
>
> Attachments: ZOOKEEPER-1572.diff, ZOOKEEPER-1572.diff
>
>
> Currently there is no async interface for multi request in ZooKeeper java 
> client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1574) mismatched CR/LF endings in text files

2012-12-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13533708#comment-13533708
 ] 

Mahadev konar commented on ZOOKEEPER-1574:
--

Nikita/Raja,
 So we can just do a prop set and commit then? I tried this:

find * | grep "java$" | xargs  svn propset -R svn:eol-style native

and its only changing the properties. Is this all we need to do on 3.4 and 
trunk? This is definitely better than committing the diff.

> mismatched CR/LF endings in text files
> --
>
> Key: ZOOKEEPER-1574
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1574
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Raja Aluri
>Assignee: Raja Aluri
> Attachments: ZOOKEEPER-1574.branch-3.4.patch, 
> ZOOKEEPER-1574.trunk.patch
>
>
> Source code in zookeeper repo has a bunch of files that have CRLF endings.
> With more development happening on windows there is a higher chance of more 
> CRLF files getting into the source tree.
> I would like to avoid that by creating .gitattributes file which prevents 
> sources from having CRLF entries in text files.
> But before adding the .gitattributes file we need to normalize the existing 
> tree, so that people when they sync after .giattributes change wont end up 
> with a bunch of modified files in their workspace.
> I am adding a couple of links here to give more primer on what exactly is the 
> issue and how we are trying to fix it.
> [http://git-scm.com/docs/gitattributes#_checking_out_and_checking_in]
> [http://stackoverflow.com/questions/170961/whats-the-best-crlf-handling-strategy-with-git]
> I will submit a separate bug and patch for .gitattributes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1578) org.apache.zookeeper.server.quorum.Zab1_0Test failed due to hard code with 33556 port

2012-12-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13533695#comment-13533695
 ] 

Mahadev konar commented on ZOOKEEPER-1578:
--

+1 the patch looks good.

> org.apache.zookeeper.server.quorum.Zab1_0Test failed due to hard code with 
> 33556 port
> -
>
> Key: ZOOKEEPER-1578
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1578
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.3
>Reporter: Li Ping Zhang
>Assignee: Li Ping Zhang
>  Labels: patch
> Attachments: ZOOKEEPER-1578-branch-3.4.patch, 
> ZOOKEEPER-1578-trunk.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> org.apache.zookeeper.server.quorum.Zab1_0Test was failed both with SUN JDK 
> and open JDK.
> [junit] Running org.apache.zookeeper.server.quorum.Zab1_0Test
> [junit] Tests run: 8, Failures: 0, Errors: 1, Time elapsed: 18.334 sec
> [junit] Test org.apache.zookeeper.server.quorum.Zab1_0Test FAILED 
> Zab1_0Test log:
> Zab1_0Test log:
> 2012-07-11 23:17:15,579 [myid:] - INFO  [main:Leader@427] - Shutdown called
> java.lang.Exception: shutdown Leader! reason: end of test
> at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:427)
> at 
> org.apache.zookeeper.server.quorum.Zab1_0Test.testLastAcceptedEpoch(Zab1_0Test.java:211)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:48)
> 2012-07-11 23:17:15,584 [myid:] - ERROR [main:Leader@139] - Couldn't bind to 
> port 33556
> java.net.BindException: Address already in use
> at java.net.PlainSocketImpl.bind(PlainSocketImpl.java:402)
> at java.net.ServerSocket.bind(ServerSocket.java:328)
> at java.net.ServerSocket.bind(ServerSocket.java:286)
> at org.apache.zookeeper.server.quorum.Leader.(Leader.java:137)
> at 
> org.apache.zookeeper.server.quorum.Zab1_0Test.createLeader(Zab1_0Test.java:810)
> at 
> org.apache.zookeeper.server.quorum.Zab1_0Test.testLeaderInElectingFollowers(Zab1_0Test.java:224)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2012-07-11 23:17:20,202 [myid:] - ERROR 
> [LearnerHandler-bdvm039.svl.ibm.com/9.30.122.48:40153:LearnerHandler@559] - 
> Unex
> pected exception causing shutdown while sock still open
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.read(SocketInputStream.java:129)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
> at java.io.DataInputStream.readInt(DataInputStream.java:370)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
> at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
> at 
> org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:291)
> 2012-07-11 23:17:20,203 [myid:] - WARN  
> [LearnerHandler-bdvm039.svl.ibm.com/9.30.122.48:40153:LearnerHandler@569] - 
> 
> *** GOODBYE bdvm039.svl.ibm.com/9.30.122.48:40153 
> 2012-07-11 23:17:20,204 [myid:] - INFO  [Thread-20:Leader@421] - Shutting down
> 2012-07-11 23:17:20,204 [myid:] - INFO  [Thread-20:Leader@427] - Shutdown 
> called
> java.lang.Exception: shutdown Leader! reason: lead ended
> this failure seems 33556 port is already used, but it is not in use with 
> command check in fact. There is a hard code in unit test, we can improve it 
> with code patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1569) support upsert: setData if the node exists, otherwise, create a new node

2012-12-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13533692#comment-13533692
 ] 

Mahadev konar commented on ZOOKEEPER-1569:
--

Jimmy,
 Can you please explain the semantics of such an operation? What would a return 
value be? When would this operation fail? When would it succeed?

> support upsert: setData if the node exists, otherwise, create a new node
> 
>
> Key: ZOOKEEPER-1569
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1569
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Attachments: zk-1569.patch, zk-1569_v1.1.patch, zk-1569_v2.patch
>
>
> Currently, ZooKeeper supports setData and create.  If it can support upsert 
> like in SQL, it will be great.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1504) Multi-thread NIOServerCnxn

2012-12-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13533687#comment-13533687
 ] 

Mahadev konar commented on ZOOKEEPER-1504:
--

Thawan,
 I was looking at the patch and it looks like you always have one acceptor 
thread. Is one acceptor thread enough when we have 1000's of immediate 
connections to the ZK servers in case of bootstrap or network glitches? Did you 
never see an issue with this?

 Read through the patch as well. Looks good to me otherwise.

> Multi-thread NIOServerCnxn
> --
>
> Key: ZOOKEEPER-1504
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1504
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.4.3, 3.4.4, 3.5.0
>Reporter: Jay Shrauner
>Assignee: Jay Shrauner
>  Labels: performance, scaling
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-1504.patch, ZOOKEEPER-1504.patch, 
> ZOOKEEPER-1504.patch, ZOOKEEPER-1504.patch, ZOOKEEPER-1504.patch
>
>
> NIOServerCnxnFactory is single threaded, which doesn't scale well to large 
> numbers of clients. This is particularly noticeable when thousands of clients 
> connect. I propose multi-threading this code as follows:
> - 1   acceptor thread, for accepting new connections
> - 1-N selector threads
> - 0-M I/O worker threads
> Numbers of threads are configurable, with defaults scaling according to 
> number of cores. Communication with the selector threads is handled via 
> LinkedBlockingQueues, and connections are permanently assigned to a 
> particular selector thread so that all potentially blocking SelectionKey 
> operations can be performed solely by the selector thread. An ExecutorService 
> is used for the worker threads.
> On a 32 core machine running Linux 2.6.38, achieved best performance with 4 
> selector threads and 64 worker threads for a 70% +/- 5% improvement in 
> throughput.
> This patch incorporates and supersedes the patches for
> https://issues.apache.org/jira/browse/ZOOKEEPER-517
> https://issues.apache.org/jira/browse/ZOOKEEPER-1444
> New classes introduced in this patch are:
>   - ExpiryQueue (from ZOOKEEPER-1444): factor out the logic from 
> SessionTrackerImpl used to expire sessions so that the same logic can be used 
> to expire connections
>   - RateLogger (from ZOOKEEPER-517): rate limit error message logging, 
> currently only used to throttle rate of logging "out of file descriptors" 
> errors
>   - WorkerService (also in ZOOKEEPER-1505): ExecutorService wrapper that 
> makes worker threads daemon threads and names then in an easily debuggable 
> manner. Supports assignable threads (as used by CommitProcessor) and 
> non-assignable threads (as used here).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1480) ClientCnxn(1161) can't get the current zk server add, so that - Session 0x for server null, unexpected error

2012-12-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13533678#comment-13533678
 ] 

Mahadev konar commented on ZOOKEEPER-1480:
--

Hey Leader,
 There are quite a few chinese characters in the patch. Can you please remove 
those? Also, can you please create a patch against trunk? 

Thanks

> ClientCnxn(1161) can't get the current zk server add, so that - Session 0x 
> for server null, unexpected error
> 
>
> Key: ZOOKEEPER-1480
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1480
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.3
>Reporter: Leader Ni
>Assignee: Leader Ni
>  Labels: client, getCurrentZooKeeperAddr
> Fix For: 3.5.0
>
> Attachments: getCurrentZooKeeperAddr_for_3.4.3.patch, 
> getCurrentZooKeeperAddr_for_branch3.4.patch
>
>
>   When zookeeper occur an unexpected error( Not SessionExpiredException, 
> SessionTimeoutException and EndOfStreamException), ClientCnxn(1161) will log 
> such as the formart "Session 0x for server null, unexpected error, closing 
> socket connection and attempting reconnect ". The log at line 1161 in 
> zookeeper-3.3.3
>   We found that, zookeeper use 
> "((SocketChannel)sockKey.channel()).socket().getRemoteSocketAddress()" to get 
> zookeeper addr. But,Sometimes, it logs "Session 0x for server null", you 
> know, if log null, developer can't determine the current zookeeper addr that 
> client is connected or connecting.
>   I add a method in Class SendThread:InetSocketAddress 
> org.apache.zookeeper.ClientCnxn.SendThread.getCurrentZooKeeperAddr().
>   Here:
> /**
> * Returns the address to which the socket is connected.
> * 
> * @return ip address of the remote side of the connection or null if not
> * connected
> */
> @Override
> SocketAddress getRemoteSocketAddress() {
>// a lot could go wrong here, so rather than put in a bunch of code
>// to check for nulls all down the chain let's do it the simple
>// yet bulletproof way 
> .

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (ZOOKEEPER-1552) Enable sync request processor in Observer

2012-12-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13533676#comment-13533676
 ] 

Mahadev konar edited comment on ZOOKEEPER-1552 at 12/17/12 6:33 AM:


Thawan,
 This is a good idea. As for the patch, I think we have too many system 
properties spread around in the source code. Its best if we can use the 
ZooKeeper config file for this. What do others think? Other than that, the 
patch looks good.


  was (Author: mahadev):
Thawan,
 This is a good idea. As for the patch, I think we have too many system 
properties spread around in the source code. Its best if we can use the 
ZooKeeper config file for this. What do others think? 

  
> Enable sync request processor in Observer
> -
>
> Key: ZOOKEEPER-1552
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1552
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.4.3
>Reporter: Thawan Kooburat
>Assignee: Thawan Kooburat
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-1552.patch, ZOOKEEPER-1552.patch
>
>
> Observer doesn't forward its txns to SyncRequestProcessor. So it never 
> persists the txns onto disk or periodically creates snapshots. This increases 
> the start-up time since it will get the entire snapshot if the observer has 
> be running for a long time. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1552) Enable sync request processor in Observer

2012-12-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13533676#comment-13533676
 ] 

Mahadev konar commented on ZOOKEEPER-1552:
--

Thawan,
 This is a good idea. As for the patch, I think we have too many system 
properties spread around in the source code. Its best if we can use the 
ZooKeeper config file for this. What do others think? 


> Enable sync request processor in Observer
> -
>
> Key: ZOOKEEPER-1552
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1552
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.4.3
>Reporter: Thawan Kooburat
>Assignee: Thawan Kooburat
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-1552.patch, ZOOKEEPER-1552.patch
>
>
> Observer doesn't forward its txns to SyncRequestProcessor. So it never 
> persists the txns onto disk or periodically creates snapshots. This increases 
> the start-up time since it will get the entire snapshot if the observer has 
> be running for a long time. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1488) Some links are not working in the Zookeeper Documentation

2012-12-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13533674#comment-13533674
 ] 

Mahadev konar commented on ZOOKEEPER-1488:
--

bq. By the way, I have just seen that the PDF generated in the in the docs 
section still has a 2008 copyright notice ("Copyright © 2008 The Apache 
Software Foundation. All rights reserved"). Should I open a ticket to update 
this? Or may I try to include in this patch?


Thanks for pointing that out Edward. Please open a jira for that.

> Some links are not working in the Zookeeper Documentation
> -
>
> Key: ZOOKEEPER-1488
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1488
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.4.3
>Reporter: Kiran BC
>Assignee: Edward Ribeiro
>Priority: Minor
> Attachments: ZOOKEEPER-1488.patch, ZOOKEEPER-1488.patch
>
>
> There are some internal link errors in the Zookeeper documentation. The list 
> is as follows:
> docs\zookeeperAdmin.html -> tickTime and datadir
> docs\zookeeperOver.html -> fg_zkComponents, fg_zkPerfReliability and 
> fg_zkPerfRW
> docs\zookeeperStarted.html -> Logging

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1593) Add Debian style /etc/default/zookeeper support to init script

2012-12-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13533673#comment-13533673
 ] 

Mahadev konar commented on ZOOKEEPER-1593:
--

Michi/Dirkjan,
 Unfortunately these package files are mostly unused and we probably should be 
getting rid of them given BigTop is doing all the packaging work. Dirkjan are 
you using the packaging in production? Do you think BigTop packaging might be 
of help to you?

> Add Debian style /etc/default/zookeeper support to init script
> --
>
> Key: ZOOKEEPER-1593
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1593
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: scripts
>Affects Versions: 3.4.5
> Environment: Debian Linux 6.0
>Reporter: Dirkjan Bussink
>Priority: Minor
> Attachments: zookeeper_debian_default.patch
>
>
> In our configuration we use a different data directory for Zookeeper. The 
> problem is that the current Debian init.d script has the default location 
> hardcoded:
> ZOOPIDDIR=/var/lib/zookeeper/data
> ZOOPIDFILE=${ZOOPIDDIR}/zookeeper_server.pid
> By using the standard Debian practice of allowing for a 
> /etc/default/zookeeper we can redefine these variables to point to the 
> correct location:
> ZOOPIDDIR=/var/lib/zookeeper/data
> ZOOPIDFILE=${ZOOPIDDIR}/zookeeper_server.pid
> [ -r /etc/default/zookeeper ] && . /etc/default/zookeeper
> This currently can't be done through /usr/libexec/zkEnv.sh, since that is 
> loaded before ZOOPIDDIR and ZOOPIDFILE are set. Any change there would 
> therefore undo the setup made in for example /etc/zookeeper/zookeeper-env.sh.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1335) Add support for --config to zkEnv.sh to specify a config directory different than what is expected

2012-12-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13533666#comment-13533666
 ] 

Mahadev konar commented on ZOOKEEPER-1335:
--

+1 for the patch. Looks good to me. Pat doesnt look like we have much 
documentation in forrest for zkServer.sh so I dont think we need any forrest 
docs update. 

> Add support for --config to zkEnv.sh to specify a config directory different 
> than what is expected
> --
>
> Key: ZOOKEEPER-1335
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1335
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Arpit Gupta
>Assignee: Arpit Gupta
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-1335.patch, ZOOKEEPER-1335.patch
>
>
> zkEnv.sh expects ZOOCFGDIR env variable set. If not it looks for the conf dir 
> in the ZOOKEEPER_PREFIX dir or in /etc/zookeeper. It would be great if we can 
> support --config option where at run time you could specify a different 
> config directory. We do the same thing in hadoop.
> With this you should be able to do
> /usr/sbin/zkServer.sh --config /some/conf/dir start|stop

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-575) remove System.exit calls to make the server more container friendly

2012-12-16 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-575:


Attachment: ZOOKEEPER-575_4.patch

Updated the patch for trunk. This would be really be nice to get in and make it 
cleaner to embed ZK.

> remove System.exit calls to make the server more container friendly
> ---
>
> Key: ZOOKEEPER-575
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-575
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.4.0
>Reporter: Patrick Hunt
>Assignee: Andrew Finnell
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-575-2.patch, ZOOKEEPER-575-3.patch, 
> ZOOKEEPER-575_4.patch, ZOOKEEPER-575.patch
>
>
> There are a handful of places left in the code that still use System.exit, we 
> should remove these to make the server
> more container friendly.
> There are some legitimate places for the exits - in *Main.java for example 
> should be fine - these are the command
> line main routines. Containers should be embedding code that runs just below 
> this layer (or we should refactor
> so that it would).
> The tricky bit is ensuring the server shuts down in case of an unrecoverable 
> error occurring, afaik these are the
> locations where we still have sys exit calls.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Subject: [ANNOUNCE] Apache ZooKeeper 3.4.5

2012-11-19 Thread Mahadev Konar
Hi Jordan,
 Looks like I forgot to release from the nexus repo. Just did it.
Please check again.

thanks
mahadev


On Mon, Nov 19, 2012 at 10:56 AM, Jordan Zimmerman
 wrote:
> I still don't see the artifacts on Maven Central. It usually doesn't take 
> this long.
>
> -JZ
>
> On Nov 18, 2012, at 5:15 PM, Mahadev Konar  wrote:
>
>> Also,
>> I have published the artifacts to maven. Do let me know if you see
>> any issues with that.
>>
>> thanks
>> mahadev
>>
>> On Sun, Nov 18, 2012 at 5:09 PM, Mahadev Konar  
>> wrote:
>>> Please ignore the "subject" in the subject. Too much copy paste :).
>>>
>>> thanks
>>> mahadev
>>>
>>>
>>> On Sun, Nov 18, 2012 at 5:06 PM, Mahadev Konar  
>>> wrote:
>>>> The Apache ZooKeeper team is proud to announce Apache ZooKeeper version 
>>>> 3.4.5
>>>>
>>>> ZooKeeper is a high-performance coordination service for distributed
>>>> applications. It exposes common services - such as naming,
>>>> configuration management, synchronization, and group services - in a
>>>> simple interface so you don't have to write them from scratch. You can
>>>> use it off-the-shelf to implement consensus, group management, leader
>>>> election, and presence protocols. And you can build on it for your
>>>> own, specific needs.
>>>>
>>>> For ZooKeeper release details and downloads, visit:
>>>> http://zookeeper.apache.org/releases.html
>>>>
>>>> ZooKeeper 3.4.5 Release Notes are at:
>>>> http://zookeeper.apache.org/doc/r3.4.5/releasenotes.html
>>>>
>>>>
>>>> thanks
>>>> mahadev
>>>>
>>>> We would like to thank the contributors that made the release possible.
>>>>
>>>> Regards,
>>>>
>>>> The ZooKeeper Team
>


Re: Subject: [ANNOUNCE] Apache ZooKeeper 3.4.5

2012-11-18 Thread Mahadev Konar
Also,
 I have published the artifacts to maven. Do let me know if you see
any issues with that.

thanks
mahadev

On Sun, Nov 18, 2012 at 5:09 PM, Mahadev Konar  wrote:
> Please ignore the "subject" in the subject. Too much copy paste :).
>
> thanks
> mahadev
>
>
> On Sun, Nov 18, 2012 at 5:06 PM, Mahadev Konar  
> wrote:
>> The Apache ZooKeeper team is proud to announce Apache ZooKeeper version 3.4.5
>>
>> ZooKeeper is a high-performance coordination service for distributed
>> applications. It exposes common services - such as naming,
>> configuration management, synchronization, and group services - in a
>> simple interface so you don't have to write them from scratch. You can
>> use it off-the-shelf to implement consensus, group management, leader
>> election, and presence protocols. And you can build on it for your
>> own, specific needs.
>>
>> For ZooKeeper release details and downloads, visit:
>> http://zookeeper.apache.org/releases.html
>>
>> ZooKeeper 3.4.5 Release Notes are at:
>> http://zookeeper.apache.org/doc/r3.4.5/releasenotes.html
>>
>>
>> thanks
>> mahadev
>>
>> We would like to thank the contributors that made the release possible.
>>
>> Regards,
>>
>> The ZooKeeper Team


Re: Subject: [ANNOUNCE] Apache ZooKeeper 3.4.5

2012-11-18 Thread Mahadev Konar
Please ignore the "subject" in the subject. Too much copy paste :).

thanks
mahadev


On Sun, Nov 18, 2012 at 5:06 PM, Mahadev Konar  wrote:
> The Apache ZooKeeper team is proud to announce Apache ZooKeeper version 3.4.5
>
> ZooKeeper is a high-performance coordination service for distributed
> applications. It exposes common services - such as naming,
> configuration management, synchronization, and group services - in a
> simple interface so you don't have to write them from scratch. You can
> use it off-the-shelf to implement consensus, group management, leader
> election, and presence protocols. And you can build on it for your
> own, specific needs.
>
> For ZooKeeper release details and downloads, visit:
> http://zookeeper.apache.org/releases.html
>
> ZooKeeper 3.4.5 Release Notes are at:
> http://zookeeper.apache.org/doc/r3.4.5/releasenotes.html
>
>
> thanks
> mahadev
>
> We would like to thank the contributors that made the release possible.
>
> Regards,
>
> The ZooKeeper Team


Subject: [ANNOUNCE] Apache ZooKeeper 3.4.5

2012-11-18 Thread Mahadev Konar
The Apache ZooKeeper team is proud to announce Apache ZooKeeper version 3.4.5

ZooKeeper is a high-performance coordination service for distributed
applications. It exposes common services - such as naming,
configuration management, synchronization, and group services - in a
simple interface so you don't have to write them from scratch. You can
use it off-the-shelf to implement consensus, group management, leader
election, and presence protocols. And you can build on it for your
own, specific needs.

For ZooKeeper release details and downloads, visit:
http://zookeeper.apache.org/releases.html

ZooKeeper 3.4.5 Release Notes are at:
http://zookeeper.apache.org/doc/r3.4.5/releasenotes.html


thanks
mahadev

We would like to thank the contributors that made the release possible.

Regards,

The ZooKeeper Team


Re: [VOTE] Release ZooKeeper 3.4.5 (candidate 1)

2012-11-17 Thread Mahadev Konar
Nothing as such.

With 6 +1's and 4 binding the vote passes. I will be updating the
release artifacts tonight or in case I get tired and fall asleep, itll
be tommorrow.

thanks
maahdev


Re: [VOTE] Release ZooKeeper 3.4.5 (candidate 1)

2012-11-15 Thread Mahadev Konar
Thanks Pat and Jimmy!


mahadev

On Wed, Nov 14, 2012 at 11:35 AM, Jimmy Xiang  wrote:
> Of course, with ZK 3.4.5 RC 1.  I verified there is only this version
> of zk jar in the classpath for both HBase and HDFS.
>
> On Wed, Nov 14, 2012 at 11:34 AM, Jimmy Xiang  wrote:
>> I tested it with JDK 1.7_9 on a live HBase cluster (trunk version, 1
>> master and 4 region servers) and it went very well. The cluster
>> started up ok. I created a table, loaded around 90k records, regions
>> split/assigned properly.
>>
>> Thanks,
>> Jimmy
>>
>>
>>
>> On Wed, Nov 14, 2012 at 11:22 AM, Patrick Hunt  wrote:
>>> Jimmy mentioned that he might have some time to try it out with hbase
>>> - Jimmy how did your testing go?
>>>
>>> Patrick
>>>
>>> On Wed, Nov 14, 2012 at 10:31 AM, Mahadev Konar  
>>> wrote:
>>>> Thanks Ted!
>>>>
>>>> mahadev
>>>>
>>>>
>>>> On Tue, Nov 13, 2012 at 5:02 PM, Ted Yu  wrote:
>>>>> Using jdk 1.7 u9, I saw the following test failures:
>>>>>
>>>>> Failed tests:
>>>>> testRSSplitEphemeralsDisappearButDaughtersAreOnlinedAfterShutdownHandling(org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster)
>>>>>
>>>>> testMultiRowMutationMultiThreads(org.apache.hadoop.hbase.regionserver.TestAtomicOperation):
>>>>> expected:<0> but was:<1>
>>>>>   queueFailover(org.apache.hadoop.hbase.replication.TestReplication):
>>>>> Waited too much time for queueFailover replication. Waited 74466ms.
>>>>>
>>>>> Tests in error:
>>>>>   Broken_testSync(org.apache.hadoop.hbase.regionserver.wal.TestHLog): 
>>>>> Error
>>>>> Recovery for block blk_-3290996327764601512_1015 failed  because recovery
>>>>> from primary datanode 127.0.0.1:53866 failed 6 times.  Pipeline was
>>>>> 127.0.0.1:53866. Aborting...
>>>>>   testSplit(org.apache.hadoop.hbase.regionserver.wal.TestHLog): 3
>>>>> exceptions [org.apache.hadoop.ipc.RemoteException:
>>>>> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on
>>>>> /user/hduser/hbase/TestHLog/3d02052e6bcac5f74e57d2a75e6bf583/recovered.edits/004.temp
>>>>> File is not open for writing. Holder DFSClient_1365323924 does not have 
>>>>> any
>>>>> open files.(..)
>>>>>
>>>>> They passed when I ran them standalone. queueFailover has been a flaky 
>>>>> test.
>>>>>
>>>>> FYI
>>>>>
>>>>> On Tue, Nov 13, 2012 at 4:15 PM, Ted Yu  wrote:
>>>>>
>>>>>> I have run HBase trunk test suite with jdk 1.6 using zookeeper 3.4.5 RC1
>>>>>> in local maven repo.
>>>>>> Tests passed.
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>>
>>>>>> On Tue, Nov 13, 2012 at 3:16 PM, Mahadev Konar 
>>>>>> wrote:
>>>>>>
>>>>>>> Anyone from hbase team wants to try it out before we close the vote?
>>>>>>> Looks like Roman did some basic testing with HBase, so thats helpful.
>>>>>>>
>>>>>>> thanks
>>>>>>> mahadev
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Nov 12, 2012 at 8:54 AM, Roman Shaposhnik  
>>>>>>> wrote:
>>>>>>> > On Mon, Nov 5, 2012 at 12:20 AM, Mahadev Konar 
>>>>>>> > 
>>>>>>> wrote:
>>>>>>> >> Hi all,
>>>>>>> >>
>>>>>>> >>   I have created a candidate build for ZooKeeper 3.4.5. This includes
>>>>>>> >> the fix for ZOOKEEPER-1560.
>>>>>>> >>  Please take a look at the release notes for the jira list.
>>>>>>> >>
>>>>>>> >>  *** Please download, test and VOTE before the
>>>>>>> >>  *** vote closes  12:00  midnight PT on Friday, Nov 9th.***
>>>>>>> >>
>>>>>>> >>  http://people.apache.org/~mahadev/zookeeper-3.4.5-candidate-1/
>>>>>>> >>
>>>>>>> >>  Should we release this?
>>>>>>> >
>>>>>>> > +1 (non-binding)
>>>>>>> >
>>>>>>> > based on Bigtop testing (HBase 0.94.2, Hadoop 2.0.2-alpha, Giraph
>>>>>>> > 0.2-SNAPSHOT, Solr 4.0.0)
>>>>>>> >
>>>>>>> > Thanks,
>>>>>>> > Roman.
>>>>>>>
>>>>>>
>>>>>>


Re: [VOTE] Release ZooKeeper 3.4.5 (candidate 1)

2012-11-14 Thread Mahadev Konar
Thanks Ted!

mahadev


On Tue, Nov 13, 2012 at 5:02 PM, Ted Yu  wrote:
> Using jdk 1.7 u9, I saw the following test failures:
>
> Failed tests:
> testRSSplitEphemeralsDisappearButDaughtersAreOnlinedAfterShutdownHandling(org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster)
>
> testMultiRowMutationMultiThreads(org.apache.hadoop.hbase.regionserver.TestAtomicOperation):
> expected:<0> but was:<1>
>   queueFailover(org.apache.hadoop.hbase.replication.TestReplication):
> Waited too much time for queueFailover replication. Waited 74466ms.
>
> Tests in error:
>   Broken_testSync(org.apache.hadoop.hbase.regionserver.wal.TestHLog): Error
> Recovery for block blk_-3290996327764601512_1015 failed  because recovery
> from primary datanode 127.0.0.1:53866 failed 6 times.  Pipeline was
> 127.0.0.1:53866. Aborting...
>   testSplit(org.apache.hadoop.hbase.regionserver.wal.TestHLog): 3
> exceptions [org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on
> /user/hduser/hbase/TestHLog/3d02052e6bcac5f74e57d2a75e6bf583/recovered.edits/004.temp
> File is not open for writing. Holder DFSClient_1365323924 does not have any
> open files.(..)
>
> They passed when I ran them standalone. queueFailover has been a flaky test.
>
> FYI
>
> On Tue, Nov 13, 2012 at 4:15 PM, Ted Yu  wrote:
>
>> I have run HBase trunk test suite with jdk 1.6 using zookeeper 3.4.5 RC1
>> in local maven repo.
>> Tests passed.
>>
>> Cheers
>>
>>
>> On Tue, Nov 13, 2012 at 3:16 PM, Mahadev Konar 
>> wrote:
>>
>>> Anyone from hbase team wants to try it out before we close the vote?
>>> Looks like Roman did some basic testing with HBase, so thats helpful.
>>>
>>> thanks
>>> mahadev
>>>
>>>
>>> On Mon, Nov 12, 2012 at 8:54 AM, Roman Shaposhnik  wrote:
>>> > On Mon, Nov 5, 2012 at 12:20 AM, Mahadev Konar 
>>> wrote:
>>> >> Hi all,
>>> >>
>>> >>   I have created a candidate build for ZooKeeper 3.4.5. This includes
>>> >> the fix for ZOOKEEPER-1560.
>>> >>  Please take a look at the release notes for the jira list.
>>> >>
>>> >>  *** Please download, test and VOTE before the
>>> >>  *** vote closes  12:00  midnight PT on Friday, Nov 9th.***
>>> >>
>>> >>  http://people.apache.org/~mahadev/zookeeper-3.4.5-candidate-1/
>>> >>
>>> >>  Should we release this?
>>> >
>>> > +1 (non-binding)
>>> >
>>> > based on Bigtop testing (HBase 0.94.2, Hadoop 2.0.2-alpha, Giraph
>>> > 0.2-SNAPSHOT, Solr 4.0.0)
>>> >
>>> > Thanks,
>>> > Roman.
>>>
>>
>>


Re: [VOTE] Release ZooKeeper 3.4.5 (candidate 1)

2012-11-13 Thread Mahadev Konar
Anyone from hbase team wants to try it out before we close the vote?
Looks like Roman did some basic testing with HBase, so thats helpful.

thanks
mahadev


On Mon, Nov 12, 2012 at 8:54 AM, Roman Shaposhnik  wrote:
> On Mon, Nov 5, 2012 at 12:20 AM, Mahadev Konar  
> wrote:
>> Hi all,
>>
>>   I have created a candidate build for ZooKeeper 3.4.5. This includes
>> the fix for ZOOKEEPER-1560.
>>  Please take a look at the release notes for the jira list.
>>
>>  *** Please download, test and VOTE before the
>>  *** vote closes  12:00  midnight PT on Friday, Nov 9th.***
>>
>>  http://people.apache.org/~mahadev/zookeeper-3.4.5-candidate-1/
>>
>>  Should we release this?
>
> +1 (non-binding)
>
> based on Bigtop testing (HBase 0.94.2, Hadoop 2.0.2-alpha, Giraph
> 0.2-SNAPSHOT, Solr 4.0.0)
>
> Thanks,
> Roman.


[VOTE] Release ZooKeeper 3.4.5 (candidate 1)

2012-11-05 Thread Mahadev Konar
Hi all,

  I have created a candidate build for ZooKeeper 3.4.5. This includes
the fix for ZOOKEEPER-1560.
 Please take a look at the release notes for the jira list.

 *** Please download, test and VOTE before the
 *** vote closes  12:00  midnight PT on Friday, Nov 9th.***

 http://people.apache.org/~mahadev/zookeeper-3.4.5-candidate-1/

 Should we release this?

 thanks
 mahadev


Re: [VOTE] Release ZooKeeper 3.4.5 (candidate 0)

2012-10-12 Thread Mahadev Konar
Thanks Ted. Will review the changes over the weekend.

Thanks again
mahadev

On Fri, Oct 12, 2012 at 1:12 PM, Ted Yu  wrote:
> Patch v7 for ZOOKEEPER-1560 passes test suite.
>
> Please take a look.
>
> On Thu, Oct 11, 2012 at 2:45 PM, Mahadev Konar wrote:
>
>> Thanks Alex for bringing it up. Ill hold the release for now. I see a
>> patch on 1560. Ill take a look and we'll see how to roll this into
>> 3.4.5.
>>
>> thanks
>> mahadev
>>
>> On Thu, Oct 11, 2012 at 2:42 PM, Alexander Shraer 
>> wrote:
>> > Hi Mahadev,
>> >
>> > ZOOKEEPER-1560 and ZOOKEEPER-1561 indicate a potentially serious issue,
>> > introduced recently in ZOOKEEPER-1437. Please consider this w.r.t. the
>> > 3.4.5 release.
>> >
>> > Best Regards,
>> > Alex
>> >
>> > On Wed, Oct 10, 2012 at 10:38 PM, Mahadev Konar 
>> wrote:
>> >> I think we have waited enough. Closing the vote now.
>> >>
>> >> With 5 +1's (3 binding) the vote passes. I will do the needful for
>> >> getting the release out.
>> >>
>> >> Thanks for voting folks.
>> >>
>> >> mahadev
>> >>
>> >> On Wed, Oct 10, 2012 at 9:04 AM, Flavio Junqueira 
>> wrote:
>> >>> +1
>> >>>
>> >>> -Flavio
>> >>>
>> >>> On Oct 8, 2012, at 7:05 AM, Mahadev Konar wrote:
>> >>>
>> >>>> Given Eugene's findings on ZOOKEEPER-1557, I think we can continue
>> >>>> rolling the current RC out. Others please vote on the thread if you
>> >>>> see any issues with that. Folks who have already voted, please re vote
>> >>>> in case you have a change of opinion.
>> >>>>
>> >>>> As for myself, I ran a couple of tests with the RC using open jdk 7
>> >>>> and things seem to work.
>> >>>>
>> >>>> +1 from my side. Pat/Ben/Flavio/others what do you guys think?
>> >>>>
>> >>>> thanks
>> >>>> mahadev
>> >>>>
>> >>>> On Sun, Oct 7, 2012 at 8:34 AM, Ted Yu  wrote:
>> >>>>> Currently ZooKeeper_branch34_openjdk7 and ZooKeeper_branch34_jdk7
>> are using
>> >>>>> lock ZooKeeper-solaris.
>> >>>>> I think ZooKeeper_branch34_openjdk7 and ZooKeeper_branch34_jdk7
>> should use
>> >>>>> a separate lock since they wouldn't run on a Solaris machine.
>> >>>>> I didn't seem to find how a new lock name can be added.
>> >>>>>
>> >>>>> Recent builds for ZooKeeper_branch34_openjdk7 and
>> ZooKeeper_branch34_jdk7
>> >>>>> have been green.
>> >>>>>
>> >>>>> Cheers
>> >>>>>
>> >>>>> On Sun, Oct 7, 2012 at 6:56 AM, Patrick Hunt 
>> wrote:
>> >>>>>
>> >>>>>> I've seen that before, it's a flakey test that's unrelated to the
>> sasl
>> >>>>>> stuff.
>> >>>>>>
>> >>>>>> Patrick
>> >>>>>>
>> >>>>>> On Sat, Oct 6, 2012 at 2:25 PM, Ted Yu  wrote:
>> >>>>>>> I saw one test failure:
>> >>>>>>>
>> >>>>>>>
>> >>>>>>
>> https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper_branch34_openjdk7/9/testReport/org.apache.zookeeper.server.quorum/QuorumPeerMainTest/testHighestZxidJoinLate/
>> >>>>>>>
>> >>>>>>> FYI
>> >>>>>>>
>> >>>>>>> On Sat, Oct 6, 2012 at 7:16 AM, Ted Yu 
>> wrote:
>> >>>>>>>
>> >>>>>>>> Up in ZOOKEEPER-1557, Eugene separated one test out and test
>> failure
>> >>>>>> seems
>> >>>>>>>> to be gone.
>> >>>>>>>>
>> >>>>>>>> For ZooKeeper_branch34_jdk7, the two failed builds:
>> >>>>>>>> #10 corresponded to ZooKeeper_branch34_openjdk7 build #7,
>> >>>>>>>> #8 corresponded to ZooKeeper_branch34_openjdk7 build #5
>> >>>>>>>> where tests failed due to BindException
>> >>>>>>>>
>> >>>>>>>>

Re: [VOTE] Release ZooKeeper 3.4.5 (candidate 0)

2012-10-11 Thread Mahadev Konar
Thanks Alex for bringing it up. Ill hold the release for now. I see a
patch on 1560. Ill take a look and we'll see how to roll this into
3.4.5.

thanks
mahadev

On Thu, Oct 11, 2012 at 2:42 PM, Alexander Shraer  wrote:
> Hi Mahadev,
>
> ZOOKEEPER-1560 and ZOOKEEPER-1561 indicate a potentially serious issue,
> introduced recently in ZOOKEEPER-1437. Please consider this w.r.t. the
> 3.4.5 release.
>
> Best Regards,
> Alex
>
> On Wed, Oct 10, 2012 at 10:38 PM, Mahadev Konar  
> wrote:
>> I think we have waited enough. Closing the vote now.
>>
>> With 5 +1's (3 binding) the vote passes. I will do the needful for
>> getting the release out.
>>
>> Thanks for voting folks.
>>
>> mahadev
>>
>> On Wed, Oct 10, 2012 at 9:04 AM, Flavio Junqueira  wrote:
>>> +1
>>>
>>> -Flavio
>>>
>>> On Oct 8, 2012, at 7:05 AM, Mahadev Konar wrote:
>>>
>>>> Given Eugene's findings on ZOOKEEPER-1557, I think we can continue
>>>> rolling the current RC out. Others please vote on the thread if you
>>>> see any issues with that. Folks who have already voted, please re vote
>>>> in case you have a change of opinion.
>>>>
>>>> As for myself, I ran a couple of tests with the RC using open jdk 7
>>>> and things seem to work.
>>>>
>>>> +1 from my side. Pat/Ben/Flavio/others what do you guys think?
>>>>
>>>> thanks
>>>> mahadev
>>>>
>>>> On Sun, Oct 7, 2012 at 8:34 AM, Ted Yu  wrote:
>>>>> Currently ZooKeeper_branch34_openjdk7 and ZooKeeper_branch34_jdk7 are 
>>>>> using
>>>>> lock ZooKeeper-solaris.
>>>>> I think ZooKeeper_branch34_openjdk7 and ZooKeeper_branch34_jdk7 should use
>>>>> a separate lock since they wouldn't run on a Solaris machine.
>>>>> I didn't seem to find how a new lock name can be added.
>>>>>
>>>>> Recent builds for ZooKeeper_branch34_openjdk7 and ZooKeeper_branch34_jdk7
>>>>> have been green.
>>>>>
>>>>> Cheers
>>>>>
>>>>> On Sun, Oct 7, 2012 at 6:56 AM, Patrick Hunt  wrote:
>>>>>
>>>>>> I've seen that before, it's a flakey test that's unrelated to the sasl
>>>>>> stuff.
>>>>>>
>>>>>> Patrick
>>>>>>
>>>>>> On Sat, Oct 6, 2012 at 2:25 PM, Ted Yu  wrote:
>>>>>>> I saw one test failure:
>>>>>>>
>>>>>>>
>>>>>> https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper_branch34_openjdk7/9/testReport/org.apache.zookeeper.server.quorum/QuorumPeerMainTest/testHighestZxidJoinLate/
>>>>>>>
>>>>>>> FYI
>>>>>>>
>>>>>>> On Sat, Oct 6, 2012 at 7:16 AM, Ted Yu  wrote:
>>>>>>>
>>>>>>>> Up in ZOOKEEPER-1557, Eugene separated one test out and test failure
>>>>>> seems
>>>>>>>> to be gone.
>>>>>>>>
>>>>>>>> For ZooKeeper_branch34_jdk7, the two failed builds:
>>>>>>>> #10 corresponded to ZooKeeper_branch34_openjdk7 build #7,
>>>>>>>> #8 corresponded to ZooKeeper_branch34_openjdk7 build #5
>>>>>>>> where tests failed due to BindException
>>>>>>>>
>>>>>>>> Cheers
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, Oct 6, 2012 at 7:06 AM, Patrick Hunt  wrote:
>>>>>>>>
>>>>>>>>> Yes. Those ubuntu machines have two slots each. If both tests run at
>>>>>>>>> the same time... bam.
>>>>>>>>>
>>>>>>>>> I just added exclusion locks to the configuration of these two jobs,
>>>>>>>>> that should help.
>>>>>>>>>
>>>>>>>>> Patrick
>>>>>>>>>
>>>>>>>>> On Fri, Oct 5, 2012 at 8:58 PM, Ted Yu  wrote:
>>>>>>>>>> I think that was due to the following running on the same machine at
>>>>>> the
>>>>>>>>>> same time:
>>>>>>>>>>
>>>>>>>>>> Building remotely on ubuntu4
>>>>>>>>>> <https://builds.apache

Re: [VOTE] Release ZooKeeper 3.4.5 (candidate 0)

2012-10-10 Thread Mahadev Konar
I think we have waited enough. Closing the vote now.

With 5 +1's (3 binding) the vote passes. I will do the needful for
getting the release out.

Thanks for voting folks.

mahadev

On Wed, Oct 10, 2012 at 9:04 AM, Flavio Junqueira  wrote:
> +1
>
> -Flavio
>
> On Oct 8, 2012, at 7:05 AM, Mahadev Konar wrote:
>
>> Given Eugene's findings on ZOOKEEPER-1557, I think we can continue
>> rolling the current RC out. Others please vote on the thread if you
>> see any issues with that. Folks who have already voted, please re vote
>> in case you have a change of opinion.
>>
>> As for myself, I ran a couple of tests with the RC using open jdk 7
>> and things seem to work.
>>
>> +1 from my side. Pat/Ben/Flavio/others what do you guys think?
>>
>> thanks
>> mahadev
>>
>> On Sun, Oct 7, 2012 at 8:34 AM, Ted Yu  wrote:
>>> Currently ZooKeeper_branch34_openjdk7 and ZooKeeper_branch34_jdk7 are using
>>> lock ZooKeeper-solaris.
>>> I think ZooKeeper_branch34_openjdk7 and ZooKeeper_branch34_jdk7 should use
>>> a separate lock since they wouldn't run on a Solaris machine.
>>> I didn't seem to find how a new lock name can be added.
>>>
>>> Recent builds for ZooKeeper_branch34_openjdk7 and ZooKeeper_branch34_jdk7
>>> have been green.
>>>
>>> Cheers
>>>
>>> On Sun, Oct 7, 2012 at 6:56 AM, Patrick Hunt  wrote:
>>>
>>>> I've seen that before, it's a flakey test that's unrelated to the sasl
>>>> stuff.
>>>>
>>>> Patrick
>>>>
>>>> On Sat, Oct 6, 2012 at 2:25 PM, Ted Yu  wrote:
>>>>> I saw one test failure:
>>>>>
>>>>>
>>>> https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper_branch34_openjdk7/9/testReport/org.apache.zookeeper.server.quorum/QuorumPeerMainTest/testHighestZxidJoinLate/
>>>>>
>>>>> FYI
>>>>>
>>>>> On Sat, Oct 6, 2012 at 7:16 AM, Ted Yu  wrote:
>>>>>
>>>>>> Up in ZOOKEEPER-1557, Eugene separated one test out and test failure
>>>> seems
>>>>>> to be gone.
>>>>>>
>>>>>> For ZooKeeper_branch34_jdk7, the two failed builds:
>>>>>> #10 corresponded to ZooKeeper_branch34_openjdk7 build #7,
>>>>>> #8 corresponded to ZooKeeper_branch34_openjdk7 build #5
>>>>>> where tests failed due to BindException
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>>
>>>>>> On Sat, Oct 6, 2012 at 7:06 AM, Patrick Hunt  wrote:
>>>>>>
>>>>>>> Yes. Those ubuntu machines have two slots each. If both tests run at
>>>>>>> the same time... bam.
>>>>>>>
>>>>>>> I just added exclusion locks to the configuration of these two jobs,
>>>>>>> that should help.
>>>>>>>
>>>>>>> Patrick
>>>>>>>
>>>>>>> On Fri, Oct 5, 2012 at 8:58 PM, Ted Yu  wrote:
>>>>>>>> I think that was due to the following running on the same machine at
>>>> the
>>>>>>>> same time:
>>>>>>>>
>>>>>>>> Building remotely on ubuntu4
>>>>>>>> <https://builds.apache.org/computer/ubuntu4> in workspace
>>>>>>>> /home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_openjdk7
>>>>>>>>
>>>>>>>> We should introduce randomized port so that test suite can execute in
>>>>>>>> parallel.
>>>>>>>>
>>>>>>>> Cheers
>>>>>>>>
>>>>>>>> On Fri, Oct 5, 2012 at 8:55 PM, Ted Yu  wrote:
>>>>>>>>
>>>>>>>>> Some tests failed in build 8 due to (See
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>> https://builds.apache.org//view/S-Z/view/ZooKeeper/job/ZooKeeper_branch34_jdk7/8/testReport/org.apache.zookeeper.server/ZxidRolloverTest/testRolloverThenRestart/
>>>>>>> ):
>>>>>>>>>
>>>>>>>>> java.lang.RuntimeException: java.net.BindException: Address already
>>>> in
>>>>>>> use
>>>>>>>>>  at
>>>>>>> org.apache.zookeeper.test.QuorumUtil.

[jira] [Commented] (ZOOKEEPER-1557) jenkins jdk7 test failure in testBadSaslAuthNotifiesWatch

2012-10-08 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13471676#comment-13471676
 ] 

Mahadev konar commented on ZOOKEEPER-1557:
--

Thanks Eugene .. Interesting

> jenkins jdk7 test failure in testBadSaslAuthNotifiesWatch
> -
>
> Key: ZOOKEEPER-1557
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1557
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.0, 3.4.5
>Reporter: Patrick Hunt
>Assignee: Eugene Koontz
> Fix For: 3.5.0, 3.4.6
>
> Attachments: jstack.out, SaslAuthFailTest.log, ZOOKEEPER-1557.patch
>
>
> Failure of testBadSaslAuthNotifiesWatch on the jenkins jdk7 job:
> https://builds.apache.org/job/ZooKeeper-trunk-jdk7/407/
> haven't seen this before.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [VOTE] Release ZooKeeper 3.4.5 (candidate 0)

2012-10-07 Thread Mahadev Konar
erCnxnFactory.configure(NIOServerCnxnFactory.java:95)
>> >>> >>   at
>> >>>
>> org.apache.zookeeper.server.ServerCnxnFactory.createFactory(ServerCnxnFactory.java:125)
>> >>> >>   at
>> >>>
>> org.apache.zookeeper.server.quorum.QuorumPeer.(QuorumPeer.java:517)
>> >>> >>   at
>> >>> org.apache.zookeeper.test.QuorumUtil.(QuorumUtil.java:113)
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> On Fri, Oct 5, 2012 at 9:56 AM, Patrick Hunt 
>> wrote:
>> >>> >>
>> >>> >>> fwiw: I setup jdk7 and openjdk7 jobs last night for branch34 on
>> >>> >>> jenkins and they are looking good so far:
>> >>> >>>
>> >>> >>>
>> >>> >>>
>> >>>
>> https://builds.apache.org//view/S-Z/view/ZooKeeper/job/ZooKeeper_branch34_jdk7/
>> >>> >>>
>> >>> >>>
>> >>>
>> https://builds.apache.org//view/S-Z/view/ZooKeeper/job/ZooKeeper_branch34_openjdk7/
>> >>> >>>
>> >>> >>> Patrick
>> >>> >>>
>> >>> >>> On Thu, Oct 4, 2012 at 11:17 PM, Patrick Hunt 
>> >>> wrote:
>> >>> >>> > Doesn't look good, failed a second time:
>> >>> >>> >
>> >>> >>> >
>> >>> >>>
>> >>>
>> https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper-trunk-jdk7/408/
>> >>> >>> >
>> >>> >>> > java.util.concurrent.TimeoutException: Did not connect
>> >>> >>> > at
>> >>> >>>
>> >>>
>> org.apache.zookeeper.test.ClientBase$CountdownWatcher.waitForConnected(ClientBase.java:129)
>> >>> >>> > at
>> >>> >>>
>> >>>
>> org.apache.zookeeper.test.WatcherTest.testWatchAutoResetWithPending(WatcherTest.java:199)
>> >>> >>> >
>> >>> >>> >
>> >>> >>> > Patrick
>> >>> >>> >
>> >>> >>> > On Thu, Oct 4, 2012 at 4:15 PM, Mahadev Konar <
>> >>> maha...@hortonworks.com>
>> >>> >>> wrote:
>> >>> >>> >> Good point Ted.
>> >>> >>> >> Eugene,
>> >>> >>> >>  Would you be able to take a quick look and point out the threat
>> >>> >>> level? :)
>> >>> >>> >>
>> >>> >>> >> I have kicked off new build to see if its reproducible or not.
>> >>> >>> >>
>> >>> >>> >> thanks
>> >>> >>> >> mahadev
>> >>> >>> >>
>> >>> >>> >> On Thu, Oct 4, 2012 at 4:10 PM, Ted Yu 
>> >>> wrote:
>> >>> >>> >>> Should ZOOKEEPER-1557 be given some time so that we track down
>> >>> root
>> >>> >>> cause ?
>> >>> >>> >>>
>> >>> >>> >>> Thanks
>> >>> >>> >>>
>> >>> >>> >>> On Wed, Oct 3, 2012 at 11:34 PM, Patrick Hunt <
>> ph...@apache.org>
>> >>> >>> wrote:
>> >>> >>> >>>
>> >>> >>> >>>> +1, sig/xsum are correct, ran rat an that looked good. All the
>> >>> unit
>> >>> >>> >>>> tests pass for me on jdk6 and openjdk7 (ubuntu 12.04). Also
>> ran
>> >>> >>> >>>> 1/3/5/13 server clusters using openjdk7, everything seems to
>> be
>> >>> >>> >>>> working.
>> >>> >>> >>>>
>> >>> >>> >>>> Patrick
>> >>> >>> >>>>
>> >>> >>> >>>> On Sun, Sep 30, 2012 at 11:15 AM, Mahadev Konar <
>> >>> >>> maha...@hortonworks.com>
>> >>> >>> >>>> wrote:
>> >>> >>> >>>> > Hi all,
>> >>> >>> >>>> >
>> >>> >>> >>>> >   I have created a candidate build for ZooKeeper 3.4.5. 2
>> >>> JIRAs are
>> >>> >>> >>>> >  addressed in this release. This includes the critical
>> bugfix
>> >>> >>> >>>> ZOOKEEPER-1550
>> >>> >>> >>>> >  which address the client connection issue.
>> >>> >>> >>>> >
>> >>> >>> >>>> >  *** Please download, test and VOTE before the
>> >>> >>> >>>> >  *** vote closes  12:00  midnight PT on Friday, Oct 5th.***
>> >>> >>> >>>> >
>> >>> >>> >>>> > Note that I am extending the vote period for a little
>> longer so
>> >>> >>> that
>> >>> >>> >>>> > folks get time to test this out.
>> >>> >>> >>>> >
>> >>> >>> >>>> >
>> >>> http://people.apache.org/~mahadev/zookeeper-3.4.5-candidate-0/
>> >>> >>> >>>> >
>> >>> >>> >>>> >  Should we release this?
>> >>> >>> >>>> >
>> >>> >>> >>>> >  thanks
>> >>> >>> >>>> >  mahadev
>> >>> >>> >>>>
>> >>> >>>
>> >>> >>
>> >>> >>
>> >>>
>> >>
>> >>
>>


[jira] [Comment Edited] (ZOOKEEPER-1557) jenkins jdk7 test failure in testBadSaslAuthNotifiesWatch

2012-10-07 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13471403#comment-13471403
 ] 

Mahadev konar edited comment on ZOOKEEPER-1557 at 10/8/12 5:04 AM:
---

Thanks Eugene for taking a look at it. Given your analysis above it doesnt look 
like we have a full knowledge of whats causing the issue. Given that this is 
not SASL related and could be related to how our test framework runs, I think 
we can move this out to 3.4.6 and get 3.4.5 out the door with what we have now. 
What do you think?

  was (Author: mahadev):
Thanks Eugene for taking a look at it. Given your any analysis above it 
doesnt look like we have a full knowledge of whats causing the issue. Given 
that this is not SASL related and could be related to how our test framework 
runs, I think we can move this out to 3.4.6 and get 3.4.5 out the door with 
what we have now. What do you think?
  
> jenkins jdk7 test failure in testBadSaslAuthNotifiesWatch
> -
>
> Key: ZOOKEEPER-1557
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1557
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.0, 3.4.5
>Reporter: Patrick Hunt
>Assignee: Eugene Koontz
> Fix For: 3.5.0, 3.4.6
>
> Attachments: jstack.out, SaslAuthFailTest.log, ZOOKEEPER-1557.patch
>
>
> Failure of testBadSaslAuthNotifiesWatch on the jenkins jdk7 job:
> https://builds.apache.org/job/ZooKeeper-trunk-jdk7/407/
> haven't seen this before.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1557) jenkins jdk7 test failure in testBadSaslAuthNotifiesWatch

2012-10-07 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1557:
-

Fix Version/s: (was: 3.4.5)
   3.4.6

> jenkins jdk7 test failure in testBadSaslAuthNotifiesWatch
> -
>
> Key: ZOOKEEPER-1557
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1557
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.0, 3.4.5
>Reporter: Patrick Hunt
>Assignee: Eugene Koontz
> Fix For: 3.5.0, 3.4.6
>
> Attachments: jstack.out, SaslAuthFailTest.log, ZOOKEEPER-1557.patch
>
>
> Failure of testBadSaslAuthNotifiesWatch on the jenkins jdk7 job:
> https://builds.apache.org/job/ZooKeeper-trunk-jdk7/407/
> haven't seen this before.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1557) jenkins jdk7 test failure in testBadSaslAuthNotifiesWatch

2012-10-07 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13471403#comment-13471403
 ] 

Mahadev konar commented on ZOOKEEPER-1557:
--

Thanks Eugene for taking a look at it. Given your any analysis above it doesnt 
look like we have a full knowledge of whats causing the issue. Given that this 
is not SASL related and could be related to how our test framework runs, I 
think we can move this out to 3.4.6 and get 3.4.5 out the door with what we 
have now. What do you think?

> jenkins jdk7 test failure in testBadSaslAuthNotifiesWatch
> -
>
> Key: ZOOKEEPER-1557
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1557
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.0, 3.4.5
>Reporter: Patrick Hunt
>Assignee: Eugene Koontz
> Fix For: 3.5.0, 3.4.5
>
> Attachments: jstack.out, SaslAuthFailTest.log, ZOOKEEPER-1557.patch
>
>
> Failure of testBadSaslAuthNotifiesWatch on the jenkins jdk7 job:
> https://builds.apache.org/job/ZooKeeper-trunk-jdk7/407/
> haven't seen this before.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [VOTE] Release ZooKeeper 3.4.5 (candidate 0)

2012-10-04 Thread Mahadev Konar
Good point Ted.
Eugene,
 Would you be able to take a quick look and point out the threat level? :)

I have kicked off new build to see if its reproducible or not.

thanks
mahadev

On Thu, Oct 4, 2012 at 4:10 PM, Ted Yu  wrote:
> Should ZOOKEEPER-1557 be given some time so that we track down root cause ?
>
> Thanks
>
> On Wed, Oct 3, 2012 at 11:34 PM, Patrick Hunt  wrote:
>
>> +1, sig/xsum are correct, ran rat an that looked good. All the unit
>> tests pass for me on jdk6 and openjdk7 (ubuntu 12.04). Also ran
>> 1/3/5/13 server clusters using openjdk7, everything seems to be
>> working.
>>
>> Patrick
>>
>> On Sun, Sep 30, 2012 at 11:15 AM, Mahadev Konar 
>> wrote:
>> > Hi all,
>> >
>> >   I have created a candidate build for ZooKeeper 3.4.5. 2 JIRAs are
>> >  addressed in this release. This includes the critical bugfix
>> ZOOKEEPER-1550
>> >  which address the client connection issue.
>> >
>> >  *** Please download, test and VOTE before the
>> >  *** vote closes  12:00  midnight PT on Friday, Oct 5th.***
>> >
>> > Note that I am extending the vote period for a little longer so that
>> > folks get time to test this out.
>> >
>> >  http://people.apache.org/~mahadev/zookeeper-3.4.5-candidate-0/
>> >
>> >  Should we release this?
>> >
>> >  thanks
>> >  mahadev
>>


[VOTE] Release ZooKeeper 3.4.5 (candidate 0)

2012-09-30 Thread Mahadev Konar
Hi all,

  I have created a candidate build for ZooKeeper 3.4.5. 2 JIRAs are
 addressed in this release. This includes the critical bugfix ZOOKEEPER-1550
 which address the client connection issue.

 *** Please download, test and VOTE before the
 *** vote closes  12:00  midnight PT on Friday, Oct 5th.***

Note that I am extending the vote period for a little longer so that
folks get time to test this out.

 http://people.apache.org/~mahadev/zookeeper-3.4.5-candidate-0/

 Should we release this?

 thanks
 mahadev


Re: SASL problem with 3.4.4 Java client

2012-09-28 Thread Mahadev Konar
Thanks to Eugene we have all green on our builds (including jdk7). Ill
spin up a new RC.

Thanks again Eugene!
mahadev

On Wed, Sep 26, 2012 at 11:26 AM, Eugene Koontz  wrote:
> On 9/26/12 11:08 AM, Patrick Hunt wrote:
>>
>>
>> I didn't notice any feedback to the list on these issues, perhaps I missed
>> it?
>>
> Hi Pat,
>
> I should have mentioned the test failures that we noticed with
> ZOOKEEPER-1497; I regret not bringing these to yours and the community's
> attention. I will look into it more today.
>
> -Eugene


[jira] [Updated] (ZOOKEEPER-1477) Test failures with Java 7 on Mac OS X

2012-09-28 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1477:
-

Priority: Major  (was: Blocker)

Downgrading to Major given the recent updates on this jira.

> Test failures with Java 7 on Mac OS X
> -
>
> Key: ZOOKEEPER-1477
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1477
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server, tests
>Affects Versions: 3.4.3
> Environment: Mac OS X Lion (10.7.4)
> Java version:
> java version "1.7.0_04"
> Java(TM) SE Runtime Environment (build 1.7.0_04-b21)
> Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode)
>Reporter: Diwaker Gupta
> Fix For: 3.4.6
>
> Attachments: with-ZK-1550.txt
>
>
> I downloaded ZK 3.4.3 sources and ran {{ant test}}. Many of the tests failed, 
> including ZooKeeperTest. A common symptom was spurious 
> {{ConnectionLossException}}:
> {code}
> 2012-06-01 12:01:23,420 [myid:] - INFO  
> [main:JUnit4ZKTestRunner$LoggedInvokeMethod@54] - TEST METHOD FAILED 
> testDeleteRecursiveAsync
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for /
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
> at 
> org.apache.zookeeper.ZooKeeperTest.testDeleteRecursiveAsync(ZooKeeperTest.java:77)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> ... (snipped)
> {code}
> As background, I was actually investigating some non-deterministic failures 
> when using Netflix's Curator with Java 7 (see 
> https://github.com/Netflix/curator/issues/79). After a while, I figured I 
> should establish a clean ZK baseline first and realized it is actually a ZK 
> issue, not a Curator issue.
> We are trying to migrate to Java 7 but this is a blocking issue for us right 
> now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1477) Test failures with Java 7 on Mac OS X

2012-09-27 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465243#comment-13465243
 ] 

Mahadev konar commented on ZOOKEEPER-1477:
--

Thats fine Diwaker. Ill downgrade this jira to a major and mark it for the next 
release. We can just ship 3.4.5 with fix for ZOOKEEPER-1550.
 
Itll be good to upload the tests logs for those that fail but its not urgent. 
We can do it later for 3.4.6.

Thanks.

> Test failures with Java 7 on Mac OS X
> -
>
> Key: ZOOKEEPER-1477
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1477
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server, tests
>Affects Versions: 3.4.3
> Environment: Mac OS X Lion (10.7.4)
> Java version:
> java version "1.7.0_04"
> Java(TM) SE Runtime Environment (build 1.7.0_04-b21)
> Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode)
>Reporter: Diwaker Gupta
>Priority: Blocker
> Fix For: 3.4.5
>
> Attachments: with-ZK-1550.txt
>
>
> I downloaded ZK 3.4.3 sources and ran {{ant test}}. Many of the tests failed, 
> including ZooKeeperTest. A common symptom was spurious 
> {{ConnectionLossException}}:
> {code}
> 2012-06-01 12:01:23,420 [myid:] - INFO  
> [main:JUnit4ZKTestRunner$LoggedInvokeMethod@54] - TEST METHOD FAILED 
> testDeleteRecursiveAsync
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for /
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
> at 
> org.apache.zookeeper.ZooKeeperTest.testDeleteRecursiveAsync(ZooKeeperTest.java:77)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> ... (snipped)
> {code}
> As background, I was actually investigating some non-deterministic failures 
> when using Netflix's Curator with Java 7 (see 
> https://github.com/Netflix/curator/issues/79). After a while, I figured I 
> should establish a clean ZK baseline first and realized it is actually a ZK 
> issue, not a Curator issue.
> We are trying to migrate to Java 7 but this is a blocking issue for us right 
> now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1477) Test failures with Java 7 on Mac OS X

2012-09-27 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465108#comment-13465108
 ] 

Mahadev konar commented on ZOOKEEPER-1477:
--

Diwaker,
 The usual time on a linux box is around 40 mins or so.



> Test failures with Java 7 on Mac OS X
> -
>
> Key: ZOOKEEPER-1477
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1477
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server, tests
>Affects Versions: 3.4.3
> Environment: Mac OS X Lion (10.7.4)
> Java version:
> java version "1.7.0_04"
> Java(TM) SE Runtime Environment (build 1.7.0_04-b21)
> Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode)
>Reporter: Diwaker Gupta
>Priority: Blocker
> Fix For: 3.4.5
>
>
> I downloaded ZK 3.4.3 sources and ran {{ant test}}. Many of the tests failed, 
> including ZooKeeperTest. A common symptom was spurious 
> {{ConnectionLossException}}:
> {code}
> 2012-06-01 12:01:23,420 [myid:] - INFO  
> [main:JUnit4ZKTestRunner$LoggedInvokeMethod@54] - TEST METHOD FAILED 
> testDeleteRecursiveAsync
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for /
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
> at 
> org.apache.zookeeper.ZooKeeperTest.testDeleteRecursiveAsync(ZooKeeperTest.java:77)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> ... (snipped)
> {code}
> As background, I was actually investigating some non-deterministic failures 
> when using Netflix's Curator with Java 7 (see 
> https://github.com/Netflix/curator/issues/79). After a while, I figured I 
> should establish a clean ZK baseline first and realized it is actually a ZK 
> issue, not a Curator issue.
> We are trying to migrate to Java 7 but this is a blocking issue for us right 
> now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1477) Test failures with Java 7 on Mac OS X

2012-09-27 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465097#comment-13465097
 ] 

Mahadev konar commented on ZOOKEEPER-1477:
--

Thanks Diwaker. Could you please upload a summary of the tests failing and the 
logs as well?



> Test failures with Java 7 on Mac OS X
> -
>
> Key: ZOOKEEPER-1477
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1477
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server, tests
>Affects Versions: 3.4.3
> Environment: Mac OS X Lion (10.7.4)
> Java version:
> java version "1.7.0_04"
> Java(TM) SE Runtime Environment (build 1.7.0_04-b21)
> Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode)
>Reporter: Diwaker Gupta
>Priority: Blocker
> Fix For: 3.4.5
>
>
> I downloaded ZK 3.4.3 sources and ran {{ant test}}. Many of the tests failed, 
> including ZooKeeperTest. A common symptom was spurious 
> {{ConnectionLossException}}:
> {code}
> 2012-06-01 12:01:23,420 [myid:] - INFO  
> [main:JUnit4ZKTestRunner$LoggedInvokeMethod@54] - TEST METHOD FAILED 
> testDeleteRecursiveAsync
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for /
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
> at 
> org.apache.zookeeper.ZooKeeperTest.testDeleteRecursiveAsync(ZooKeeperTest.java:77)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> ... (snipped)
> {code}
> As background, I was actually investigating some non-deterministic failures 
> when using Netflix's Curator with Java 7 (see 
> https://github.com/Netflix/curator/issues/79). After a while, I figured I 
> should establish a clean ZK baseline first and realized it is actually a ZK 
> issue, not a Curator issue.
> We are trying to migrate to Java 7 but this is a blocking issue for us right 
> now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1477) Test failures with Java 7 on Mac OS X

2012-09-27 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465077#comment-13465077
 ] 

Mahadev konar commented on ZOOKEEPER-1477:
--

Diwaker, 
 Would you be able to run the tests along with Eugenes patch on  ZOOKEEPER-1550 
? If not please let me know. I can go ahead and run it.



> Test failures with Java 7 on Mac OS X
> -
>
> Key: ZOOKEEPER-1477
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1477
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server, tests
>Affects Versions: 3.4.3
> Environment: Mac OS X Lion (10.7.4)
> Java version:
> java version "1.7.0_04"
> Java(TM) SE Runtime Environment (build 1.7.0_04-b21)
> Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode)
>Reporter: Diwaker Gupta
>Priority: Blocker
> Fix For: 3.4.5
>
>
> I downloaded ZK 3.4.3 sources and ran {{ant test}}. Many of the tests failed, 
> including ZooKeeperTest. A common symptom was spurious 
> {{ConnectionLossException}}:
> {code}
> 2012-06-01 12:01:23,420 [myid:] - INFO  
> [main:JUnit4ZKTestRunner$LoggedInvokeMethod@54] - TEST METHOD FAILED 
> testDeleteRecursiveAsync
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for /
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
> at 
> org.apache.zookeeper.ZooKeeperTest.testDeleteRecursiveAsync(ZooKeeperTest.java:77)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> ... (snipped)
> {code}
> As background, I was actually investigating some non-deterministic failures 
> when using Netflix's Curator with Java 7 (see 
> https://github.com/Netflix/curator/issues/79). After a while, I figured I 
> should establish a clean ZK baseline first and realized it is actually a ZK 
> issue, not a Curator issue.
> We are trying to migrate to Java 7 but this is a blocking issue for us right 
> now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1550) ZooKeeperSaslClient does not finish anonymous login on OpenJDK

2012-09-26 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464387#comment-13464387
 ] 

Mahadev konar commented on ZOOKEEPER-1550:
--

Eugene,
 Still failing :)...

> ZooKeeperSaslClient does not finish anonymous login on OpenJDK
> --
>
> Key: ZOOKEEPER-1550
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1550
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.4
>Reporter: Robert Macomber
>Assignee: Eugene Koontz
>Priority: Blocker
> Fix For: 3.4.5
>
> Attachments: ZOOKEEPER-1550.patch, ZOOKEEPER-1550.patch
>
>
> On OpenJDK, {{javax.security.auth.login.Configuration.getConfiguration}} does 
> not throw an exception.  
> {{ZooKeeperSaslClient.clientTunneledAuthenticationInProgress}} uses an 
> exception from that method as a proxy for "this client is not configured to 
> use SASL" and as a result no commands can be sent, since it is still waiting 
> for auth to complete.
> [Link to mailing list 
> discussion|http://comments.gmane.org/gmane.comp.java.zookeeper.user/2667]
> The relevant bit of logs from OpenJDK and Oracle versions of 'connect and do 
> getChildren("/")':
> {code:title=OpenJDK}
> INFO [main] 2012-09-25 14:02:24,545 com.socrata.Main Waiting for connection...
> DEBUG [main] 2012-09-25 14:02:24,548 com.socrata.zookeeper.ZooKeeperProvider 
> Waiting for connected-state...
> INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,576 
> org.apache.zookeeper.ClientCnxn Opening socket connection to server 
> mike.local/10.0.2.106:2181. Will not attempt to authenticate using SASL 
> (unknown error)
> INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,584 
> org.apache.zookeeper.ClientCnxn Socket connection established to 
> mike.local/10.0.2.106:2181, initiating session
> DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,586 
> org.apache.zookeeper.ClientCnxn Session establishment request sent on 
> mike.local/10.0.2.106:2181
> INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,600 
> org.apache.zookeeper.ClientCnxn Session establishment complete on server 
> mike.local/10.0.2.106:2181, sessionid = 0x139ff2e85b60005, negotiated timeout 
> = 4
> DEBUG [main-EventThread] 2012-09-25 14:02:24,614 
> com.socrata.zookeeper.ZooKeeperProvider ConnectionStateChanged(Connected)
> DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,636 
> org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
> clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
> request:: '/,F  response:: v{} until SASL authentication completes.
> DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,923 
> org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
> clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
> request:: '/,F  response:: v{} until SASL authentication completes.
> DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,924 
> org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
> clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
> request:: '/,F  response:: v{} until SASL authentication completes.
> DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,924 
> org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
> clientPath:null serverPath:null finished:false header:: -2,11  replyHeader:: 
> null request:: null response:: nulluntil SASL authentication completes.
> DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,260 
> org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
> clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
> request:: '/,F  response:: v{} until SASL authentication completes.
> DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,260 
> org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
> clientPath:null serverPath:null finished:false header:: -2,11  replyHeader:: 
> null request:: null response:: nulluntil SASL authentication completes.
> DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,261 
> org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
> clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
> request:: '/,F  response:: v{} until SASL authentication completes.
> DEBUG [main-SendThread(mike.loc

[jira] [Commented] (ZOOKEEPER-1550) ZooKeeperSaslClient does not finish anonymous login on OpenJDK

2012-09-26 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464370#comment-13464370
 ] 

Mahadev konar commented on ZOOKEEPER-1550:
--

Eugene,
 Looks like the sasl test failed. Can you please take a look?

> ZooKeeperSaslClient does not finish anonymous login on OpenJDK
> --
>
> Key: ZOOKEEPER-1550
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1550
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.4
>Reporter: Robert Macomber
>Assignee: Eugene Koontz
>Priority: Blocker
> Fix For: 3.4.5
>
> Attachments: ZOOKEEPER-1550.patch
>
>
> On OpenJDK, {{javax.security.auth.login.Configuration.getConfiguration}} does 
> not throw an exception.  
> {{ZooKeeperSaslClient.clientTunneledAuthenticationInProgress}} uses an 
> exception from that method as a proxy for "this client is not configured to 
> use SASL" and as a result no commands can be sent, since it is still waiting 
> for auth to complete.
> [Link to mailing list 
> discussion|http://comments.gmane.org/gmane.comp.java.zookeeper.user/2667]
> The relevant bit of logs from OpenJDK and Oracle versions of 'connect and do 
> getChildren("/")':
> {code:title=OpenJDK}
> INFO [main] 2012-09-25 14:02:24,545 com.socrata.Main Waiting for connection...
> DEBUG [main] 2012-09-25 14:02:24,548 com.socrata.zookeeper.ZooKeeperProvider 
> Waiting for connected-state...
> INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,576 
> org.apache.zookeeper.ClientCnxn Opening socket connection to server 
> mike.local/10.0.2.106:2181. Will not attempt to authenticate using SASL 
> (unknown error)
> INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,584 
> org.apache.zookeeper.ClientCnxn Socket connection established to 
> mike.local/10.0.2.106:2181, initiating session
> DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,586 
> org.apache.zookeeper.ClientCnxn Session establishment request sent on 
> mike.local/10.0.2.106:2181
> INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,600 
> org.apache.zookeeper.ClientCnxn Session establishment complete on server 
> mike.local/10.0.2.106:2181, sessionid = 0x139ff2e85b60005, negotiated timeout 
> = 4
> DEBUG [main-EventThread] 2012-09-25 14:02:24,614 
> com.socrata.zookeeper.ZooKeeperProvider ConnectionStateChanged(Connected)
> DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,636 
> org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
> clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
> request:: '/,F  response:: v{} until SASL authentication completes.
> DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,923 
> org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
> clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
> request:: '/,F  response:: v{} until SASL authentication completes.
> DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,924 
> org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
> clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
> request:: '/,F  response:: v{} until SASL authentication completes.
> DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,924 
> org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
> clientPath:null serverPath:null finished:false header:: -2,11  replyHeader:: 
> null request:: null response:: nulluntil SASL authentication completes.
> DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,260 
> org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
> clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
> request:: '/,F  response:: v{} until SASL authentication completes.
> DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,260 
> org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
> clientPath:null serverPath:null finished:false header:: -2,11  replyHeader:: 
> null request:: null response:: nulluntil SASL authentication completes.
> DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,261 
> org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
> clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
> request:: '/,F  response:: v{} until SASL authentication completes.
> DEBUG [main-SendThre

[jira] [Commented] (ZOOKEEPER-1550) ZooKeeperSaslClient does not finish anonymous login on OpenJDK

2012-09-26 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464357#comment-13464357
 ] 

Mahadev konar commented on ZOOKEEPER-1550:
--

Awesome, Ill check this in and kick of the builds on jdk 7 and see if it all 
works.


> ZooKeeperSaslClient does not finish anonymous login on OpenJDK
> --
>
> Key: ZOOKEEPER-1550
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1550
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.4
>Reporter: Robert Macomber
>Assignee: Eugene Koontz
>Priority: Blocker
> Fix For: 3.4.5
>
> Attachments: ZOOKEEPER-1550.patch
>
>
> On OpenJDK, {{javax.security.auth.login.Configuration.getConfiguration}} does 
> not throw an exception.  
> {{ZooKeeperSaslClient.clientTunneledAuthenticationInProgress}} uses an 
> exception from that method as a proxy for "this client is not configured to 
> use SASL" and as a result no commands can be sent, since it is still waiting 
> for auth to complete.
> [Link to mailing list 
> discussion|http://comments.gmane.org/gmane.comp.java.zookeeper.user/2667]
> The relevant bit of logs from OpenJDK and Oracle versions of 'connect and do 
> getChildren("/")':
> {code:title=OpenJDK}
> INFO [main] 2012-09-25 14:02:24,545 com.socrata.Main Waiting for connection...
> DEBUG [main] 2012-09-25 14:02:24,548 com.socrata.zookeeper.ZooKeeperProvider 
> Waiting for connected-state...
> INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,576 
> org.apache.zookeeper.ClientCnxn Opening socket connection to server 
> mike.local/10.0.2.106:2181. Will not attempt to authenticate using SASL 
> (unknown error)
> INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,584 
> org.apache.zookeeper.ClientCnxn Socket connection established to 
> mike.local/10.0.2.106:2181, initiating session
> DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,586 
> org.apache.zookeeper.ClientCnxn Session establishment request sent on 
> mike.local/10.0.2.106:2181
> INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,600 
> org.apache.zookeeper.ClientCnxn Session establishment complete on server 
> mike.local/10.0.2.106:2181, sessionid = 0x139ff2e85b60005, negotiated timeout 
> = 4
> DEBUG [main-EventThread] 2012-09-25 14:02:24,614 
> com.socrata.zookeeper.ZooKeeperProvider ConnectionStateChanged(Connected)
> DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,636 
> org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
> clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
> request:: '/,F  response:: v{} until SASL authentication completes.
> DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,923 
> org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
> clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
> request:: '/,F  response:: v{} until SASL authentication completes.
> DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,924 
> org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
> clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
> request:: '/,F  response:: v{} until SASL authentication completes.
> DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,924 
> org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
> clientPath:null serverPath:null finished:false header:: -2,11  replyHeader:: 
> null request:: null response:: nulluntil SASL authentication completes.
> DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,260 
> org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
> clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
> request:: '/,F  response:: v{} until SASL authentication completes.
> DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,260 
> org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
> clientPath:null serverPath:null finished:false header:: -2,11  replyHeader:: 
> null request:: null response:: nulluntil SASL authentication completes.
> DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,261 
> org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
> clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
> request:: '/,F  response:: v{} until SASL authentication completes.
> DEBUG

[jira] [Commented] (ZOOKEEPER-1550) ZooKeeperSaslClient does not finish anonymous login on OpenJDK

2012-09-26 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464339#comment-13464339
 ] 

Mahadev konar commented on ZOOKEEPER-1550:
--

Thanks Eugene.

Robert, can you verify this patch as well? 

Thanks

> ZooKeeperSaslClient does not finish anonymous login on OpenJDK
> --
>
> Key: ZOOKEEPER-1550
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1550
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.4
>Reporter: Robert Macomber
>Assignee: Eugene Koontz
>Priority: Blocker
> Fix For: 3.4.5
>
> Attachments: ZOOKEEPER-1550.patch
>
>
> On OpenJDK, {{javax.security.auth.login.Configuration.getConfiguration}} does 
> not throw an exception.  
> {{ZooKeeperSaslClient.clientTunneledAuthenticationInProgress}} uses an 
> exception from that method as a proxy for "this client is not configured to 
> use SASL" and as a result no commands can be sent, since it is still waiting 
> for auth to complete.
> [Link to mailing list 
> discussion|http://comments.gmane.org/gmane.comp.java.zookeeper.user/2667]
> The relevant bit of logs from OpenJDK and Oracle versions of 'connect and do 
> getChildren("/")':
> {code:title=OpenJDK}
> INFO [main] 2012-09-25 14:02:24,545 com.socrata.Main Waiting for connection...
> DEBUG [main] 2012-09-25 14:02:24,548 com.socrata.zookeeper.ZooKeeperProvider 
> Waiting for connected-state...
> INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,576 
> org.apache.zookeeper.ClientCnxn Opening socket connection to server 
> mike.local/10.0.2.106:2181. Will not attempt to authenticate using SASL 
> (unknown error)
> INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,584 
> org.apache.zookeeper.ClientCnxn Socket connection established to 
> mike.local/10.0.2.106:2181, initiating session
> DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,586 
> org.apache.zookeeper.ClientCnxn Session establishment request sent on 
> mike.local/10.0.2.106:2181
> INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,600 
> org.apache.zookeeper.ClientCnxn Session establishment complete on server 
> mike.local/10.0.2.106:2181, sessionid = 0x139ff2e85b60005, negotiated timeout 
> = 4
> DEBUG [main-EventThread] 2012-09-25 14:02:24,614 
> com.socrata.zookeeper.ZooKeeperProvider ConnectionStateChanged(Connected)
> DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,636 
> org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
> clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
> request:: '/,F  response:: v{} until SASL authentication completes.
> DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,923 
> org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
> clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
> request:: '/,F  response:: v{} until SASL authentication completes.
> DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,924 
> org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
> clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
> request:: '/,F  response:: v{} until SASL authentication completes.
> DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,924 
> org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
> clientPath:null serverPath:null finished:false header:: -2,11  replyHeader:: 
> null request:: null response:: nulluntil SASL authentication completes.
> DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,260 
> org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
> clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
> request:: '/,F  response:: v{} until SASL authentication completes.
> DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,260 
> org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
> clientPath:null serverPath:null finished:false header:: -2,11  replyHeader:: 
> null request:: null response:: nulluntil SASL authentication completes.
> DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,261 
> org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
> clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
> request:: '/,F  response:: v{} until SASL authentication completes.
> DEBUG [main-SendThre

  1   2   3   4   5   6   7   8   9   >