[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2018-09-26 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629504#comment-16629504
 ] 

Ted Yu commented on ZOOKEEPER-1936:
---

Can you outline how you plan to fix ?

thanks

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Ted Yu
>Priority: Minor
> Fix For: 3.6.0, 3.5.5
>
> Attachments: ZOOKEEPER-1936.branch-3.4.patch, ZOOKEEPER-1936.patch, 
> ZOOKEEPER-1936.v2.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, 
> ZOOKEEPER-1936.v4.patch, ZOOKEEPER-1936.v5.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-1859) pwriter should be closed in NIOServerCnxn#checkFourLetterWord()

2018-07-11 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1859:
--
Description: 
{code}
final PrintWriter pwriter = new PrintWriter(
new BufferedWriter(new SendBufferWriter()));
...
} else if (len == telnetCloseCmd) {
cleanupWriterSocket(null);
return true;
}
{code}
pwriter should be closed in case of telnetCloseCmd

  was:
{code}
final PrintWriter pwriter = new PrintWriter(
new BufferedWriter(new SendBufferWriter()));
...
} else if (len == telnetCloseCmd) {
cleanupWriterSocket(null);
return true;
}
{code}

pwriter should be closed in case of telnetCloseCmd


> pwriter should be closed in NIOServerCnxn#checkFourLetterWord()
> ---
>
> Key: ZOOKEEPER-1859
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1859
> Project: ZooKeeper
>  Issue Type: Bug
>    Reporter: Ted Yu
>Priority: Minor
>
> {code}
> final PrintWriter pwriter = new PrintWriter(
> new BufferedWriter(new SendBufferWriter()));
> ...
> } else if (len == telnetCloseCmd) {
> cleanupWriterSocket(null);
> return true;
> }
> {code}
> pwriter should be closed in case of telnetCloseCmd



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] Apache ZooKeeper release 3.4.12 candidate 1

2018-04-20 Thread Ted Yu
The original email said:

bq. vote by March 31st 2018,

IMHO Apr 30th is not far ahead :-)

If you think RC1 should receive more vote, please extend the voting
deadline.

On Wed, Apr 18, 2018 at 1:29 PM, Abraham Fine  wrote:

> I'm waiting for one more additional vote on the release. When that is done
> it will be available.
>
> On Wed, Apr 18, 2018, at 12:38, Ted Yu wrote:
> > I don't see 3.4.12 artifact under
> > https://mvnrepository.com/artifact/org.apache.zookeeper/zookeeper
> >
> > Abraham:
> > Can you clarify ?
> >
> > Thanks
> >
> > On Mon, Apr 16, 2018 at 9:35 AM, Ted Yu  wrote:
> >
> > > Hi,
> > > If I understand correctly, zookeeper users can expect maven artifacts
> for
> > > 3.4.12 to be posted soon.
> > >
> > > Thanks
> > >
>


RE: [SUGGESTION] Migrate project structure to Maven build

2018-04-19 Thread Ted Yu
 +1 for migrating to maven build.
 Original message From: Mohammad arshad 
 Date: 4/19/18  5:23 AM  (GMT-08:00) To: 
dev@zookeeper.apache.org Subject: RE: [SUGGESTION] Migrate project structure to 
Maven build 
Thanks Norbert for the good initiative. I am +1 on migrating to maven
I think it would be good to start with master branch. After changing and 
stabilizing it, we can backport changes to other branches.
May be we can create an umbrella JIRA and create independent tasks under it. 
There will be many things which can be handled independently

Thanks & Regards
Arshad
-Original Message-
From: Enrico Olivelli [mailto:eolive...@gmail.com] 
Sent: Thursday, April 19, 2018 8:03 PM
To: DevZooKeeper 
Subject: Re: [SUGGESTION] Migrate project structure to Maven build

Hi Norbert,
thank you for your suggestiion

there is a long standing patch for migration to Maven
https://issues.apache.org/jira/browse/ZOOKEEPER-1078

personally I am using that pom.xml in order to speed up work

I really would like this change, but we need support from some committer.
It is an important change and it cannot be done without full consensus in the 
community

Cheers
Enrico


2018-04-19 13:28 GMT+02:00 Norbert Kalmar :

> Hi ZooKeeper community,
>
> As the vast majority of the components in the Hadoop ecosystem is 
> built with Maven, what do you think of moving Zookeeper to a Maven 
> structure as well?
>
> This would bring the benefit of a more consistent project structure, 
> better dependency management and more possibilities for future changes 
> (i.e.: we could separate java client code so that projects like HDFS 
> that only needs the client doesn't have to import the whole ZooKeeper).
>
> This could be done as a multi-step change.
>
> The change would also include the separation of unit tests from 
> integration and/or functional tests.
>
> In the first iteration, the project structure could be separated 
> something
> like:
>
> zookeeper
> |-bin
> |-conf
> |-zk-client-c
> |-zk-contrib
> | |-zk-contrib-fatjar
> | |-zk-contrib-huebrowser
> | |-zk-contrib-loggraph
> | |-zk-contrib-monitoring
> | |-zk-contrib-rest
> | |-zk-contrib-zkfuse
> | |-zk-contrib-zkperl
> | |-zk-contrib-zkpython
> | |-zk-contrib-zktreeutil
> | \-zk-contrib-zooinspector
> |-zk-docs
> |-zk-it (integration tests)
> |-zk-server
> |-zk-recipes
> | |-zk-recipes-election
> | |-zk-recipes-lock
> \ \-zk-recipes-queue
>
>
> With this kind of structure, the code change could be kept to a bare 
> minimum, if any at all.
> Just change the ant script to conform to the new structure.
>
> In a second iteration, we could start the changes that require code 
> changes as well:
>
> zookeeper
> |-bin
> |-conf
> |-jute
> |-zk-client
> | |-zk-client-c
> | |-zk-client-java
> | \-zk-client-go (or any other language) -zk-common -zk-contrib
> | |-zk-contrib-fatjar
> | |-zk-contrib-huebrowser
> | |-zk-contrib-loggraph
> | |-zk-contrib-monitoring
> | |-zk-contrib-rest
> | |-zk-contrib-zkfuse
> | |-zk-contrib-zkperl
> | |-zk-contrib-zkpython
> | |-zk-contrib-zktreeutil
> | \-zk-contrib-zooinspector
> |-zk-docs
> |-zk-it (integration tests)
> |-zk-server
> |-zk-recipes
> | |-zk-recipes-election
> | |-zk-recipes-lock
> \ \-zk-recipes-queue
>
>
> Here, java client code is separated from the server code (and any 
> other supported languages client code).
>
> The final iteration would be something like:
>
> zk-something
> |-src
> | |-main
> | |  |-java
> | |  | \org...
> | |   \resources
> | \test (unit tests only?)
> |  |-java
> |  |  \org...
> |  \resources
> \pom.xml
>
>
> But this is just to give a high level example/vision.
>
> Of course, with all the iteration, even at the end when possibly 
> moving to a full Maven build, it is important that the final jar 
> structure remains the same.
>
> What do you think?
>
> Kind regards,
> Norbert
>


Re: [VOTE] Apache ZooKeeper release 3.4.12 candidate 1

2018-04-18 Thread Ted Yu
I don't see 3.4.12 artifact under
https://mvnrepository.com/artifact/org.apache.zookeeper/zookeeper

Abraham:
Can you clarify ?

Thanks

On Mon, Apr 16, 2018 at 9:35 AM, Ted Yu  wrote:

> Hi,
> If I understand correctly, zookeeper users can expect maven artifacts for
> 3.4.12 to be posted soon.
>
> Thanks
>


Re: [VOTE] Apache ZooKeeper release 3.4.12 candidate 1

2018-04-16 Thread Ted Yu
Hi,
If I understand correctly, zookeeper users can expect maven artifacts for
3.4.12 to be posted soon.

Thanks


Re: [VOTE] Apache ZooKeeper release 3.4.12 candidate 1

2018-04-11 Thread Ted Yu
Hi,
The PR for ZK-2959 already has +1.

Can the PR be merged ?

Thanks


Re: [VOTE] Apache ZooKeeper release 3.4.12 candidate 1

2018-04-05 Thread Ted Yu
Can the vote be closed ?

It seems we have enough +1's

Thanks


Re: [VOTE] Apache ZooKeeper release 3.4.12 candidate 1

2018-03-27 Thread Ted Yu
+1

checked signatures
unit tests passed on Linux


Re: [VOTE] Apache ZooKeeper release 3.4.12 candidate 0

2018-03-23 Thread Ted Yu
I just ran the test suite again - there was no test failure.

So +1 from my side.

On Fri, Mar 23, 2018 at 9:54 AM, Abraham Fine  wrote:

> Do they always fail when run with the rest of the test suite or is it
> inconsistent?
>
> The reason I ask is that the failure you are reporting is a ConnectionLoss
> and testSessionTimeout has a history of being flaky (generally on ZooKeeper
> 3.5 though).
>
>
> On Fri, Mar 23, 2018, at 09:45, Ted Yu wrote:
> > Here is OS:
> >
> > Linux h.com 3.10.0-327.28.3.el7.x86_64 #1 SMP Thu Aug 18 19:05:49 UTC
> 2016
> > x86_64 x86_64 x86_64 GNU/Linux
> >
> > java version "1.8.0_161"
> > Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
> > Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)
> >
> > The tests don't fail when run alone.
> >
> > FYI
> >
> > On Fri, Mar 23, 2018 at 9:41 AM, Abraham Fine  wrote:
> >
> > > Hi Ted-
> > >
> > > Thanks for running the test cases on the RC. I am not able to reproduce
> > > the failures. Would you mind telling us a little bit more about the
> > > environment you are running the tests in (operating system, jvm)? In
> > > addition, to the failures occur every time you run the tests or just
> > > occasionally?
> > >
> > > Thanks,
> > > Abe
> > >
> > > On Thu, Mar 22, 2018, at 17:16, Ted Yu wrote:
> > > > Hi,
> > > > I ran test suite for the RC.
> > > >
> > > > Testcase: testSessionTimeout took 22.686 sec
> > > >   Caused an ERROR
> > > > KeeperErrorCode = ConnectionLoss for /stest
> > > > org.apache.zookeeper.KeeperException$ConnectionLossException:
> > > > KeeperErrorCode = ConnectionLoss for /stest
> > > >   at org.apache.zookeeper.KeeperException.create(
> > > KeeperException.java:102)
> > > >   at org.apache.zookeeper.KeeperException.create(
> > > KeeperException.java:54)
> > > >   at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1105)
> > > >   at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1133)
> > > >   at
> > > > org.apache.zookeeper.test.SessionTest.testSessionTimeout(
> > > SessionTest.java:300)
> > > >
> > > > 
> > > >
> > > > Testcase: testWatcherAutoResetDisabledWithLocal took 8.545 sec
> > > >   Caused an ERROR
> > > > KeeperErrorCode = ConnectionLoss for /watchtest/child2
> > > > org.apache.zookeeper.KeeperException$ConnectionLossException:
> > > > KeeperErrorCode = ConnectionLoss for /watchtest/child2
> > > >   at
> > > > org.apache.zookeeper.KeeperException.create(
> KeeperException.java:102)
> > > >   at
> > > > org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
> > > >   at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:876)
> > > >   at
> > > > org.apache.zookeeper.test.WatcherTest.testWatcherAutoReset(
> > > WatcherTest.java:369)
> > > >   at
> > > > org.apache.zookeeper.test.WatcherTest.testWatcherAutoResetWithLocal(
> > > WatcherTest.java:255)
> > > >   at
> > > > org.apache.zookeeper.test.WatcherTest.testWatcherAutoResetDisabledWi
> > > thLocal(WatcherTest.java:268)
> > > >
> > > > Has anyone else seen the above test failures ?
> > > >
> > > > Cheers
> > >
>


Re: [VOTE] Apache ZooKeeper release 3.4.12 candidate 0

2018-03-23 Thread Ted Yu
Here is OS:

Linux h.com 3.10.0-327.28.3.el7.x86_64 #1 SMP Thu Aug 18 19:05:49 UTC 2016
x86_64 x86_64 x86_64 GNU/Linux

java version "1.8.0_161"
Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)

The tests don't fail when run alone.

FYI

On Fri, Mar 23, 2018 at 9:41 AM, Abraham Fine  wrote:

> Hi Ted-
>
> Thanks for running the test cases on the RC. I am not able to reproduce
> the failures. Would you mind telling us a little bit more about the
> environment you are running the tests in (operating system, jvm)? In
> addition, to the failures occur every time you run the tests or just
> occasionally?
>
> Thanks,
> Abe
>
> On Thu, Mar 22, 2018, at 17:16, Ted Yu wrote:
> > Hi,
> > I ran test suite for the RC.
> >
> > Testcase: testSessionTimeout took 22.686 sec
> >   Caused an ERROR
> > KeeperErrorCode = ConnectionLoss for /stest
> > org.apache.zookeeper.KeeperException$ConnectionLossException:
> > KeeperErrorCode = ConnectionLoss for /stest
> >   at org.apache.zookeeper.KeeperException.create(
> KeeperException.java:102)
> >   at org.apache.zookeeper.KeeperException.create(
> KeeperException.java:54)
> >   at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1105)
> >   at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1133)
> >   at
> > org.apache.zookeeper.test.SessionTest.testSessionTimeout(
> SessionTest.java:300)
> >
> > 
> >
> > Testcase: testWatcherAutoResetDisabledWithLocal took 8.545 sec
> >   Caused an ERROR
> > KeeperErrorCode = ConnectionLoss for /watchtest/child2
> > org.apache.zookeeper.KeeperException$ConnectionLossException:
> > KeeperErrorCode = ConnectionLoss for /watchtest/child2
> >   at
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
> >   at
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
> >   at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:876)
> >   at
> > org.apache.zookeeper.test.WatcherTest.testWatcherAutoReset(
> WatcherTest.java:369)
> >   at
> > org.apache.zookeeper.test.WatcherTest.testWatcherAutoResetWithLocal(
> WatcherTest.java:255)
> >   at
> > org.apache.zookeeper.test.WatcherTest.testWatcherAutoResetDisabledWi
> thLocal(WatcherTest.java:268)
> >
> > Has anyone else seen the above test failures ?
> >
> > Cheers
>


Re: [VOTE] Apache ZooKeeper release 3.4.12 candidate 0

2018-03-22 Thread Ted Yu
Hi,
I ran test suite for the RC.

Testcase: testSessionTimeout took 22.686 sec
  Caused an ERROR
KeeperErrorCode = ConnectionLoss for /stest
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /stest
  at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
  at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
  at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1105)
  at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1133)
  at
org.apache.zookeeper.test.SessionTest.testSessionTimeout(SessionTest.java:300)



Testcase: testWatcherAutoResetDisabledWithLocal took 8.545 sec
  Caused an ERROR
KeeperErrorCode = ConnectionLoss for /watchtest/child2
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /watchtest/child2
  at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
  at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
  at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:876)
  at
org.apache.zookeeper.test.WatcherTest.testWatcherAutoReset(WatcherTest.java:369)
  at
org.apache.zookeeper.test.WatcherTest.testWatcherAutoResetWithLocal(WatcherTest.java:255)
  at
org.apache.zookeeper.test.WatcherTest.testWatcherAutoResetDisabledWithLocal(WatcherTest.java:268)

Has anyone else seen the above test failures ?

Cheers


Re: [VOTE] Upgrade 3.5 and trunk to Java8

2018-03-22 Thread Ted Yu
+1


[jira] [Updated] (ZOOKEEPER-1859) pwriter should be closed in NIOServerCnxn#checkFourLetterWord()

2018-02-17 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1859:
--
Description: 
{code}
final PrintWriter pwriter = new PrintWriter(
new BufferedWriter(new SendBufferWriter()));
...
} else if (len == telnetCloseCmd) {
cleanupWriterSocket(null);
return true;
}
{code}

pwriter should be closed in case of telnetCloseCmd

  was:
{code}
final PrintWriter pwriter = new PrintWriter(
new BufferedWriter(new SendBufferWriter()));
...
} else if (len == telnetCloseCmd) {
cleanupWriterSocket(null);
return true;
}
{code}
pwriter should be closed in case of telnetCloseCmd


> pwriter should be closed in NIOServerCnxn#checkFourLetterWord()
> ---
>
> Key: ZOOKEEPER-1859
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1859
> Project: ZooKeeper
>  Issue Type: Bug
>    Reporter: Ted Yu
>Priority: Minor
>
> {code}
> final PrintWriter pwriter = new PrintWriter(
> new BufferedWriter(new SendBufferWriter()));
> ...
> } else if (len == telnetCloseCmd) {
> cleanupWriterSocket(null);
> return true;
> }
> {code}
> pwriter should be closed in case of telnetCloseCmd



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-1859) pwriter should be closed in NIOServerCnxn#checkFourLetterWord()

2018-02-12 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1859:
--
Description: 
{code}
final PrintWriter pwriter = new PrintWriter(
new BufferedWriter(new SendBufferWriter()));
...
} else if (len == telnetCloseCmd) {
cleanupWriterSocket(null);
return true;
}
{code}
pwriter should be closed in case of telnetCloseCmd

  was:
{code}
final PrintWriter pwriter = new PrintWriter(
new BufferedWriter(new SendBufferWriter()));
...
} else if (len == telnetCloseCmd) {
cleanupWriterSocket(null);
return true;
}
{code}

pwriter should be closed in case of telnetCloseCmd


> pwriter should be closed in NIOServerCnxn#checkFourLetterWord()
> ---
>
> Key: ZOOKEEPER-1859
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1859
> Project: ZooKeeper
>  Issue Type: Bug
>    Reporter: Ted Yu
>Priority: Minor
>
> {code}
> final PrintWriter pwriter = new PrintWriter(
> new BufferedWriter(new SendBufferWriter()));
> ...
> } else if (len == telnetCloseCmd) {
> cleanupWriterSocket(null);
> return true;
> }
> {code}
> pwriter should be closed in case of telnetCloseCmd



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-1859) pwriter should be closed in NIOServerCnxn#checkFourLetterWord()

2017-11-05 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1859:
--
Description: 
{code}
final PrintWriter pwriter = new PrintWriter(
new BufferedWriter(new SendBufferWriter()));
...
} else if (len == telnetCloseCmd) {
cleanupWriterSocket(null);
return true;
}
{code}

pwriter should be closed in case of telnetCloseCmd

  was:
{code}
final PrintWriter pwriter = new PrintWriter(
new BufferedWriter(new SendBufferWriter()));
...
} else if (len == telnetCloseCmd) {
cleanupWriterSocket(null);
return true;
}
{code}
pwriter should be closed in case of telnetCloseCmd


> pwriter should be closed in NIOServerCnxn#checkFourLetterWord()
> ---
>
> Key: ZOOKEEPER-1859
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1859
> Project: ZooKeeper
>  Issue Type: Bug
>    Reporter: Ted Yu
>Priority: Minor
>
> {code}
> final PrintWriter pwriter = new PrintWriter(
> new BufferedWriter(new SendBufferWriter()));
> ...
> } else if (len == telnetCloseCmd) {
> cleanupWriterSocket(null);
> return true;
> }
> {code}
> pwriter should be closed in case of telnetCloseCmd



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ZOOKEEPER-1859) pwriter should be closed in NIOServerCnxn#checkFourLetterWord()

2017-10-25 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1859:
--
Description: 
{code}
final PrintWriter pwriter = new PrintWriter(
new BufferedWriter(new SendBufferWriter()));
...
} else if (len == telnetCloseCmd) {
cleanupWriterSocket(null);
return true;
}
{code}
pwriter should be closed in case of telnetCloseCmd

  was:
{code}
final PrintWriter pwriter = new PrintWriter(
new BufferedWriter(new SendBufferWriter()));
...
} else if (len == telnetCloseCmd) {
cleanupWriterSocket(null);
return true;
}
{code}

pwriter should be closed in case of telnetCloseCmd


> pwriter should be closed in NIOServerCnxn#checkFourLetterWord()
> ---
>
> Key: ZOOKEEPER-1859
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1859
> Project: ZooKeeper
>  Issue Type: Bug
>    Reporter: Ted Yu
>Priority: Minor
>
> {code}
> final PrintWriter pwriter = new PrintWriter(
> new BufferedWriter(new SendBufferWriter()));
> ...
> } else if (len == telnetCloseCmd) {
> cleanupWriterSocket(null);
> return true;
> }
> {code}
> pwriter should be closed in case of telnetCloseCmd



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ZOOKEEPER-1859) pwriter should be closed in NIOServerCnxn#checkFourLetterWord()

2017-09-22 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1859:
--
Description: 
{code}
final PrintWriter pwriter = new PrintWriter(
new BufferedWriter(new SendBufferWriter()));
...
} else if (len == telnetCloseCmd) {
cleanupWriterSocket(null);
return true;
}
{code}

pwriter should be closed in case of telnetCloseCmd

  was:
{code}
final PrintWriter pwriter = new PrintWriter(
new BufferedWriter(new SendBufferWriter()));
...
} else if (len == telnetCloseCmd) {
cleanupWriterSocket(null);
return true;
}
{code}
pwriter should be closed in case of telnetCloseCmd


> pwriter should be closed in NIOServerCnxn#checkFourLetterWord()
> ---
>
> Key: ZOOKEEPER-1859
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1859
> Project: ZooKeeper
>  Issue Type: Bug
>    Reporter: Ted Yu
>Priority: Minor
>
> {code}
> final PrintWriter pwriter = new PrintWriter(
> new BufferedWriter(new SendBufferWriter()));
> ...
> } else if (len == telnetCloseCmd) {
> cleanupWriterSocket(null);
> return true;
> }
> {code}
> pwriter should be closed in case of telnetCloseCmd



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ZOOKEEPER-1859) pwriter should be closed in NIOServerCnxn#checkFourLetterWord()

2017-07-13 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1859:
--
Description: 
{code}
final PrintWriter pwriter = new PrintWriter(
new BufferedWriter(new SendBufferWriter()));
...
} else if (len == telnetCloseCmd) {
cleanupWriterSocket(null);
return true;
}
{code}
pwriter should be closed in case of telnetCloseCmd

  was:
{code}
final PrintWriter pwriter = new PrintWriter(
new BufferedWriter(new SendBufferWriter()));
...
} else if (len == telnetCloseCmd) {
cleanupWriterSocket(null);
return true;
}
{code}

pwriter should be closed in case of telnetCloseCmd


> pwriter should be closed in NIOServerCnxn#checkFourLetterWord()
> ---
>
> Key: ZOOKEEPER-1859
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1859
> Project: ZooKeeper
>  Issue Type: Bug
>    Reporter: Ted Yu
>Priority: Minor
>
> {code}
> final PrintWriter pwriter = new PrintWriter(
> new BufferedWriter(new SendBufferWriter()));
> ...
> } else if (len == telnetCloseCmd) {
> cleanupWriterSocket(null);
> return true;
> }
> {code}
> pwriter should be closed in case of telnetCloseCmd



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ZOOKEEPER-1859) pwriter should be closed in NIOServerCnxn#checkFourLetterWord()

2017-04-05 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1859:
--
Description: 
{code}
final PrintWriter pwriter = new PrintWriter(
new BufferedWriter(new SendBufferWriter()));
...
} else if (len == telnetCloseCmd) {
cleanupWriterSocket(null);
return true;
}
{code}

pwriter should be closed in case of telnetCloseCmd

  was:
{code}
final PrintWriter pwriter = new PrintWriter(
new BufferedWriter(new SendBufferWriter()));
...
} else if (len == telnetCloseCmd) {
cleanupWriterSocket(null);
return true;
}
{code}
pwriter should be closed in case of telnetCloseCmd


> pwriter should be closed in NIOServerCnxn#checkFourLetterWord()
> ---
>
> Key: ZOOKEEPER-1859
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1859
> Project: ZooKeeper
>  Issue Type: Bug
>    Reporter: Ted Yu
>Priority: Minor
>
> {code}
> final PrintWriter pwriter = new PrintWriter(
> new BufferedWriter(new SendBufferWriter()));
> ...
> } else if (len == telnetCloseCmd) {
> cleanupWriterSocket(null);
> return true;
> }
> {code}
> pwriter should be closed in case of telnetCloseCmd



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ZOOKEEPER-2664) ClientPortBindTest#testBindByAddress may fail due to "No such device" exception

2017-01-13 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822029#comment-15822029
 ] 

Ted Yu commented on ZOOKEEPER-2664:
---

[~praste]:
Looks like you mistakenly entered ZOOKEEPER-2664 which is not for log4j.

> ClientPortBindTest#testBindByAddress may fail due to "No such device" 
> exception
> ---
>
> Key: ZOOKEEPER-2664
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2664
> Project: ZooKeeper
>  Issue Type: Test
>Affects Versions: 3.4.6
>    Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: ZOOKEEPER-2664.v1.txt
>
>
> Saw the following in a recent run:
> {code}
> Stacktrace
> java.net.SocketException: No such device
>   at java.net.NetworkInterface.isLoopback0(Native Method)
>   at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390)
>   at 
> org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> Standard Output
> 2017-01-12 23:20:43,792 [myid:] - INFO  [main:ZKTestCase$1@50] - STARTING 
> testBindByAddress
> 2017-01-12 23:20:43,795 [myid:] - INFO  
> [main:JUnit4ZKTestRunner$LoggedInvokeMethod@50] - RUNNING TEST METHOD 
> testBindByAddress
> 2017-01-12 23:20:43,799 [myid:] - INFO  
> [main:JUnit4ZKTestRunner$LoggedInvokeMethod@62] - TEST METHOD FAILED 
> testBindByAddress
> java.net.SocketException: No such device
>   at java.net.NetworkInterface.isLoopback0(Native Method)
>   at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390)
>   at 
> org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:601)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
>   at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
>   at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:532)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1179)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1030)
> {code}
> Proposed fix is to catch exception from isLoopback() call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2664) ClientPortBindTest#testBindByAddress may fail due to "No such device" exception

2017-01-13 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822028#comment-15822028
 ] 

Ted Yu commented on ZOOKEEPER-2664:
---

https://github.com/apache/zookeeper/pull/149

> ClientPortBindTest#testBindByAddress may fail due to "No such device" 
> exception
> ---
>
> Key: ZOOKEEPER-2664
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2664
> Project: ZooKeeper
>  Issue Type: Test
>Affects Versions: 3.4.6
>    Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: ZOOKEEPER-2664.v1.txt
>
>
> Saw the following in a recent run:
> {code}
> Stacktrace
> java.net.SocketException: No such device
>   at java.net.NetworkInterface.isLoopback0(Native Method)
>   at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390)
>   at 
> org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> Standard Output
> 2017-01-12 23:20:43,792 [myid:] - INFO  [main:ZKTestCase$1@50] - STARTING 
> testBindByAddress
> 2017-01-12 23:20:43,795 [myid:] - INFO  
> [main:JUnit4ZKTestRunner$LoggedInvokeMethod@50] - RUNNING TEST METHOD 
> testBindByAddress
> 2017-01-12 23:20:43,799 [myid:] - INFO  
> [main:JUnit4ZKTestRunner$LoggedInvokeMethod@62] - TEST METHOD FAILED 
> testBindByAddress
> java.net.SocketException: No such device
>   at java.net.NetworkInterface.isLoopback0(Native Method)
>   at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390)
>   at 
> org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:601)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
>   at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
>   at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:532)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1179)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1030)
> {code}
> Proposed fix is to catch exception from isLoopback() call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (ZOOKEEPER-2664) ClientPortBindTest#testBindByAddress may fail due to "No such device" exception

2017-01-13 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-2664:
--
Comment: was deleted

(was: https://github.com/apache/zookeeper/pull/149)

> ClientPortBindTest#testBindByAddress may fail due to "No such device" 
> exception
> ---
>
> Key: ZOOKEEPER-2664
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2664
> Project: ZooKeeper
>  Issue Type: Test
>Affects Versions: 3.4.6
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: ZOOKEEPER-2664.v1.txt
>
>
> Saw the following in a recent run:
> {code}
> Stacktrace
> java.net.SocketException: No such device
>   at java.net.NetworkInterface.isLoopback0(Native Method)
>   at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390)
>   at 
> org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> Standard Output
> 2017-01-12 23:20:43,792 [myid:] - INFO  [main:ZKTestCase$1@50] - STARTING 
> testBindByAddress
> 2017-01-12 23:20:43,795 [myid:] - INFO  
> [main:JUnit4ZKTestRunner$LoggedInvokeMethod@50] - RUNNING TEST METHOD 
> testBindByAddress
> 2017-01-12 23:20:43,799 [myid:] - INFO  
> [main:JUnit4ZKTestRunner$LoggedInvokeMethod@62] - TEST METHOD FAILED 
> testBindByAddress
> java.net.SocketException: No such device
>   at java.net.NetworkInterface.isLoopback0(Native Method)
>   at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390)
>   at 
> org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:601)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
>   at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
>   at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:532)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1179)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1030)
> {code}
> Proposed fix is to catch exception from isLoopback() call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2664) ClientPortBindTest#testBindByAddress may fail due to "No such device" exception

2017-01-13 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15821881#comment-15821881
 ] 

Ted Yu commented on ZOOKEEPER-2664:
---

Since ZOOKEEPER-2395 didn't propose patch, I think we can proceed with patch 
review here.

> ClientPortBindTest#testBindByAddress may fail due to "No such device" 
> exception
> ---
>
> Key: ZOOKEEPER-2664
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2664
> Project: ZooKeeper
>  Issue Type: Test
>Affects Versions: 3.4.6
>            Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: ZOOKEEPER-2664.v1.txt
>
>
> Saw the following in a recent run:
> {code}
> Stacktrace
> java.net.SocketException: No such device
>   at java.net.NetworkInterface.isLoopback0(Native Method)
>   at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390)
>   at 
> org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> Standard Output
> 2017-01-12 23:20:43,792 [myid:] - INFO  [main:ZKTestCase$1@50] - STARTING 
> testBindByAddress
> 2017-01-12 23:20:43,795 [myid:] - INFO  
> [main:JUnit4ZKTestRunner$LoggedInvokeMethod@50] - RUNNING TEST METHOD 
> testBindByAddress
> 2017-01-12 23:20:43,799 [myid:] - INFO  
> [main:JUnit4ZKTestRunner$LoggedInvokeMethod@62] - TEST METHOD FAILED 
> testBindByAddress
> java.net.SocketException: No such device
>   at java.net.NetworkInterface.isLoopback0(Native Method)
>   at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390)
>   at 
> org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:601)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
>   at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
>   at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:532)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1179)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1030)
> {code}
> Proposed fix is to catch exception from isLoopback() call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (ZOOKEEPER-2664) ClientPortBindTest#testBindByAddress may fail due to "No such device" exception

2017-01-13 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reassigned ZOOKEEPER-2664:
-

Assignee: Ted Yu

> ClientPortBindTest#testBindByAddress may fail due to "No such device" 
> exception
> ---
>
> Key: ZOOKEEPER-2664
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2664
> Project: ZooKeeper
>  Issue Type: Test
>Affects Versions: 3.4.6
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: ZOOKEEPER-2664.v1.txt
>
>
> Saw the following in a recent run:
> {code}
> Stacktrace
> java.net.SocketException: No such device
>   at java.net.NetworkInterface.isLoopback0(Native Method)
>   at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390)
>   at 
> org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> Standard Output
> 2017-01-12 23:20:43,792 [myid:] - INFO  [main:ZKTestCase$1@50] - STARTING 
> testBindByAddress
> 2017-01-12 23:20:43,795 [myid:] - INFO  
> [main:JUnit4ZKTestRunner$LoggedInvokeMethod@50] - RUNNING TEST METHOD 
> testBindByAddress
> 2017-01-12 23:20:43,799 [myid:] - INFO  
> [main:JUnit4ZKTestRunner$LoggedInvokeMethod@62] - TEST METHOD FAILED 
> testBindByAddress
> java.net.SocketException: No such device
>   at java.net.NetworkInterface.isLoopback0(Native Method)
>   at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390)
>   at 
> org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:601)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
>   at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
>   at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:532)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1179)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1030)
> {code}
> Proposed fix is to catch exception from isLoopback() call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2664) ClientPortBindTest#testBindByAddress may fail due to "No such device" exception

2017-01-13 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-2664:
--
Attachment: ZOOKEEPER-2664.v1.txt

> ClientPortBindTest#testBindByAddress may fail due to "No such device" 
> exception
> ---
>
> Key: ZOOKEEPER-2664
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2664
> Project: ZooKeeper
>  Issue Type: Test
>Affects Versions: 3.4.6
>Reporter: Ted Yu
> Attachments: ZOOKEEPER-2664.v1.txt
>
>
> Saw the following in a recent run:
> {code}
> Stacktrace
> java.net.SocketException: No such device
>   at java.net.NetworkInterface.isLoopback0(Native Method)
>   at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390)
>   at 
> org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> Standard Output
> 2017-01-12 23:20:43,792 [myid:] - INFO  [main:ZKTestCase$1@50] - STARTING 
> testBindByAddress
> 2017-01-12 23:20:43,795 [myid:] - INFO  
> [main:JUnit4ZKTestRunner$LoggedInvokeMethod@50] - RUNNING TEST METHOD 
> testBindByAddress
> 2017-01-12 23:20:43,799 [myid:] - INFO  
> [main:JUnit4ZKTestRunner$LoggedInvokeMethod@62] - TEST METHOD FAILED 
> testBindByAddress
> java.net.SocketException: No such device
>   at java.net.NetworkInterface.isLoopback0(Native Method)
>   at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390)
>   at 
> org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:601)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
>   at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
>   at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:532)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1179)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1030)
> {code}
> Proposed fix is to catch exception from isLoopback() call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZOOKEEPER-2664) ClientPortBindTest#testBindByAddress may fail due to "No such device" exception

2017-01-13 Thread Ted Yu (JIRA)
Ted Yu created ZOOKEEPER-2664:
-

 Summary: ClientPortBindTest#testBindByAddress may fail due to "No 
such device" exception
 Key: ZOOKEEPER-2664
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2664
 Project: ZooKeeper
  Issue Type: Test
Affects Versions: 3.4.6
Reporter: Ted Yu


Saw the following in a recent run:
{code}
Stacktrace

java.net.SocketException: No such device
at java.net.NetworkInterface.isLoopback0(Native Method)
at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390)
at 
org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61)
at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
Standard Output

2017-01-12 23:20:43,792 [myid:] - INFO  [main:ZKTestCase$1@50] - STARTING 
testBindByAddress
2017-01-12 23:20:43,795 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@50] - RUNNING TEST METHOD 
testBindByAddress
2017-01-12 23:20:43,799 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@62] - TEST METHOD FAILED 
testBindByAddress
java.net.SocketException: No such device
at java.net.NetworkInterface.isLoopback0(Native Method)
at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390)
at 
org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:532)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1179)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1030)
{code}
Proposed fix is to catch exception from isLoopback() call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (ZOOKEEPER-2384) Support atomic increment / decrement of znode value

2016-11-11 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved ZOOKEEPER-2384.
---
Resolution: Later

> Support atomic increment / decrement of znode value
> ---
>
> Key: ZOOKEEPER-2384
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2384
> Project: ZooKeeper
>  Issue Type: Improvement
>    Reporter: Ted Yu
>  Labels: atomic
>
> Use case is to store reference count (integer type) in znode.
> It is desirable to provide support for atomic increment / decrement of the 
> znode value.
> Suggestion from Flavio:
> {quote}
> you can read the znode, keep the version of the znode, update the value, 
> write back conditionally. The condition for the setData operation to succeed 
> is that the version is the same that it read
> {quote}
> While the above is feasible, developer has to implement retry logic 
> him/herself. It is not easy to combine increment / decrement with other 
> operations using multi.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2384) Support atomic increment / decrement of znode value

2016-11-04 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15638611#comment-15638611
 ] 

Ted Yu commented on ZOOKEEPER-2384:
---

Thanks for the suggestion, Nick.

> Support atomic increment / decrement of znode value
> ---
>
> Key: ZOOKEEPER-2384
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2384
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Ted Yu
>  Labels: atomic
>
> Use case is to store reference count (integer type) in znode.
> It is desirable to provide support for atomic increment / decrement of the 
> znode value.
> Suggestion from Flavio:
> {quote}
> you can read the znode, keep the version of the znode, update the value, 
> write back conditionally. The condition for the setData operation to succeed 
> is that the version is the same that it read
> {quote}
> While the above is feasible, developer has to implement retry logic 
> him/herself. It is not easy to combine increment / decrement with other 
> operations using multi.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2080) ReconfigRecoveryTest fails intermittently

2016-10-27 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613237#comment-15613237
 ] 

Ted Yu commented on ZOOKEEPER-2080:
---

Thanks for the effort, Michael.

> ReconfigRecoveryTest fails intermittently
> -
>
> Key: ZOOKEEPER-2080
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2080
> Project: ZooKeeper
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Michael Han
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2080.patch, ZOOKEEPER-2080.patch, 
> ZOOKEEPER-2080.patch, ZOOKEEPER-2080.patch, ZOOKEEPER-2080.patch, 
> ZOOKEEPER-2080.patch, jacoco-ZOOKEEPER-2080.unzip-grows-to-70MB.7z, 
> repro-20150816.log, threaddump.log
>
>
> I got the following test failure on MacBook with trunk code:
> {code}
> Testcase: testCurrentObserverIsParticipantInNewConfig took 93.628 sec
>   FAILED
> waiting for server 2 being up
> junit.framework.AssertionFailedError: waiting for server 2 being up
>   at 
> org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentObserverIsParticipantInNewConfig(ReconfigRecoveryTest.java:529)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2606) SaslServerCallbackHandler#handleAuthorizeCallback() should log the exception

2016-10-16 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-2606:
--
Labels: security  (was: )

> SaslServerCallbackHandler#handleAuthorizeCallback() should log the exception
> 
>
> Key: ZOOKEEPER-2606
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2606
> Project: ZooKeeper
>  Issue Type: Bug
>    Reporter: Ted Yu
>    Assignee: Ted Yu
>Priority: Minor
>  Labels: security
> Attachments: ZOOKEEPER-2606.v1.patch
>
>
> {code}
> LOG.info("Setting authorizedID: " + userNameBuilder);
> ac.setAuthorizedID(userNameBuilder.toString());
> } catch (IOException e) {
> LOG.error("Failed to set name based on Kerberos authentication 
> rules.");
> }
> {code}
> On one cluster, we saw the following:
> {code}
> 2016-10-04 02:18:16,484 - ERROR 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@137] - 
> Failed to set name based on Kerberos authentication rules.
> {code}
> It would be helpful if the log contains information about the IOException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-10-05 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15550505#comment-15550505
 ] 

Ted Yu commented on ZOOKEEPER-1936:
---

Is there anything I can do to move this forward ?

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Ted Yu
>Priority: Minor
> Fix For: 3.4.10, 3.5.3
>
> Attachments: ZOOKEEPER-1936.branch-3.4.patch, ZOOKEEPER-1936.patch, 
> ZOOKEEPER-1936.v2.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, 
> ZOOKEEPER-1936.v4.patch, ZOOKEEPER-1936.v5.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2606) SaslServerCallbackHandler#handleAuthorizeCallback() should log the exception

2016-10-04 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-2606:
--
Attachment: ZOOKEEPER-2606.v1.patch

> SaslServerCallbackHandler#handleAuthorizeCallback() should log the exception
> 
>
> Key: ZOOKEEPER-2606
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2606
> Project: ZooKeeper
>  Issue Type: Bug
>    Reporter: Ted Yu
>    Assignee: Ted Yu
>Priority: Minor
> Attachments: ZOOKEEPER-2606.v1.patch
>
>
> {code}
> LOG.info("Setting authorizedID: " + userNameBuilder);
> ac.setAuthorizedID(userNameBuilder.toString());
> } catch (IOException e) {
> LOG.error("Failed to set name based on Kerberos authentication 
> rules.");
> }
> {code}
> On one cluster, we saw the following:
> {code}
> 2016-10-04 02:18:16,484 - ERROR 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@137] - 
> Failed to set name based on Kerberos authentication rules.
> {code}
> It would be helpful if the log contains information about the IOException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2606) SaslServerCallbackHandler#handleAuthorizeCallback() should log the exception

2016-10-04 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-2606:
--
Priority: Minor  (was: Major)

> SaslServerCallbackHandler#handleAuthorizeCallback() should log the exception
> 
>
> Key: ZOOKEEPER-2606
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2606
> Project: ZooKeeper
>  Issue Type: Bug
>    Reporter: Ted Yu
>    Assignee: Ted Yu
>Priority: Minor
>
> {code}
> LOG.info("Setting authorizedID: " + userNameBuilder);
> ac.setAuthorizedID(userNameBuilder.toString());
> } catch (IOException e) {
> LOG.error("Failed to set name based on Kerberos authentication 
> rules.");
> }
> {code}
> On one cluster, we saw the following:
> {code}
> 2016-10-04 02:18:16,484 - ERROR 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@137] - 
> Failed to set name based on Kerberos authentication rules.
> {code}
> It would be helpful if the log contains information about the IOException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZOOKEEPER-2606) SaslServerCallbackHandler#handleAuthorizeCallback() should log the exception

2016-10-04 Thread Ted Yu (JIRA)
Ted Yu created ZOOKEEPER-2606:
-

 Summary: SaslServerCallbackHandler#handleAuthorizeCallback() 
should log the exception
 Key: ZOOKEEPER-2606
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2606
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu


{code}
LOG.info("Setting authorizedID: " + userNameBuilder);
ac.setAuthorizedID(userNameBuilder.toString());
} catch (IOException e) {
LOG.error("Failed to set name based on Kerberos authentication 
rules.");
}
{code}
On one cluster, we saw the following:
{code}
2016-10-04 02:18:16,484 - ERROR 
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@137] - 
Failed to set name based on Kerberos authentication rules.
{code}
It would be helpful if the log contains information about the IOException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-09-30 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1936:
--
Attachment: ZOOKEEPER-1936.v5.patch

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Ted Yu
>Priority: Minor
> Fix For: 3.4.10, 3.5.3
>
> Attachments: ZOOKEEPER-1936.branch-3.4.patch, ZOOKEEPER-1936.patch, 
> ZOOKEEPER-1936.v2.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, 
> ZOOKEEPER-1936.v4.patch, ZOOKEEPER-1936.v5.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-09-30 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1936:
--
Attachment: ZOOKEEPER-1936.branch-3.4.patch

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Ted Yu
>Priority: Minor
> Fix For: 3.4.10, 3.5.3
>
> Attachments: ZOOKEEPER-1936.branch-3.4.patch, ZOOKEEPER-1936.patch, 
> ZOOKEEPER-1936.v2.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, 
> ZOOKEEPER-1936.v4.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2384) Support atomic increment / decrement of znode value

2016-05-23 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-2384:
--
Description: 
Use case is to store reference count (integer type) in znode.

It is desirable to provide support for atomic increment / decrement of the 
znode value.

Suggestion from Flavio:
{quote}
you can read the znode, keep the version of the znode, update the value, write 
back conditionally. The condition for the setData operation to succeed is that 
the version is the same that it read
{quote}
While the above is feasible, developer has to implement retry logic 
him/herself. It is not easy to combine increment / decrement with other 
operations using multi.

  was:
Use case is to store reference count (integer type) in znode.

It is desirable to provide support for atomic increment / decrement of the 
znode value.

Suggestion from Flavio:

you can read the znode, keep the version of the znode, update the value, write 
back conditionally. The condition for the setData operation to succeed is that 
the version is the same that it read

While the above is feasible, developer has to implement retry logic 
him/herself. It is not easy to combine increment / decrement with other 
operations using multi.


> Support atomic increment / decrement of znode value
> ---
>
> Key: ZOOKEEPER-2384
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2384
> Project: ZooKeeper
>  Issue Type: Improvement
>    Reporter: Ted Yu
>  Labels: atomic
>
> Use case is to store reference count (integer type) in znode.
> It is desirable to provide support for atomic increment / decrement of the 
> znode value.
> Suggestion from Flavio:
> {quote}
> you can read the znode, keep the version of the znode, update the value, 
> write back conditionally. The condition for the setData operation to succeed 
> is that the version is the same that it read
> {quote}
> While the above is feasible, developer has to implement retry logic 
> him/herself. It is not easy to combine increment / decrement with other 
> operations using multi.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2384) Support atomic increment / decrement of znode value

2016-03-18 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-2384:
--
Labels: atomic  (was: )

> Support atomic increment / decrement of znode value
> ---
>
> Key: ZOOKEEPER-2384
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2384
> Project: ZooKeeper
>  Issue Type: Improvement
>    Reporter: Ted Yu
>  Labels: atomic
>
> Use case is to store reference count (integer type) in znode.
> It is desirable to provide support for atomic increment / decrement of the 
> znode value.
> Suggestion from Flavio:
> you can read the znode, keep the version of the znode, update the value, 
> write back conditionally. The condition for the setData operation to succeed 
> is that the version is the same that it read
> While the above is feasible, developer has to implement retry logic 
> him/herself. It is not easy to combine increment / decrement with other 
> operations using multi.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZOOKEEPER-2384) Support atomic increment / decrement of znode value

2016-03-09 Thread Ted Yu (JIRA)
Ted Yu created ZOOKEEPER-2384:
-

 Summary: Support atomic increment / decrement of znode value
 Key: ZOOKEEPER-2384
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2384
 Project: ZooKeeper
  Issue Type: Improvement
Reporter: Ted Yu


Use case is to store reference count (integer type) in znode.

It is desirable to provide support for atomic increment / decrement of the 
znode value.

Suggestion from Flavio:

you can read the znode, keep the version of the znode, update the value, write 
back conditionally. The condition for the setData operation to succeed is that 
the version is the same that it read

While the above is feasible, developer has to implement retry logic 
him/herself. It is not easy to combine increment / decrement with other 
operations using multi.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: 3.4.8 release schedule

2016-02-04 Thread Ted Yu
This is good news.

Non-blocking issues can be addressed in 3.4.9

On Thu, Feb 4, 2016 at 2:41 PM, Raúl Gutiérrez Segalés 
wrote:

> Hi all,
>
> I'll be doing the release management for 3.4.8. Initially, we wanted to
> just have the fix for the shutdown synchronization issue that affected
> 3.4.7. But then a few more - potentially important - issues came up.
>
> To avoid blocking people who were already waiting on the fixes in 3.4.7,
> I'll cut an RC for 3.4.8 tonight and we'll move on from there.
>
> Hopefully, we can have 3.4.9 with the issues currently in-flight not much
> later this year.
>
> Sounds reasonable?
>
> -rgs
>


[jira] [Updated] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-01-29 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1936:
--
Attachment: (was: ZOOKEEPER-1936.v4.patch)

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, 
> ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v4.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-01-29 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1936:
--
Attachment: ZOOKEEPER-1936.v4.patch

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, 
> ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v4.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-01-29 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1936:
--
Attachment: ZOOKEEPER-1936.v4.patch

Patch v4 addresses Chris' comment above

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, 
> ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v4.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-01-29 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15123934#comment-15123934
 ] 

Ted Yu commented on ZOOKEEPER-1936:
---

Haven't got a chance to reproduce the bug.

After some QE fix, hbase un-secure deployment works reliably.

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, 
> ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: 3.4.8 release

2016-01-29 Thread Ted Yu
ZOOKEEPER-2355 is in Open state as of now.

My two cents:

3.4.7 has been retracted.
It would be nice to get 3.4.8 out the door soon so that zookeeper users can
pick up bug fixes in between 3.4.6 and 3.4 branch.

On Thu, Jan 28, 2016 at 8:57 AM, Raúl Gutiérrez Segalés  wrote:

> Hi,
>
> On 28 January 2016 at 07:07, Talluri, Chandra <
> chandra.tall...@fmr.com.invalid> wrote:
>
> > Thanks for the updates.
> >
> > When can we expect 3.4.8?
> >
>
> I think we need to decide if we want to include these patches:
>
> https://issues.apache.org/jira/browse/ZOOKEEPER-2355
> https://issues.apache.org/jira/browse/ZOOKEEPER-2247
>
> They seem to be almost ready, though there might be some subtleties left to
> be addressed.
>
>
> > What are the plans for 3.5.*  stable version release?
> >
>
> The general plan for making 3.5 stable is here:
>
> http://markmail.org/message/ymxliy2rrwjc2pmo
>
> I believe Chris Nauroth will be the RM for the upcoming 3.5.2 release.
>
>
> -rgs
>


[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-01-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15113278#comment-15113278
 ] 

Ted Yu commented on ZOOKEEPER-1936:
---

Any thing I need to do here ? [~rgs]

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, 
> ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-01-20 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15108940#comment-15108940
 ] 

Ted Yu commented on ZOOKEEPER-1936:
---

https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3010//testReport/org.apache.zookeeper.test/AsyncHammerTest/testHammer/

doesn't seem to be related to the patch.

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, 
> ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-01-19 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15108068#comment-15108068
 ] 

Ted Yu commented on ZOOKEEPER-1936:
---

Previous patch was generated for branch-3.4

Attached patch for trunk.

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, 
> ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-01-19 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1936:
--
Attachment: ZOOKEEPER-1936.v3.patch

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, 
> ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-01-15 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102276#comment-15102276
 ] 

Ted Yu commented on ZOOKEEPER-1936:
---

Patch v3 addresses comments from Chris and Rakesh.

The same patch can be applied smoothly on branch-3.4

Let me know if separate patch for branch-3.4 should be attached.

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, 
> ZOOKEEPER-1936.v3.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-01-15 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1936:
--
Attachment: ZOOKEEPER-1936.v3.patch

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, 
> ZOOKEEPER-1936.v3.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-01-15 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102058#comment-15102058
 ] 

Ted Yu commented on ZOOKEEPER-1936:
---

After logging onto the node where the error happened, we found that dataDir 
didn't exist. 
So my patch doesn't suffice. 

Manual start on that node didn't reproduce the error though.

Comment is welcome.

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-01-14 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1936:
--
Attachment: ZOOKEEPER-1936.v2.patch

Alternate patch for consideration.

Only throw exception if dataDir doesn't exist and mkdirs() call fails.

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2347) Deadlock shutting down zookeeper

2016-01-12 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15095073#comment-15095073
 ] 

Ted Yu commented on ZOOKEEPER-2347:
---

Assuming there was only test change since I performed validation last year, 
this should be good to go.

> Deadlock shutting down zookeeper
> 
>
> Key: ZOOKEEPER-2347
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2347
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.7
>Reporter: Ted Yu
>Assignee: Rakesh R
>Priority: Blocker
> Fix For: 3.4.8
>
> Attachments: ZOOKEEPER-2347-br-3.4.patch, 
> ZOOKEEPER-2347-br-3.4.patch, ZOOKEEPER-2347-br-3.4.patch, 
> ZOOKEEPER-2347-br-3.4.patch, testSplitLogManager.stack
>
>
> HBase recently upgraded to zookeeper 3.4.7
> In one of the tests, TestSplitLogManager, there is reproducible hang at the 
> end of the test.
> Below is snippet from stack trace related to zookeeper:
> {code}
> "main-EventThread" daemon prio=5 tid=0x7fd27488a800 nid=0x6f1f waiting on 
> condition [0x00011834b000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007c5b8d3a0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main-SendThread(localhost:59510)" daemon prio=5 tid=0x7fd274eb4000 
> nid=0x9513 waiting on condition [0x000118042000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101)
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
> "SyncThread:0" prio=5 tid=0x7fd274d02000 nid=0x730f waiting for monitor 
> entry [0x0001170ac000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.decInProcess(ZooKeeperServer.java:512)
>   - waiting to lock <0x0007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:144)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131)
> "main-EventThread" daemon prio=5 tid=0x7fd2753a3800 nid=0x711b waiting on 
> condition [0x000117a3]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007c9b106b8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main" prio=5 tid=0x7fd27600 nid=0x1903 in Object.wait() 
> [0x000108aa1000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0x0007c5b66400> (a 
> org.apache.zookeeper.server.SyncRequestProcessor)
>   at java.lang.Thread.join(Thread.java:1281)
>   - locked <0x0007c5b66400> (a 
> org.apache.zookeeper.server.SyncRequestProcessor)
>   at java.lang.Thread.join(Thread.java:1355)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:213)
>   at 
> org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:770)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:478)
>   - locked <0x0007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:266)
>   at 
> org.apache.hadoop.hbase.zookeeper

[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-01-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15089636#comment-15089636
 ] 

Ted Yu commented on ZOOKEEPER-1936:
---

[~fpj]:
Can you take a look ?

Thanks

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: ZOOKEEPER-1936.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-01-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15089595#comment-15089595
 ] 

Ted Yu commented on ZOOKEEPER-1936:
---

We encountered this issue during testing, though intermittently.

Can the fix be committed ?
[~shralex] [~phunt]

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: ZOOKEEPER-1936.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2347) Deadlock shutting down zookeeper

2016-01-04 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15081602#comment-15081602
 ] 

Ted Yu commented on ZOOKEEPER-2347:
---

[~fpj]:
Can you review the patch ?

> Deadlock shutting down zookeeper
> 
>
> Key: ZOOKEEPER-2347
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2347
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.7
>Reporter: Ted Yu
>Assignee: Rakesh R
>Priority: Blocker
> Fix For: 3.4.8
>
> Attachments: ZOOKEEPER-2347-br-3.4.patch, 
> ZOOKEEPER-2347-br-3.4.patch, testSplitLogManager.stack
>
>
> HBase recently upgraded to zookeeper 3.4.7
> In one of the tests, TestSplitLogManager, there is reproducible hang at the 
> end of the test.
> Below is snippet from stack trace related to zookeeper:
> {code}
> "main-EventThread" daemon prio=5 tid=0x7fd27488a800 nid=0x6f1f waiting on 
> condition [0x00011834b000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007c5b8d3a0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main-SendThread(localhost:59510)" daemon prio=5 tid=0x7fd274eb4000 
> nid=0x9513 waiting on condition [0x000118042000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101)
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
> "SyncThread:0" prio=5 tid=0x7fd274d02000 nid=0x730f waiting for monitor 
> entry [0x0001170ac000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.decInProcess(ZooKeeperServer.java:512)
>   - waiting to lock <0x0007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:144)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131)
> "main-EventThread" daemon prio=5 tid=0x7fd2753a3800 nid=0x711b waiting on 
> condition [0x000117a3]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007c9b106b8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main" prio=5 tid=0x7fd27600 nid=0x1903 in Object.wait() 
> [0x000108aa1000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0x0007c5b66400> (a 
> org.apache.zookeeper.server.SyncRequestProcessor)
>   at java.lang.Thread.join(Thread.java:1281)
>   - locked <0x0007c5b66400> (a 
> org.apache.zookeeper.server.SyncRequestProcessor)
>   at java.lang.Thread.join(Thread.java:1355)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:213)
>   at 
> org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:770)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:478)
>   - locked <0x0007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:266)
>   at 
> org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.shutdown(MiniZooKeeperCluster.java:301)
> {code}
> Note the address (0x0007c5b66400) in the last hunk which seems to 
> indicate some form of deadlock.
> According to Camille Fournier:
> We made shutdown synchronized. But decrementing the requests is
> also synchronized and called from a different thread. So yeah, deadlock.
> This came in with ZOOKEEPER-1907



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2347) Deadlock shutting down zookeeper

2015-12-30 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15075556#comment-15075556
 ] 

Ted Yu commented on ZOOKEEPER-2347:
---

Rakesh:
Thanks for updating the test case.

> Deadlock shutting down zookeeper
> 
>
> Key: ZOOKEEPER-2347
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2347
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.7
>Reporter: Ted Yu
>Assignee: Rakesh R
>Priority: Blocker
> Fix For: 3.4.8
>
> Attachments: ZOOKEEPER-2347-br-3.4.patch, 
> ZOOKEEPER-2347-br-3.4.patch, testSplitLogManager.stack
>
>
> HBase recently upgraded to zookeeper 3.4.7
> In one of the tests, TestSplitLogManager, there is reproducible hang at the 
> end of the test.
> Below is snippet from stack trace related to zookeeper:
> {code}
> "main-EventThread" daemon prio=5 tid=0x7fd27488a800 nid=0x6f1f waiting on 
> condition [0x00011834b000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007c5b8d3a0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main-SendThread(localhost:59510)" daemon prio=5 tid=0x7fd274eb4000 
> nid=0x9513 waiting on condition [0x000118042000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101)
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
> "SyncThread:0" prio=5 tid=0x7fd274d02000 nid=0x730f waiting for monitor 
> entry [0x0001170ac000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.decInProcess(ZooKeeperServer.java:512)
>   - waiting to lock <0x0007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:144)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131)
> "main-EventThread" daemon prio=5 tid=0x7fd2753a3800 nid=0x711b waiting on 
> condition [0x000117a3]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007c9b106b8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main" prio=5 tid=0x7fd27600 nid=0x1903 in Object.wait() 
> [0x000108aa1000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0x0007c5b66400> (a 
> org.apache.zookeeper.server.SyncRequestProcessor)
>   at java.lang.Thread.join(Thread.java:1281)
>   - locked <0x0007c5b66400> (a 
> org.apache.zookeeper.server.SyncRequestProcessor)
>   at java.lang.Thread.join(Thread.java:1355)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:213)
>   at 
> org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:770)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:478)
>   - locked <0x0007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:266)
>   at 
> org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.shutdown(MiniZooKeeperCluster.java:301)
> {code}
> Note the address (0x0007c5b66400) in the last hunk which seems to 
> indicate some form of deadlock.
> According to Camille Fournier:
> We made shutdown synchronized. But decrementing the requests is
> also synchronized and called from a different thread. So yeah, deadlock.
> This came in with ZOOKEEPER-1907



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: ZooKeeperServer#shutdown hangs

2015-12-17 Thread Ted Yu
Jason:
See the following test which revealed the deadlock scenario:

https://github.com/apache/hbase/blob/master/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java

On Jenkins, hbase build has been flaky where sometimes the above test hung
but sometimes it passed.

I tend to think that this bug should be fixed for production system.

Cheers

On Thu, Dec 17, 2015 at 3:33 PM, Jason Rosenberg  wrote:

> Curious if there are specific scenarios which trigger this issue.  So far
> we have not seen it where we've upgraded.  We have many tests in continuous
> integration that embed zookeeper servers, and so far haven't seen any
> issues.
>
> Jason
>
> On Wed, Dec 16, 2015 at 6:01 PM, Ted Yu  wrote:
>
> > Thanks, Flavio.
> >
> > When 3.4.8 RC comes out, I will give it a spin.
> >
> > Cheers
> >
> > On Wed, Dec 16, 2015 at 2:59 PM, Flavio Junqueira 
> wrote:
> >
> > > This is bad, we should fix it and release 3.4.8 soon. With the holidays
> > > and such, we won't be able to produce an RC and vote, so I suggest we
> > > target early Jan. In the meanwhile, I'd suggest users to not move to
> > 3.4.7.
> > >
> > > I've reopened ZK-1907 and suggested a fix to this problem.
> > >
> > > -Flavio
> > >
> > >
> > > > On 16 Dec 2015, at 21:01, Ted Yu  wrote:
> > > >
> > > > Logged ZOOKEEPER-2347
> > > >
> > > > Thanks
> > > >
> > > > On Wed, Dec 16, 2015 at 12:36 PM, Camille Fournier <
> cami...@apache.org
> > >
> > > > wrote:
> > > >
> > > >> Blergh. We made shutdown synchronized. But decrementing the requests
> > is
> > > >> also synchronized and called from a different thread. So yeah,
> > deadlock.
> > > >>
> > > >> Can you open a ticket for this? This came in with ZOOKEEPER-1907
> > > >>
> > > >> C
> > > >>
> > > >> On Wed, Dec 16, 2015 at 2:46 PM, Ted Yu 
> wrote:
> > > >>
> > > >>> Hi,
> > > >>> HBase recently upgraded to zookeeper 3.4.7
> > > >>>
> > > >>> In one of the tests, TestSplitLogManager, there is reproducible
> hang
> > at
> > > >> the
> > > >>> end of the test.
> > > >>> Below is snippet from stack trace related to zookeeper:
> > > >>>
> > > >>> "main-EventThread" daemon prio=5 tid=0x7fd27488a800 nid=0x6f1f
> > > >> waiting
> > > >>> on condition [0x00011834b000]
> > > >>>   java.lang.Thread.State: WAITING (parking)
> > > >>>  at sun.misc.Unsafe.park(Native Method)
> > > >>>  - parking to wait for  <0x0007c5b8d3a0> (a
> > > >>>
> > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> > > >>>  at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> > > >>>  at
> > > >>>
> > > >>>
> > > >>
> > >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
> > > >>>  at
> > > >>>
> > > >>
> > >
> >
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> > > >>>  at
> > > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> > > >>>
> > > >>> "main-SendThread(localhost:59510)" daemon prio=5
> > tid=0x7fd274eb4000
> > > >>> nid=0x9513 waiting on condition [0x000118042000]
> > > >>>   java.lang.Thread.State: TIMED_WAITING (sleeping)
> > > >>>  at java.lang.Thread.sleep(Native Method)
> > > >>>  at
> > > >>>
> > > >>>
> > > >>
> > >
> >
> org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101)
> > > >>>  at
> > > >>>
> > > >>>
> > > >>
> > >
> >
> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997)
> > > >>>  at
> > > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
> > > >>>
> > > >>> "SyncThread:0" prio=5 tid=0x7fd274d

[jira] [Commented] (ZOOKEEPER-2347) Deadlock shutting down zookeeper

2015-12-17 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063086#comment-15063086
 ] 

Ted Yu commented on ZOOKEEPER-2347:
---

Thanks for the pointer, Chris.

I ran TestSplitLogManager after modifying pom.xml twice which passed. 
Previously the test hung quite reliably on Mac.



> Deadlock shutting down zookeeper
> 
>
> Key: ZOOKEEPER-2347
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2347
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.7
>Reporter: Ted Yu
>Assignee: Rakesh R
>Priority: Blocker
> Fix For: 3.4.8
>
> Attachments: ZOOKEEPER-2347-br-3.4.patch, testSplitLogManager.stack
>
>
> HBase recently upgraded to zookeeper 3.4.7
> In one of the tests, TestSplitLogManager, there is reproducible hang at the 
> end of the test.
> Below is snippet from stack trace related to zookeeper:
> {code}
> "main-EventThread" daemon prio=5 tid=0x7fd27488a800 nid=0x6f1f waiting on 
> condition [0x00011834b000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007c5b8d3a0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main-SendThread(localhost:59510)" daemon prio=5 tid=0x7fd274eb4000 
> nid=0x9513 waiting on condition [0x000118042000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101)
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
> "SyncThread:0" prio=5 tid=0x7fd274d02000 nid=0x730f waiting for monitor 
> entry [0x0001170ac000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.decInProcess(ZooKeeperServer.java:512)
>   - waiting to lock <0x0007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:144)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131)
> "main-EventThread" daemon prio=5 tid=0x7fd2753a3800 nid=0x711b waiting on 
> condition [0x000117a3]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007c9b106b8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main" prio=5 tid=0x7fd27600 nid=0x1903 in Object.wait() 
> [0x000108aa1000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0x0007c5b66400> (a 
> org.apache.zookeeper.server.SyncRequestProcessor)
>   at java.lang.Thread.join(Thread.java:1281)
>   - locked <0x0007c5b66400> (a 
> org.apache.zookeeper.server.SyncRequestProcessor)
>   at java.lang.Thread.join(Thread.java:1355)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:213)
>   at 
> org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:770)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:478)
>   - locked <0x0007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:266)
>   at 
> org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.shutdown(MiniZooKeeperCluster.

[jira] [Commented] (ZOOKEEPER-2347) Deadlock shutting down zookeeper

2015-12-17 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062994#comment-15062994
 ] 

Ted Yu commented on ZOOKEEPER-2347:
---

Not sure how I can test this with hbase unit test(s).

As far as I know, zookeeper still uses ant to build while hbase dependency is 
expressed through maven.

> Deadlock shutting down zookeeper
> 
>
> Key: ZOOKEEPER-2347
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2347
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.7
>Reporter: Ted Yu
>Assignee: Rakesh R
>Priority: Blocker
> Fix For: 3.4.8
>
> Attachments: ZOOKEEPER-2347-br-3.4.patch, testSplitLogManager.stack
>
>
> HBase recently upgraded to zookeeper 3.4.7
> In one of the tests, TestSplitLogManager, there is reproducible hang at the 
> end of the test.
> Below is snippet from stack trace related to zookeeper:
> {code}
> "main-EventThread" daemon prio=5 tid=0x7fd27488a800 nid=0x6f1f waiting on 
> condition [0x00011834b000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007c5b8d3a0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main-SendThread(localhost:59510)" daemon prio=5 tid=0x7fd274eb4000 
> nid=0x9513 waiting on condition [0x000118042000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101)
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
> "SyncThread:0" prio=5 tid=0x7fd274d02000 nid=0x730f waiting for monitor 
> entry [0x0001170ac000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.decInProcess(ZooKeeperServer.java:512)
>   - waiting to lock <0x0007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:144)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131)
> "main-EventThread" daemon prio=5 tid=0x7fd2753a3800 nid=0x711b waiting on 
> condition [0x000117a3]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007c9b106b8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main" prio=5 tid=0x7fd27600 nid=0x1903 in Object.wait() 
> [0x000108aa1000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0x0007c5b66400> (a 
> org.apache.zookeeper.server.SyncRequestProcessor)
>   at java.lang.Thread.join(Thread.java:1281)
>   - locked <0x0007c5b66400> (a 
> org.apache.zookeeper.server.SyncRequestProcessor)
>   at java.lang.Thread.join(Thread.java:1355)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:213)
>   at 
> org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:770)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:478)
>   - locked <0x0007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:266)
>   at 
> org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.shutdown(MiniZooKe

Re: ZooKeeperServer#shutdown hangs

2015-12-16 Thread Ted Yu
Thanks, Flavio.

When 3.4.8 RC comes out, I will give it a spin.

Cheers

On Wed, Dec 16, 2015 at 2:59 PM, Flavio Junqueira  wrote:

> This is bad, we should fix it and release 3.4.8 soon. With the holidays
> and such, we won't be able to produce an RC and vote, so I suggest we
> target early Jan. In the meanwhile, I'd suggest users to not move to 3.4.7.
>
> I've reopened ZK-1907 and suggested a fix to this problem.
>
> -Flavio
>
>
> > On 16 Dec 2015, at 21:01, Ted Yu  wrote:
> >
> > Logged ZOOKEEPER-2347
> >
> > Thanks
> >
> > On Wed, Dec 16, 2015 at 12:36 PM, Camille Fournier 
> > wrote:
> >
> >> Blergh. We made shutdown synchronized. But decrementing the requests is
> >> also synchronized and called from a different thread. So yeah, deadlock.
> >>
> >> Can you open a ticket for this? This came in with ZOOKEEPER-1907
> >>
> >> C
> >>
> >> On Wed, Dec 16, 2015 at 2:46 PM, Ted Yu  wrote:
> >>
> >>> Hi,
> >>> HBase recently upgraded to zookeeper 3.4.7
> >>>
> >>> In one of the tests, TestSplitLogManager, there is reproducible hang at
> >> the
> >>> end of the test.
> >>> Below is snippet from stack trace related to zookeeper:
> >>>
> >>> "main-EventThread" daemon prio=5 tid=0x7fd27488a800 nid=0x6f1f
> >> waiting
> >>> on condition [0x00011834b000]
> >>>   java.lang.Thread.State: WAITING (parking)
> >>>  at sun.misc.Unsafe.park(Native Method)
> >>>  - parking to wait for  <0x0007c5b8d3a0> (a
> >>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> >>>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> >>>  at
> >>>
> >>>
> >>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
> >>>  at
> >>>
> >>
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> >>>  at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> >>>
> >>> "main-SendThread(localhost:59510)" daemon prio=5 tid=0x7fd274eb4000
> >>> nid=0x9513 waiting on condition [0x000118042000]
> >>>   java.lang.Thread.State: TIMED_WAITING (sleeping)
> >>>  at java.lang.Thread.sleep(Native Method)
> >>>  at
> >>>
> >>>
> >>
> org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101)
> >>>  at
> >>>
> >>>
> >>
> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997)
> >>>  at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
> >>>
> >>> "SyncThread:0" prio=5 tid=0x7fd274d02000 nid=0x730f waiting for
> >> monitor
> >>> entry [0x0001170ac000]
> >>>   java.lang.Thread.State: BLOCKED (on object monitor)
> >>>  at
> >>>
> >>>
> >>
> org.apache.zookeeper.server.ZooKeeperServer.decInProcess(ZooKeeperServer.java:512)
> >>>  - waiting to lock <0x0007c5b62128> (a
> >>> org.apache.zookeeper.server.ZooKeeperServer)
> >>>  at
> >>>
> >>>
> >>
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:144)
> >>>  at
> >>>
> >>>
> >>
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200)
> >>>  at
> >>>
> >>>
> >>
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131)
> >>>
> >>> "main-EventThread" daemon prio=5 tid=0x7fd2753a3800 nid=0x711b
> >> waiting
> >>> on condition [0x000117a3]
> >>>   java.lang.Thread.State: WAITING (parking)
> >>>  at sun.misc.Unsafe.park(Native Method)
> >>>  - parking to wait for  <0x0007c9b106b8> (a
> >>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> >>>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> >>>  at
> >>>
> >>>
> >>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
> >>>  at
> >>>
> >>
> java.util.concurrent.LinkedBlockingQ

[jira] [Updated] (ZOOKEEPER-2347) Deadlock shutting down zookeeper

2015-12-16 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-2347:
--
Attachment: testSplitLogManager.stack

Stack trace showing the issue

> Deadlock shutting down zookeeper
> 
>
> Key: ZOOKEEPER-2347
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2347
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.7
>    Reporter: Ted Yu
>Priority: Critical
> Attachments: testSplitLogManager.stack
>
>
> HBase recently upgraded to zookeeper 3.4.7
> In one of the tests, TestSplitLogManager, there is reproducible hang at the 
> end of the test.
> Below is snippet from stack trace related to zookeeper:
> {code}
> "main-EventThread" daemon prio=5 tid=0x7fd27488a800 nid=0x6f1f waiting on 
> condition [0x00011834b000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007c5b8d3a0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main-SendThread(localhost:59510)" daemon prio=5 tid=0x7fd274eb4000 
> nid=0x9513 waiting on condition [0x000118042000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101)
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
> "SyncThread:0" prio=5 tid=0x7fd274d02000 nid=0x730f waiting for monitor 
> entry [0x0001170ac000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.decInProcess(ZooKeeperServer.java:512)
>   - waiting to lock <0x0007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:144)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131)
> "main-EventThread" daemon prio=5 tid=0x7fd2753a3800 nid=0x711b waiting on 
> condition [0x000117a3]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007c9b106b8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main" prio=5 tid=0x7fd27600 nid=0x1903 in Object.wait() 
> [0x000108aa1000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0x0007c5b66400> (a 
> org.apache.zookeeper.server.SyncRequestProcessor)
>   at java.lang.Thread.join(Thread.java:1281)
>   - locked <0x0007c5b66400> (a 
> org.apache.zookeeper.server.SyncRequestProcessor)
>   at java.lang.Thread.join(Thread.java:1355)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:213)
>   at 
> org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:770)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:478)
>   - locked <0x0007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:266)
>   at 
> org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.shutdown(MiniZooKeeperCluster.java:301)
> {code}
> Note the address (0x0007c5b66400) in the last hunk which seems to 
> indicate some form of deadlock.
> According to Camille Fournier:
> We made shutdown synchronized. But decrementing the requests is
> also synchronized and called from a different thread. So yeah, deadlock.
> This came in with ZOOKEEPER-1907



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZOOKEEPER-2347) Deadlock shutting down zookeeper

2015-12-16 Thread Ted Yu (JIRA)
Ted Yu created ZOOKEEPER-2347:
-

 Summary: Deadlock shutting down zookeeper
 Key: ZOOKEEPER-2347
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2347
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.4.7
Reporter: Ted Yu
Priority: Critical


HBase recently upgraded to zookeeper 3.4.7

In one of the tests, TestSplitLogManager, there is reproducible hang at the end 
of the test.
Below is snippet from stack trace related to zookeeper:
{code}
"main-EventThread" daemon prio=5 tid=0x7fd27488a800 nid=0x6f1f waiting on 
condition [0x00011834b000]
   java.lang.Thread.State: WAITING (parking)
  at sun.misc.Unsafe.park(Native Method)
  - parking to wait for  <0x0007c5b8d3a0> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
  at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)

"main-SendThread(localhost:59510)" daemon prio=5 tid=0x7fd274eb4000 
nid=0x9513 waiting on condition [0x000118042000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
  at java.lang.Thread.sleep(Native Method)
  at 
org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101)
  at 
org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997)
  at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)

"SyncThread:0" prio=5 tid=0x7fd274d02000 nid=0x730f waiting for monitor 
entry [0x0001170ac000]
   java.lang.Thread.State: BLOCKED (on object monitor)
  at 
org.apache.zookeeper.server.ZooKeeperServer.decInProcess(ZooKeeperServer.java:512)
  - waiting to lock <0x0007c5b62128> (a 
org.apache.zookeeper.server.ZooKeeperServer)
  at 
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:144)
  at 
org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200)
  at 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131)

"main-EventThread" daemon prio=5 tid=0x7fd2753a3800 nid=0x711b waiting on 
condition [0x000117a3]
   java.lang.Thread.State: WAITING (parking)
  at sun.misc.Unsafe.park(Native Method)
  - parking to wait for  <0x0007c9b106b8> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
  at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)

"main" prio=5 tid=0x7fd27600 nid=0x1903 in Object.wait() 
[0x000108aa1000]
   java.lang.Thread.State: WAITING (on object monitor)
  at java.lang.Object.wait(Native Method)
  - waiting on <0x0007c5b66400> (a 
org.apache.zookeeper.server.SyncRequestProcessor)
  at java.lang.Thread.join(Thread.java:1281)
  - locked <0x0007c5b66400> (a 
org.apache.zookeeper.server.SyncRequestProcessor)
  at java.lang.Thread.join(Thread.java:1355)
  at 
org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:213)
  at 
org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:770)
  at 
org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:478)
  - locked <0x0007c5b62128> (a org.apache.zookeeper.server.ZooKeeperServer)
  at 
org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:266)
  at 
org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.shutdown(MiniZooKeeperCluster.java:301)
{code}
Note the address (0x0007c5b66400) in the last hunk which seems to indicate 
some form of deadlock.

According to Camille Fournier:

We made shutdown synchronized. But decrementing the requests is
also synchronized and called from a different thread. So yeah, deadlock.
This came in with ZOOKEEPER-1907



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2170) Zookeeper is not logging as per the configuration in log4j.properties

2015-09-29 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14935526#comment-14935526
 ] 

Ted Yu commented on ZOOKEEPER-2170:
---

If I am not mistaken, 3.4.6 has this issue as well.

When can I expect this to be fixed ?

Thanks

> Zookeeper is not logging as per the configuration in log4j.properties
> -
>
> Key: ZOOKEEPER-2170
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2170
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Arshad Mohammad
>Assignee: Arshad Mohammad
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2170-002.patch, ZOOKEEPER-2170-003.patch, 
> ZOOKEEPER-2170.001.patch
>
>
> In conf/log4j.properties default root logger is 
> {code}
> zookeeper.root.logger=INFO, CONSOLE
> {code}
> Changing root logger to bellow value or any other value does not change 
> logging effect
> {code}
> zookeeper.root.logger=DEBUG, ROLLINGFILE
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (ZOOKEEPER-2080) ReconfigRecoveryTest fails intermittently

2015-03-29 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved ZOOKEEPER-2080.
---
Resolution: Cannot Reproduce

> ReconfigRecoveryTest fails intermittently
> -
>
> Key: ZOOKEEPER-2080
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2080
> Project: ZooKeeper
>  Issue Type: Sub-task
>    Reporter: Ted Yu
>Priority: Minor
>
> I got the following test failure on MacBook with trunk code:
> {code}
> Testcase: testCurrentObserverIsParticipantInNewConfig took 93.628 sec
>   FAILED
> waiting for server 2 being up
> junit.framework.AssertionFailedError: waiting for server 2 being up
>   at 
> org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentObserverIsParticipantInNewConfig(ReconfigRecoveryTest.java:529)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2080) ReconfigRecoveryTest fails intermittently

2015-03-19 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369593#comment-14369593
 ] 

Ted Yu commented on ZOOKEEPER-2080:
---

Looks like the test doesn't fail recently.

> ReconfigRecoveryTest fails intermittently
> -
>
> Key: ZOOKEEPER-2080
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2080
> Project: ZooKeeper
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Priority: Minor
>
> I got the following test failure on MacBook with trunk code:
> {code}
> Testcase: testCurrentObserverIsParticipantInNewConfig took 93.628 sec
>   FAILED
> waiting for server 2 being up
> junit.framework.AssertionFailedError: waiting for server 2 being up
>   at 
> org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentObserverIsParticipantInNewConfig(ReconfigRecoveryTest.java:529)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2105) PrintWriter left unclosed in NIOServerCnxn#checkFourLetterWord

2015-01-18 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-2105:
--
Attachment: zookeeper-2105-v1.patch

> PrintWriter left unclosed in NIOServerCnxn#checkFourLetterWord
> --
>
> Key: ZOOKEEPER-2105
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2105
> Project: ZooKeeper
>  Issue Type: Bug
>    Reporter: Ted Yu
>Priority: Minor
> Attachments: zookeeper-2105-v1.patch
>
>
> {code}
> final PrintWriter pwriter = new PrintWriter(
> new BufferedWriter(new SendBufferWriter()));
> {code}
> pwriter should be closed upon return from the method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2105) PrintWriter left unclosed in NIOServerCnxn#checkFourLetterWord

2015-01-09 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14272192#comment-14272192
 ] 

Ted Yu commented on ZOOKEEPER-2105:
---

NettyServerCnxn#checkFourLetterWord() has similar issue.

> PrintWriter left unclosed in NIOServerCnxn#checkFourLetterWord
> --
>
> Key: ZOOKEEPER-2105
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2105
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> {code}
> final PrintWriter pwriter = new PrintWriter(
> new BufferedWriter(new SendBufferWriter()));
> {code}
> pwriter should be closed upon return from the method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZOOKEEPER-2105) PrintWriter left unclosed in NIOServerCnxn#checkFourLetterWord

2015-01-09 Thread Ted Yu (JIRA)
Ted Yu created ZOOKEEPER-2105:
-

 Summary: PrintWriter left unclosed in 
NIOServerCnxn#checkFourLetterWord
 Key: ZOOKEEPER-2105
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2105
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


{code}
final PrintWriter pwriter = new PrintWriter(
new BufferedWriter(new SendBufferWriter()));
{code}
pwriter should be closed upon return from the method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2064) Prevent resource leak in various classes

2014-11-20 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220142#comment-14220142
 ] 

Ted Yu commented on ZOOKEEPER-2064:
---

Thanks Flavio.

> Prevent resource leak in various classes
> 
>
> Key: ZOOKEEPER-2064
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2064
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Critical
> Fix For: 3.4.7, 3.5.1, 3.6.0
>
> Attachments: 2064-v1.txt, 2064-v2.txt, ZOOKEEPER-2064.patch
>
>
> In various classes, there is potential resource leak.
> e.g. LogIterator / RandomAccessFileReader is not closed upon return from the 
> method.
> Corresponding close() should be called to prevent resource leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2064) Prevent resource leak in various classes

2014-11-20 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220082#comment-14220082
 ] 

Ted Yu commented on ZOOKEEPER-2064:
---

[~Mdanielle] [~phunt]:
Mind taking a look ?

> Prevent resource leak in various classes
> 
>
> Key: ZOOKEEPER-2064
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2064
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Critical
> Attachments: 2064-v1.txt, 2064-v2.txt
>
>
> In various classes, there is potential resource leak.
> e.g. LogIterator / RandomAccessFileReader is not closed upon return from the 
> method.
> Corresponding close() should be called to prevent resource leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZOOKEEPER-2080) ReconfigRecoveryTest fails intermittently

2014-11-12 Thread Ted Yu (JIRA)
Ted Yu created ZOOKEEPER-2080:
-

 Summary: ReconfigRecoveryTest fails intermittently
 Key: ZOOKEEPER-2080
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2080
 Project: ZooKeeper
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor


I got the following test failure on MacBook with trunk code:
{code}
Testcase: testCurrentObserverIsParticipantInNewConfig took 93.628 sec
  FAILED
waiting for server 2 being up
junit.framework.AssertionFailedError: waiting for server 2 being up
  at 
org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentObserverIsParticipantInNewConfig(ReconfigRecoveryTest.java:529)
  at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2064) Prevent resource leak in various classes

2014-11-12 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208931#comment-14208931
 ] 

Ted Yu commented on ZOOKEEPER-2064:
---

Correction:
ReconfigRecoveryTest#testCurrentServersAreObserversInNextConfig failed with 
patch
ReconfigRecoveryTest#testCurrentObserverIsParticipantInNewConfig failed without 
patch.

> Prevent resource leak in various classes
> 
>
> Key: ZOOKEEPER-2064
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2064
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Critical
> Attachments: 2064-v1.txt, 2064-v2.txt
>
>
> In various classes, there is potential resource leak.
> e.g. LogIterator / RandomAccessFileReader is not closed upon return from the 
> method.
> Corresponding close() should be called to prevent resource leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2064) Prevent resource leak in various classes

2014-11-12 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208924#comment-14208924
 ] 

Ted Yu commented on ZOOKEEPER-2064:
---

I ran the failed tests locally.
ReconfigRecoveryTest#testCurrentObserverIsParticipantInNewConfig fails with or 
without my patch.

> Prevent resource leak in various classes
> 
>
> Key: ZOOKEEPER-2064
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2064
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Critical
> Attachments: 2064-v1.txt, 2064-v2.txt
>
>
> In various classes, there is potential resource leak.
> e.g. LogIterator / RandomAccessFileReader is not closed upon return from the 
> method.
> Corresponding close() should be called to prevent resource leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2064) Prevent resource leak in various classes

2014-11-12 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-2064:
--
Attachment: 2064-v2.txt

patch v2 is based on latest trunk.

> Prevent resource leak in various classes
> 
>
> Key: ZOOKEEPER-2064
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2064
> Project: ZooKeeper
>  Issue Type: Bug
>    Reporter: Ted Yu
>Priority: Critical
> Attachments: 2064-v1.txt, 2064-v2.txt
>
>
> In various classes, there is potential resource leak.
> e.g. LogIterator / RandomAccessFileReader is not closed upon return from the 
> method.
> Corresponding close() should be called to prevent resource leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2064) Prevent resource leak in various classes

2014-10-21 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-2064:
--
Attachment: 2064-v1.txt

Tentative patch.

> Prevent resource leak in various classes
> 
>
> Key: ZOOKEEPER-2064
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2064
> Project: ZooKeeper
>  Issue Type: Bug
>    Reporter: Ted Yu
> Attachments: 2064-v1.txt
>
>
> In various classes, there is potential resource leak.
> e.g. LogIterator / RandomAccessFileReader is not closed upon return from the 
> method.
> Corresponding close() should be called to prevent resource leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZOOKEEPER-2064) Prevent resource leak in various classes

2014-10-21 Thread Ted Yu (JIRA)
Ted Yu created ZOOKEEPER-2064:
-

 Summary: Prevent resource leak in various classes
 Key: ZOOKEEPER-2064
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2064
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Ted Yu


In various classes, there is potential resource leak.
e.g. LogIterator / RandomAccessFileReader is not closed upon return from the 
method.

Corresponding close() should be called to prevent resource leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [ANNOUNCE] New ZooKeeper committer: Rakesh R

2014-05-18 Thread Ted Yu
Congratulations, Rakesh.


[jira] [Resolved] (ZOOKEEPER-1665) Support recursive deletion in multi

2014-04-02 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved ZOOKEEPER-1665.
---

Resolution: Won't Fix

> Support recursive deletion in multi
> ---
>
> Key: ZOOKEEPER-1665
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1665
> Project: ZooKeeper
>  Issue Type: New Feature
>Reporter: Ted Yu
>
> Use case in HBase is that we need to recursively delete multiple subtrees:
> {code}
> ZKUtil.deleteChildrenRecursively(watcher, acquiredZnode);
> ZKUtil.deleteChildrenRecursively(watcher, reachedZnode);
> ZKUtil.deleteChildrenRecursively(watcher, abortZnode);
> {code}
> To achieve high consistency, it is desirable to use multi for the above 
> operations.
> This JIRA adds support for recursive deletion in multi.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ZOOKEEPER-1665) Support recursive deletion in multi

2014-03-12 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13931718#comment-13931718
 ] 

Ted Yu commented on ZOOKEEPER-1665:
---

The snippet looks good. 
Patch is welcome. 

Thanks

> Support recursive deletion in multi
> ---
>
> Key: ZOOKEEPER-1665
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1665
> Project: ZooKeeper
>  Issue Type: New Feature
>Reporter: Ted Yu
>
> Use case in HBase is that we need to recursively delete multiple subtrees:
> {code}
> ZKUtil.deleteChildrenRecursively(watcher, acquiredZnode);
> ZKUtil.deleteChildrenRecursively(watcher, reachedZnode);
> ZKUtil.deleteChildrenRecursively(watcher, abortZnode);
> {code}
> To achieve high consistency, it is desirable to use multi for the above 
> operations.
> This JIRA adds support for recursive deletion in multi.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: [VOTE] Apache ZooKeeper release 3.4.6 candidate 0

2014-03-09 Thread Ted Yu
You can use the following command to resume from the first skipped module:

mvn test -rf :hadoop-yarn-server-tests

But of course the tests may fail half way :-)


On Sun, Mar 9, 2014 at 9:54 AM, Patrick Hunt  wrote:

> Doesn't help, now they just seem to be skipped instead. Regardless, I
> think we're ok now.
>
> [INFO] hadoop-yarn-server-common . SUCCESS [6.956s]
> [INFO] hadoop-yarn-server-nodemanager  SUCCESS
> [4:43.190s]
> [INFO] hadoop-yarn-server-web-proxy .. SUCCESS [7.180s]
> [INFO] hadoop-yarn-server-resourcemanager  FAILURE
> [15:23.465s]
> [INFO] hadoop-yarn-server-tests .. SKIPPED
> [INFO] hadoop-yarn-client  SKIPPED
> [INFO] hadoop-yarn-applications .. SUCCESS [0.057s]
> [INFO] hadoop-yarn-applications-distributedshell . SKIPPED
> [INFO] hadoop-yarn-applications-unmanaged-am-launcher  SKIPPED
> [INFO] hadoop-yarn-site .. SUCCESS [0.051s]
> [INFO] hadoop-yarn-project ... SKIPPED
> [INFO] hadoop-mapreduce-client ... SUCCESS [0.112s]
> [INFO] hadoop-mapreduce-client-core .. SUCCESS
> [1:01.780s]
> [INFO] hadoop-mapreduce-client-common  SKIPPED
> [INFO] hadoop-mapreduce-client-shuffle ... SUCCESS [2.730s]
> [INFO] hadoop-mapreduce-client-app ... SKIPPED
> [INFO] hadoop-mapreduce-client-hs  SKIPPED
> [INFO] hadoop-mapreduce-client-jobclient . SKIPPED
> [INFO] hadoop-mapreduce-client-hs-plugins  SKIPPED
> [INFO] Apache Hadoop MapReduce Examples .. SKIPPED
> [INFO] hadoop-mapreduce .. SUCCESS [1.549s]
> [INFO] Apache Hadoop MapReduce Streaming . SKIPPED
> [INFO] Apache Hadoop Distributed Copy  SKIPPED
> [INFO] Apache Hadoop Archives  SKIPPED
> [INFO] Apache Hadoop Rumen ... SKIPPED
> [INFO] Apache Hadoop Gridmix . SKIPPED
> [INFO] Apache Hadoop Data Join ... SKIPPED
> [INFO] Apache Hadoop Extras .. SKIPPED
> [INFO] Apache Hadoop Pipes ... SUCCESS [0.036s]
> [INFO] Apache Hadoop OpenStack support ... SUCCESS [0.982s]
> [INFO] Apache Hadoop Client .. SKIPPED
> [INFO] Apache Hadoop Mini-Cluster  SKIPPED
> [INFO] Apache Hadoop Scheduler Load Simulator  SKIPPED
> [INFO] Apache Hadoop Tools Dist .. SKIPPED
> [INFO] Apache Hadoop Tools ... SUCCESS [0.049s]
> [INFO] Apache Hadoop Distribution  SKIPPED
> [INFO]
> 
> [INFO] BUILD FAILURE
> [INFO]
> 
> [INFO] Total time: 4:35:58.145s
>
> On Sat, Mar 8, 2014 at 9:09 AM, Patrick Hunt  wrote:
> > Didn't know about that one . Let me try that thx
> >
> > On Mar 8, 2014 8:45 AM, "Ted Yu"  wrote:
> >>
> >> Have you tried using '--fail-at-end' on the command line ?
> >>
> >> It is supposed to run through all modules.
> >>
> >> Cheers
> >>
> >>
> >> On Sat, Mar 8, 2014 at 8:37 AM, Patrick Hunt  wrote:
> >>
> >> > Thanks Todd. My primary concern was actually that we got
> >> > repository.a.o setup properly. (staging) Re that issue you'I worked on
> >> > a few months ago - the zookeeper test jar was missing. I assume that
> >> > if any of the ZK tests are passing then that aspect is functioning
> >> > properly? I eyeballed the repo and it looked correct (the test jar is
> >> > there) but I wanted to verify by running Hadoop tests themselves.
> >> >
> >> > btw. the module kept running all the tests, but then stopped after all
> >> > the tests within that module were run. It didn't continue with addl
> >> > modules. But it did run all the tests w/in the failing module. I tried
> >> > setting failOnError (whatever it is) in mvn to false, but that didn't
> >> > help, which is odd
> >> >
> >> > Thanks!
> >> >
> >> > Patrick
> >> >
> >> > On Fri, Mar 7, 2014 at 6:30 PM, Todd Lipcon 
> wrote:
> >> > > On F

Re: [VOTE] Apache ZooKeeper release 3.4.6 candidate 0

2014-03-08 Thread Ted Yu
Have you tried using '--fail-at-end' on the command line ?

It is supposed to run through all modules.

Cheers


On Sat, Mar 8, 2014 at 8:37 AM, Patrick Hunt  wrote:

> Thanks Todd. My primary concern was actually that we got
> repository.a.o setup properly. (staging) Re that issue you'I worked on
> a few months ago - the zookeeper test jar was missing. I assume that
> if any of the ZK tests are passing then that aspect is functioning
> properly? I eyeballed the repo and it looked correct (the test jar is
> there) but I wanted to verify by running Hadoop tests themselves.
>
> btw. the module kept running all the tests, but then stopped after all
> the tests within that module were run. It didn't continue with addl
> modules. But it did run all the tests w/in the failing module. I tried
> setting failOnError (whatever it is) in mvn to false, but that didn't
> help, which is odd
>
> Thanks!
>
> Patrick
>
> On Fri, Mar 7, 2014 at 6:30 PM, Todd Lipcon  wrote:
> > On Fri, Mar 7, 2014 at 2:44 PM, Patrick Hunt  wrote:
> >>
> >> I still can't get through all the Hadoop tests in 2.3.0, I even tried
> >> another machine. It does get further on another machine, but there
> >> seem to be flakey tests (not zk related) that always keep the full
> >> suite from completing.
> >
> >
> > It ought to keep running even if some of the tests failed. Surprising
> that
> > it doesn't. Maybe run each module's tests sepately rather than mvn test
> from
> > toplevel?
> >
> >>
> >>
> >> I did notice that the zookeeper jar and test jar is pulled down
> >> successfully from the staging environment (when I update hadoop to use
> >> 3.4.6 staging).
> >>
> >> The following tests reference zk, and they pass, so at this point I'm
> >> going to give up and assume that we're good. :-) I'm ccing Todd,
> >> perhaps he can give us some insight into whether this set of tests
> >> covers most of what hadoop tests re zk use:
> >>
> >> Running org.apache.hadoop.ha.TestActiveStandbyElectorRealZK
> >> Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.185
> >> sec - in org.apache.hadoop.ha.TestActiveStandbyElectorRealZK
> >> Running org.apache.hadoop.ha.TestZKFailoverController
> >> Tests run: 18, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:
> >> 45.351 sec - in org.apache.hadoop.ha.TestZKFailoverController
> >> Running org.apache.hadoop.ha.TestZKFailoverControllerStress
> >> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 94.119
> >> sec - in org.apache.hadoop.ha.TestZKFailoverControllerStress
> >> Running org.apache.hadoop.util.TestZKUtil
> >> Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.132
> >> sec - in org.apache.hadoop.util.TestZKUtil
> >> Running
> >> org.apache.hadoop.hdfs.server.namenode.ha.TestDFSZKFailoverController
> >> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 23.09
> >> sec - in
> >> org.apache.hadoop.hdfs.server.namenode.ha.TestDFSZKFailoverController
> >
> >
> > Those look like a good set from the common side. MR has a few tests like
> > TestZKRMStateStore which might be useful, but I don't think it's doing
> > anything too advanced. I'd be surprised if they caught any ZK regression
> > that wouldn't be well covered by the rest of your test infrastructure.
> >
> > -Todd
> >
> >>
> >>
> >> Patrick
> >>
> >> On Thu, Mar 6, 2014 at 7:48 AM, Flavio Junqueira  >
> >> wrote:
> >> > +1, I have:
> >> >
> >> > - Checked signature and hashes
> >> > - Ran tests on Mac OS X, Windows, Linux Ubuntu
> >> > - Tested with internal application
> >> >
> >> > -Flavio
> >> >
> >> > On 25 Feb 2014, at 22:23, Mahadev Konar 
> wrote:
> >> >
> >> >> +1
> >> >>
> >> >> Verified the signatures and the artifacts.
> >> >>
> >> >> thanks
> >> >> mahadev
> >> >> Mahadev Konar
> >> >> Hortonworks Inc.
> >> >> http://hortonworks.com/
> >> >>
> >> >>
> >> >> On Mon, Feb 24, 2014 at 12:20 PM, Michi Mutsuzaki
> >> >>  wrote:
> >> >>> +1
> >> >>>
> >> >>> ant test passed on ubuntu 12.04.
> >> >>>
> >> >>> On Sun, F

Re: [VOTE] Apache ZooKeeper release 3.4.6 candidate 0

2014-02-23 Thread Ted Yu
I pointed HBase 0.98 at 3.4.6 RC0 in the staging repo.
I ran through test suite and it passed:

[INFO] BUILD SUCCESS
[INFO]

[INFO] Total time: 1:09:42.116s
[INFO] Finished at: Sun Feb 23 19:21:04 UTC 2014
[INFO] Final Memory: 48M/503M

Cheers


On Sun, Feb 23, 2014 at 11:39 AM, Flavio Junqueira wrote:

> This is a bugfix release candidate for 3.4.5. It fixes 117 issues,
> including issues that affect
> leader election, Zab, and SASL authentication.
>
> The full release notes is available at:
>
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310801&version=12323310
>
> *** Please download, test and vote by March 9th 2014, 23:59 UTC+0. ***
>
> Source files:
> http://people.apache.org/~fpj/zookeeper-3.4.6-candidate-0/
>
> Maven staging repo:
>
> https://repository.apache.org/content/groups/staging/org/apache/zookeeper/zookeeper/3.4.6/
>
> The tag to be voted upon:
> https://svn.apache.org/repos/asf/zookeeper/tags/release-3.4.6-rc0
>
> ZooKeeper's KEYS file containing PGP keys we use to sign the release:
>
> http://www.apache.org/dist/zookeeper/KEYS
>
> Should we release this candidate?
>
> -Flavio


[jira] [Commented] (ZOOKEEPER-1861) ConcurrentHashMap isn't used properly in QuorumCnxManager

2014-02-11 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13898727#comment-13898727
 ] 

Ted Yu commented on ZOOKEEPER-1861:
---

bq. I prefer patch v2

I agree.

Patch v3 basically makes the map a HashMap.

bq. then create, and put if not absent

I guess you meant 'put if absent'

The chance of extra allocation should be low.

> ConcurrentHashMap isn't used properly in QuorumCnxManager
> -
>
> Key: ZOOKEEPER-1861
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1861
> Project: ZooKeeper
>  Issue Type: Bug
>    Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
> Attachments: zookeeper-1861-v1.txt, zookeeper-1861-v2.txt, 
> zookeeper-1861-v3.txt
>
>
> queueSendMap is a ConcurrentHashMap.
> At line 210:
> {code}
> if (!queueSendMap.containsKey(sid)) {
> queueSendMap.put(sid, new ArrayBlockingQueue(
> SEND_CAPACITY));
> {code}
> By the time control enters if block, there may be another concurrent put with 
> same sid to the ConcurrentHashMap.
> putIfAbsent() should be used.
> Similar issue occurs at line 307 as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (ZOOKEEPER-1861) ConcurrentHashMap isn't used properly in QuorumCnxManager

2014-02-11 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1861:
--

Attachment: zookeeper-1861-v3.txt

How about patch v3 ?

> ConcurrentHashMap isn't used properly in QuorumCnxManager
> -
>
> Key: ZOOKEEPER-1861
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1861
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
> Attachments: zookeeper-1861-v1.txt, zookeeper-1861-v2.txt, 
> zookeeper-1861-v3.txt
>
>
> queueSendMap is a ConcurrentHashMap.
> At line 210:
> {code}
> if (!queueSendMap.containsKey(sid)) {
> queueSendMap.put(sid, new ArrayBlockingQueue(
> SEND_CAPACITY));
> {code}
> By the time control enters if block, there may be another concurrent put with 
> same sid to the ConcurrentHashMap.
> putIfAbsent() should be used.
> Similar issue occurs at line 307 as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (ZOOKEEPER-1861) ConcurrentHashMap isn't used properly in QuorumCnxManager

2014-02-10 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897335#comment-13897335
 ] 

Ted Yu commented on ZOOKEEPER-1861:
---

Further review on this would be appreciated.

> ConcurrentHashMap isn't used properly in QuorumCnxManager
> -
>
> Key: ZOOKEEPER-1861
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1861
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
> Attachments: zookeeper-1861-v1.txt, zookeeper-1861-v2.txt
>
>
> queueSendMap is a ConcurrentHashMap.
> At line 210:
> {code}
> if (!queueSendMap.containsKey(sid)) {
> queueSendMap.put(sid, new ArrayBlockingQueue(
> SEND_CAPACITY));
> {code}
> By the time control enters if block, there may be another concurrent put with 
> same sid to the ConcurrentHashMap.
> putIfAbsent() should be used.
> Similar issue occurs at line 307 as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: Where are we in ZOOKEEPER-1416

2014-01-18 Thread Ted Yu
Pardon me, Honghua. 
I should have given proper context. 

Will pay attention in the future. 

Cheers

On Jan 18, 2014, at 4:05 PM, Ted Dunning  wrote:

> On Sat, Jan 18, 2014 at 8:58 AM, Stack  wrote:
> 
>>> On Fri, Jan 17, 2014 at 2:14 PM, Ted Dunning 
>>> wrote:
>>> 
 
 That comment indicates a lack of understanding of ZK, not a bug in ZK.
>> 
>> 
>> Mighty Ted Dunning, I'd just like to say that the quote is badly out of
>> context and misrepresents the clueful gentleman quoted (The comment
>> actually leads off with the root problem being HBase's USE of ZK, NOT ZK).
> 
> Fair correction.  I don't think I was the one who removed the context, but
> I should have been more sensitive to its lack.
> 
> Corrections always welcome.  Corrections from Stack pretty much always
> accepted without question.
> 
> 
>> Having the ability to know exact deltas would help make HBase region
>> assignment more robust.
>> 
>> Nor am I sure what this is about.  Those actually working on making HBase
>> region assignment more 'robust' have not asked for the above; i.e. have ZK
>> make fundamental changes in the way it operates, though it is working as
>> advertised, just to 'fix' a downstream project's misuse.
> 
> See previous general principle.  This is much more compatible with what I
> remember of the strategies being used in the HBase Master.


Re: Where are we in ZOOKEEPER-1416

2014-01-17 Thread Ted Yu
Thanks for the feedback, Kishore and Ted.

Appreciate it.


On Fri, Jan 17, 2014 at 2:41 PM, Ted Dunning  wrote:

> My reference here is to the comments a ways up thread.  Kishore and I
> clearly agree completely that idempotency and dealing with the state as it
> is right now are the keys to correct design.
>
>
> On Fri, Jan 17, 2014 at 2:14 PM, Ted Dunning 
> wrote:
>
> >
> > That comment indicates a lack of understanding of ZK, not a bug in ZK.
> >
> > You don't lose state transitions if you read new state at the same time
> > you set the new watch.
> >
> > Likewise, it is simply a product of bad design to have a problem with
> > asynchronous notification.  Changes on other machines *are* asynchronous
> so
> > anybody who can't handle that is inherently denying reality.  If you want
> > to inject the notifications into a sequential view of an event stream,
> that
> > is trivial to do.
> >
> > Systems that depend on transition notification are generally not as
> robust
> > as systems that depend on current state.  Building a cluster manager
> works
> > better if the master is notified that a change has happened, but then
> > simply deals with the situation as it stands.
> >
> > As an analog, imagine that you have a system that shows a number x and a
> > second system that is supposed to show an echo of that number.
> >
> > Design A is notified of changes to x in the form of deltas.  If there is
> > ever an error in handling events, the echo will be off forever.  The
> error
> > that causes the delta to be dropped could be notification or a coding
> error
> > or a misunderstanding of how parallel systems work.  For instance, the
> > InterruptedException might not be handled right.
> >
> > Design B is notified of changes to x and whenever a change happens, the
> > second system simply goes and reads the new state.  Errors will be
> quickly
> > corrected.
> >
> > It sounds like the original poster is trying to build something like
> > Design A when they should be building Design B.
> >
> >
> >
> >
> >
> > On Fri, Jan 17, 2014 at 12:34 PM, Ted Yu  wrote:
> >
> >> HBASE-5487 is also related.
> >>
> >> The discussion there is very long. Below is an excerpt from Honghua:
> >>
> >> too many tricky scenarios/bugs due to ZK watch is one-time(which can
> >> result
> >> in missed state transition) and the notification/process is
> >> asyncronous(which can lead to delayed/non-update-to-date state in master
> >> memory).
> >>
> >> Cheers
> >>
> >>
> >> On Fri, Jan 17, 2014 at 11:25 AM, Ted Yu  wrote:
> >>
> >> > Hi, Flavio:
> >> > HBASE-8365 is one such case.
> >> >
> >> > Let me search around for other related discussion.
> >> >
> >> >
> >> > On Fri, Jan 17, 2014 at 11:17 AM, Flavio Junqueira <
> >> fpjunque...@yahoo.com>wrote:
> >> >
> >> >> Hi Ted,
> >> >>
> >> >> Can you provide more detail on how the precise deltas could make it
> >> more
> >> >> robust?
> >> >>
> >> >> -Flavio
> >> >>
> >> >> -Original Message-
> >> >> From: "Ted Yu" 
> >> >> Sent: 17/01/2014 17:25
> >> >> To: "dev@zookeeper.apache.org" 
> >> >> Subject: Re: Where are we in ZOOKEEPER-1416
> >> >>
> >> >> Having the ability to know exact deltas would help make HBase region
> >> >> assignment more robust.
> >> >>
> >> >> Cheers
> >> >>
> >> >>
> >> >>
> >> >> On Fri, Jan 17, 2014 at 9:13 AM, kishore g 
> >> wrote:
> >> >>
> >> >> > I agree with you, I like the side effect and in fact I would prefer
> >> to
> >> >> have
> >> >> > one notification for all changes under a parent node.
> >> >> >
> >> >> > However, Hao is probably asking for ability to know exact deltas.
> >> >> >
> >> >> >
> >> >> > On Fri, Jan 17, 2014 at 8:15 AM, FPJ 
> wrote:
> >> >> >
> >> >> > > We don't need to have a mapping between every change and a
> >> >> notification.
> >> >> > If
> >> >> > > there are 2+ changes betwee

Re: Where are we in ZOOKEEPER-1416

2014-01-17 Thread Ted Yu
HBASE-5487 is also related.

The discussion there is very long. Below is an excerpt from Honghua:

too many tricky scenarios/bugs due to ZK watch is one-time(which can result
in missed state transition) and the notification/process is
asyncronous(which can lead to delayed/non-update-to-date state in master
memory).

Cheers


On Fri, Jan 17, 2014 at 11:25 AM, Ted Yu  wrote:

> Hi, Flavio:
> HBASE-8365 is one such case.
>
> Let me search around for other related discussion.
>
>
> On Fri, Jan 17, 2014 at 11:17 AM, Flavio Junqueira 
> wrote:
>
>> Hi Ted,
>>
>> Can you provide more detail on how the precise deltas could make it more
>> robust?
>>
>> -Flavio
>>
>> -Original Message-
>> From: "Ted Yu" 
>> Sent: 17/01/2014 17:25
>> To: "dev@zookeeper.apache.org" 
>> Subject: Re: Where are we in ZOOKEEPER-1416
>>
>> Having the ability to know exact deltas would help make HBase region
>> assignment more robust.
>>
>> Cheers
>>
>>
>>
>> On Fri, Jan 17, 2014 at 9:13 AM, kishore g  wrote:
>>
>> > I agree with you, I like the side effect and in fact I would prefer to
>> have
>> > one notification for all changes under a parent node.
>> >
>> > However, Hao is probably asking for ability to know exact deltas.
>> >
>> >
>> > On Fri, Jan 17, 2014 at 8:15 AM, FPJ  wrote:
>> >
>> > > We don't need to have a mapping between every change and a
>> notification.
>> > If
>> > > there are 2+ changes between notifications, you'll be able to observe
>> it
>> > by
>> > > reading the ZK state. In fact, one nice side-effect is that we reduce
>> the
>> > > number of notifications when there are many concurrent changes.
>> > >
>> > > The only situation I can see it being necessary is the one in which we
>> > need
>> > > to know precisely the changes and we haven't cached a previous
>> version of
>> > > the state.
>> > >
>> > > -Flavio
>> > >
>> > > > -Original Message-
>> > > > From: kishore g [mailto:g.kish...@gmail.com]
>> > > > Sent: 17 January 2014 16:06
>> > > > To: dev@zookeeper.apache.org
>> > > > Subject: Re: Where are we in ZOOKEEPER-1416
>> > > >
>> > > > I think Hao is pointing out that there is no way to see every change
>> > > > (delta) that happened to a znode. Consider 2 changes A,B in quick
>> > > > succession. When client gets notified of A and before setting the
>> watch
>> > > the
>> > > > change B has occurred on the server side. This means the client
>> cannot
>> > > know
>> > > > the delta A. Client can only read the state after change B is
>> applied.
>> > > >
>> > > > Implementing the concept of Persistent watcher guarantees that
>> client
>> > if
>> > > > notified after every change.
>> > > >
>> > > > This is a nice to have feature but I dont understand the
>> requirement in
>> > > Hbase
>> > > > where this is needed. Hao, can you shed more light on how this
>> would be
>> > > > useful for HBase (to act like state machine)
>> > > >
>> > > > thanks,
>> > > > Kishore G
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > On Fri, Jan 17, 2014 at 5:18 AM, FPJ  wrote:
>> > > >
>> > > > > But you don't really miss events, you'll see them when you read
>> the
>> > ZK
>> > > > > state. If you follow the pattern I described, you're supposed to
>> > > > > observe all changes. Perhaps I'm missing some concrete use case
>> you
>> > > > > have mind.
>> > > > >
>> > > > > -Flavio
>> > > > >
>> > > > > > -Original Message-
>> > > > > > From: 陈迪豪 [mailto:chendi...@xiaomi.com]
>> > > > > > Sent: 17 January 2014 13:03
>> > > > > > To: dev@zookeeper.apache.org
>> > > > > > Subject: RE: Where are we in ZOOKEEPER-1416
>> > > > > >
>> > > > > > No, it's not complicated.

Re: Where are we in ZOOKEEPER-1416

2014-01-17 Thread Ted Yu
Hi, Flavio:
HBASE-8365 is one such case.

Let me search around for other related discussion.


On Fri, Jan 17, 2014 at 11:17 AM, Flavio Junqueira wrote:

> Hi Ted,
>
> Can you provide more detail on how the precise deltas could make it more
> robust?
>
> -Flavio
>
> -Original Message-
> From: "Ted Yu" 
> Sent: 17/01/2014 17:25
> To: "dev@zookeeper.apache.org" 
> Subject: Re: Where are we in ZOOKEEPER-1416
>
> Having the ability to know exact deltas would help make HBase region
> assignment more robust.
>
> Cheers
>
>
> On Fri, Jan 17, 2014 at 9:13 AM, kishore g  wrote:
>
> > I agree with you, I like the side effect and in fact I would prefer to
> have
> > one notification for all changes under a parent node.
> >
> > However, Hao is probably asking for ability to know exact deltas.
> >
> >
> > On Fri, Jan 17, 2014 at 8:15 AM, FPJ  wrote:
> >
> > > We don't need to have a mapping between every change and a
> notification.
> > If
> > > there are 2+ changes between notifications, you'll be able to observe
> it
> > by
> > > reading the ZK state. In fact, one nice side-effect is that we reduce
> the
> > > number of notifications when there are many concurrent changes.
> > >
> > > The only situation I can see it being necessary is the one in which we
> > need
> > > to know precisely the changes and we haven't cached a previous version
> of
> > > the state.
> > >
> > > -Flavio
> > >
> > > > -Original Message-
> > > > From: kishore g [mailto:g.kish...@gmail.com]
> > > > Sent: 17 January 2014 16:06
> > > > To: dev@zookeeper.apache.org
> > > > Subject: Re: Where are we in ZOOKEEPER-1416
> > > >
> > > > I think Hao is pointing out that there is no way to see every change
> > > > (delta) that happened to a znode. Consider 2 changes A,B in quick
> > > > succession. When client gets notified of A and before setting the
> watch
> > > the
> > > > change B has occurred on the server side. This means the client
> cannot
> > > know
> > > > the delta A. Client can only read the state after change B is
> applied.
> > > >
> > > > Implementing the concept of Persistent watcher guarantees that client
> > if
> > > > notified after every change.
> > > >
> > > > This is a nice to have feature but I dont understand the requirement
> in
> > > Hbase
> > > > where this is needed. Hao, can you shed more light on how this would
> be
> > > > useful for HBase (to act like state machine)
> > > >
> > > > thanks,
> > > > Kishore G
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Fri, Jan 17, 2014 at 5:18 AM, FPJ  wrote:
> > > >
> > > > > But you don't really miss events, you'll see them when you read the
> > ZK
> > > > > state. If you follow the pattern I described, you're supposed to
> > > > > observe all changes. Perhaps I'm missing some concrete use case you
> > > > > have mind.
> > > > >
> > > > > -Flavio
> > > > >
> > > > > > -Original Message-
> > > > > > From: 陈迪豪 [mailto:chendi...@xiaomi.com]
> > > > > > Sent: 17 January 2014 13:03
> > > > > > To: dev@zookeeper.apache.org
> > > > > > Subject: RE: Where are we in ZOOKEEPER-1416
> > > > > >
> > > > > > No, it's not complicated. But for the people who don't understand
> > zk
> > > > > deeply,
> > > > > > they would easily ignore the fact that they would miss events in
> > > > > > some
> > > > > way.
> > > > > > Moreover, I think providing persistent watch is good for
> developers
> > > > > > to
> > > > > build
> > > > > > the "state-machine" application. Actually, HBase suffer from
> > missing
> > > > > > the intermediate state when use zk to store the data.
> > > > > >
> > > > > > If the feature is implemented, I would like to see the patch and
> > > > > > consider
> > > > > if it
> > > > > > can be used for us.
> > > >

Re: Where are we in ZOOKEEPER-1416

2014-01-17 Thread Ted Yu
Having the ability to know exact deltas would help make HBase region
assignment more robust.

Cheers


On Fri, Jan 17, 2014 at 9:13 AM, kishore g  wrote:

> I agree with you, I like the side effect and in fact I would prefer to have
> one notification for all changes under a parent node.
>
> However, Hao is probably asking for ability to know exact deltas.
>
>
> On Fri, Jan 17, 2014 at 8:15 AM, FPJ  wrote:
>
> > We don't need to have a mapping between every change and a notification.
> If
> > there are 2+ changes between notifications, you'll be able to observe it
> by
> > reading the ZK state. In fact, one nice side-effect is that we reduce the
> > number of notifications when there are many concurrent changes.
> >
> > The only situation I can see it being necessary is the one in which we
> need
> > to know precisely the changes and we haven't cached a previous version of
> > the state.
> >
> > -Flavio
> >
> > > -Original Message-
> > > From: kishore g [mailto:g.kish...@gmail.com]
> > > Sent: 17 January 2014 16:06
> > > To: dev@zookeeper.apache.org
> > > Subject: Re: Where are we in ZOOKEEPER-1416
> > >
> > > I think Hao is pointing out that there is no way to see every change
> > > (delta) that happened to a znode. Consider 2 changes A,B in quick
> > > succession. When client gets notified of A and before setting the watch
> > the
> > > change B has occurred on the server side. This means the client cannot
> > know
> > > the delta A. Client can only read the state after change B is applied.
> > >
> > > Implementing the concept of Persistent watcher guarantees that client
> if
> > > notified after every change.
> > >
> > > This is a nice to have feature but I dont understand the requirement in
> > Hbase
> > > where this is needed. Hao, can you shed more light on how this would be
> > > useful for HBase (to act like state machine)
> > >
> > > thanks,
> > > Kishore G
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Fri, Jan 17, 2014 at 5:18 AM, FPJ  wrote:
> > >
> > > > But you don't really miss events, you'll see them when you read the
> ZK
> > > > state. If you follow the pattern I described, you're supposed to
> > > > observe all changes. Perhaps I'm missing some concrete use case you
> > > > have mind.
> > > >
> > > > -Flavio
> > > >
> > > > > -Original Message-
> > > > > From: 陈迪豪 [mailto:chendi...@xiaomi.com]
> > > > > Sent: 17 January 2014 13:03
> > > > > To: dev@zookeeper.apache.org
> > > > > Subject: RE: Where are we in ZOOKEEPER-1416
> > > > >
> > > > > No, it's not complicated. But for the people who don't understand
> zk
> > > > deeply,
> > > > > they would easily ignore the fact that they would miss events in
> > > > > some
> > > > way.
> > > > > Moreover, I think providing persistent watch is good for developers
> > > > > to
> > > > build
> > > > > the "state-machine" application. Actually, HBase suffer from
> missing
> > > > > the intermediate state when use zk to store the data.
> > > > >
> > > > > If the feature is implemented, I would like to see the patch and
> > > > > consider
> > > > if it
> > > > > can be used for us.
> > > > >
> > > > > 
> > > > > From: Flavio Junqueira [fpjunque...@yahoo.com]
> > > > > Sent: Friday, January 17, 2014 8:12 PM
> > > > > To: dev@zookeeper.apache.org
> > > > > Subject: RE: Where are we in ZOOKEEPER-1416
> > > > >
> > > > > My take is that persistent subscriptions add complexity and are not
> > > > strictly
> > > > > necessary. You can follow this pattern of setting a watch, reading
> > > > > the
> > > > state
> > > > > upon a notification and setting a new watch. Why do you feel that's
> > > > > complicated?
> > > > >
> > > > > -Flavio
> > > > >
> > > > > -Original Message-
> > > > > From: 陈迪豪 [mailto:chendi...@xiaomi.com]
> > > > > Sent: Friday, January 17, 2014 3:13 AM
> > > > > To: dev@zookeeper.apache.org
> > > > > Subject: Where are we in ZOOKEEPER-1416
> > > > >
> > > > >
> > > > >
> > > > > Persistent watch and implementing the feature to act like "state
> > > machine"
> > > > > which is mentioned in
> > > > > ZOOKEEPER-153 > > 153>,
> > > > > would be great for ZooKeeper user. I think HBase would like to know
> > > > > all
> > > > the
> > > > > change in zk rather than missing kind of events.
> > > > >
> > > > > So, would we continue developing these features? It's also a little
> > > > > complicated to develop with zk and I think there're lots of things
> > > > > to
> > > > improve.
> > > >
> > > >
> > > >
> >
> >
>


[jira] [Commented] (ZOOKEEPER-1861) ConcurrentHashMap isn't used properly in QuorumCnxManager

2014-01-15 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872288#comment-13872288
 ] 

Ted Yu commented on ZOOKEEPER-1861:
---

The above suggestion would involve more complex logic.

Maybe the first two hunks in patch v2 can be integrated first ?

> ConcurrentHashMap isn't used properly in QuorumCnxManager
> -
>
> Key: ZOOKEEPER-1861
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1861
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
> Attachments: zookeeper-1861-v1.txt, zookeeper-1861-v2.txt
>
>
> queueSendMap is a ConcurrentHashMap.
> At line 210:
> {code}
> if (!queueSendMap.containsKey(sid)) {
> queueSendMap.put(sid, new ArrayBlockingQueue(
> SEND_CAPACITY));
> {code}
> By the time control enters if block, there may be another concurrent put with 
> same sid to the ConcurrentHashMap.
> putIfAbsent() should be used.
> Similar issue occurs at line 307 as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (ZOOKEEPER-1861) ConcurrentHashMap isn't used properly in QuorumCnxManager

2014-01-14 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13871657#comment-13871657
 ] 

Ted Yu commented on ZOOKEEPER-1861:
---

To avoid allocating extra ArrayBlockingQueue, I am thinking of the following:
* create a singleton ArrayBlockingQueue which serves as marker
* if queueSendMap.putIfAbsent(sid, singleton) returns null, create the real 
ArrayBlockingQueue, named bq, and call queueSendMap.replace(sid, bq)
* if queueSendMap.putIfAbsent(sid, singleton) returns non-null value, check 
whether the return is singleton, if so, wait till queueSendMap.get(sid) returns 
a value which is not singleton.

> ConcurrentHashMap isn't used properly in QuorumCnxManager
> -
>
> Key: ZOOKEEPER-1861
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1861
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
> Attachments: zookeeper-1861-v1.txt, zookeeper-1861-v2.txt
>
>
> queueSendMap is a ConcurrentHashMap.
> At line 210:
> {code}
> if (!queueSendMap.containsKey(sid)) {
> queueSendMap.put(sid, new ArrayBlockingQueue(
> SEND_CAPACITY));
> {code}
> By the time control enters if block, there may be another concurrent put with 
> same sid to the ConcurrentHashMap.
> putIfAbsent() should be used.
> Similar issue occurs at line 307 as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (ZOOKEEPER-1861) ConcurrentHashMap isn't used properly in QuorumCnxManager

2014-01-14 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13870966#comment-13870966
 ] 

Ted Yu commented on ZOOKEEPER-1861:
---

[~michim]:
Can you take a look at patch v2 ?

> ConcurrentHashMap isn't used properly in QuorumCnxManager
> -
>
> Key: ZOOKEEPER-1861
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1861
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
> Attachments: zookeeper-1861-v1.txt, zookeeper-1861-v2.txt
>
>
> queueSendMap is a ConcurrentHashMap.
> At line 210:
> {code}
> if (!queueSendMap.containsKey(sid)) {
> queueSendMap.put(sid, new ArrayBlockingQueue(
> SEND_CAPACITY));
> {code}
> By the time control enters if block, there may be another concurrent put with 
> same sid to the ConcurrentHashMap.
> putIfAbsent() should be used.
> Similar issue occurs at line 307 as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (ZOOKEEPER-1861) ConcurrentHashMap isn't used properly in QuorumCnxManager

2014-01-13 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1861:
--

Attachment: zookeeper-1861-v2.txt

Patch v2 addresses Michi's comments

> ConcurrentHashMap isn't used properly in QuorumCnxManager
> -
>
> Key: ZOOKEEPER-1861
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1861
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
> Attachments: zookeeper-1861-v1.txt, zookeeper-1861-v2.txt
>
>
> queueSendMap is a ConcurrentHashMap.
> At line 210:
> {code}
> if (!queueSendMap.containsKey(sid)) {
> queueSendMap.put(sid, new ArrayBlockingQueue(
> SEND_CAPACITY));
> {code}
> By the time control enters if block, there may be another concurrent put with 
> same sid to the ConcurrentHashMap.
> putIfAbsent() should be used.
> Similar issue occurs at line 307 as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (ZOOKEEPER-1861) ConcurrentHashMap isn't used properly in QuorumCnxManager

2014-01-13 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1861:
--

Attachment: zookeeper-1861-v1.txt

Sure.

Here is the patch.

> ConcurrentHashMap isn't used properly in QuorumCnxManager
> -
>
> Key: ZOOKEEPER-1861
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1861
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
> Attachments: zookeeper-1861-v1.txt
>
>
> queueSendMap is a ConcurrentHashMap.
> At line 210:
> {code}
> if (!queueSendMap.containsKey(sid)) {
> queueSendMap.put(sid, new ArrayBlockingQueue(
> SEND_CAPACITY));
> {code}
> By the time control enters if block, there may be another concurrent put with 
> same sid to the ConcurrentHashMap.
> putIfAbsent() should be used.
> Similar issue occurs at line 307 as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (ZOOKEEPER-1861) ConcurrentHashMap isn't used properly in QuorumCnxManager

2014-01-11 Thread Ted Yu (JIRA)
Ted Yu created ZOOKEEPER-1861:
-

 Summary: ConcurrentHashMap isn't used properly in QuorumCnxManager
 Key: ZOOKEEPER-1861
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1861
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


queueSendMap is a ConcurrentHashMap.
At line 210:
{code}
if (!queueSendMap.containsKey(sid)) {
queueSendMap.put(sid, new ArrayBlockingQueue(
SEND_CAPACITY));
{code}
By the time control enters if block, there may be another concurrent put with 
same sid to the ConcurrentHashMap.
putIfAbsent() should be used.

Similar issue occurs at line 307 as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (ZOOKEEPER-1859) pwriter should be closed in NIOServerCnxn#checkFourLetterWord()

2014-01-11 Thread Ted Yu (JIRA)
Ted Yu created ZOOKEEPER-1859:
-

 Summary: pwriter should be closed in 
NIOServerCnxn#checkFourLetterWord()
 Key: ZOOKEEPER-1859
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1859
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


{code}
final PrintWriter pwriter = new PrintWriter(
new BufferedWriter(new SendBufferWriter()));
...
} else if (len == telnetCloseCmd) {
cleanupWriterSocket(null);
return true;
}
{code}
pwriter should be closed in case of telnetCloseCmd



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


  1   2   >