[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-01-14 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101270#comment-15101270
 ] 

Rakesh R commented on ZOOKEEPER-1936:
-

Thanks [~yuzhih...@gmail.com] for the fix. +1 for the additional {{#exists()}} 
check.

I've few comments:
# Could you consider {{snapDir}} creation too.
{code}
if (!this.snapDir.mkdirs()) {
throw new DatadirException("Unable to create snap directory "
+ this.snapDir);
}
{code}
# Can we reduce the nested calls. How about using AND operator like,
{code}
if (!this.dataDir.mkdirs() && !this.dataDir.exists())
{code}

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2353) QuorumCnxManager protocol needs to be upgradable with-in a specific Version

2016-01-14 Thread Akihiro Suda (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15099240#comment-15099240
 ] 

Akihiro Suda commented on ZOOKEEPER-2353:
-

OK, I got it. Thanks.


> QuorumCnxManager protocol needs to be upgradable with-in a specific Version
> ---
>
> Key: ZOOKEEPER-2353
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2353
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.4.7, 3.5.1
>Reporter: Powell Molleti
>
> Currently 3.5.X sends its hdr as follows:
> {code:title=QuorumCnxManager.java|borderStyle=solid}
> dout.writeLong(PROTOCOL_VERSION);
> dout.writeLong(self.getId());
> String addr = self.getElectionAddress().getHostString() + ":" + 
> self.getElectionAddress().getPort();
> byte[] addr_bytes = addr.getBytes();
> dout.writeInt(addr_bytes.length);
> dout.write(addr_bytes);
> dout.flush();
> {code}
> Since it writes length of host and port byte string there is no simple way to 
> append new fields to this hdr anymore. I.e the rx side has to consider all 
> bytes after sid for host and port parsing, which is what it does here:
> [QuorumCnxManager.InitialMessage.parse(): http://bit.ly/1Q0znpW]
> {code:title=QuorumCnxManager.java|borderStyle=solid}
> sid = din.readLong();
> int remaining = din.readInt();
> if (remaining <= 0 || remaining > maxBuffer) {
> throw new InitialMessageException(
> "Unreasonable buffer length: %s", remaining);
> }
> byte[] b = new byte[remaining];
> int num_read = din.read(b);
> if (num_read != remaining) {
> throw new InitialMessageException(
> "Read only %s bytes out of %s sent by server %s",
> num_read, remaining, sid);
> }
> // FIXME: IPv6 is not supported. Using something like Guava's 
> HostAndPort
> //parser would be good.
> String addr = new String(b);
> String[] host_port = addr.split(":");
> {code}
> This has been captured in the discussion here: ZOOKEEPER-2186.
> Though it is possible to circumvent this problem by various means the request 
> here is to design messages with hdr such that there is no need to bump 
> version number or hack certain fields (i.e figure out if its length of 
> host/port or length of different message etc, in the above case).
> This is the idea here as captured in ZOOKEEPER-2186.
> {code:java}
> dout.writeLong(PROTOCOL_VERSION);
> String addr = self.getElectionAddress().getHostString() + ":" + 
> self.getElectionAddress().getPort();
> byte[] addr_bytes = addr.getBytes();
> // After version write the total length of msg sent by sender.
> dout.writeInt(Long.BYTES + addr_bytes.length);   
> // Write sid afterwards
> dout.writeLong(self.getId());
> // Write length of host/port string   
> dout.writeInt(addr_bytes.length);
> // Write host/port string   
> dout.write(addr_bytes); 
> {code}
> Since total length of the message and length of each variable field is also 
> present it is quite easy to provide backward compatibility, w.r.t to parsing 
> of the message. 
> Older code will read the length of message it knows and ignore the rest. 
> Newer revision(s), that wants to keep things compatible, will only append to 
> hdr and not change the meaning of current fields.
> I am guessing this was the original intent w.r.t the introduction of protocol 
> version here: ZOOKEEPER-1633
> Since 3.4.x code does not parse this and 3.5.x is still in alpha mode perhaps 
> it is possible to consider this change now?.
> Also I would like to propose to carefully consider the option of using 
> protobufs for the next protocol version bump. This will prevent issues like 
> this in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-01-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15099184#comment-15099184
 ] 

Hadoop QA commented on ZOOKEEPER-1936:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12782377/ZOOKEEPER-1936.v2.patch
  against trunk revision 1720227.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3008//console

This message is automatically generated.

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: ZOOKEEPER-1936 PreCommit Build #3008

2016-01-14 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3008/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 104 lines...]
 [exec] PATCH APPLICATION FAILED
 [exec] 
 [exec] 
 [exec] 
 [exec] 
 [exec] -1 overall.  Here are the results of testing the latest attachment 
 [exec]   
http://issues.apache.org/jira/secure/attachment/12782377/ZOOKEEPER-1936.v2.patch
 [exec]   against trunk revision 1720227.
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
 [exec] Please justify why no new tests are needed 
for this patch.
 [exec] Also please list what manual steps were 
performed to verify this patch.
 [exec] 
 [exec] -1 patch.  The patch command could not apply the patch.
 [exec] 
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3008//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment added.
 [exec] 337cc67136223e3245ad0856681651f6e8ece7e2 logged out
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1816:
 exec returned: 1

Total time: 48 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Setting 
LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7
Recording test results
Setting 
LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7
ERROR: Publisher 'Publish JUnit test result report' failed: No test report 
files were found. Configuration error?
Setting 
LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7
[description-setter] Description set: ZOOKEEPER-1936
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Setting 
LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7
Setting 
LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7
Setting 
LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7
Setting 
LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7



###
## FAILED TESTS (if any) 
##
No tests ran.

[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-01-14 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15099001#comment-15099001
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1936:
---

cc: [~cnauroth], [~rakeshr]

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-01-14 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098999#comment-15098999
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1936:
---

Thanks [~te...@apache.org] - looking. Maybe we can get this in for 3.4.8 as 
well. 

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-01-14 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1936:
--
Attachment: ZOOKEEPER-1936.v2.patch

Alternate patch for consideration.

Only throw exception if dataDir doesn't exist and mkdirs() call fails.

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2353) QuorumCnxManager protocol needs to be upgradable with-in a specific Version

2016-01-14 Thread Ivan Kelly (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098376#comment-15098376
 ] 

Ivan Kelly commented on ZOOKEEPER-2353:
---

[~fpj] 3.4.8 will be in the same situation as all releases < 3.4.7, so there's 
no urgency in changing it.

If we are willing to break 3.5-alpha, we could add a version number to the 
protocol in 3.4.x, and then reimplement the 3.5 changes with the new version 
number. 

In 3.4, the sequence on receiving a connection is.
1. Read a long from socket. This is the sid of the connecting server.
2. We check the sid, if we chose to continue with the connection, we start 
reading packets and handing them off to FLE.
3. FLE checks if the buffer length is less than 28 bytes, if so discards it. 
Otherwise it processes it.

This means we can create a ClientHandshake packet, as long as we keep it below 
28bytes. 

To handle cases where a client that doesn't send ClientHandshake connects to a 
server that does use it, the server can mark and reset the socket InputSteam 
(not sure if it's supported). If doesn't even need to do this though. It can 
just ignore the first packet and wait for the next round of election to start.

> QuorumCnxManager protocol needs to be upgradable with-in a specific Version
> ---
>
> Key: ZOOKEEPER-2353
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2353
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.4.7, 3.5.1
>Reporter: Powell Molleti
>
> Currently 3.5.X sends its hdr as follows:
> {code:title=QuorumCnxManager.java|borderStyle=solid}
> dout.writeLong(PROTOCOL_VERSION);
> dout.writeLong(self.getId());
> String addr = self.getElectionAddress().getHostString() + ":" + 
> self.getElectionAddress().getPort();
> byte[] addr_bytes = addr.getBytes();
> dout.writeInt(addr_bytes.length);
> dout.write(addr_bytes);
> dout.flush();
> {code}
> Since it writes length of host and port byte string there is no simple way to 
> append new fields to this hdr anymore. I.e the rx side has to consider all 
> bytes after sid for host and port parsing, which is what it does here:
> [QuorumCnxManager.InitialMessage.parse(): http://bit.ly/1Q0znpW]
> {code:title=QuorumCnxManager.java|borderStyle=solid}
> sid = din.readLong();
> int remaining = din.readInt();
> if (remaining <= 0 || remaining > maxBuffer) {
> throw new InitialMessageException(
> "Unreasonable buffer length: %s", remaining);
> }
> byte[] b = new byte[remaining];
> int num_read = din.read(b);
> if (num_read != remaining) {
> throw new InitialMessageException(
> "Read only %s bytes out of %s sent by server %s",
> num_read, remaining, sid);
> }
> // FIXME: IPv6 is not supported. Using something like Guava's 
> HostAndPort
> //parser would be good.
> String addr = new String(b);
> String[] host_port = addr.split(":");
> {code}
> This has been captured in the discussion here: ZOOKEEPER-2186.
> Though it is possible to circumvent this problem by various means the request 
> here is to design messages with hdr such that there is no need to bump 
> version number or hack certain fields (i.e figure out if its length of 
> host/port or length of different message etc, in the above case).
> This is the idea here as captured in ZOOKEEPER-2186.
> {code:java}
> dout.writeLong(PROTOCOL_VERSION);
> String addr = self.getElectionAddress().getHostString() + ":" + 
> self.getElectionAddress().getPort();
> byte[] addr_bytes = addr.getBytes();
> // After version write the total length of msg sent by sender.
> dout.writeInt(Long.BYTES + addr_bytes.length);   
> // Write sid afterwards
> dout.writeLong(self.getId());
> // Write length of host/port string   
> dout.writeInt(addr_bytes.length);
> // Write host/port string   
> dout.write(addr_bytes); 
> {code}
> Since total length of the message and length of each variable field is also 
> present it is quite easy to provide backward compatibility, w.r.t to parsing 
> of the message. 
> Older code will read the length of message it knows and ignore the rest. 
> Newer revision(s), that wants to keep things compatible, will only append to 
> hdr and not change the meaning of current fields.
> I am guessing this was the original intent w.r.t the introduction of protocol 
> version here: ZOOKEEPER-1633
> Since 3.4.x code does not parse this and 3.5.x is still in alpha mode perhaps 
> it is possible to consider this change now?.
> Also I would like to propose to carefully consider the option of using 
> proto

[jira] [Commented] (ZOOKEEPER-1045) Quorum Peer mutual authentication

2016-01-14 Thread Ivan Kelly (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098287#comment-15098287
 ] 

Ivan Kelly commented on ZOOKEEPER-1045:
---

{quote}Two concerns that I have are , is it architecturally ok to enforce ZK to 
talk to an external server(perhaps on regular intervals) to form a quorum and 
if that is ok then is this the most widely used/requested feature by 
users.{quote}

The patch already supports DIGEST-MD5, so Krb isn't required to have auth.

Encryption of the links is a separate concern, and shouldn't be done as part of 
this change. There's already a JIRA for it (ZOOKEEPER-1000).

> Quorum Peer mutual authentication
> -
>
> Key: ZOOKEEPER-1045
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1045
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: server
>Reporter: Eugene Koontz
>Assignee: Rakesh R
> Attachments: ZOOKEEPER-1045-00.patch, ZOOKEEPER-1045-Rolling Upgrade 
> Design Proposal.pdf
>
>
> ZOOKEEPER-938 addresses mutual authentication between clients and servers. 
> This bug, on the other hand, is for authentication among quorum peers. 
> Hopefully much of the work done on SASL integration with Zookeeper for 
> ZOOKEEPER-938 can be used as a foundation for this enhancement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2353) QuorumCnxManager protocol needs to be upgradable with-in a specific Version

2016-01-14 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098143#comment-15098143
 ] 

Flavio Junqueira commented on ZOOKEEPER-2353:
-

[~suda] I'd really like to have us using a different serialization framework, 
even if we do it ourselves. There are actually two issues with the current 
serialization for leader election. First, it is mixed the election code and I'd 
rather separate it out. Second, it is not using the same serialization as the 
other protocols (Zab, client-server). We should fix it, but it is enough work 
to be done as separate jira rather than a secondary task here. An important 
question is which framework to use, and every time we raise this issue, we have 
different opinion on what to use. It'd be great if we could agree this time 
around and have it finally done.

> QuorumCnxManager protocol needs to be upgradable with-in a specific Version
> ---
>
> Key: ZOOKEEPER-2353
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2353
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.4.7, 3.5.1
>Reporter: Powell Molleti
>
> Currently 3.5.X sends its hdr as follows:
> {code:title=QuorumCnxManager.java|borderStyle=solid}
> dout.writeLong(PROTOCOL_VERSION);
> dout.writeLong(self.getId());
> String addr = self.getElectionAddress().getHostString() + ":" + 
> self.getElectionAddress().getPort();
> byte[] addr_bytes = addr.getBytes();
> dout.writeInt(addr_bytes.length);
> dout.write(addr_bytes);
> dout.flush();
> {code}
> Since it writes length of host and port byte string there is no simple way to 
> append new fields to this hdr anymore. I.e the rx side has to consider all 
> bytes after sid for host and port parsing, which is what it does here:
> [QuorumCnxManager.InitialMessage.parse(): http://bit.ly/1Q0znpW]
> {code:title=QuorumCnxManager.java|borderStyle=solid}
> sid = din.readLong();
> int remaining = din.readInt();
> if (remaining <= 0 || remaining > maxBuffer) {
> throw new InitialMessageException(
> "Unreasonable buffer length: %s", remaining);
> }
> byte[] b = new byte[remaining];
> int num_read = din.read(b);
> if (num_read != remaining) {
> throw new InitialMessageException(
> "Read only %s bytes out of %s sent by server %s",
> num_read, remaining, sid);
> }
> // FIXME: IPv6 is not supported. Using something like Guava's 
> HostAndPort
> //parser would be good.
> String addr = new String(b);
> String[] host_port = addr.split(":");
> {code}
> This has been captured in the discussion here: ZOOKEEPER-2186.
> Though it is possible to circumvent this problem by various means the request 
> here is to design messages with hdr such that there is no need to bump 
> version number or hack certain fields (i.e figure out if its length of 
> host/port or length of different message etc, in the above case).
> This is the idea here as captured in ZOOKEEPER-2186.
> {code:java}
> dout.writeLong(PROTOCOL_VERSION);
> String addr = self.getElectionAddress().getHostString() + ":" + 
> self.getElectionAddress().getPort();
> byte[] addr_bytes = addr.getBytes();
> // After version write the total length of msg sent by sender.
> dout.writeInt(Long.BYTES + addr_bytes.length);   
> // Write sid afterwards
> dout.writeLong(self.getId());
> // Write length of host/port string   
> dout.writeInt(addr_bytes.length);
> // Write host/port string   
> dout.write(addr_bytes); 
> {code}
> Since total length of the message and length of each variable field is also 
> present it is quite easy to provide backward compatibility, w.r.t to parsing 
> of the message. 
> Older code will read the length of message it knows and ignore the rest. 
> Newer revision(s), that wants to keep things compatible, will only append to 
> hdr and not change the meaning of current fields.
> I am guessing this was the original intent w.r.t the introduction of protocol 
> version here: ZOOKEEPER-1633
> Since 3.4.x code does not parse this and 3.5.x is still in alpha mode perhaps 
> it is possible to consider this change now?.
> Also I would like to propose to carefully consider the option of using 
> protobufs for the next protocol version bump. This will prevent issues like 
> this in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2353) QuorumCnxManager protocol needs to be upgradable with-in a specific Version

2016-01-14 Thread Akihiro Suda (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15097809#comment-15097809
 ] 

Akihiro Suda commented on ZOOKEEPER-2353:
-

You're right. All the discussion about protobuf is not related to QCM in 3.X 
codebase.
But I thought the link still can help for considering design of future possible 
protocols, even though the codebase differs.
Please feel free to remove the link if it's useless.
Thanks.


> QuorumCnxManager protocol needs to be upgradable with-in a specific Version
> ---
>
> Key: ZOOKEEPER-2353
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2353
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.4.7, 3.5.1
>Reporter: Powell Molleti
>
> Currently 3.5.X sends its hdr as follows:
> {code:title=QuorumCnxManager.java|borderStyle=solid}
> dout.writeLong(PROTOCOL_VERSION);
> dout.writeLong(self.getId());
> String addr = self.getElectionAddress().getHostString() + ":" + 
> self.getElectionAddress().getPort();
> byte[] addr_bytes = addr.getBytes();
> dout.writeInt(addr_bytes.length);
> dout.write(addr_bytes);
> dout.flush();
> {code}
> Since it writes length of host and port byte string there is no simple way to 
> append new fields to this hdr anymore. I.e the rx side has to consider all 
> bytes after sid for host and port parsing, which is what it does here:
> [QuorumCnxManager.InitialMessage.parse(): http://bit.ly/1Q0znpW]
> {code:title=QuorumCnxManager.java|borderStyle=solid}
> sid = din.readLong();
> int remaining = din.readInt();
> if (remaining <= 0 || remaining > maxBuffer) {
> throw new InitialMessageException(
> "Unreasonable buffer length: %s", remaining);
> }
> byte[] b = new byte[remaining];
> int num_read = din.read(b);
> if (num_read != remaining) {
> throw new InitialMessageException(
> "Read only %s bytes out of %s sent by server %s",
> num_read, remaining, sid);
> }
> // FIXME: IPv6 is not supported. Using something like Guava's 
> HostAndPort
> //parser would be good.
> String addr = new String(b);
> String[] host_port = addr.split(":");
> {code}
> This has been captured in the discussion here: ZOOKEEPER-2186.
> Though it is possible to circumvent this problem by various means the request 
> here is to design messages with hdr such that there is no need to bump 
> version number or hack certain fields (i.e figure out if its length of 
> host/port or length of different message etc, in the above case).
> This is the idea here as captured in ZOOKEEPER-2186.
> {code:java}
> dout.writeLong(PROTOCOL_VERSION);
> String addr = self.getElectionAddress().getHostString() + ":" + 
> self.getElectionAddress().getPort();
> byte[] addr_bytes = addr.getBytes();
> // After version write the total length of msg sent by sender.
> dout.writeInt(Long.BYTES + addr_bytes.length);   
> // Write sid afterwards
> dout.writeLong(self.getId());
> // Write length of host/port string   
> dout.writeInt(addr_bytes.length);
> // Write host/port string   
> dout.write(addr_bytes); 
> {code}
> Since total length of the message and length of each variable field is also 
> present it is quite easy to provide backward compatibility, w.r.t to parsing 
> of the message. 
> Older code will read the length of message it knows and ignore the rest. 
> Newer revision(s), that wants to keep things compatible, will only append to 
> hdr and not change the meaning of current fields.
> I am guessing this was the original intent w.r.t the introduction of protocol 
> version here: ZOOKEEPER-1633
> Since 3.4.x code does not parse this and 3.5.x is still in alpha mode perhaps 
> it is possible to consider this change now?.
> Also I would like to propose to carefully consider the option of using 
> protobufs for the next protocol version bump. This will prevent issues like 
> this in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)