[jira] [Updated] (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets

2014-09-17 Thread Reed Wanderman-Milne (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reed Wanderman-Milne updated ZOOKEEPER-900:
---
Attachment: ZOOKEEPER-900-part2.patch

I've attached a patch that fixes the blocking issue in connectOne(). I've moved 
much of the conncetion logic into SendWorker, so all the socket operations are 
done on a seperate thread. Some of the code in the two connectOne() methods 
were moved to SendWorker.conncetToServer. Additionally, receiveConnection() and 
initiateConnection() were moved to conncetOne. As a result, conncetOne() 
shouldn't wait for the connection to be established before returning.

One consequence of this is that SendWorker.finish() may block for the 
connection to be made, if it's called before a connection is established (since 
both finish() and SendWorker.establishConnection() are synchronized). This is 
better than blocking on connectOne(), but does anyone have any ideas to fix 
this?

 FLE implementation should be improved to use non-blocking sockets
 -

 Key: ZOOKEEPER-900
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Vishal Kher
Assignee: Vishal Kher
Priority: Critical
 Fix For: 3.5.1

 Attachments: ZOOKEEPER-900-part2.patch, ZOOKEEPER-900.patch, 
 ZOOKEEPER-900.patch1, ZOOKEEPER-900.patch2


 From earlier email exchanges:
 1. Blocking connects and accepts:
 a) The first problem is in manager.toSend(). This invokes connectOne(), which 
 does a blocking connect. While testing, I changed the code so that 
 connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() 
 does a socketChannel.connect(). After starting AsyncConnect, connectOne 
 starts a timer. connectOne continues with normal operations if the connection 
 is established before the timer expires, otherwise, when the timer expires it 
 interrupts AsyncConnect() thread and returns. In this way, I can have an 
 upper bound on the amount of time we need to wait for connect to succeed. Of 
 course, this was a quick fix for my testing. Ideally, we should use Selector 
 to do non-blocking connects/accepts. I am planning to do that later once we 
 at least have a quick fix for the problem and consensus from others for the 
 real fix (this problem is big blocker for us). Note that it is OK to do 
 blocking IO in SenderWorker and RecvWorker threads since they block IO to the 
 respective peer.
 b) The blocking IO problem is not just restricted to connectOne(), but also 
 in receiveConnection(). The Listener thread calls receiveConnection() for 
 each incoming connection request. receiveConnection does blocking IO to get 
 peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the 
 peer that had sent the connection request. All of this is happening from the 
 Listener. In short, if a peer fails after initiating a connection, the 
 Listener thread won't be able to accept connections from other peers, because 
 it would be stuck in read() or connetOne(). Also the code has an inherent 
 cycle. initiateConnection() and receiveConnection() will have to be very 
 carefully synchronized otherwise, we could run into deadlocks. This code is 
 going to be difficult to maintain/modify.
 Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets

2014-09-11 Thread Reed Wanderman-Milne (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130830#comment-14130830
 ] 

Reed Wanderman-Milne commented on ZOOKEEPER-900:


Hi,

I'm wondering if there's any progress on this JIRA. I'm running into an issue 
similar to that of ZOOKEEPER-1678, which can be solved by fixing this. If no 
one is working on it, I'd be happy to take a stab at it.

[~vishalmlst]'s patch added a timeout for connections to other peers, but it 
still seems appears that only one connection can be processed at a time. 
Additionally, in connectOne(long), a lock on the QuorumPeer is held, preventing 
other threads from accessing it. Both this issues seem to contribute to 
ZOOKEEPER-1678. [~vishalmlst] suggested in an earlier comment to move the 
socket operations to SenderWorker and RecvWorker, which would prevent socket 
operations from blocking other connections.

Let me know what your thoughts are. Thanks!

 FLE implementation should be improved to use non-blocking sockets
 -

 Key: ZOOKEEPER-900
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Vishal Kher
Assignee: Vishal Kher
Priority: Critical
 Fix For: 3.5.1

 Attachments: ZOOKEEPER-900.patch, ZOOKEEPER-900.patch1, 
 ZOOKEEPER-900.patch2


 From earlier email exchanges:
 1. Blocking connects and accepts:
 a) The first problem is in manager.toSend(). This invokes connectOne(), which 
 does a blocking connect. While testing, I changed the code so that 
 connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() 
 does a socketChannel.connect(). After starting AsyncConnect, connectOne 
 starts a timer. connectOne continues with normal operations if the connection 
 is established before the timer expires, otherwise, when the timer expires it 
 interrupts AsyncConnect() thread and returns. In this way, I can have an 
 upper bound on the amount of time we need to wait for connect to succeed. Of 
 course, this was a quick fix for my testing. Ideally, we should use Selector 
 to do non-blocking connects/accepts. I am planning to do that later once we 
 at least have a quick fix for the problem and consensus from others for the 
 real fix (this problem is big blocker for us). Note that it is OK to do 
 blocking IO in SenderWorker and RecvWorker threads since they block IO to the 
 respective peer.
 b) The blocking IO problem is not just restricted to connectOne(), but also 
 in receiveConnection(). The Listener thread calls receiveConnection() for 
 each incoming connection request. receiveConnection does blocking IO to get 
 peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the 
 peer that had sent the connection request. All of this is happening from the 
 Listener. In short, if a peer fails after initiating a connection, the 
 Listener thread won't be able to accept connections from other peers, because 
 it would be stuck in read() or connetOne(). Also the code has an inherent 
 cycle. initiateConnection() and receiveConnection() will have to be very 
 carefully synchronized otherwise, we could run into deadlocks. This code is 
 going to be difficult to maintain/modify.
 Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-1660) Add documentation for dynamic reconfiguration

2014-08-25 Thread Reed Wanderman-Milne (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reed Wanderman-Milne updated ZOOKEEPER-1660:


Attachment: ZOOKEEPER-1660-v3.patch

 Add documentation for dynamic reconfiguration
 -

 Key: ZOOKEEPER-1660
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1660
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: documentation
Affects Versions: 3.5.0
Reporter: Alexander Shraer
Assignee: Reed Wanderman-Milne
Priority: Blocker
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1660-v2.patch, ZOOKEEPER-1660-v3.patch, 
 ZOOKEEPER-1660.patch


 Update user manual with reconfiguration info.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ZOOKEEPER-1660) Add documentation for dynamic reconfiguration

2014-08-25 Thread Reed Wanderman-Milne (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109669#comment-14109669
 ] 

Reed Wanderman-Milne commented on ZOOKEEPER-1660:
-

Hi Alex,

That was a good change, the overview is much clearer now. Note that I had to 
format the paper citation slightly, since it appears Docbooks doesn't support 
line breaks. Tell me if there are any more changes to the Google Doc.

The patch contains a reference to the reconfig page from the Administrator's 
guide (in the section Configuration Parameters), so a reader should be able to 
figure out how to upgrade to 3.5.0.

 Add documentation for dynamic reconfiguration
 -

 Key: ZOOKEEPER-1660
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1660
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: documentation
Affects Versions: 3.5.0
Reporter: Alexander Shraer
Assignee: Reed Wanderman-Milne
Priority: Blocker
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1660-v2.patch, ZOOKEEPER-1660-v3.patch, 
 ZOOKEEPER-1660.patch


 Update user manual with reconfiguration info.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (ZOOKEEPER-1660) Add documentation for dynamic reconfiguration

2014-08-23 Thread Reed Wanderman-Milne (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reed Wanderman-Milne updated ZOOKEEPER-1660:


Attachment: ZOOKEEPER-1660-v2.patch

 Add documentation for dynamic reconfiguration
 -

 Key: ZOOKEEPER-1660
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1660
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: documentation
Affects Versions: 3.5.0
Reporter: Alexander Shraer
Assignee: Reed Wanderman-Milne
Priority: Blocker
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1660-v2.patch, ZOOKEEPER-1660.patch


 Update user manual with reconfiguration info.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ZOOKEEPER-1660) Add documentation for dynamic reconfiguration

2014-08-23 Thread Reed Wanderman-Milne (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108195#comment-14108195
 ] 

Reed Wanderman-Milne commented on ZOOKEEPER-1660:
-

Hi Alex,

Thanks for the updates. I made the changes, except for adding the comment about 
local sessions (which I can add later if necessary).

Maybe we should move the Upgrading to 3.5.0 to the Administrator's guide 
page, it doesn't seem directly related to dynamic reconfig. What do you think?

 Add documentation for dynamic reconfiguration
 -

 Key: ZOOKEEPER-1660
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1660
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: documentation
Affects Versions: 3.5.0
Reporter: Alexander Shraer
Assignee: Reed Wanderman-Milne
Priority: Blocker
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1660-v2.patch, ZOOKEEPER-1660.patch


 Update user manual with reconfiguration info.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (ZOOKEEPER-1660) Add documentation for dynamic reconfiguration

2014-08-22 Thread Reed Wanderman-Milne (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reed Wanderman-Milne updated ZOOKEEPER-1660:


Attachment: ZOOKEEPER-1660.patch

Here's a draft of the new documentation. I had to make some minor formatting 
changes from the Google Doc, since the Forrest Docbooks plugin doesn't support 
some formatting options.

 Add documentation for dynamic reconfiguration
 -

 Key: ZOOKEEPER-1660
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1660
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: documentation
Affects Versions: 3.5.0
Reporter: Alexander Shraer
Assignee: Reed Wanderman-Milne
Priority: Blocker
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1660.patch


 Update user manual with reconfiguration info.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ZOOKEEPER-1660) Add documentation for dynamic reconfiguration

2014-08-19 Thread Reed Wanderman-Milne (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102974#comment-14102974
 ] 

Reed Wanderman-Milne commented on ZOOKEEPER-1660:
-

I'll start working on the Forrest, doc then, I'll have it done in a few days.

 Add documentation for dynamic reconfiguration
 -

 Key: ZOOKEEPER-1660
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1660
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: documentation
Affects Versions: 3.5.0
Reporter: Alexander Shraer
Assignee: Reed Wanderman-Milne
Priority: Blocker
 Fix For: 3.5.0


 Update user manual with reconfiguration info.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (ZOOKEEPER-1991) zkServer.sh returns with a zero exit status when a ZooKeeper process is already running

2014-07-25 Thread Reed Wanderman-Milne (JIRA)
Reed Wanderman-Milne created ZOOKEEPER-1991:
---

 Summary: zkServer.sh returns with a zero exit status when a 
ZooKeeper process is already running
 Key: ZOOKEEPER-1991
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1991
 Project: ZooKeeper
  Issue Type: Bug
  Components: scripts
Affects Versions: 3.4.6
Reporter: Reed Wanderman-Milne
Priority: Minor


If ZooKeeper is started with zkServer.sh, and an error is shown that a 
ZooKeeper process is already running, the command returns with an exit status 
of 0, while it should end with a non-zero exit status.

Example:
$ bin/zkServer.sh start
JMX enabled by default
Using config: /home/reed/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... already running as process 25063.
$ echo $?
0

This can make it difficult for automated scripts to check if creating a new 
ZooKeeper process was successful, as it won't catch if a user accidentally 
started it before. 




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ZOOKEEPER-1660) Add documentation for dynamic reconfiguration

2014-07-24 Thread Reed Wanderman-Milne (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073718#comment-14073718
 ] 

Reed Wanderman-Milne commented on ZOOKEEPER-1660:
-

I spoke to [~shralex], and agreed to create the forrest docs, once the Google 
Doc is updated to its near-final version.

 Add documentation for dynamic reconfiguration
 -

 Key: ZOOKEEPER-1660
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1660
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: documentation
Affects Versions: 3.5.0
Reporter: Alexander Shraer
Assignee: Alexander Shraer
Priority: Blocker
 Fix For: 3.5.0


 Update user manual with reconfiguration info.



--
This message was sent by Atlassian JIRA
(v6.2#6252)