Re: Requesting reviews for ZOOKEEPER-236: SSL Support for Atomic Broadcast protocol
I can help review this On Apr 20, 2017 2:22 PM, "Abraham Fine" wrote: > Hello- > > I have been continuing work on ZOOKEEPER-236 and it would be great to > get feedback from the community on the patch. The pull request can be > found here: https://github.com/apache/zookeeper/pull/184 > > ZOOKEEPER-236 provides the ability to use SSL/TLS to secure > communication within the ZooKeeper quorum. > > Documentation will be handled in another pull request but the usage is > very similar to our existing Client <-> Quorum functionality, here is an > overview of the basic configuration. > > System properties are set on each member of the quorum, for example: > -Dzookeeper.ssl.quorum.keyStore.location=keystore.jks > -Dzookeeper.ssl.quorum.keyStore.password=password > -Dzookeeper.ssl.quorum.trustStore.location=truststore.jks > > A flag is set in the cfg files: > sslQuorum=true > > The best way to see all the functionality provided by this patch is to > take a look at the integration tests: > https://github.com/afine/zookeeper/blob/3c6c81b69b7105fa7c5235a0f27718 > a7eae195de/src/java/test/org/apache/zookeeper/test/QuorumSSLTest.java. > The integration tests contain examples showing how hostname > verification, rolling upgrades, cipher configuration, protocol > configuration, and certificate revocation are handled. > > There is a current outstanding question regarding hostname verification, > please provide input here: > https://github.com/apache/zookeeper/pull/184#discussion_r111485824 > > Looking forward to hearing everyone's thoughts. > > Thanks, > Abraham Fine > > >
[jira] [Commented] (ZOOKEEPER-2362) ZooKeeper multi / transaction allows partial read
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15977383#comment-15977383 ] Atri Sharma commented on ZOOKEEPER-2362: I see this issue to be still there. If it is a release blocker, can I go ahead and take this? [~hanm] > ZooKeeper multi / transaction allows partial read > - > > Key: ZOOKEEPER-2362 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2362 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Whitney Sorenson >Priority: Critical > Fix For: 3.5.4, 3.6.0, 3.4.11 > > > In this thread > http://mail-archives.apache.org/mod_mbox/zookeeper-user/201602.mbox/%3CCAPbqGzicBkLLyVDm7RFM20z0y3X1v1P-C9-1%3D%3D1DDqRDTzdOmQ%40mail.gmail.com%3E > , I discussed an issue I've now seen in multiple environments: > In a multi (using Curator), I write 2 new nodes. At some point, I issue 2 > reads for these new nodes. In one read, I see one of the new nodes. In a > subsequent read, I fail to see the other new node: > 1. Starting state : { /foo = , /bar = } > 2. In a multi, write: { /foo = A, /bar = B} > 3. Read /foo as A > 4. Read /bar as > #3 and #4 are issued 100% sequentially. > It is not known at what point during #2, #3 starts. > Note: the reads are getChildren() calls. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ZOOKEEPER-1692) Add support for single member ensemble
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15961749#comment-15961749 ] Atri Sharma commented on ZOOKEEPER-1692: Was this ever implemented? [~thawan] Can I help with this? > Add support for single member ensemble > -- > > Key: ZOOKEEPER-1692 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1692 > Project: ZooKeeper > Issue Type: Improvement > Components: quorum >Affects Versions: 3.4.0 >Reporter: Thawan Kooburat >Assignee: Thawan Kooburat >Priority: Minor > > In the past, we ran into problem where quorum could not be formed multiple > times. It take a while to investigate the root cause and fix the problem. > Our current solution is to make it possible to run a quorum with a single > member in it. Unlike standalone mode, it has to run as LeaderZooKeeper > server, so that the observers can connect to it. > This will allow the operator to use this workaround to bring back the > ensemble quickly while investigating the problem in background. > The main problem here is to allow the observers to connect with the leader > when the quorum size is reduced to one. We don't want to update the (static) > configuration on the observer since it require server restart. We are > thinking of allowing the observer to connect to any participant which > declared that it is the leader without running the leader election algorithm > (because it won't have enough votes). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ZOOKEEPER-1609) Improve ZooKeeper performance under mixed workload
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15960675#comment-15960675 ] Atri Sharma commented on ZOOKEEPER-1609: Folks, I am planning to work on this. Please let me know if it is relevant and is not done yet. [~shralex] > Improve ZooKeeper performance under mixed workload > -- > > Key: ZOOKEEPER-1609 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1609 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.4.3 >Reporter: Thawan Kooburat > > ZOOKEEPER-1505 allows 1 write or N reads to pass through the CommitProcessor > at any given time. I did performance experiment similar to > http://wiki.apache.org/hadoop/ZooKeeper/Performance and found that read > throughput drop dramatically when there are write requests. After a bit more > investigation, I found that > the biggest bottleneck is at the request queue entering the CommitProcessor. > When the CommitProcessor see any write request, it will need to block the > entire pipeline and wait until matching commit from the leader. This means > that all read requests (including ping request) won't be able to go through. > The time spent waiting for commit from the leader far exceed the time spent > waiting for 1 write to goes through the CommitProcessor. > The current plan is to create multiple request queues at the front of the > CommitProcessor. Requests are hashed using sessionId and send to one of the > queue. Whenever, the CommitProcessor saw a write request on one of the queue > it moves on to process read requests. It will have to unblock the write > requests in the same order that it sent to the leader, so it may need to > maintain a separate list to keep track of that. > The correctness is the same as having more learners in the ensemble. Sessions > which are hashed onto a different queue is similar to sessions connecting to > a different learners in the ensemble. > I am hoping that this will improve read throughput and reduce disconnect rate > on an ensemble with large number of clients -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ZOOKEEPER-2076) Improve Leader Change Mechanism
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15950539#comment-15950539 ] Atri Sharma commented on ZOOKEEPER-2076: Hi Folks, Is this still valid? [~shralex] If nobody is working on this, I can take it up > Improve Leader Change Mechanism > --- > > Key: ZOOKEEPER-2076 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2076 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.5.0 >Reporter: Alexander Shraer > > When a leader is removed during a reconfiguration, ZOOKEEPER-107 uses a > mechanism where the old leader nominates the new one. Although it reduces the > time for a new leader to be elected, it still takes too long. This JIRA is > for two things: > 1. Improve the mechanism, e.g., avoid loading snapshots, etc. during the > handoff. > 2. Make it a first-class citizen & export it as a client API. We get > questions about this once in a while - how do I cause a different leader to > be elected ? Currently the response is either kill or reconfigure the current > leader. > Any one interested to work on this ? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets
[ https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946569#comment-15946569 ] Atri Sharma commented on ZOOKEEPER-900: --- Ping? > FLE implementation should be improved to use non-blocking sockets > - > > Key: ZOOKEEPER-900 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900 > Project: ZooKeeper > Issue Type: Bug >Reporter: Vishal Kher >Assignee: Martin Kuchta >Priority: Critical > Fix For: 3.5.4, 3.6.0 > > Attachments: ZOOKEEPER-900-part2.patch, ZOOKEEPER-900.patch, > ZOOKEEPER-900.patch1, ZOOKEEPER-900.patch2 > > > From earlier email exchanges: > 1. Blocking connects and accepts: > a) The first problem is in manager.toSend(). This invokes connectOne(), which > does a blocking connect. While testing, I changed the code so that > connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() > does a socketChannel.connect(). After starting AsyncConnect, connectOne > starts a timer. connectOne continues with normal operations if the connection > is established before the timer expires, otherwise, when the timer expires it > interrupts AsyncConnect() thread and returns. In this way, I can have an > upper bound on the amount of time we need to wait for connect to succeed. Of > course, this was a quick fix for my testing. Ideally, we should use Selector > to do non-blocking connects/accepts. I am planning to do that later once we > at least have a quick fix for the problem and consensus from others for the > real fix (this problem is big blocker for us). Note that it is OK to do > blocking IO in SenderWorker and RecvWorker threads since they block IO to the > respective peer. > b) The blocking IO problem is not just restricted to connectOne(), but also > in receiveConnection(). The Listener thread calls receiveConnection() for > each incoming connection request. receiveConnection does blocking IO to get > peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the > peer that had sent the connection request. All of this is happening from the > Listener. In short, if a peer fails after initiating a connection, the > Listener thread won't be able to accept connections from other peers, because > it would be stuck in read() or connetOne(). Also the code has an inherent > cycle. initiateConnection() and receiveConnection() will have to be very > carefully synchronized otherwise, we could run into deadlocks. This code is > going to be difficult to maintain/modify. > Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
Re: ZOOKEEPER-900
Hi Patrick, I am fine with either. I tried pinging the owners but got no response. Please let me know. Regards, Atri On Tue, Mar 28, 2017 at 4:51 AM, Patrick Hunt wrote: > Hi Atri. Which do you intend to work on? 900, 901, or both? Typically if > someone is intending to work on something they will be listed as the > "assigned" in JIRA. They would be the first person to check in with. If > they no longer intend to work on something it's easy enough to reassign. > > Regards, > > Patrick > > On Sat, Mar 25, 2017 at 7:36 AM, Atri Sharma wrote: > >> Hi folks, >> >> I was looking to work on ZOOKEEPER-901. Could anybody please let me >> know if they are working on it? >> >> -- >> Regards, >> >> Atri >> l'apprenant >> -- Regards, Atri l'apprenant
ZOOKEEPER-900
Hi folks, I was looking to work on ZOOKEEPER-901. Could anybody please let me know if they are working on it? -- Regards, Atri l'apprenant
[jira] [Commented] (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets
[ https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15940894#comment-15940894 ] Atri Sharma commented on ZOOKEEPER-900: --- Could you please reassign it to me/ point me to another JIRA is this umbrella? > FLE implementation should be improved to use non-blocking sockets > - > > Key: ZOOKEEPER-900 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900 > Project: ZooKeeper > Issue Type: Bug >Reporter: Vishal Kher >Assignee: Martin Kuchta >Priority: Critical > Fix For: 3.5.4, 3.6.0 > > Attachments: ZOOKEEPER-900-part2.patch, ZOOKEEPER-900.patch, > ZOOKEEPER-900.patch1, ZOOKEEPER-900.patch2 > > > From earlier email exchanges: > 1. Blocking connects and accepts: > a) The first problem is in manager.toSend(). This invokes connectOne(), which > does a blocking connect. While testing, I changed the code so that > connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() > does a socketChannel.connect(). After starting AsyncConnect, connectOne > starts a timer. connectOne continues with normal operations if the connection > is established before the timer expires, otherwise, when the timer expires it > interrupts AsyncConnect() thread and returns. In this way, I can have an > upper bound on the amount of time we need to wait for connect to succeed. Of > course, this was a quick fix for my testing. Ideally, we should use Selector > to do non-blocking connects/accepts. I am planning to do that later once we > at least have a quick fix for the problem and consensus from others for the > real fix (this problem is big blocker for us). Note that it is OK to do > blocking IO in SenderWorker and RecvWorker threads since they block IO to the > respective peer. > b) The blocking IO problem is not just restricted to connectOne(), but also > in receiveConnection(). The Listener thread calls receiveConnection() for > each incoming connection request. receiveConnection does blocking IO to get > peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the > peer that had sent the connection request. All of this is happening from the > Listener. In short, if a peer fails after initiating a connection, the > Listener thread won't be able to accept connections from other peers, because > it would be stuck in read() or connetOne(). Also the code has an inherent > cycle. initiateConnection() and receiveConnection() will have to be very > carefully synchronized otherwise, we could run into deadlocks. This code is > going to be difficult to maintain/modify. > Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets
[ https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15940766#comment-15940766 ] Atri Sharma commented on ZOOKEEPER-900: --- Hi Folks, Is this still being worked on? I was thinking of taking this up. Please let me know > FLE implementation should be improved to use non-blocking sockets > - > > Key: ZOOKEEPER-900 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900 > Project: ZooKeeper > Issue Type: Bug >Reporter: Vishal Kher >Assignee: Martin Kuchta >Priority: Critical > Fix For: 3.5.4, 3.6.0 > > Attachments: ZOOKEEPER-900-part2.patch, ZOOKEEPER-900.patch, > ZOOKEEPER-900.patch1, ZOOKEEPER-900.patch2 > > > From earlier email exchanges: > 1. Blocking connects and accepts: > a) The first problem is in manager.toSend(). This invokes connectOne(), which > does a blocking connect. While testing, I changed the code so that > connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() > does a socketChannel.connect(). After starting AsyncConnect, connectOne > starts a timer. connectOne continues with normal operations if the connection > is established before the timer expires, otherwise, when the timer expires it > interrupts AsyncConnect() thread and returns. In this way, I can have an > upper bound on the amount of time we need to wait for connect to succeed. Of > course, this was a quick fix for my testing. Ideally, we should use Selector > to do non-blocking connects/accepts. I am planning to do that later once we > at least have a quick fix for the problem and consensus from others for the > real fix (this problem is big blocker for us). Note that it is OK to do > blocking IO in SenderWorker and RecvWorker threads since they block IO to the > respective peer. > b) The blocking IO problem is not just restricted to connectOne(), but also > in receiveConnection(). The Listener thread calls receiveConnection() for > each incoming connection request. receiveConnection does blocking IO to get > peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the > peer that had sent the connection request. All of this is happening from the > Listener. In short, if a peer fails after initiating a connection, the > Listener thread won't be able to accept connections from other peers, because > it would be stuck in read() or connetOne(). Also the code has an inherent > cycle. initiateConnection() and receiveConnection() will have to be very > carefully synchronized otherwise, we could run into deadlocks. This code is > going to be difficult to maintain/modify. > Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
Re: Contributing to Zookeeper
Thanks, that is very helpful On Mar 19, 2017 10:13 PM, "Jordan Zimmerman" wrote: > Atri, > > Docs regarding ZooKeeper are abundant on the Internet. Like all Apache > projects, the best place to start is the project website: > http://zookeeper.apache.org <http://zookeeper.apache.org/>. Also, like > most Apache projects there is a wiki with tons of information: > https://cwiki.apache.org/confluence/display/ZOOKEEPER/Index < > https://cwiki.apache.org/confluence/display/ZOOKEEPER/Index> > > -Jordan > > > On Mar 19, 2017, at 11:04 AM, Atri Sharma wrote: > > > > Thanks Camille. > > > > Could you please point me to some internals/code docs that I could > > refer to for getting started? > > > > Regards, > > > > Atri > > > > On Sun, Mar 19, 2017 at 1:48 AM, Camille Fournier > wrote: > >> Hi Atri, > >> > >> We're always happy to have folks contribute. I would recommend starting > by > >> reading the various documentation on how to contribute, and hanging out > on > >> the mailing lists for a while to get a feel for the project. Once you > see > >> something that makes sense for your skill set, volunteer an answer or a > >> patch. That's the way most of us have gotten involved here. > >> > >> Cheers, > >> C > >> > >> On Sat, Mar 18, 2017 at 10:20 AM, Atri Sharma > wrote: > >> > >>> Hi folks, > >>> > >>> Please advise > >>> > >>> On Mar 17, 2017 10:25 PM, "Atri Sharma" wrote: > >>> > >>>> Hi All, > >>>> > >>>> I am a distributed systems engineer with experience across different > >>>> spectrum of highly scalable systems and have worked with consistency > >>>> and quorum protocols. > >>>> > >>>> I would be happy to help out on any ongoing/needed feature in > >>>> Zookeeper. Please let me know. > >>>> > >>>> > >>>> Regards, > >>>> > >>>> Atri > >>>> > >>> > > > > > > > > -- > > Regards, > > > > Atri > > Apache Concerted > >
Re: Contributing to Zookeeper
Thanks Camille. Could you please point me to some internals/code docs that I could refer to for getting started? Regards, Atri On Sun, Mar 19, 2017 at 1:48 AM, Camille Fournier wrote: > Hi Atri, > > We're always happy to have folks contribute. I would recommend starting by > reading the various documentation on how to contribute, and hanging out on > the mailing lists for a while to get a feel for the project. Once you see > something that makes sense for your skill set, volunteer an answer or a > patch. That's the way most of us have gotten involved here. > > Cheers, > C > > On Sat, Mar 18, 2017 at 10:20 AM, Atri Sharma wrote: > >> Hi folks, >> >> Please advise >> >> On Mar 17, 2017 10:25 PM, "Atri Sharma" wrote: >> >> > Hi All, >> > >> > I am a distributed systems engineer with experience across different >> > spectrum of highly scalable systems and have worked with consistency >> > and quorum protocols. >> > >> > I would be happy to help out on any ongoing/needed feature in >> > Zookeeper. Please let me know. >> > >> > >> > Regards, >> > >> > Atri >> > >> -- Regards, Atri Apache Concerted
Re: Contributing to Zookeeper
Hi folks, Please advise On Mar 17, 2017 10:25 PM, "Atri Sharma" wrote: > Hi All, > > I am a distributed systems engineer with experience across different > spectrum of highly scalable systems and have worked with consistency > and quorum protocols. > > I would be happy to help out on any ongoing/needed feature in > Zookeeper. Please let me know. > > > Regards, > > Atri >
Contributing to Zookeeper
Hi All, I am a distributed systems engineer with experience across different spectrum of highly scalable systems and have worked with consistency and quorum protocols. I would be happy to help out on any ongoing/needed feature in Zookeeper. Please let me know. Regards, Atri