Re: Requesting reviews for ZOOKEEPER-236: SSL Support for Atomic Broadcast protocol

2017-04-20 Thread Atri Sharma
I can help review this

On Apr 20, 2017 2:22 PM, "Abraham Fine"  wrote:

> Hello-
>
> I have been continuing work on ZOOKEEPER-236 and it would be great to
> get feedback from the community on the patch. The pull request can be
> found here: https://github.com/apache/zookeeper/pull/184
>
> ZOOKEEPER-236 provides the ability to use SSL/TLS to secure
> communication within the ZooKeeper quorum.
>
> Documentation will be handled in another pull request but the usage is
> very similar to our existing Client <-> Quorum functionality, here is an
> overview of the basic configuration.
>
> System properties are set on each member of the quorum, for example:
> -Dzookeeper.ssl.quorum.keyStore.location=keystore.jks
> -Dzookeeper.ssl.quorum.keyStore.password=password
> -Dzookeeper.ssl.quorum.trustStore.location=truststore.jks
>
> A flag is set in the cfg files:
> sslQuorum=true
>
> The best way to see all the functionality provided by this patch is to
> take a look at the integration tests:
> https://github.com/afine/zookeeper/blob/3c6c81b69b7105fa7c5235a0f27718
> a7eae195de/src/java/test/org/apache/zookeeper/test/QuorumSSLTest.java.
> The integration tests contain examples showing how hostname
> verification, rolling upgrades, cipher configuration, protocol
> configuration, and certificate revocation are handled.
>
> There is a current outstanding question regarding hostname verification,
> please provide input here:
> https://github.com/apache/zookeeper/pull/184#discussion_r111485824
>
> Looking forward to hearing everyone's thoughts.
>
> Thanks,
> Abraham Fine
>
>
>


[jira] [Commented] (ZOOKEEPER-2362) ZooKeeper multi / transaction allows partial read

2017-04-20 Thread Atri Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15977383#comment-15977383
 ] 

Atri Sharma commented on ZOOKEEPER-2362:


I see this issue to be still there. If it is a release blocker, can I go ahead 
and take this? [~hanm]

> ZooKeeper multi / transaction allows partial read
> -
>
> Key: ZOOKEEPER-2362
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2362
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Whitney Sorenson
>Priority: Critical
> Fix For: 3.5.4, 3.6.0, 3.4.11
>
>
> In this thread 
> http://mail-archives.apache.org/mod_mbox/zookeeper-user/201602.mbox/%3CCAPbqGzicBkLLyVDm7RFM20z0y3X1v1P-C9-1%3D%3D1DDqRDTzdOmQ%40mail.gmail.com%3E
>  , I discussed an issue I've now seen in multiple environments:
> In a multi (using Curator), I write 2 new nodes. At some point, I issue 2 
> reads for these new nodes. In one read, I see one of the new nodes. In a 
> subsequent read, I fail to see the other new node:
> 1. Starting state : { /foo = , /bar =  }
> 2. In a multi, write: { /foo = A, /bar = B}
> 3. Read /foo as A
> 4. Read /bar as  
> #3 and #4 are issued 100% sequentially.
> It is not known at what point during #2, #3 starts.
> Note: the reads are getChildren() calls.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ZOOKEEPER-1692) Add support for single member ensemble

2017-04-08 Thread Atri Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15961749#comment-15961749
 ] 

Atri Sharma commented on ZOOKEEPER-1692:


Was this ever implemented? [~thawan] Can I help with this?

> Add support for single member ensemble
> --
>
> Key: ZOOKEEPER-1692
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1692
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: quorum
>Affects Versions: 3.4.0
>Reporter: Thawan Kooburat
>Assignee: Thawan Kooburat
>Priority: Minor
>
> In the past, we ran into problem where quorum could not be formed multiple 
> times. It take a while to investigate the root cause and fix the problem.
> Our current solution is to make it possible to run a quorum with a single 
> member in it. Unlike standalone mode, it has to run as LeaderZooKeeper 
> server, so that the observers can connect to it. 
> This will allow the operator to use this workaround to bring back the 
> ensemble quickly while investigating the problem in background.
> The main problem here is to allow the observers to connect with the leader 
> when the quorum size is reduced to one. We don't want to update the (static) 
> configuration on the observer since it require server restart. We are 
> thinking of allowing the observer to connect to any participant which 
> declared that it is the leader without running the leader election algorithm 
> (because it won't have enough votes).  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ZOOKEEPER-1609) Improve ZooKeeper performance under mixed workload

2017-04-07 Thread Atri Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15960675#comment-15960675
 ] 

Atri Sharma commented on ZOOKEEPER-1609:


Folks, I am planning to work on this. Please let me know if it is relevant and 
is not done yet. [~shralex]

> Improve ZooKeeper performance under mixed workload
> --
>
> Key: ZOOKEEPER-1609
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1609
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.4.3
>Reporter: Thawan Kooburat
>
> ZOOKEEPER-1505 allows 1 write or N reads to pass through the CommitProcessor 
> at any given time. I did performance experiment similar to 
> http://wiki.apache.org/hadoop/ZooKeeper/Performance and found that read 
> throughput drop dramatically when there are write requests. After a bit more 
> investigation, I found that
> the biggest bottleneck is at the request queue entering the CommitProcessor.
> When the CommitProcessor see any write request, it will need to block the 
> entire pipeline and wait until matching commit from the leader. This means 
> that all read requests (including ping request) won't be able to go through. 
> The time spent waiting for commit from the leader far exceed the time spent 
> waiting for 1 write to goes through the CommitProcessor. 
> The current plan is to create multiple request queues at the front of the 
> CommitProcessor. Requests are hashed using sessionId and send to one of the 
> queue. Whenever, the CommitProcessor saw a write request on one of the queue 
> it moves on to process read requests. It will have to unblock the write 
> requests in the same order that it sent to the leader, so it may need to 
> maintain a separate list to keep track of that. 
> The correctness is the same as having more learners in the ensemble. Sessions 
> which are hashed onto a different queue is similar to sessions connecting to 
> a different learners in the ensemble. 
> I am hoping that this will improve read throughput and reduce disconnect rate 
> on an ensemble with large number of clients  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ZOOKEEPER-2076) Improve Leader Change Mechanism

2017-03-31 Thread Atri Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15950539#comment-15950539
 ] 

Atri Sharma commented on ZOOKEEPER-2076:


Hi Folks,

Is this still valid? [~shralex]

If nobody is working on this, I can take it up

> Improve Leader Change Mechanism
> ---
>
> Key: ZOOKEEPER-2076
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2076
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.5.0
>Reporter: Alexander Shraer
>
> When a leader is removed during a reconfiguration, ZOOKEEPER-107 uses a 
> mechanism where the old leader nominates the new one. Although it reduces the 
> time for a new leader to be elected, it still takes too long. This JIRA is 
> for two things:
> 1. Improve the mechanism, e.g., avoid loading snapshots, etc. during the 
> handoff.
> 2. Make it a first-class citizen & export it as a client API. We get 
> questions about this once in a while - how do I cause a different leader to 
> be elected ? Currently the response is either kill or reconfigure the current 
> leader.
> Any one interested to work on this ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets

2017-03-28 Thread Atri Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946569#comment-15946569
 ] 

Atri Sharma commented on ZOOKEEPER-900:
---

Ping?

> FLE implementation should be improved to use non-blocking sockets
> -
>
> Key: ZOOKEEPER-900
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Vishal Kher
>Assignee: Martin Kuchta
>Priority: Critical
> Fix For: 3.5.4, 3.6.0
>
> Attachments: ZOOKEEPER-900-part2.patch, ZOOKEEPER-900.patch, 
> ZOOKEEPER-900.patch1, ZOOKEEPER-900.patch2
>
>
> From earlier email exchanges:
> 1. Blocking connects and accepts:
> a) The first problem is in manager.toSend(). This invokes connectOne(), which 
> does a blocking connect. While testing, I changed the code so that 
> connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() 
> does a socketChannel.connect(). After starting AsyncConnect, connectOne 
> starts a timer. connectOne continues with normal operations if the connection 
> is established before the timer expires, otherwise, when the timer expires it 
> interrupts AsyncConnect() thread and returns. In this way, I can have an 
> upper bound on the amount of time we need to wait for connect to succeed. Of 
> course, this was a quick fix for my testing. Ideally, we should use Selector 
> to do non-blocking connects/accepts. I am planning to do that later once we 
> at least have a quick fix for the problem and consensus from others for the 
> real fix (this problem is big blocker for us). Note that it is OK to do 
> blocking IO in SenderWorker and RecvWorker threads since they block IO to the 
> respective peer.
> b) The blocking IO problem is not just restricted to connectOne(), but also 
> in receiveConnection(). The Listener thread calls receiveConnection() for 
> each incoming connection request. receiveConnection does blocking IO to get 
> peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the 
> peer that had sent the connection request. All of this is happening from the 
> Listener. In short, if a peer fails after initiating a connection, the 
> Listener thread won't be able to accept connections from other peers, because 
> it would be stuck in read() or connetOne(). Also the code has an inherent 
> cycle. initiateConnection() and receiveConnection() will have to be very 
> carefully synchronized otherwise, we could run into deadlocks. This code is 
> going to be difficult to maintain/modify.
> Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: ZOOKEEPER-900

2017-03-27 Thread Atri Sharma
Hi Patrick,

I am fine with either. I tried pinging the owners but got no response.

Please let me know.

Regards,

Atri

On Tue, Mar 28, 2017 at 4:51 AM, Patrick Hunt  wrote:
> Hi Atri. Which do you intend to work on? 900, 901, or both? Typically if
> someone is intending to work on something they will be listed as the
> "assigned" in JIRA. They would be the first person to check in with. If
> they no longer intend to work on something it's easy enough to reassign.
>
> Regards,
>
> Patrick
>
> On Sat, Mar 25, 2017 at 7:36 AM, Atri Sharma  wrote:
>
>> Hi folks,
>>
>> I was looking to work on ZOOKEEPER-901. Could anybody please let me
>> know if they are working on it?
>>
>> --
>> Regards,
>>
>> Atri
>> l'apprenant
>>



-- 
Regards,

Atri
l'apprenant


ZOOKEEPER-900

2017-03-25 Thread Atri Sharma
Hi folks,

I was looking to work on ZOOKEEPER-901. Could anybody please let me
know if they are working on it?

-- 
Regards,

Atri
l'apprenant


[jira] [Commented] (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets

2017-03-24 Thread Atri Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15940894#comment-15940894
 ] 

Atri Sharma commented on ZOOKEEPER-900:
---

Could you please reassign it to me/ point me to another JIRA is this umbrella?

> FLE implementation should be improved to use non-blocking sockets
> -
>
> Key: ZOOKEEPER-900
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Vishal Kher
>Assignee: Martin Kuchta
>Priority: Critical
> Fix For: 3.5.4, 3.6.0
>
> Attachments: ZOOKEEPER-900-part2.patch, ZOOKEEPER-900.patch, 
> ZOOKEEPER-900.patch1, ZOOKEEPER-900.patch2
>
>
> From earlier email exchanges:
> 1. Blocking connects and accepts:
> a) The first problem is in manager.toSend(). This invokes connectOne(), which 
> does a blocking connect. While testing, I changed the code so that 
> connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() 
> does a socketChannel.connect(). After starting AsyncConnect, connectOne 
> starts a timer. connectOne continues with normal operations if the connection 
> is established before the timer expires, otherwise, when the timer expires it 
> interrupts AsyncConnect() thread and returns. In this way, I can have an 
> upper bound on the amount of time we need to wait for connect to succeed. Of 
> course, this was a quick fix for my testing. Ideally, we should use Selector 
> to do non-blocking connects/accepts. I am planning to do that later once we 
> at least have a quick fix for the problem and consensus from others for the 
> real fix (this problem is big blocker for us). Note that it is OK to do 
> blocking IO in SenderWorker and RecvWorker threads since they block IO to the 
> respective peer.
> b) The blocking IO problem is not just restricted to connectOne(), but also 
> in receiveConnection(). The Listener thread calls receiveConnection() for 
> each incoming connection request. receiveConnection does blocking IO to get 
> peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the 
> peer that had sent the connection request. All of this is happening from the 
> Listener. In short, if a peer fails after initiating a connection, the 
> Listener thread won't be able to accept connections from other peers, because 
> it would be stuck in read() or connetOne(). Also the code has an inherent 
> cycle. initiateConnection() and receiveConnection() will have to be very 
> carefully synchronized otherwise, we could run into deadlocks. This code is 
> going to be difficult to maintain/modify.
> Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets

2017-03-24 Thread Atri Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15940766#comment-15940766
 ] 

Atri Sharma commented on ZOOKEEPER-900:
---

Hi Folks,

Is this still being worked on? I was thinking of taking this up.

Please let me know

> FLE implementation should be improved to use non-blocking sockets
> -
>
> Key: ZOOKEEPER-900
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Vishal Kher
>Assignee: Martin Kuchta
>Priority: Critical
> Fix For: 3.5.4, 3.6.0
>
> Attachments: ZOOKEEPER-900-part2.patch, ZOOKEEPER-900.patch, 
> ZOOKEEPER-900.patch1, ZOOKEEPER-900.patch2
>
>
> From earlier email exchanges:
> 1. Blocking connects and accepts:
> a) The first problem is in manager.toSend(). This invokes connectOne(), which 
> does a blocking connect. While testing, I changed the code so that 
> connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() 
> does a socketChannel.connect(). After starting AsyncConnect, connectOne 
> starts a timer. connectOne continues with normal operations if the connection 
> is established before the timer expires, otherwise, when the timer expires it 
> interrupts AsyncConnect() thread and returns. In this way, I can have an 
> upper bound on the amount of time we need to wait for connect to succeed. Of 
> course, this was a quick fix for my testing. Ideally, we should use Selector 
> to do non-blocking connects/accepts. I am planning to do that later once we 
> at least have a quick fix for the problem and consensus from others for the 
> real fix (this problem is big blocker for us). Note that it is OK to do 
> blocking IO in SenderWorker and RecvWorker threads since they block IO to the 
> respective peer.
> b) The blocking IO problem is not just restricted to connectOne(), but also 
> in receiveConnection(). The Listener thread calls receiveConnection() for 
> each incoming connection request. receiveConnection does blocking IO to get 
> peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the 
> peer that had sent the connection request. All of this is happening from the 
> Listener. In short, if a peer fails after initiating a connection, the 
> Listener thread won't be able to accept connections from other peers, because 
> it would be stuck in read() or connetOne(). Also the code has an inherent 
> cycle. initiateConnection() and receiveConnection() will have to be very 
> carefully synchronized otherwise, we could run into deadlocks. This code is 
> going to be difficult to maintain/modify.
> Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Contributing to Zookeeper

2017-03-19 Thread Atri Sharma
Thanks, that is very helpful

On Mar 19, 2017 10:13 PM, "Jordan Zimmerman" 
wrote:

> Atri,
>
> Docs regarding ZooKeeper are abundant on the Internet. Like all Apache
> projects, the best place to start is the project website:
> http://zookeeper.apache.org <http://zookeeper.apache.org/>. Also, like
> most Apache projects there is a wiki with tons of information:
> https://cwiki.apache.org/confluence/display/ZOOKEEPER/Index <
> https://cwiki.apache.org/confluence/display/ZOOKEEPER/Index>
>
> -Jordan
>
> > On Mar 19, 2017, at 11:04 AM, Atri Sharma  wrote:
> >
> > Thanks Camille.
> >
> > Could you please point me to some internals/code docs that I could
> > refer to for getting started?
> >
> > Regards,
> >
> > Atri
> >
> > On Sun, Mar 19, 2017 at 1:48 AM, Camille Fournier 
> wrote:
> >> Hi Atri,
> >>
> >> We're always happy to have folks contribute. I would recommend starting
> by
> >> reading the various documentation on how to contribute, and hanging out
> on
> >> the mailing lists for a while to get a feel for the project. Once you
> see
> >> something that makes sense for your skill set, volunteer an answer or a
> >> patch. That's the way most of us have gotten involved here.
> >>
> >> Cheers,
> >> C
> >>
> >> On Sat, Mar 18, 2017 at 10:20 AM, Atri Sharma 
> wrote:
> >>
> >>> Hi folks,
> >>>
> >>> Please advise
> >>>
> >>> On Mar 17, 2017 10:25 PM, "Atri Sharma"  wrote:
> >>>
> >>>> Hi All,
> >>>>
> >>>> I am a distributed systems engineer with experience across different
> >>>> spectrum of highly scalable systems and have worked with consistency
> >>>> and quorum protocols.
> >>>>
> >>>> I would be happy to help out on any ongoing/needed feature in
> >>>> Zookeeper. Please let me know.
> >>>>
> >>>>
> >>>> Regards,
> >>>>
> >>>> Atri
> >>>>
> >>>
> >
> >
> >
> > --
> > Regards,
> >
> > Atri
> > Apache Concerted
>
>


Re: Contributing to Zookeeper

2017-03-19 Thread Atri Sharma
Thanks Camille.

Could you please point me to some internals/code docs that I could
refer to for getting started?

Regards,

Atri

On Sun, Mar 19, 2017 at 1:48 AM, Camille Fournier  wrote:
> Hi Atri,
>
> We're always happy to have folks contribute. I would recommend starting by
> reading the various documentation on how to contribute, and hanging out on
> the mailing lists for a while to get a feel for the project. Once you see
> something that makes sense for your skill set, volunteer an answer or a
> patch. That's the way most of us have gotten involved here.
>
> Cheers,
> C
>
> On Sat, Mar 18, 2017 at 10:20 AM, Atri Sharma  wrote:
>
>> Hi folks,
>>
>> Please advise
>>
>> On Mar 17, 2017 10:25 PM, "Atri Sharma"  wrote:
>>
>> > Hi All,
>> >
>> > I am a distributed systems engineer with experience across different
>> > spectrum of highly scalable systems and have worked with consistency
>> > and quorum protocols.
>> >
>> > I would be happy to help out on any ongoing/needed feature in
>> > Zookeeper. Please let me know.
>> >
>> >
>> > Regards,
>> >
>> > Atri
>> >
>>



-- 
Regards,

Atri
Apache Concerted


Re: Contributing to Zookeeper

2017-03-18 Thread Atri Sharma
Hi folks,

Please advise

On Mar 17, 2017 10:25 PM, "Atri Sharma"  wrote:

> Hi All,
>
> I am a distributed systems engineer with experience across different
> spectrum of highly scalable systems and have worked with consistency
> and quorum protocols.
>
> I would be happy to help out on any ongoing/needed feature in
> Zookeeper. Please let me know.
>
>
> Regards,
>
> Atri
>


Contributing to Zookeeper

2017-03-17 Thread Atri Sharma
Hi All,

I am a distributed systems engineer with experience across different
spectrum of highly scalable systems and have worked with consistency
and quorum protocols.

I would be happy to help out on any ongoing/needed feature in
Zookeeper. Please let me know.


Regards,

Atri