> On 24 Oct 2018, at 17:56, thierry bordaz <tbor...@redhat.com> wrote:
> 
> 
> 
> On 10/21/2018 03:55 AM, William Brown wrote:
>> Thanks for this write up Thierry,
>> 
>>> On 19 Oct 2018, at 17:15, thierry bordaz <tbor...@redhat.com> wrote:
>>> 
>>> Hi,
>>> 
>>> C10K is a scalability problem that a server can face when dealing with 
>>> events of thousands of connections (i.e. clients) at the same time. Events 
>>> can be new connections, new operations on the established connections, 
>>> closure of connection (from client or server)
>>> For 389-ds, C10K problem was resolved with a new framework Nunc-Stans [1]. 
>>> Nunc-stans was first enabled in RHDS 7.4 and improved/fixed in 7.5. 
>>> Robustness issues [2] and [3] were reported in 7.5 and it was decided to 
>>> disable Nunc-stans. It is not known if those issues exist or not in 7.4.
>>> 
>>> William posted a PR to fix those two issues [4]. Nunc-stans is a complex 
>>> framework, with its own dynamic. Review of this PR is not easy and even a 
>>> careful review may not guaranty it will fix [2] and [3] and may not 
>>> introduce others unexpected side effects.
>>> 
>>> From there we discussed two options (but there may be others):
>>> 
>>>     • Review and merge the PR [4], then later run some intensive tests 
>>> aiming to verify [2],[3] and checking the robustness in order to reenable NS
>> I think this is the best solution. If NS is “not working” today, then 
>> merging the PR won’t make it “not work” any less ;)
>> 
>> Given we have a reliable way to disabled it at build time, I think that 
>> merging, testing, and then eventually discussion and decision of enabling by 
>> default is the best plan.
> 
> Hi William,
> 
> The existing set of tests reveal a single failure[1]. That shows the PR is 
> really promising but at the same time that tests are crucial with NS. Like 
> the two robustness bugs (conn leak and deadlock) only happening is some 
> specific conditions.
> 
> I think review on large NS PR is bit formal step and the most important are 
> tests/measure. We may agree to push the NS patches base on existing tests 
> results.  New tests/measure (for benefits and [2] and [3]) on top of the 
> current PR could be the option. Let's wait for additional feedback

Sure. I will work on Viktor's comments and found issues with paged search, and 
I’ll commit some more time to running the tests if I can. 

The reason I’m pushing to “just merge and test/fix later” is that at the moment 
I have very little time, and I don’t want to see my work bit-rot over time, and 
get further and further away. It’s going to make future fixes and improvements 
to the space easier once it’s merged.

Saying that I think it would be good to have a deadline or a list of people we 
want comments from. Mark, Simon, Ludwig, Viktor, yourself and myself come to 
mind … we need to know what our approach is, and this ties back to the “feature 
gating” discussion that I hope you saw. We need a way to have features we are 
working on, so we can test and reason about them, but without letting them out 
in production - and having ways to roll the back. 

Thanks, 


> 
> [1] https://pagure.io/389-ds-base/pull-request/49636#comment-66941
> 
> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1608746 deadlock
> [3] https://bugzilla.redhat.com/show_bug.cgi?id=1605554 connection leaks
> 
> best regards
> thierry
> 
>>>     • Build some tests for
>>>             • measure the benefit of NS as [2] and [3] do not prevent some 
>>> performance tests
>>>             • identify possible reproducers for [2] and [3]
>>>             • create robustness and long duration NS specific tests
>>>             • review and merge the PR [4]
>>> As PR [4] is not intended for perf improvement, the step 2.1 will impact 
>>> the priority according to the performance benefits.
>> I think that the test plan is good. 2 and 3 will be hard to reproduce I 
>> think, they seem like they require complex state and interactions. The goal 
>> of the PR I have open is to reduce the complexity of NS integration into DS, 
>> so that tests for situations like 2 and 3 can be more easily created (and 
>> hopefully the simpler integration is going to resolve some issues).
>> 
>> The “benefit” of NS is hard to say - because the other parts of the server 
>> core are still needing threading improvement, NS may help some aspects 
>> (connection acceptance performance for example), we won’t see all the of the 
>> benefits until we improve other parts. There are many bottlenecks to fix! An 
>> example is lib globs high use of atomics (should be COW), and the plugin 
>> architecture.
>> 
>> At a theoretical level, NS will be faster as we don’t need to lock and poll 
>> over the connection table, but today the CT may not be the bottleneck, so NS 
>> may not make this “faster” just by enabling.
>> 
>>> Comments are welcomed
>>> 
>>> Regarding 2.1 plan we made the following notes for the test plan:
>>> The benefit of Nunc-Stans can only be measure with a large number of 
>>> connections (i.e. client) above 1000. That means a set of clients (sometime 
>>> all) should keep their connection opened. Clients should run on several 
>>> hosts so that clients are not the bootleneck.
>>> 
>>> For the two types of events (new connection and new operations), the 
>>> measurement could be
>>> 
>>>     • Event: New connections
>>>             • Start all clients in parallel to establish connections 
>>> (keeping them opened) take the duration to get 1000, 2000, ... 10000 
>>> connections and check there are drop or not
>>>             • Establish 1000 connections and monitor during to open 100 
>>> more, the same starting with 2000, 10000
>>>             • Client should not run any operations during the monitoring
>>>     • Event: New operations
>>>             • Start all clients and when 1000 connections are established, 
>>> launch simple operations (e.g. search -s base -b "" objectclass) and 
>>> monitor how many of them can be handled. The same with 2000, ... 10000.
>>>             • response time and workqueue length could be monitored to be 
>>> sure the bottleneck are not the worker.
>> At one point I started (but did not finish) ldclt integration with lib389. 
>> It would be great to have a series of stress and acceptance tests like these 
>> available in lib389 for our release validations, so I believe that their 
>> creation is valuable regardless of NS.
>> 
>>> [1] http://www.port389.org/docs/389ds/design/nunc-stans.html
>>> 
>>> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1608746 deadlock
>>> [3] https://bugzilla.redhat.com/show_bug.cgi?id=1605554 connection leaks
>>> 
>>> [4] https://pagure.io/389-ds-base/pull-request/49636
>>> 
>>> _______________________________________________
>>> 389-devel mailing list -- 389-devel@lists.fedoraproject.org
>>> To unsubscribe send an email to 389-devel-le...@lists.fedoraproject.org
>>> Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
>>> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
>>> List Archives: 
>>> https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org
>> —
>> Sincerely,
>> 
>> William
>> 
>> 
>> _______________________________________________
>> 389-devel mailing list -- 389-devel@lists.fedoraproject.org
>> To unsubscribe send an email to 389-devel-le...@lists.fedoraproject.org
>> Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
>> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
>> List Archives: 
>> https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org
> _______________________________________________
> 389-devel mailing list -- 389-devel@lists.fedoraproject.org
> To unsubscribe send an email to 389-devel-le...@lists.fedoraproject.org
> Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: 
> https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org

—
Sincerely,

William


_______________________________________________
389-devel mailing list -- 389-devel@lists.fedoraproject.org
To unsubscribe send an email to 389-devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org

Reply via email to