Hi,

C10K is a scalability problem that a server can face when dealing with events of thousands of connections (i.e. clients) at the same time. Events can be new connections, new operations on the established connections, closure of connection (from client or server)

For 389-ds, C10K problem was resolved with a new framework *Nunc-Stans* [1]. Nunc-stans was first enabled in RHDS 7.4 and improved/fixed in 7.5. Robustness issues [2] and [3] were reported in 7.5 and it was decided to disable Nunc-stans. It is not known if those issues exist or not in 7.4.

William posted a PR to fix those two issues [4]. Nunc-stans is a complex framework, with its own dynamic. Review of this PR is not easy and even a careful review may not guaranty it will fix [2] and [3] and may not introduce others unexpected side effects.

From there we discussed two options (but there may be others):

1. Review and merge the PR [4], then later run some intensive tests
   aiming to verify [2],[3] and checking the robustness in order to
   reenable NS
2. Build some tests for
    1. measure the benefit of NS as [2] and [3] do not prevent some
       performance tests
    2. identify possible reproducers for [2] and [3]
    3. create robustness and long duration NS specific tests
    4. review and merge the PR [4]

As PR [4] is not intended for perf improvement, the step 2.1 will impact the priority according to the performance benefits.

Comments are welcomed


   Regarding 2.1 plan we made the following notes for the test plan:

   /The benefit of Nunc-Stans can only be measure with a large number
   of connections (i.e. client) above 1000. That means a set of clients
   (sometime all) should keep their connection //*opened*//. Clients
   should run on several hosts so that clients are not the bootleneck./

   /For the two types of events (new connection and new operations),
   the measurement could be/

     * /Event: New connections /
         o /Start all clients in parallel to establish connections
           (keeping them opened) take the duration to get 1000, 2000,
           ... 10000 connections and check there are drop or not/
         o /Establish 1000 connections and monitor during to open 100
           more, the same starting with 2000, 10000/
         o /Client should not run any operations during the monitoring/
     * /Event: New operations /
         o /Start all clients and when 1000 connections are
           established, launch simple operations (e.g. search -s base
           -b "" objectclass) and monitor how many of them can be
           handled. The same with 2000, ... 10000./
         o /response time and workqueue length could be monitored to be
           sure the bottleneck are not the worker./

[1] http://www.port389.org/docs/389ds/design/nunc-stans.html

[2]https://bugzilla.redhat.com/show_bug.cgi?id=1608746 deadlock
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1605554 connection leaks

[4]https://pagure.io/389-ds-base/pull-request/49636

_______________________________________________
389-devel mailing list -- 389-devel@lists.fedoraproject.org
To unsubscribe send an email to 389-devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org

Reply via email to