Hi Willy, Thank your for your detailed answer.
On Mon, Mar 19, 2018 at 07:28:16PM +0100, Willy Tarreau wrote: > Threading was clearly released with an experimental status, just like > H2, because we knew we'd be facing some post-release issues in these > two areas that are hard to get 100% right at once. However I consider > that the situation has got much better, and to confirm this, both of > these are now enabled by default in HapTech's products. With this said, > I expect that over time we'll continue to see a few bugs, but not more > than what we're seeing in various areas. For example, we didn't get a > single issue on haproxy.org since it was updated to the 1.8.1 or so, > 3 months ago. So this is getting quite good. ok it was not clear to me as being experimental since it was quite widely advertised in several blog posts, but I probably missed something. Thanks for the clarification though. Since 1.8.1 the only issues we had with nbthread was indeed related to using it along with seamless reload but I will give a second try with the latest patches you released in 1.8 tree today. > I ran a stress test on this patch, with a single server running with > "maxconn 1", with a frontend bound to two threads. I measure exactly > 30000 conn/s with a single thread (keep in mind that there's a single > connection at once), and 28500 with two threads. Thus the sync point > takes on average an extra 1.75 microsecond, compared to the 35 > microseconds it takes on average to finish processing the request > (connect, server processing, response, close). > Also if you're running with nbproc > 1 instead, the maxconn setting is > not really respected since it becomes per-process. When you run with > 8 processes it doesn't mean much anymore, or you need to have small > maxconn settings, implying that sometimes a process might queue some > requests while there are available slots in other processes. Thus I'd > argue that the threads here significantly improve the situation by > allowing all connection slots to be used by all CPUs, which is a real > improvement which should theorically show you lower latencies. thanks for these details. We will run some tests on our side as well; the commit message made me worried about the last percentile of requests which might have crazy numbers sometimes. I now better understand we are speaking about 1.75 extra microseconds. > Note that if this is of interest to you, it's trivial to make haproxy > run in busy polling mode, and in this case the performance increases to > 30900 conn/s, at the expense of eating all your CPU (which possibly you > don't care about if the latency is your worst ennemy). We can possibly > even improve this to ensure that it's done only when there are existing > sessions on a given thread. Let me know if this is something that could > be of interest to you, as I think we could make this configurable and > bypass the sync point in this case. It is definitely something interesting for us to make it configurable. I will try to have a look as well. > No they're definitely not for 1.8 and still really touchy. We're > progressively attacking locks wherever we can. Some further patches > will refine the scheduler to make it more parallel, and even the code > above will continue to change, see it as a first step in the right > direction. understood. > We noticed a nice performance boost on the last one with many cores > (24 threads, something like +40% on connection rate), but we'll probably > see even better once the rest is addressed. indeed, I remember we spoke about those improvments at the last meetup. nice work, 1.9 looks already interesting from this point of view! Cheers, -- William