Re: SSL, peered sticky tables + nbproc 1?

2014-05-27 Thread Willy Tarreau
Hi Andy,

On Tue, May 27, 2014 at 06:00:37PM +0100, Andrew Phillips wrote:
 Something I overlooked replying to on this thread;
 
  BTW, I remember you said that you fixed the busy loop by disabling the
  FD in the speculative event cache, but do you remember how you re-enable
  it ? Eg, if all other processes have accepted some connections, your
  first process will have to accept new connections again, so that means
  that its state depends on others'.
 
   We initially just returned from listener_accept(). This caused us to
 go into a busy spin as there were always pending speculative reads, so
 fd_nbspec was non zero in ev_epoll.c which triggered setting
 wait_time=0. 
 
   Looking at the flow in listener_accept(), what we observed happening
 before was that without any of our patches, several processes would wake
 up on a new socket event. The fastest would win and accept() and the
 slower ones would hit the error check in listener.c at line 353. 
 353:   if (unlikely(cfd == -1))  
switch (errno) {
  case EAGAIN:
  case EINTR:
  case ECONNABORTED:
   fd_poll_recv(fd);
   return;   /* nothing more to accept */
  :
   In this case, chasing fd_poll_recv(fd) through the files indicated it
 cleared the speculative events off the queue, meaning fd_nbspec would
 not be set, and wait_time would not get set to 0. 
 
   So we just added the same call to the shm patch refusal path. Which
 solved our problem. 
 
   Not sure how that relates to your point about the processes state
 depending on others, which does not seem to be the case. 

Got it, thanks for the explanation! I thought you completely disabled
events on this FD, which would be an issue right now. Here with only
disabling speculative events, you only lose the readiness information.
That works for level-triggered pollers, but will not work anymore with
an event-triggered poller if/when we switch to EPOLL_ET. But at least
I get the picture now.

Thanks!
Willy




Re: SSL, peered sticky tables + nbproc 1?

2014-05-18 Thread Andrew Phillips
Willy,

 Thanks for the response. I wrote the reply as I read through, so it's
interesting to see that we've pursued similar lines of thought about how
to solve this problem. 

 I think our workload is very different from 'normal'. We have several 
quite long lived connections, with a modest accept() rate of new
connections. 

 I've just checked the kernel code and indeed it's not a real round-robin,
 it's a hash on the 4-tuple (src/dst/spt/dpt), coupled with a pseudo-random
 mix. But that makes a lot of sense, since a round robin would have to perform
 a memory write to store the index. That said, when testing here, I get the
 same distribution on all servers +/- 0.2% or so.
 
 We'll test more with EL6 latest and SO_REUSEPORT - given the
information above it's possible our test rig may not show the above
algorithm to its best, and may not represent production load that well. 

 Ideally though a least conn LB would be best. James has posted our test
numbers - they're better and may be good enough for now. And there is
always the alternative of maintaining the shm_balance patch internally. 


 If you're concerned with long lived connections, then round robin is not the
 proper choice, you should use a leastconn algorithm instead, which will take
 care of disconnected clients.

  Yes, this is essentially the thinking behind the shm_balance patch. 
 
 I definitely agree. I think we should propose some kernel-side improvements
 such as a way to distribute according to number of established connections
 per queue instead of hashing or round-robinning, but it seems there's no
 relation between the listen queues and the inherited sockets, so it looks
 hard to get that information even from the kernel.

  I'd be happy to help here where we can. Any patch that maintains state
about the number of connections sent to each socket is likely to be hard
to merge to kernel. 

 The alternative is for haproxy to maintain the count by process of
active sockets, and somehow poke that back into the kernel as a hint to
send more to a particular socket. That also feels ugly however. It comes
back to either haproxy making routing/load balancing decisions amongst
its children or improving the kernel mechanism that is doing the same
job.
 Haproxy has more information available, and a faster turn around on new
load balancing strategies. 

  So options are;
   1) Come up with a better stateless LB algo for the kernel. 
   2) Maintain counts in kernel for a least connections algo. 
   3) Stay as is kernel wise, but have haproxy play a more active
  role in distributing connections. 
   4) Do nothing, as its good enough for most people. 

If there's a better way for us to track active connections per server
that at least would help simplify the shm balance patch. 

 That's quite common. That's the reason why we have the tune.maxaccept global
 tunable, in order to prevent some processes from grabbing all connections at
 once, but still that's not perfect because as you noticed, a same process can
 be notified several times in a row while another one was doing something else
 or was scheduled out, leaving some room for anything else. By the way, using
 cpu-map to bind processes to CPUs significantly helps, provided of course
 that no other system process is allowed to run on the same CPUs.

  Ok, we'll go back and check that in detail. CPU pinning and SMP IRQ
affinity we do as a matter of course. 

 Don't worry, I know that very well. Some customers are still running some 2.4
 kernels that I built for them years ago for that reason, and our appliances
 still ship with 2.6.32 :-) So you don't have to justify that you cannot 
 upgrade,
 I'm the first one to defend that position.

  Ok, that's reassuring. There are many projects out there that while
wonderful, assume you have the latest version of fedora/ubuntu
available.


  The actconn variable is shared in shared memory. We're using the single
  writer model, so each process is the only process writing to its slot
  in shared memory,
 
 OK found it, indexed based on relative_pid. I thought all processes shared
 the same global actconn which scared me!

  Yes, It wouldn't have worked very well either :-)

  via what should be an atomic write. Locking should not
  be required. If there is a concern about write tearing, we can change it
  to an explicit atomic cmpxchg or similar.
 
 No that's not needed. The only thing is that having them all in the same
 cache line means that cache lines are bouncing back and forth between CPUs,
 causing latencies to update the values, but that's all.

  Good point. We can avoid cache ping pong if we pad the structure
appropriately. 


  If this works as mentioned above, then please consider putting a note in
  the haproxy
  documents talking about this for EL customers and what the minimum
  acceptable kernel
  revs are to make this work properly.
 
 That's a good point. In fact we've been running with SO_REUSECONN for
 many years, I think my first 

Re: SSL, peered sticky tables + nbproc 1?

2014-05-18 Thread Willy Tarreau
Hi Andy,

On Sun, May 18, 2014 at 03:16:34PM +0100, Andrew Phillips wrote:
 Willy,
 
  Thanks for the response. I wrote the reply as I read through, so it's
 interesting to see that we've pursued similar lines of thought about how
 to solve this problem. 
 
  I think our workload is very different from 'normal'. We have several 
 quite long lived connections, with a modest accept() rate of new
 connections. 

That's something that RDP providers see as well. I've also seen a Citrix
farm in the past which had to face a difficult issue which is to support
many long-lived connectinos with a very low average accept rate (eg: a
few hundred connections per day) but with the goal of being able to accept
20 times more if people had to work from home due to problems going to
their job (eg: transportation services on strike), and to accept all of
them at 9am. There was some SSL in the mix to make things funnier.

  I've just checked the kernel code and indeed it's not a real round-robin,
  it's a hash on the 4-tuple (src/dst/spt/dpt), coupled with a pseudo-random
  mix. But that makes a lot of sense, since a round robin would have to 
  perform
  a memory write to store the index. That said, when testing here, I get the
  same distribution on all servers +/- 0.2% or so.
  
  We'll test more with EL6 latest and SO_REUSEPORT - given the
 information above it's possible our test rig may not show the above
 algorithm to its best, and may not represent production load that well. 

It will really depend on the total amount of connections in fact. I would
not be surprized if the load is highly uneven at all below 100 or so per
process due to the hash. But maybe that could be enough already.

  Ideally though a least conn LB would be best. James has posted our test
 numbers - they're better and may be good enough for now. And there is
 always the alternative of maintaining the shm_balance patch internally. 

Sure!

If your traffic is not too high, there's something simple you can do which
can be *very* efficient. It's being used by at least one RDP provider, but
I don't remember which one. The idea was the following : deciphering SSL
costs much, especially the handshakes which you don't want to cause noticeable
pauses to all users when they happen. So instead of randomly stacking the
connections onto each others into a process pool, there was a front layer
in pure TCP mode doing nothing but distributing connections in leastconn.
The cost is very low in terms of CPU and even lower in terms of latency.
And now with dev25, you have the abstract namespace sockets which are
basically unix sockets with internal names. They're 2.5 times cheaper
than TCP sockets. I'm really convinced you should give that a try. It
would look like this :

listen dispatcher
   bind :1234 process 1
   balance leastconn
   server process2 abns@p2 send-proxy
   server process3 abns@p3 send-proxy
   server process4 abns@p4 send-proxy
   server process5 abns@p5 send-proxy

listen worker
   bind abns@p2 process 2 accept-proxy
   bind abns@p3 process 3 accept-proxy
   bind abns@p4 process 4 accept-proxy
   bind abns@p5 process 5 accept-proxy
   ...

In worker, simply add ssl ... to each line if you need to decipher
SSL. You can (and should) even check that processes are still alive
using a simple check on each line.

(..)
  I definitely agree. I think we should propose some kernel-side improvements
  such as a way to distribute according to number of established connections
  per queue instead of hashing or round-robinning, but it seems there's no
  relation between the listen queues and the inherited sockets, so it looks
  hard to get that information even from the kernel.
 
   I'd be happy to help here where we can. Any patch that maintains state
 about the number of connections sent to each socket is likely to be hard
 to merge to kernel. 

Especially if it requires inflating a structure like struct sock.

  The alternative is for haproxy to maintain the count by process of
 active sockets, and somehow poke that back into the kernel as a hint to
 send more to a particular socket.

I agree.

 That also feels ugly however.

It depends. Said like this yes it feels ugly. However, if you reason
with a budget and processes only accept their budget of incoming
connections, then it's much different. And with a budget it's not
that hard to implement. Basically you raise all budgets to 1 when
they're all 0, you decrease a process's budget when it accepts a
connection, you increase its budget when it closes a connection,
and you subtract the value of the lowest budget when all of them
have a budget greater than 1.

 It comes
 back to either haproxy making routing/load balancing decisions amongst
 its children or improving the kernel mechanism that is doing the same
 job.
  Haproxy has more information available, and a faster turn around on new
 load balancing strategies. 

Yes and load balancing is its job, though 

Re: SSL, peered sticky tables + nbproc 1?

2014-05-15 Thread Willy Tarreau
Hi James,

On Tue, May 13, 2014 at 06:00:13PM +0100, James Hogarth wrote:
 Hi Willy,
 
 Please see the response from our Head of Systems below.

Thank you. For ease of discussions, I'm copying him. Andy, please tell me
if this is not appropriate.

 On a side note our initial investigations see better behaviour
 (ie one or two processes don't run away with it all) but the current EL6
 kernel
 utilising the SO_REUSEPORT behaviour doesn't appear to do a perfect round
 robin of the processes and consequently can end up a bit unbalanced -

I've just checked the kernel code and indeed it's not a real round-robin,
it's a hash on the 4-tuple (src/dst/spt/dpt), coupled with a pseudo-random
mix. But that makes a lot of sense, since a round robin would have to perform
a memory write to store the index. That said, when testing here, I get the
same distribution on all servers +/- 0.2% or so.

 and this is especially so for the longer lived connections depending on one 
 client
 may disconnect.

If you're concerned with long lived connections, then round robin is not the
proper choice, you should use a leastconn algorithm instead, which will take
care of disconnected clients.

 We're in the process of rebasing this code to dev25 and
 cleaning it up as per your suggestions.
 
 To give an idea of the difference in behaviour counting the connections per
 process
 very 5 seconds whilst ramping up the connections in the background:
 
 Haproxy HEAD on current el6:
  0   0   0   0   0   0   0
  0   0   1   1   1   0   0
  1   0   4   1   2   0   0
  2   2   5   2   3   0   0
  2   2   6   3   4   2   0
  2   2   7   3   4   4   2
  3   3   7   4   5   5   2
  3   6   8   4   6   6   2
  3   8   9   4   7   6   2
  3   10   9   6   7   6   3
  3   12   9   7   9   7   3
 
 Haproxy HEAD with new shm_balance patch on current el6:
 
  0   0   0   0   0   0   0
  0   0   1   1   1   1   0
  1   1   1   1   2   1   2
  2   2   2   2   2   2   2
  3   3   3   2   3   3   3
  4   3   4   4   3   4   3
  5   4   4   4   5   4   4
  5   5   5   5   5   5   6
  6   6   6   6   6   5   6
  7   6   6   7   7   6   7
  7   8   7   7   7   7   7

But these are very small numbers. Are you really running with numbers *that*
low in production or is it just because you wanted to make a test ? I was
assuming that you were dealing with thousands or tens of thousands of
connections per second, where the in-kernel distribution is really good.
I can easily expect that it can be off by a few units in a test involving
just a few tens of connections however.

Responding to Andy below :

  We realise that this patch is rushed, and we appreciate the feedback.
 It also is true that we've been working off dev21 and have been working on
 it
 for a while. If SO_REUSEPORT works well, then it's a far neater solution
 that renders this patch unnecessary.

I definitely agree. I think we should propose some kernel-side improvements
such as a way to distribute according to number of established connections
per queue instead of hashing or round-robinning, but it seems there's no
relation between the listen queues and the inherited sockets, so it looks
hard to get that information even from the kernel.

 There's some extra explanation here that may help answer some of your
 questions.
 
 I must confess I don't really understand well what behaviour this
 shm_balance mode is supposed to provide.
 
 Without this patch, on dev21, on the enterprise linux kernels we have
 tested
 and run in production, we see that the busiest haproxy process will run
 away and
 grab most new connections. In effect we get 160% capacity over a single
 process before we start seeing queueing latency. The load balancing is very
 unequal.

That's quite common. That's the reason why we have the tune.maxaccept global
tunable, in order to prevent some processes from grabbing all connections at
once, but still that's not perfect because as you noticed, a same process can
be notified several times in a row while another one was doing something else
or was scheduled out, leaving some room for anything else. By the way, using
cpu-map to bind processes to CPUs significantly helps, provided of course
that no other system process is allowed to run on the same CPUs.

 We need more capacity than that. With this patch, we get uniform balancing
 across 7
 processes giving us almost 700% usable over a single process.

For sure, but as explained above, in my opinion, round robin is not the
best choice for long lived connections (though it's better than nothing
of course).

 The decision to upgrade to a more recent kernel (particularly if it's not
 an EL kernel)
 is a difficult one for shops running on enterprise versions of linux. Many
 places stick
 on a particular point revision for a while and only upgrade to newer point
 releases
 after requalification.

Don't worry, I know that very well. Some customers are still running some 2.4
kernels that I built for them years ago for that reason, and our appliances

Re: SSL, peered sticky tables + nbproc 1?

2014-05-09 Thread Willy Tarreau
Hi James,

On Thu, May 08, 2014 at 08:58:59PM +0100, James Hogarth wrote:
 On 2 May 2014 20:10, Willy Tarreau w...@1wt.eu wrote:
 
  You're welcome. I really want to release 1.5-final ASAP, but at least
  with everything in place so that we can safely fix the minor remaining
  annoyances. So if we identify quickly that things are still done wrong
  and need to be addressed before the release (eg: because we'll be force
  to change the way some config settings are used), better do it ASAP.
  Otherwise if we're sure that a given config behaviour will not change,
  such fixes can happen in -stable because they won't affect users which
  do not rely on them.
 
 
 Alright in light of the above here's a RFC patch that's a little WIP still
 ... we've yet to write the documentation on the shm_balance mode but we are
 running this in a production environment.

I must confess I don't really understand well what behaviour this
shm_balance mode is supposed to provide. I'm seeing that the actconn
variable seems to be shared between all processes and is incremented
and decremented without any form of locking, so I'm a bit scared when
you say that it's running in production !

More comments below.

 Our environment is dev21 at present but I just rebased it to the tarball
 snapshot of last night...
 
 It compiles against that but please not I've not yet tested it against that!
 
 To give you an idea on how to use it here's a sanitised snippet of config:
 
 global
   nbproc 4
   daemon
   maxconn 4000
   stats timeout 1d
   log 127.0.0.1 local2
   pidfile /var/run/haproxy.pid
   stats socket /var/run/haproxy.1.sock level admin
   stats socket /var/run/haproxy.2.sock level admin
   stats socket /var/run/haproxy.3.sock level admin
   stats socket /var/run/haproxy.4.sock level admin
   stats bind-process all
   shm-balance my_shm_balancer
 listen web-stats-1
   bind 0.0.0.0:81
   bind-process 1
   mode http
   log global
   maxconn 10
   clitimeout 10s
   srvtimeout 10s
   contimeout 10s
   timeout queue 10s
   stats enable
   stats refresh 30s
   stats show-node
   stats show-legends
   stats auth admin:password
   stats uri /haproxy?stats
  listen web-stats-2
   bind 0.0.0.0:82
   bind-process 2
   mode http
   log global
   maxconn 10
   clitimeout 10s
   srvtimeout 10s
   contimeout 10s
   timeout queue 10s
   stats enable
   stats refresh 30s
   stats show-node
   stats show-legends
   stats auth admin:password
   stats uri /haproxy?stats
 listen web-stats-3
   bind 0.0.0.0:83
   bind-process 3
   mode http
   log global
   maxconn 10
   clitimeout 10s
   srvtimeout 10s
   contimeout 10s
   timeout queue 10s
   stats enable
   stats refresh 30s
   stats show-node
   stats show-legends
   stats auth admin:password
   stats uri /haproxy?stats
 listen web-stats-4
   bind 0.0.0.0:84
   bind-process 4
   mode http
   log global
   maxconn 10
   clitimeout 10s
   srvtimeout 10s
   contimeout 10s
   timeout queue 10s
   stats enable
   stats refresh 30s
   stats show-node
   stats show-legends
   stats auth admin:password
   stats uri /haproxy?stats
 listen frontendname
   bind 0.0.0.0:52000
   server server 10.0.0.1:27000 id 1 check port 9501
   option httpchk GET /status HTTP/1.0
   mode tcp
 
 The haproxy-shm-client can be used to query the shm to see how things are
 loaded and weight/disable/enable threads from processing queries.

But why not use the stats socket instead of using a second access path to
check the status ?

 Now why did we do this?
 
 When we were testing multiple processes one thing we noted was that the
 most likely process to accept() was actually a bit unintuitive. Rather than
 being busy causing a 'natural' load balancing behaviour it worked out
 against this.

Yes, that's the reason why the tune.maxaccept is divided by the number of
active processes. With recent kernels (3.9+), the system will automatically
round-robin between multiple socket queues bound to the same ip:port.
However this requires multiple sockets. With the latest changes allowing
the bind-process to go down to the listener (at last!), I realized that in
addition to allowing it for the stats socket (primary goal), it provides an
easy way to benefit from this kernel's round robin without having to create
a bind/unbind sequence as I was planning it.

 If a thread was currently on the CPU it was reasonably likely that it would
 be the first to grab the connection due to the need for ones 'idle' to
 context switch onto the CPU. As a result it was primarily only one or two
 haproxy processes actually picking up the connections and it made for very
 asymmetrical balancing across processes.

You see this pattern even more often when running local benchmarks. Just
run two processes on a dual-core, dual-thread system, then have the load
generator on the same system, and you'll see that the load generator disturbs
one of the process more than the other one.

 The algorithm looks to see if it is in the least busy 'half bucket' and if
 so will 

Re: SSL, peered sticky tables + nbproc 1?

2014-05-02 Thread Willy Tarreau
hi,

On Fri, May 02, 2014 at 11:11:39AM -0600, Jeff Zellner wrote:
 Well, I thought wrong -- I see that peered sticky tables absolutely
 don't work with multiple processes, and sticky rules give a warning.
 
 Would that be a feature on the roadmap? I can see that it's probably
 pretty non-trivial -- but would be super useful, at least for us.

Yes that's clearly on the roadmap. In order of fixing/improvements,
here's what I'd like to see :
  - peers work fine when only one process uses them
  - have the ability to run with explicit peers per process : if you
just have to declare as many peers sections as processes, it's
better than nothing.
  - have stick-table (and peers) work in multi-process mode with a
shared memory system like we do with SSL contexts.

Currently the issue is that all processes try to connect to the remote
and present the same peer name, resulting in the previous connection to
be dropped. And incoming connections will only feed one process and not
the other ones.

I'd like to be able to do at least #1 for the release, I do think it's
doable, because I attempted it 18 months ago and ended up in a complex
corner case of inter-proxy dependence calculation, to only realize that
we didn't need to have haproxy automatically deduce everything, just let
it do what the user wants, and document the limits.

Regards,
Willy




Re: SSL, peered sticky tables + nbproc 1?

2014-05-02 Thread Bryan Talbot
It sounds like that Jeff ran out of CPU for SSL terminations and that could
be addressed as described by Willy here

https://www.mail-archive.com/haproxy@formilux.org/msg13104.html

and allow him to stay with a single-process stick table for the actual load
balancing.

-Bryan




On Fri, May 2, 2014 at 10:23 AM, Willy Tarreau w...@1wt.eu wrote:

 hi,

 On Fri, May 02, 2014 at 11:11:39AM -0600, Jeff Zellner wrote:
  Well, I thought wrong -- I see that peered sticky tables absolutely
  don't work with multiple processes, and sticky rules give a warning.
 
  Would that be a feature on the roadmap? I can see that it's probably
  pretty non-trivial -- but would be super useful, at least for us.

 Yes that's clearly on the roadmap. In order of fixing/improvements,
 here's what I'd like to see :
   - peers work fine when only one process uses them
   - have the ability to run with explicit peers per process : if you
 just have to declare as many peers sections as processes, it's
 better than nothing.
   - have stick-table (and peers) work in multi-process mode with a
 shared memory system like we do with SSL contexts.

 Currently the issue is that all processes try to connect to the remote
 and present the same peer name, resulting in the previous connection to
 be dropped. And incoming connections will only feed one process and not
 the other ones.

 I'd like to be able to do at least #1 for the release, I do think it's
 doable, because I attempted it 18 months ago and ended up in a complex
 corner case of inter-proxy dependence calculation, to only realize that
 we didn't need to have haproxy automatically deduce everything, just let
 it do what the user wants, and document the limits.

 Regards,
 Willy





Re: SSL, peered sticky tables + nbproc 1?

2014-05-02 Thread Willy Tarreau
On Fri, May 02, 2014 at 10:59:00AM -0700, Bryan Talbot wrote:
 It sounds like that Jeff ran out of CPU for SSL terminations and that could
 be addressed as described by Willy here
 
 https://www.mail-archive.com/haproxy@formilux.org/msg13104.html
 
 and allow him to stay with a single-process stick table for the actual load
 balancing.

Yes that's perfectly possible. And when we have proxy proto v2 with SSL info,
it'll be even better :-)

Willy




Re: SSL, peered sticky tables + nbproc 1?

2014-05-02 Thread James Hogarth
On 2 May 2014 19:02, Willy Tarreau w...@1wt.eu wrote:

 On Fri, May 02, 2014 at 10:59:00AM -0700, Bryan Talbot wrote:
  It sounds like that Jeff ran out of CPU for SSL terminations and that
could
  be addressed as described by Willy here
 
  https://www.mail-archive.com/haproxy@formilux.org/msg13104.html
 
  and allow him to stay with a single-process stick table for the actual
load
  balancing.

 Yes that's perfectly possible. And when we have proxy proto v2 with SSL
info,
 it'll be even better :-)

 Willy



We've done quite a bit of work on this internally recently to provide SSL
multiprocess with sane load balancing.

There's a couple of small edge cases we've got left then we were intending
to push it up for comments...

I've literally just got home but I'll follow up in the office next week to
see how close we are.

James


Re: SSL, peered sticky tables + nbproc 1?

2014-05-02 Thread Jeff Zellner
Great, we'd love to see that.

And thanks for the other SSL performance trick. We might be able to
make that and some SSL cache tuning work for us, as well.

On Fri, May 2, 2014 at 12:23 PM, James Hogarth james.hoga...@gmail.com wrote:

 On 2 May 2014 19:02, Willy Tarreau w...@1wt.eu wrote:

 On Fri, May 02, 2014 at 10:59:00AM -0700, Bryan Talbot wrote:
  It sounds like that Jeff ran out of CPU for SSL terminations and that
  could
  be addressed as described by Willy here
 
  https://www.mail-archive.com/haproxy@formilux.org/msg13104.html
 
  and allow him to stay with a single-process stick table for the actual
  load
  balancing.

 Yes that's perfectly possible. And when we have proxy proto v2 with SSL
 info,
 it'll be even better :-)

 Willy



 We've done quite a bit of work on this internally recently to provide SSL
 multiprocess with sane load balancing.

 There's a couple of small edge cases we've got left then we were intending
 to push it up for comments...

 I've literally just got home but I'll follow up in the office next week to
 see how close we are.

 James



Re: SSL, peered sticky tables + nbproc 1?

2014-05-02 Thread Willy Tarreau
Hi James,

On Fri, May 02, 2014 at 07:23:21PM +0100, James Hogarth wrote:
 We've done quite a bit of work on this internally recently to provide SSL
 multiprocess with sane load balancing.
 
 There's a couple of small edge cases we've got left then we were intending
 to push it up for comments...
 
 I've literally just got home but I'll follow up in the office next week to
 see how close we are.

You're welcome. I really want to release 1.5-final ASAP, but at least
with everything in place so that we can safely fix the minor remaining
annoyances. So if we identify quickly that things are still done wrong
and need to be addressed before the release (eg: because we'll be force
to change the way some config settings are used), better do it ASAP.
Otherwise if we're sure that a given config behaviour will not change,
such fixes can happen in -stable because they won't affect users which
do not rely on them.

Best regards,
Willy