Hi David,
On Wed, Sep 12, 2012 at 10:07:58PM +0000, David Torgerson wrote:
> haproxy SSL termination... Awesome!!!!!!
>
> I have been in the process of replacing our hardware appliances with a
> software
> based solution running in a virtualized environment.
>
> We currently have a project running in semi-beta mode to a closed set of
> users.
> Our current load is around 2500 new ssl TPS at 2048 bit ssl certs. We
> currently
> have a ssl cache of around 1,000,000. I am not sure how many ssl session
> reuses
> a second because our appliances do not capture that info ( I am guessing
> around
> 4000 because of data in other logs files ). When we open up our project to
> the
> public we expect much more traffic.
>
> Our software solution is using stunnel + haproxy (with accept-proxy) running
> on
> 10 Virtual machines per host, with 10 CPU's per virtual machine. We actually
> get
> better throughput using the VM's rather than the physical server due to
> software
> interrupts etc. I have been able to benchmark this solution using full TCP
> and
> HTTP requests at 9,500 new ssl TPS at 2048 bit certs with around 70,000 ssl
> reuses per second. We also get over 6Gbps of throughput. (we will have at
> least
> two physical servers for redundancy etc so double the numbers above).
Impressive numbers !
> Our software solution still has a pair L4 load balancers in front of the ssl
> terminators for redundancy, and we are using stunnel's shared ssl session
> cache
> so that we can avoid sticky TCP sessions from the L4 load balancer.
In fact even without a shared session cache, you can linearly scale using
SSL-ID stickiness with haproxy. Here's how to achieve this (I'm sure you'll
find it fun to experiment) :
- have a first layer of L3/L4 LBs
- have a second layer of haproxy servers which only do TCP-based LB and
NO SSL offloading. These ones point to a farm of SSL offloaders using
the PROXY protocol, they stick on the SSL ID and share their SSL ID
tables. Depending on the load, you can have either one or many such
front LBs.
- have a third layer of haproxy-based SSL offloaders with accept-proxy.
These ones do not need to share their session cache because they're
always accessed with either a new SSL session or an SSL session they
already know.
The first haproxy layer conf would probably look approximately like this :
# 4 front TCP LBs with SSL stickiness
peers front-peers
peer 1.1.1.1:1024
peer 1.1.1.2:1024
listen ssl-front
bind :443
stick-table type binary len 32 size 1m expire 30m peers front-peers
acl clienthello req_ssl_hello_type 1
acl serverhello rep_ssl_hello_type 2
# use tcp content accepts to detects ssl client and server hello.
tcp-request inspect-delay 5s
tcp-request content accept if clienthello
# no timeout on response inspect delay by default.
tcp-response content accept if serverhello
# SSL session ID (SSLID) may be present on a client or server hello.
# Its length is coded on 1 byte at offset 43 and its value starts
# at offset 44.
# Match and learn on request if client hello.
stick on payload_lv(43,1) if clienthello
# Learn on response if server hello.
stick store-response payload_lv(43,1) if serverhello
option ssl-hello-chk
server offload1 10.10.10.1:443 check send-proxy
server offload2 10.10.10.2:443 check send-proxy
server offload3 10.10.10.3:443 check send-proxy
server offload4 10.10.10.4:443 check send-proxy
server offload5 10.10.10.5:443 check send-proxy
server offload6 10.10.10.6:443 check send-proxy
server offload7 10.10.10.7:443 check send-proxy
server offload8 10.10.10.8:443 check send-proxy
And the second layer, you already know since you've built it, it's very
basic :
listen ssl-back
bind :443 accept-proxy ssl
That way, as you can see, nothing is shared between the SSL offloaders,
which results in lower network traffic and overhead. You can even put
them on multiple sites !
> I tested the new haproxy SSL implementation I was able to hit closer to
> 12,000
> new TPS (MUCH better performance) but it does not appear that haproxy
> currently
> shares the session cache across servers. I do know that it shares it across
> multi-process on the same box. Is this something that you are planning on
> implementing? Or is there some other way that I can achieve this?
Yes this is something we want to implement at Exceliance. We have already
done this for the Stud project, so we already have some working code to
adapt to haproxy. It's just that there is no emergency for us to do it
because the architecture above scales much better and already works. But
we're certainly going to implement it, because it removes one layer of LBs
and because it's cool :-)
> Thanks for your awesome work!
Thanks for your detailed and interesting feedback !
Regards,
Willy