Drain L4 host that fronts a L7 cluster

2023-05-04 Thread Abhijeet Rastogi
Hi HAproxy community,

We've the following production setup, clusters of:

IPVS (dsr)  ->
L4 HAproxy (tcp termination) ->
L7 Proxy (via pp, L7 done here. Not HAproxy today, but it soon will be)

Problem statement is, how do you drain a node in L4 tier gracefully as
it has no concept of L7 constructs like "Connection: close" or "GOAWAY
h2 frames" and can't send thoseon L7 proxy's behalf. In our current L7
proxy, we've custom logic written to inspect client IP of request (=L4
node) and do the drain behavior for only those requests matching the
client IP of L4 that's going under maintenance.

However, as we're migrating to L7 HAproxy, I am unable to find a way
to initiate "soft-stop" behavior only for requests coming from a
specific upstream L4 node. Note, we already have measures in place to
stop sending new requests to L4 as soon as we initiate the maintenance
mode, this is about gracefully draining existing connections on L4
which are spread across the whole L7 cluster.

I'll be great if I can get some pointers here on what to look at, to
solve this problem.

* I already checked the HAproxy code around soft-top and
close-spread-time etc, it doesn't have any support to only drain
specific clients.
* In lua, I didn't find any methods around initiating drain for
specific requests.
* For any map (L4 client IP lookup) based solution, I was unable to
find any http-request operation that sets "drain mode".

Cheers,
Abhijeet (https://abhi.host)



Re: Active session count drop after HAProxy upgrade from 2.0 to 2.4

2023-05-04 Thread Olivier D
Hi Wily,


That's a bug and it shouldn't be like this.
>
You can find information about this here :
https://www.mail-archive.com/haproxy@formilux.org/msg43291.html
But don't waste too much time on this.


> > For those interested, the (small) necessary config changes were :
> > - option httpchk syntax (use http-check)
> > - some healthchecks not working anymore on servers with
> > "send-proxy-v2-ssl-cn ssl-check", due to an unresolved bug in Apache 2.4
> (
> > https://bz.apache.org/bugzilla/show_bug.cgi?id=63893).
>
> But why were they working previously ?

Yes, I confirm this was working previously with the exact same haproxy
config file.


> Maybe they were sent as dummy
> PROXY commands ? If so maybe we could implement a workaround for such
> broken implementations if that's a big problem (not sure if this is
> feasible, just trying to figure what the desired behavior should be).
>
I don't know what changed in HAProxy 2.2 or 2.4 about this. The
configuration was the following :

listen x:443
mode tcp
bind x.x.x.x:443
option httpchk GET /test.php HTTP/1.0 # to be updated to new format
with 2.4
server sx 192.168.1.19:443 id 12 check weight 5 send-proxy-v2-ssl-cn
check-ssl verify none
server sx2 192.168.1.22:443 id 13 check weight 5 send-proxy-v2-ssl-cn
check-ssl verify none

The error reported was L6RSP (+ the above error in Apache log files).
Same error with "mode http" instead of tcp.

Removing "check-ssl" leads to L7RSP, but this is expected (talking plain
text when SSL is required).

Right now, I'm avoiding this issue by making the test on port 80 (http-check
connect port 80).



> > Everything seems to run smoothly, but on the monitoring, the number of
> > active sessions (scur) dropped significantly (only one third active
> > sessions compared to before), even after several hours. I did not make
> any
> > change on keep alive or timeouts, that's why I'm wondering if any
> > modifications between  2.0 and 2.4 may explain this behaviour.
>
> If you were running without HTX mode it's very likely because in the
> past it was indicating the number of established sessions while now
> it's reporting the number of active requests (since technically it's
> always a stream that is being accounted for, but in the past they used
> to remain present while in idle state, using all the resources between
> two requests).
>
That's it. I was indeed NOT using HTX in 2.0. Thanks for the explanation.

Olivier


Re: Active session count drop after HAProxy upgrade from 2.0 to 2.4

2023-05-04 Thread Willy Tarreau
Hi Olivier,

On Thu, May 04, 2023 at 03:09:43PM +0200, Olivier D wrote:
> Hello,
> 
> I've finally updated our load balancer, using HAProxy 2.0, to HAProxy 2.4
> \o/

Great!

> I was motivated by both the EOL on 2.0, and by a recurring segfault
> everytime we reloaded. btw, that segfault is now gone with 2.4 :)

That's a bug and it shouldn't be like this.

> I did not update to a newer version because we are still heavy users of
> "nbproc" that we still need to convert to thread usage (and replace
> bind-process by 'process').

I thought we killed nbproc much earlier than this! Time flies!

> For those interested, the (small) necessary config changes were :
> - option httpchk syntax (use http-check)
> - some healthchecks not working anymore on servers with
> "send-proxy-v2-ssl-cn ssl-check", due to an unresolved bug in Apache 2.4 (
> https://bz.apache.org/bugzilla/show_bug.cgi?id=63893).

But why were they working previously ? Maybe they were sent as dummy
PROXY commands ? If so maybe we could implement a workaround for such
broken implementations if that's a big problem (not sure if this is
feasible, just trying to figure what the desired behavior should be).

> Everything seems to run smoothly, but on the monitoring, the number of
> active sessions (scur) dropped significantly (only one third active
> sessions compared to before), even after several hours. I did not make any
> change on keep alive or timeouts, that's why I'm wondering if any
> modifications between  2.0 and 2.4 may explain this behaviour.

If you were running without HTX mode it's very likely because in the
past it was indicating the number of established sessions while now
it's reporting the number of active requests (since technically it's
always a stream that is being accounted for, but in the past they used
to remain present while in idle state, using all the resources between
two requests).

Cheers,
Willy



Active session count drop after HAProxy upgrade from 2.0 to 2.4

2023-05-04 Thread Olivier D
Hello,

I've finally updated our load balancer, using HAProxy 2.0, to HAProxy 2.4
\o/
I was motivated by both the EOL on 2.0, and by a recurring segfault
everytime we reloaded. btw, that segfault is now gone with 2.4 :)

I did not update to a newer version because we are still heavy users of
"nbproc" that we still need to convert to thread usage (and replace
bind-process by 'process').

For those interested, the (small) necessary config changes were :
- option httpchk syntax (use http-check)
- some healthchecks not working anymore on servers with
"send-proxy-v2-ssl-cn ssl-check", due to an unresolved bug in Apache 2.4 (
https://bz.apache.org/bugzilla/show_bug.cgi?id=63893).

Everything seems to run smoothly, but on the monitoring, the number of
active sessions (scur) dropped significantly (only one third active
sessions compared to before), even after several hours. I did not make any
change on keep alive or timeouts, that's why I'm wondering if any
modifications between  2.0 and 2.4 may explain this behaviour.

Cheers,

Olivier