Drain L4 host that fronts a L7 cluster
Hi HAproxy community, We've the following production setup, clusters of: IPVS (dsr) -> L4 HAproxy (tcp termination) -> L7 Proxy (via pp, L7 done here. Not HAproxy today, but it soon will be) Problem statement is, how do you drain a node in L4 tier gracefully as it has no concept of L7 constructs like "Connection: close" or "GOAWAY h2 frames" and can't send thoseon L7 proxy's behalf. In our current L7 proxy, we've custom logic written to inspect client IP of request (=L4 node) and do the drain behavior for only those requests matching the client IP of L4 that's going under maintenance. However, as we're migrating to L7 HAproxy, I am unable to find a way to initiate "soft-stop" behavior only for requests coming from a specific upstream L4 node. Note, we already have measures in place to stop sending new requests to L4 as soon as we initiate the maintenance mode, this is about gracefully draining existing connections on L4 which are spread across the whole L7 cluster. I'll be great if I can get some pointers here on what to look at, to solve this problem. * I already checked the HAproxy code around soft-top and close-spread-time etc, it doesn't have any support to only drain specific clients. * In lua, I didn't find any methods around initiating drain for specific requests. * For any map (L4 client IP lookup) based solution, I was unable to find any http-request operation that sets "drain mode". Cheers, Abhijeet (https://abhi.host)
Re: Active session count drop after HAProxy upgrade from 2.0 to 2.4
Hi Wily, That's a bug and it shouldn't be like this. > You can find information about this here : https://www.mail-archive.com/haproxy@formilux.org/msg43291.html But don't waste too much time on this. > > For those interested, the (small) necessary config changes were : > > - option httpchk syntax (use http-check) > > - some healthchecks not working anymore on servers with > > "send-proxy-v2-ssl-cn ssl-check", due to an unresolved bug in Apache 2.4 > ( > > https://bz.apache.org/bugzilla/show_bug.cgi?id=63893). > > But why were they working previously ? Yes, I confirm this was working previously with the exact same haproxy config file. > Maybe they were sent as dummy > PROXY commands ? If so maybe we could implement a workaround for such > broken implementations if that's a big problem (not sure if this is > feasible, just trying to figure what the desired behavior should be). > I don't know what changed in HAProxy 2.2 or 2.4 about this. The configuration was the following : listen x:443 mode tcp bind x.x.x.x:443 option httpchk GET /test.php HTTP/1.0 # to be updated to new format with 2.4 server sx 192.168.1.19:443 id 12 check weight 5 send-proxy-v2-ssl-cn check-ssl verify none server sx2 192.168.1.22:443 id 13 check weight 5 send-proxy-v2-ssl-cn check-ssl verify none The error reported was L6RSP (+ the above error in Apache log files). Same error with "mode http" instead of tcp. Removing "check-ssl" leads to L7RSP, but this is expected (talking plain text when SSL is required). Right now, I'm avoiding this issue by making the test on port 80 (http-check connect port 80). > > Everything seems to run smoothly, but on the monitoring, the number of > > active sessions (scur) dropped significantly (only one third active > > sessions compared to before), even after several hours. I did not make > any > > change on keep alive or timeouts, that's why I'm wondering if any > > modifications between 2.0 and 2.4 may explain this behaviour. > > If you were running without HTX mode it's very likely because in the > past it was indicating the number of established sessions while now > it's reporting the number of active requests (since technically it's > always a stream that is being accounted for, but in the past they used > to remain present while in idle state, using all the resources between > two requests). > That's it. I was indeed NOT using HTX in 2.0. Thanks for the explanation. Olivier
Re: Active session count drop after HAProxy upgrade from 2.0 to 2.4
Hi Olivier, On Thu, May 04, 2023 at 03:09:43PM +0200, Olivier D wrote: > Hello, > > I've finally updated our load balancer, using HAProxy 2.0, to HAProxy 2.4 > \o/ Great! > I was motivated by both the EOL on 2.0, and by a recurring segfault > everytime we reloaded. btw, that segfault is now gone with 2.4 :) That's a bug and it shouldn't be like this. > I did not update to a newer version because we are still heavy users of > "nbproc" that we still need to convert to thread usage (and replace > bind-process by 'process'). I thought we killed nbproc much earlier than this! Time flies! > For those interested, the (small) necessary config changes were : > - option httpchk syntax (use http-check) > - some healthchecks not working anymore on servers with > "send-proxy-v2-ssl-cn ssl-check", due to an unresolved bug in Apache 2.4 ( > https://bz.apache.org/bugzilla/show_bug.cgi?id=63893). But why were they working previously ? Maybe they were sent as dummy PROXY commands ? If so maybe we could implement a workaround for such broken implementations if that's a big problem (not sure if this is feasible, just trying to figure what the desired behavior should be). > Everything seems to run smoothly, but on the monitoring, the number of > active sessions (scur) dropped significantly (only one third active > sessions compared to before), even after several hours. I did not make any > change on keep alive or timeouts, that's why I'm wondering if any > modifications between 2.0 and 2.4 may explain this behaviour. If you were running without HTX mode it's very likely because in the past it was indicating the number of established sessions while now it's reporting the number of active requests (since technically it's always a stream that is being accounted for, but in the past they used to remain present while in idle state, using all the resources between two requests). Cheers, Willy
Active session count drop after HAProxy upgrade from 2.0 to 2.4
Hello, I've finally updated our load balancer, using HAProxy 2.0, to HAProxy 2.4 \o/ I was motivated by both the EOL on 2.0, and by a recurring segfault everytime we reloaded. btw, that segfault is now gone with 2.4 :) I did not update to a newer version because we are still heavy users of "nbproc" that we still need to convert to thread usage (and replace bind-process by 'process'). For those interested, the (small) necessary config changes were : - option httpchk syntax (use http-check) - some healthchecks not working anymore on servers with "send-proxy-v2-ssl-cn ssl-check", due to an unresolved bug in Apache 2.4 ( https://bz.apache.org/bugzilla/show_bug.cgi?id=63893). Everything seems to run smoothly, but on the monitoring, the number of active sessions (scur) dropped significantly (only one third active sessions compared to before), even after several hours. I did not make any change on keep alive or timeouts, that's why I'm wondering if any modifications between 2.0 and 2.4 may explain this behaviour. Cheers, Olivier