Re: Issues with dynamic inserted servers
Hi Thomas, On Tue, Feb 07, 2023 at 06:18:26PM +0100, Thomas Pedoussaut wrote: (...) > As you can see in the logs, servers are seen, registered and marked as UP. > But a request made a few seconds later, the backend can't find a suitable > server to fulfill the request. > > > Feb 7 16:34:27 ip-172-31-33-146 haproxy[42442]: [NOTICE] (42442) : CLI : > 'server pages/bdb47d1ac9644c5f99c5e90dd4f9b944' : New server registered. > Feb 7 16:34:40 ip-172-31-33-146 haproxy[42442]: [WARNING] (42442) : Server > pages/bdb47d1ac9644c5f99c5e90dd4f9b944 is UP/READY (leaving forced > maintenance). > Feb 7 16:34:40 ip-172-31-33-146 haproxy[42442]: Server > pages/bdb47d1ac9644c5f99c5e90dd4f9b944 is UP/READY (leaving forced > maintenance). > Feb 7 16:34:50 ip-172-31-33-146 haproxy[42442]: [WARNING] (42442) : Server > pages/bdb47d1ac9644c5f99c5e90dd4f9b944 is UP. 1 active and 0 backup servers > online. 0 sessions requeued, 0 total in queue. > Feb 7 16:34:50 ip-172-31-33-146 haproxy[42442]: Server > pages/bdb47d1ac9644c5f99c5e90dd4f9b944 is UP. 1 active and 0 backup servers > online. 0 sessions requeued, 0 total in queue. > Feb 7 16:35:16 ip-172-31-33-146 haproxy[42442]: 82.66.114.242:36698 > [07/Feb/2023:16:35:01.250] www~ pages/ 0/15001/-1/-1/15001 503 4793 - > - sQ-- 1/1/0/0/0 0/1 "GET / HTTP/1.1" www.XXX Wget/1.20.3 (linux-gnu) > > > The servers state is like this: > > echo "show servers state pages" |netcat -w 2 172.31.33.146 > 1 > # be_id be_name srv_id srv_name srv_addr srv_op_state srv_admin_state > srv_uweight srv_iweight srv_time_since_last_change srv_check_status > srv_check_result srv_check_health srv_check_state srv_agent_state > bk_f_forced_id srv_f_forced_id srv_fqdn srv_port srvrecord srv_use_ssl > srv_check_port srv_check_addr srv_agent_addr srv_agent_port > 5 pages 1 bdb47d1ac9644c5f99c5e90dd4f9b944 172.31.35.239 2 0 10 10 2087 15 3 > 4 6 0 0 0 - 80 - 0 0 - - 0 > > srv_check_result is 3 which indicates the healthchecks are fine. > > > I'm a bit baffled by the situation. If someone has a bit more experience in > inserting backends on the fly with L7 checks, i'll be gratefull. Very interesting. I must confess I have no idea at the moment. Among the tests you've done, did you always try to add a first server to an empty backend or did you also try to add a server to a backend that already had one ? I'm suspecting that something, somewhere indicates at boot time that there is no server in this backend and that this "something" is not changed when adding one later, and could be used to entirely bypass the LB algorithm. That's just a wild guess of course. Similarly it would be interesting to know if starting empty and adding yet another server unblocks the situation. Thanks, Willy
Issues with dynamic inserted servers
[ I'm using HaProxy for 5 years but with static conf reloaded] Due to issues with speed of discovery for backends through DNS on AWS, I'm writing my own system to insert servers on the fly in my load balancers. As names for backend, I'm using an taskid from my cloud provider backend pages timeout server 120s option forwardfor http-request redirect scheme https if ! { ssl_fc } option httpchk GET /health.php default-server inter 5s fall 3 rise 2 balance random server bdb47d1ac9644c5f99c5e90dd4f9b944 172.31.35.239:80 weight 10 maxconn 16 check slowstart 10s With a config build from cluster status, everything is fine (16:33:58) When AWS/ECS send me a new task, I register it with those 3 commands (at 16:34:27): echo "add server pages/bdb47d1ac9644c5f99c5e90dd4f9b944 172.31.35.239:80 weight 10 maxconn 32 check inter 5s fall 3 rise 2 slowstart 10s " |netcat -w 2 172.31.33.146 New server registered. echo "enable health pages/bdb47d1ac9644c5f99c5e90dd4f9b944" |netcat -w 2 172.31.33.146 echo "enable server pages/bdb47d1ac9644c5f99c5e90dd4f9b944" |netcat -w 2 172.31.33.146 As you can see in the logs, servers are seen, registered and marked as UP. But a request made a few seconds later, the backend can't find a suitable server to fulfill the request. Feb 7 16:33:29 ip-172-31-33-146 haproxy[42439]: [NOTICE] (42439) : haproxy version is 2.7.2-1ppa1~jammy Feb 7 16:33:29 ip-172-31-33-146 haproxy[42439]: [NOTICE] (42439) : path to executable is /usr/sbin/haproxy Feb 7 16:33:29 ip-172-31-33-146 haproxy[42439]: [NOTICE] (42439) : New worker (42442) forked Feb 7 16:33:29 ip-172-31-33-146 haproxy[42439]: [NOTICE] (42439) : Loading success. Feb 7 16:33:58 ip-172-31-33-146 haproxy[42442]: 82.66.114.242:57352 [07/Feb/2023:16:33:57.712] www~ pages/bdb47d1ac9644c5f99c5e90dd4f9b944 0/0/0/1131/1141 200 67569 - - 1/1/0/0/0 0/0 "GET / HTTP/1.1" www. Wget/1.20.3 (linux-gnu) Feb 7 16:34:15 ip-172-31-33-146 haproxy[42442]: [WARNING] (42442) : Server pages/bdb47d1ac9644c5f99c5e90dd4f9b944 is going DOWN for maintenance. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue. Feb 7 16:34:15 ip-172-31-33-146 haproxy[42442]: [ALERT] (42442) : backend 'pages' has no server available! Feb 7 16:34:15 ip-172-31-33-146 haproxy[42442]: Server pages/bdb47d1ac9644c5f99c5e90dd4f9b944 is going DOWN for maintenance. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue. Feb 7 16:34:15 ip-172-31-33-146 haproxy[42442]: backend pages has no server available! Feb 7 16:34:20 ip-172-31-33-146 haproxy[42442]: [NOTICE] (42442) : Server deleted. Feb 7 16:34:27 ip-172-31-33-146 haproxy[42442]: [NOTICE] (42442) : CLI : 'server pages/bdb47d1ac9644c5f99c5e90dd4f9b944' : New server registered. Feb 7 16:34:40 ip-172-31-33-146 haproxy[42442]: [WARNING] (42442) : Server pages/bdb47d1ac9644c5f99c5e90dd4f9b944 is UP/READY (leaving forced maintenance). Feb 7 16:34:40 ip-172-31-33-146 haproxy[42442]: Server pages/bdb47d1ac9644c5f99c5e90dd4f9b944 is UP/READY (leaving forced maintenance). Feb 7 16:34:50 ip-172-31-33-146 haproxy[42442]: [WARNING] (42442) : Server pages/bdb47d1ac9644c5f99c5e90dd4f9b944 is UP. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue. Feb 7 16:34:50 ip-172-31-33-146 haproxy[42442]: Server pages/bdb47d1ac9644c5f99c5e90dd4f9b944 is UP. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue. Feb 7 16:35:16 ip-172-31-33-146 haproxy[42442]: 82.66.114.242:36698 [07/Feb/2023:16:35:01.250] www~ pages/ 0/15001/-1/-1/15001 503 4793 - - sQ-- 1/1/0/0/0 0/1 "GET / HTTP/1.1" www.XXX Wget/1.20.3 (linux-gnu) The servers state is like this: echo "show servers state pages" |netcat -w 2 172.31.33.146 1 # be_id be_name srv_id srv_name srv_addr srv_op_state srv_admin_state srv_uweight srv_iweight srv_time_since_last_change srv_check_status srv_check_result srv_check_health srv_check_state srv_agent_state bk_f_forced_id srv_f_forced_id srv_fqdn srv_port srvrecord srv_use_ssl srv_check_port srv_check_addr srv_agent_addr srv_agent_port 5 pages 1 bdb47d1ac9644c5f99c5e90dd4f9b944 172.31.35.239 2 0 10 10 2087 15 3 4 6 0 0 0 - 80 - 0 0 - - 0 srv_check_result is 3 which indicates the healthchecks are fine. I'm a bit baffled by the situation. If someone has a bit more experience in inserting backends on the fly with L7 checks, i'll be gratefull. -- Thomas Pedoussaut
subscribe
Re: clarify close behaviour on http-request rules
Hi Christopher, Thanks for your answer. On Tue, Feb 7, 2023 at 9:09 AM Christopher Faulet wrote: > The tarpit action is final. So it cannot be used in addition to a return or a > deny action. For the wait-for-body action, indeed, it will wait for the body > or > a full buffer for 1 second. Thus in this case, if the whole request can be > stored in buffer and is received fast enough, this will mitigate your issue. ah yes good point indeed, I forgot the behaviour of tarpit. I will give it a try with `wait-for-body`. > FYI, we are refactoring the way errors, aborts and shutdowns are handled > internally. And the data draining at the mux level is definitely a subject we > should address. It is a painful work but we hope to include it in the 2.8, at > least partially. thanks! -- William
Re: clarify close behaviour on http-request rules
Le 2/6/23 à 12:08, William Dauchy a écrit : Hi Christopher, On Fri, Feb 3, 2023 at 7:59 PM William Dauchy wrote: On Tue, Oct 18, 2022 at 4:15 PM Christopher Faulet wrote: On all HTX versions, K/A and close modes are handled in the H1 multiplexer. Thus, on these versions, http_reply_and_close() is only closing the stream. The multiplexer is responsible to close the client connection or not. On pre-HTX versions, when http_reply_and_close() is used, the client connection is also closed. It is a limitation of of HAProxy versions using the legacy HTTP. Note there is a case where the connection client may be closed. If the HAProxy response is returned before the end of the request, the client connection is closed. There is no (not yet) draining mode at the mux level. coming back on this very late: `http-request wait-for-body time` or a `http-request tarpit` mitigate the draining issue? I am trying to find a workaround on a setup where we are behind another L7 LB where we unexpectedly close the connection. > saying another way, what is going the behaviour of `http-request return` if I have: http-request wait-for-body time 1s if CONDITION_A http-request deny if CONDITION_A Is it going to wait for the request, and so mitigate the mentioned drain limitation we currently have in the mux when `CONDITION_A` matches? Hi William, The tarpit action is final. So it cannot be used in addition to a return or a deny action. For the wait-for-body action, indeed, it will wait for the body or a full buffer for 1 second. Thus in this case, if the whole request can be stored in buffer and is received fast enough, this will mitigate your issue. FYI, we are refactoring the way errors, aborts and shutdowns are handled internally. And the data draining at the mux level is definitely a subject we should address. It is a painful work but we hope to include it in the 2.8, at least partially. -- Christopher Faulet