Hi Thomas, On Tue, Feb 07, 2023 at 06:18:26PM +0100, Thomas Pedoussaut wrote: (...) > As you can see in the logs, servers are seen, registered and marked as UP. > But a request made a few seconds later, the backend can't find a suitable > server to fulfill the request. > > > Feb 7 16:34:27 ip-172-31-33-146 haproxy[42442]: [NOTICE] (42442) : CLI : > 'server pages/bdb47d1ac9644c5f99c5e90dd4f9b944' : New server registered. > Feb 7 16:34:40 ip-172-31-33-146 haproxy[42442]: [WARNING] (42442) : Server > pages/bdb47d1ac9644c5f99c5e90dd4f9b944 is UP/READY (leaving forced > maintenance). > Feb 7 16:34:40 ip-172-31-33-146 haproxy[42442]: Server > pages/bdb47d1ac9644c5f99c5e90dd4f9b944 is UP/READY (leaving forced > maintenance). > Feb 7 16:34:50 ip-172-31-33-146 haproxy[42442]: [WARNING] (42442) : Server > pages/bdb47d1ac9644c5f99c5e90dd4f9b944 is UP. 1 active and 0 backup servers > online. 0 sessions requeued, 0 total in queue. > Feb 7 16:34:50 ip-172-31-33-146 haproxy[42442]: Server > pages/bdb47d1ac9644c5f99c5e90dd4f9b944 is UP. 1 active and 0 backup servers > online. 0 sessions requeued, 0 total in queue. > Feb 7 16:35:16 ip-172-31-33-146 haproxy[42442]: 82.66.114.242:36698 > [07/Feb/2023:16:35:01.250] www~ pages/<NOSRV> 0/15001/-1/-1/15001 503 4793 - > - sQ-- 1/1/0/0/0 0/1 "GET / HTTP/1.1" www.XXXXXXX Wget/1.20.3 (linux-gnu) > > > The servers state is like this: > > echo "show servers state pages" |netcat -w 2 172.31.33.146 9999 > 1 > # be_id be_name srv_id srv_name srv_addr srv_op_state srv_admin_state > srv_uweight srv_iweight srv_time_since_last_change srv_check_status > srv_check_result srv_check_health srv_check_state srv_agent_state > bk_f_forced_id srv_f_forced_id srv_fqdn srv_port srvrecord srv_use_ssl > srv_check_port srv_check_addr srv_agent_addr srv_agent_port > 5 pages 1 bdb47d1ac9644c5f99c5e90dd4f9b944 172.31.35.239 2 0 10 10 2087 15 3 > 4 6 0 0 0 - 80 - 0 0 - - 0 > > srv_check_result is 3 which indicates the healthchecks are fine. > > > I'm a bit baffled by the situation. If someone has a bit more experience in > inserting backends on the fly with L7 checks, i'll be gratefull.
Very interesting. I must confess I have no idea at the moment. Among the tests you've done, did you always try to add a first server to an empty backend or did you also try to add a server to a backend that already had one ? I'm suspecting that something, somewhere indicates at boot time that there is no server in this backend and that this "something" is not changed when adding one later, and could be used to entirely bypass the LB algorithm. That's just a wild guess of course. Similarly it would be interesting to know if starting empty and adding yet another server unblocks the situation. Thanks, Willy