Re: Issues with dynamic inserted servers

2023-02-07 Thread Willy Tarreau
Hi Thomas,

On Tue, Feb 07, 2023 at 06:18:26PM +0100, Thomas Pedoussaut wrote:
(...)
> As you can see in the logs, servers are seen, registered and marked as UP.
> But a request made a few seconds later, the backend can't find a suitable
> server to fulfill the request.
> 
> 
> Feb  7 16:34:27 ip-172-31-33-146 haproxy[42442]: [NOTICE] (42442) : CLI :
> 'server pages/bdb47d1ac9644c5f99c5e90dd4f9b944' : New server registered.
> Feb  7 16:34:40 ip-172-31-33-146 haproxy[42442]: [WARNING] (42442) : Server
> pages/bdb47d1ac9644c5f99c5e90dd4f9b944 is UP/READY (leaving forced
> maintenance).
> Feb  7 16:34:40 ip-172-31-33-146 haproxy[42442]: Server
> pages/bdb47d1ac9644c5f99c5e90dd4f9b944 is UP/READY (leaving forced
> maintenance).
> Feb  7 16:34:50 ip-172-31-33-146 haproxy[42442]: [WARNING] (42442) : Server
> pages/bdb47d1ac9644c5f99c5e90dd4f9b944 is UP. 1 active and 0 backup servers
> online. 0 sessions requeued, 0 total in queue.
> Feb  7 16:34:50 ip-172-31-33-146 haproxy[42442]: Server
> pages/bdb47d1ac9644c5f99c5e90dd4f9b944 is UP. 1 active and 0 backup servers
> online. 0 sessions requeued, 0 total in queue.
> Feb  7 16:35:16 ip-172-31-33-146 haproxy[42442]: 82.66.114.242:36698
> [07/Feb/2023:16:35:01.250] www~ pages/ 0/15001/-1/-1/15001 503 4793 -
> - sQ-- 1/1/0/0/0 0/1 "GET / HTTP/1.1" www.XXX Wget/1.20.3 (linux-gnu)
> 
> 
> The servers state is like this:
> 
> echo "show servers state pages" |netcat -w 2 172.31.33.146 
> 1
> # be_id be_name srv_id srv_name srv_addr srv_op_state srv_admin_state
> srv_uweight srv_iweight srv_time_since_last_change srv_check_status
> srv_check_result srv_check_health srv_check_state srv_agent_state
> bk_f_forced_id srv_f_forced_id srv_fqdn srv_port srvrecord srv_use_ssl
> srv_check_port srv_check_addr srv_agent_addr srv_agent_port
> 5 pages 1 bdb47d1ac9644c5f99c5e90dd4f9b944 172.31.35.239 2 0 10 10 2087 15 3
> 4 6 0 0 0 - 80 - 0 0 - - 0
> 
> srv_check_result is 3 which  indicates the healthchecks are fine.
> 
> 
> I'm a bit baffled by the situation. If someone has a bit more experience in
> inserting backends on the fly with L7 checks, i'll be gratefull.

Very interesting. I must confess I have no idea at the moment. Among the
tests you've done, did you always try to add a first server to an empty
backend or did you also try to add a server to a backend that already had
one ? I'm suspecting that something, somewhere indicates at boot time
that there is no server in this backend and that this "something" is not
changed when adding one later, and could be used to entirely bypass the
LB algorithm. That's just a wild guess of course.

Similarly it would be interesting to know if starting empty and adding
yet another server unblocks the situation.

Thanks,
Willy



Issues with dynamic inserted servers

2023-02-07 Thread Thomas Pedoussaut

[ I'm using HaProxy for 5 years but with static conf reloaded]

Due to issues with speed of discovery for backends through DNS on AWS, 
I'm writing my own system to insert servers on the fly in my load balancers.


As names for backend, I'm using an taskid from my cloud provider

backend pages

    timeout server 120s
    option forwardfor
    http-request redirect scheme https if ! { ssl_fc }
    option httpchk GET /health.php
    default-server inter 5s fall 3 rise 2
    balance random
    server bdb47d1ac9644c5f99c5e90dd4f9b944 172.31.35.239:80 weight 
10 maxconn 16 check slowstart 10s


With a config build from cluster status, everything is fine (16:33:58)

When AWS/ECS send me a new task, I register it with those 3 commands (at 
16:34:27):


echo "add server pages/bdb47d1ac9644c5f99c5e90dd4f9b944 172.31.35.239:80 
weight 10 maxconn 32 check inter 5s fall 3 rise 2 slowstart 10s " 
|netcat -w 2 172.31.33.146 

New server registered.

echo "enable health pages/bdb47d1ac9644c5f99c5e90dd4f9b944" |netcat -w 2 
172.31.33.146 


echo "enable server pages/bdb47d1ac9644c5f99c5e90dd4f9b944" |netcat -w 2 
172.31.33.146 



As you can see in the logs, servers are seen, registered and marked as 
UP. But a request made a few seconds later, the backend can't find a 
suitable server to fulfill the request.



Feb  7 16:33:29 ip-172-31-33-146 haproxy[42439]: [NOTICE]   (42439) : 
haproxy version is 2.7.2-1ppa1~jammy
Feb  7 16:33:29 ip-172-31-33-146 haproxy[42439]: [NOTICE] (42439) : path 
to executable is /usr/sbin/haproxy
Feb  7 16:33:29 ip-172-31-33-146 haproxy[42439]: [NOTICE] (42439) : New 
worker (42442) forked
Feb  7 16:33:29 ip-172-31-33-146 haproxy[42439]: [NOTICE] (42439) : 
Loading success.
Feb  7 16:33:58 ip-172-31-33-146 haproxy[42442]: 82.66.114.242:57352 
[07/Feb/2023:16:33:57.712] www~ pages/bdb47d1ac9644c5f99c5e90dd4f9b944 
0/0/0/1131/1141 200 67569 - -  1/1/0/0/0 0/0 "GET / HTTP/1.1" 
www. Wget/1.20.3 (linux-gnu)
Feb  7 16:34:15 ip-172-31-33-146 haproxy[42442]: [WARNING] (42442) : 
Server pages/bdb47d1ac9644c5f99c5e90dd4f9b944 is going DOWN for 
maintenance. 0 active and 0 backup servers left. 0 sessions active, 0 
requeued, 0 remaining in queue.
Feb  7 16:34:15 ip-172-31-33-146 haproxy[42442]: [ALERT] (42442) : 
backend 'pages' has no server available!
Feb  7 16:34:15 ip-172-31-33-146 haproxy[42442]: Server 
pages/bdb47d1ac9644c5f99c5e90dd4f9b944 is going DOWN for maintenance. 0 
active and 0 backup servers left. 0 sessions active, 0 requeued, 0 
remaining in queue.
Feb  7 16:34:15 ip-172-31-33-146 haproxy[42442]: backend pages has no 
server available!
Feb  7 16:34:20 ip-172-31-33-146 haproxy[42442]: [NOTICE] (42442) : 
Server deleted.
Feb  7 16:34:27 ip-172-31-33-146 haproxy[42442]: [NOTICE] (42442) : CLI 
: 'server pages/bdb47d1ac9644c5f99c5e90dd4f9b944' : New server registered.
Feb  7 16:34:40 ip-172-31-33-146 haproxy[42442]: [WARNING] (42442) : 
Server pages/bdb47d1ac9644c5f99c5e90dd4f9b944 is UP/READY (leaving 
forced maintenance).
Feb  7 16:34:40 ip-172-31-33-146 haproxy[42442]: Server 
pages/bdb47d1ac9644c5f99c5e90dd4f9b944 is UP/READY (leaving forced 
maintenance).
Feb  7 16:34:50 ip-172-31-33-146 haproxy[42442]: [WARNING] (42442) : 
Server pages/bdb47d1ac9644c5f99c5e90dd4f9b944 is UP. 1 active and 0 
backup servers online. 0 sessions requeued, 0 total in queue.
Feb  7 16:34:50 ip-172-31-33-146 haproxy[42442]: Server 
pages/bdb47d1ac9644c5f99c5e90dd4f9b944 is UP. 1 active and 0 backup 
servers online. 0 sessions requeued, 0 total in queue.
Feb  7 16:35:16 ip-172-31-33-146 haproxy[42442]: 82.66.114.242:36698 
[07/Feb/2023:16:35:01.250] www~ pages/ 0/15001/-1/-1/15001 503 
4793 - - sQ-- 1/1/0/0/0 0/1 "GET / HTTP/1.1" www.XXX Wget/1.20.3 
(linux-gnu)



The servers state is like this:

echo "show servers state pages" |netcat -w 2 172.31.33.146 
1
# be_id be_name srv_id srv_name srv_addr srv_op_state srv_admin_state 
srv_uweight srv_iweight srv_time_since_last_change srv_check_status 
srv_check_result srv_check_health srv_check_state srv_agent_state 
bk_f_forced_id srv_f_forced_id srv_fqdn srv_port srvrecord srv_use_ssl 
srv_check_port srv_check_addr srv_agent_addr srv_agent_port
5 pages 1 bdb47d1ac9644c5f99c5e90dd4f9b944 172.31.35.239 2 0 10 10 2087 
15 3 4 6 0 0 0 - 80 - 0 0 - - 0


srv_check_result is 3 which  indicates the healthchecks are fine.


I'm a bit baffled by the situation. If someone has a bit more experience 
in inserting backends on the fly with L7 checks, i'll be gratefull.



--

Thomas Pedoussaut




subscribe

2023-02-07 Thread Thomas Pedoussaut





Re: clarify close behaviour on http-request rules

2023-02-07 Thread William Dauchy
Hi Christopher,

Thanks for your answer.

On Tue, Feb 7, 2023 at 9:09 AM Christopher Faulet  wrote:
> The tarpit action is final. So it cannot be used in addition to a return or a
> deny action. For the wait-for-body action, indeed, it will wait for the body 
> or
> a full buffer for 1 second. Thus in this case, if the whole request can be
> stored in buffer and is received fast enough, this will mitigate your issue.

ah yes good point indeed, I forgot the behaviour of tarpit. I will
give it a try with `wait-for-body`.

> FYI, we are refactoring the way errors, aborts and shutdowns are handled
> internally. And the data draining at the mux level is definitely a subject we
> should address. It is a painful work but we hope to include it in the 2.8, at
> least partially.

thanks!
-- 
William



Re: clarify close behaviour on http-request rules

2023-02-07 Thread Christopher Faulet

Le 2/6/23 à 12:08, William Dauchy a écrit :

Hi Christopher,

On Fri, Feb 3, 2023 at 7:59 PM William Dauchy  wrote:

On Tue, Oct 18, 2022 at 4:15 PM Christopher Faulet  wrote:

On all HTX versions, K/A and close modes are handled in the H1 multiplexer.
Thus, on these versions, http_reply_and_close() is only closing the stream. The
multiplexer is responsible to close the client connection or not.

On pre-HTX versions, when http_reply_and_close() is used, the client connection
is also closed. It is a limitation of of HAProxy versions using the legacy HTTP.

Note there is a case where the connection client may be closed. If the HAProxy
response is returned before the end of the request, the client connection is
closed. There is no (not yet) draining mode at the mux level.


coming back on this very late:

`http-request wait-for-body time` or a `http-request tarpit` mitigate
the draining issue?
I am trying to find a workaround on a setup where we are behind
another L7 LB where we unexpectedly close the connection.

 > saying another way, what is going the behaviour of `http-request
return` if I have:

http-request wait-for-body time 1s if CONDITION_A
http-request deny if CONDITION_A

Is it going to wait for the request, and so mitigate the mentioned
drain limitation we currently have in the mux when `CONDITION_A`
matches?



Hi William,

The tarpit action is final. So it cannot be used in addition to a return or a 
deny action. For the wait-for-body action, indeed, it will wait for the body or 
a full buffer for 1 second. Thus in this case, if the whole request can be 
stored in buffer and is received fast enough, this will mitigate your issue.


FYI, we are refactoring the way errors, aborts and shutdowns are handled 
internally. And the data draining at the mux level is definitely a subject we 
should address. It is a painful work but we hope to include it in the 2.8, at 
least partially.


--
Christopher Faulet