Re: HAProxy and TIME_WAIT
Thank you, works like a charm ! 2011/11/30 Willy Tarreau > On Wed, Nov 30, 2011 at 06:10:29PM +0200, Daniel Rankov wrote: > > Hi, Thank you, these explonations are really helpfull. > > Now may be because of a bug or something but "option nolinger" is not > > working for backend. it works great for frontend. And I've tested putting > > this option all over the config file... That's is what had confused me. > > Indeed you're right, I can reproduce this behaviour too. It happened when > we introduced the systematic shutdown() before the close() to avoid > resetting > too many connections. Please apply the attached patch which fixes it. > > Thanks for your report, > Willy > >
Re: HAProxy and TIME_WAIT
On Wed, Nov 30, 2011 at 06:10:29PM +0200, Daniel Rankov wrote: > Hi, Thank you, these explonations are really helpfull. > Now may be because of a bug or something but "option nolinger" is not > working for backend. it works great for frontend. And I've tested putting > this option all over the config file... That's is what had confused me. Indeed you're right, I can reproduce this behaviour too. It happened when we introduced the systematic shutdown() before the close() to avoid resetting too many connections. Please apply the attached patch which fixes it. Thanks for your report, Willy >From b7e257fe61890e4edc839d76dc0223a8d5bdb0f2 Mon Sep 17 00:00:00 2001 From: Willy Tarreau Date: Wed, 30 Nov 2011 18:02:24 +0100 Subject: BUG: tcp: option nolinger does not work on backends Daniel Rankov reported that "option nolinger" is inefficient on backends. The reason is that it is set on the file descriptor only, which does not prevent haproxy from performing a clean shutdown() before closing. We must set the flag on the stream_interface instead if we want an RST to be emitted upon active close. --- src/proto_tcp.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/src/proto_tcp.c b/src/proto_tcp.c index 37d9054..5ccfb81 100644 --- a/src/proto_tcp.c +++ b/src/proto_tcp.c @@ -239,7 +239,7 @@ int tcpv4_connect_server(struct stream_interface *si, setsockopt(fd, SOL_SOCKET, SO_KEEPALIVE, (char *) &one, sizeof(one)); if (be->options & PR_O_TCP_NOLING) - setsockopt(fd, SOL_SOCKET, SO_LINGER, (struct linger *) &nolinger, sizeof(struct linger)); + si->flags |= SI_FL_NOLINGER; /* allow specific binding : * - server-specific at first -- 1.7.2.3
Re: HAProxy and TIME_WAIT
Hi, Thank you, these explonations are really helpfull. Now may be because of a bug or something but "option nolinger" is not working for backend. it works great for frontend. And I've tested putting this option all over the config file... That's is what had confused me. OS is centos6, HA-Proxy version 1.4.18. here is the simplified config file, I quess there is nothing wrong with it, but still: global maxconn 32000 daemon log 127.0.0.1 local1 info defaults log global option tcplog maxconn 32000 frontend https-in bind 192.168.2.38:443 default_backend servers-https option nolinger backend servers-https mode tcp balance source option redispatch option nolinger server jetty-1 127.0.0.1:8443 With this configuration haproxy closes the connection to client with RST, but with backend it does not. By putting "option nolinger" in defaults section it works the same way. Am I wrong or is it a bug ? Thank you 2011/11/30 Willy Tarreau > On Wed, Nov 30, 2011 at 03:56:14PM +0200, Daniel Rankov wrote: > > Ok, now I'm kind of stuck here. > > Let me share you my observations on my really simple evirionment: > > for client I use wget on server with ip 192.168.2.30 > > haproxy is set on server with ip 192.168.2.38 > > and haproxy and web serer comunicate on 127.0.0.1. haproxy is in tcpmode. > > this is the monitored tcpdump for connection client to haproxy /just the > > closing connection part/ : > > > > 14:56:40.448210 IP 192.168.2.30.55867 > 192.168.2.38.443: . ack 7983 win > > 204 > > 14:56:40.448849 IP 192.168.2.30.55867 > 192.168.2.38.443: F 618:618(0) > ack > > 7983 win 204 > > 14:56:40.449513 IP 192.168.2.38.443 > 192.168.2.30.55867: F 7983:7983(0) > > ack 619 win 62 > > 14:56:40.449656 IP 192.168.2.30.55867 > 192.168.2.38.443: . ack 7984 win > > 204 > > > > and this is tcpdump for 127.0.0.1 /just the closing part again/ : > > > > 14:56:40.447887 IP 127.0.0.1.59302 > 127.0.0.1.8443: . ack 7983 win 386 > > > > 14:56:40.448914 IP 127.0.0.1.59302 > 127.0.0.1.8443: F 618:618(0) ack > 7983 > > win 386 > > 14:56:40.449236 IP 127.0.0.1.8443 > 127.0.0.1.59302: F 7983:7983(0) ack > 619 > > win 273 > > 14:56:40.449272 IP 127.0.0.1.59302 > 127.0.0.1.8443: . ack 7984 win 386 > > > > > > So that showes me that the connections from haproxy to webserver are > closed > > with FIN/FIN-ACK/ACK. > > here is netstat -anpo | grep TIME: > > tcp0 0 127.0.0.1:59302 127.0.0.1:8443 > > TIME_WAIT - timewait (58.73/0/0) > > > > is that the expected bahaviour ? > > Yes, if you're in TCP mode (I though you were using HTTP mode), it's > perfectly > expected because in TCP mode there is no way to know if some important data > were sent and not received by the other side, so you cannot use an RST to > force > a close. > > Also, in TCP mode, haproxy just relays on the other side what it sees. So > as > you can see, wget closes the connection to haproxy, then haproxy does the > same > with the server. > > If you want to force an RST, you can use "option nolinger" in the backend. > But > then again, this is really not recommended since it can lead to incomplete > data > being received by the server. In the case of HTTPS, it should not be an > issue > due to the SSL closing handshake, but this is something to keep in mind. > > Regards, > Willy > >
Re: HAProxy and TIME_WAIT
On Wed, Nov 30, 2011 at 03:56:14PM +0200, Daniel Rankov wrote: > Ok, now I'm kind of stuck here. > Let me share you my observations on my really simple evirionment: > for client I use wget on server with ip 192.168.2.30 > haproxy is set on server with ip 192.168.2.38 > and haproxy and web serer comunicate on 127.0.0.1. haproxy is in tcpmode. > this is the monitored tcpdump for connection client to haproxy /just the > closing connection part/ : > > 14:56:40.448210 IP 192.168.2.30.55867 > 192.168.2.38.443: . ack 7983 win > 204 > 14:56:40.448849 IP 192.168.2.30.55867 > 192.168.2.38.443: F 618:618(0) ack > 7983 win 204 > 14:56:40.449513 IP 192.168.2.38.443 > 192.168.2.30.55867: F 7983:7983(0) > ack 619 win 62 > 14:56:40.449656 IP 192.168.2.30.55867 > 192.168.2.38.443: . ack 7984 win > 204 > > and this is tcpdump for 127.0.0.1 /just the closing part again/ : > > 14:56:40.447887 IP 127.0.0.1.59302 > 127.0.0.1.8443: . ack 7983 win 386 > > 14:56:40.448914 IP 127.0.0.1.59302 > 127.0.0.1.8443: F 618:618(0) ack 7983 > win 386 > 14:56:40.449236 IP 127.0.0.1.8443 > 127.0.0.1.59302: F 7983:7983(0) ack 619 > win 273 > 14:56:40.449272 IP 127.0.0.1.59302 > 127.0.0.1.8443: . ack 7984 win 386 > > > So that showes me that the connections from haproxy to webserver are closed > with FIN/FIN-ACK/ACK. > here is netstat -anpo | grep TIME: > tcp0 0 127.0.0.1:59302 127.0.0.1:8443 > TIME_WAIT - timewait (58.73/0/0) > > is that the expected bahaviour ? Yes, if you're in TCP mode (I though you were using HTTP mode), it's perfectly expected because in TCP mode there is no way to know if some important data were sent and not received by the other side, so you cannot use an RST to force a close. Also, in TCP mode, haproxy just relays on the other side what it sees. So as you can see, wget closes the connection to haproxy, then haproxy does the same with the server. If you want to force an RST, you can use "option nolinger" in the backend. But then again, this is really not recommended since it can lead to incomplete data being received by the server. In the case of HTTPS, it should not be an issue due to the SSL closing handshake, but this is something to keep in mind. Regards, Willy
Re: HAProxy and TIME_WAIT
Ok, now I'm kind of stuck here. Let me share you my observations on my really simple evirionment: for client I use wget on server with ip 192.168.2.30 haproxy is set on server with ip 192.168.2.38 and haproxy and web serer comunicate on 127.0.0.1. haproxy is in tcpmode. this is the monitored tcpdump for connection client to haproxy /just the closing connection part/ : 14:56:40.448210 IP 192.168.2.30.55867 > 192.168.2.38.443: . ack 7983 win 204 14:56:40.448849 IP 192.168.2.30.55867 > 192.168.2.38.443: F 618:618(0) ack 7983 win 204 14:56:40.449513 IP 192.168.2.38.443 > 192.168.2.30.55867: F 7983:7983(0) ack 619 win 62 14:56:40.449656 IP 192.168.2.30.55867 > 192.168.2.38.443: . ack 7984 win 204 and this is tcpdump for 127.0.0.1 /just the closing part again/ : 14:56:40.447887 IP 127.0.0.1.59302 > 127.0.0.1.8443: . ack 7983 win 386 14:56:40.448914 IP 127.0.0.1.59302 > 127.0.0.1.8443: F 618:618(0) ack 7983 win 386 14:56:40.449236 IP 127.0.0.1.8443 > 127.0.0.1.59302: F 7983:7983(0) ack 619 win 273 14:56:40.449272 IP 127.0.0.1.59302 > 127.0.0.1.8443: . ack 7984 win 386 So that showes me that the connections from haproxy to webserver are closed with FIN/FIN-ACK/ACK. here is netstat -anpo | grep TIME: tcp0 0 127.0.0.1:59302 127.0.0.1:8443 TIME_WAIT - timewait (58.73/0/0) is that the expected bahaviour ? All the best ! 2011/11/29 Willy Tarreau > Hi Daniel, > > On Tue, Nov 29, 2011 at 06:10:46PM +0200, Daniel Rankov wrote: > > For sure TIME_WAIT connections are not an issue when thay keep > information > > about sockets to clients, but when TIME_WAIT connections keep sockets > bussy > > for your host where HAProxy is deployed to the backend the limit can be > > reached - it's defined by ip_local_port_range. > > Here is what I mean: > > Client -(1)-> HAProxy -(2)-> Webserver > > / it doesn't metter if the web server and haproxy are on the same > server./ > > I) client connects to haproxy > > socket is tooken - clientIP:random_port:haproxy_ip:haproxy_port > > > > II) haproxy connects to webserver > > socket is tooken haproxy_local_ip:random_port:backend_ip:backend_port > > > > III) client closes a conneciton to haproxy (1) in normal way - > > FIN/FIN-ACK/ACK > > this way we have one connections that goes from ESTABLISHED to TIME_WAIT > > state. we don't really care about this TIME_WAIT connection beacause the > > socket that is tooken is between a client and haproxy > > - clientIP:random_port:haproxy_ip:haproxy_port > > > > IV) haproxy closes the connection to backend (2) with FIN/FIN-ACK/ACK > > Now this ESTABLISHED connection goes to TIME_WAIT state. and the socket > > that is tooken is between the haporxy and backend server. > > I agree on this point and this is why it does not happen :-) > > Haproxy uses an RST to close the connection to the backend server precisely > because of this, otherwise it would not work at all. You can strace it, you > will notice that it does a setsockopt(SO_LINGER, {0}) before closing. In > fact, you cannot even configure it not to do this because it would cause > too > much harm. > > (...) > > I believe that the common architecture is that backend servers are > > phisically close to haproxy and are on high speed networks where no > packet > > lost is expected. So we don't really need TIME_WAIT state here. It's not > > needed on localhost for sure. > > When haproxy closes the connection to a server, we never need the TIME_WAIT > anyway, because if it closes, it means it has nothing left to say to the > server and is not interested in getting its response. So even if some data > were lost, it would not be an issue. > > For instance, one situation where you can observe this close is when you > enable forceclose or http-server-close. You'll see that as soon as the > last byte of payload is received, haproxy sends an RST to the server to > release the connection so that another pending request may immediately > reuse it. > > Even if the RST was lost, a packet from the server would reach the haproxy > machine and match no known connection, causing an RST in return. > > That said, I completely agree with all your analysis and that's what has > caused me a lot of gray hair when implementing the client-side keep-alive, > precisely because I needed a way to make haproxy close the server > connection > without being affected by the TIME_WAIT on this side. > > Best regards, > Willy > >
Re: HAProxy and TIME_WAIT
Hi Daniel, On Tue, Nov 29, 2011 at 06:10:46PM +0200, Daniel Rankov wrote: > For sure TIME_WAIT connections are not an issue when thay keep information > about sockets to clients, but when TIME_WAIT connections keep sockets bussy > for your host where HAProxy is deployed to the backend the limit can be > reached - it's defined by ip_local_port_range. > Here is what I mean: > Client -(1)-> HAProxy -(2)-> Webserver > / it doesn't metter if the web server and haproxy are on the same server./ > I) client connects to haproxy > socket is tooken - clientIP:random_port:haproxy_ip:haproxy_port > > II) haproxy connects to webserver > socket is tooken haproxy_local_ip:random_port:backend_ip:backend_port > > III) client closes a conneciton to haproxy (1) in normal way - > FIN/FIN-ACK/ACK > this way we have one connections that goes from ESTABLISHED to TIME_WAIT > state. we don't really care about this TIME_WAIT connection beacause the > socket that is tooken is between a client and haproxy > - clientIP:random_port:haproxy_ip:haproxy_port > > IV) haproxy closes the connection to backend (2) with FIN/FIN-ACK/ACK > Now this ESTABLISHED connection goes to TIME_WAIT state. and the socket > that is tooken is between the haporxy and backend server. I agree on this point and this is why it does not happen :-) Haproxy uses an RST to close the connection to the backend server precisely because of this, otherwise it would not work at all. You can strace it, you will notice that it does a setsockopt(SO_LINGER, {0}) before closing. In fact, you cannot even configure it not to do this because it would cause too much harm. (...) > I believe that the common architecture is that backend servers are > phisically close to haproxy and are on high speed networks where no packet > lost is expected. So we don't really need TIME_WAIT state here. It's not > needed on localhost for sure. When haproxy closes the connection to a server, we never need the TIME_WAIT anyway, because if it closes, it means it has nothing left to say to the server and is not interested in getting its response. So even if some data were lost, it would not be an issue. For instance, one situation where you can observe this close is when you enable forceclose or http-server-close. You'll see that as soon as the last byte of payload is received, haproxy sends an RST to the server to release the connection so that another pending request may immediately reuse it. Even if the RST was lost, a packet from the server would reach the haproxy machine and match no known connection, causing an RST in return. That said, I completely agree with all your analysis and that's what has caused me a lot of gray hair when implementing the client-side keep-alive, precisely because I needed a way to make haproxy close the server connection without being affected by the TIME_WAIT on this side. Best regards, Willy
Re: HAProxy and TIME_WAIT
For sure TIME_WAIT connections are not an issue when thay keep information about sockets to clients, but when TIME_WAIT connections keep sockets bussy for your host where HAProxy is deployed to the backend the limit can be reached - it's defined by ip_local_port_range. Here is what I mean: Client -(1)-> HAProxy -(2)-> Webserver / it doesn't metter if the web server and haproxy are on the same server./ I) client connects to haproxy socket is tooken - clientIP:random_port:haproxy_ip:haproxy_port II) haproxy connects to webserver socket is tooken haproxy_local_ip:random_port:backend_ip:backend_port III) client closes a conneciton to haproxy (1) in normal way - FIN/FIN-ACK/ACK this way we have one connections that goes from ESTABLISHED to TIME_WAIT state. we don't really care about this TIME_WAIT connection beacause the socket that is tooken is between a client and haproxy - clientIP:random_port:haproxy_ip:haproxy_port IV) haproxy closes the connection to backend (2) with FIN/FIN-ACK/ACK Now this ESTABLISHED connection goes to TIME_WAIT state. and the socket that is tooken is between the haporxy and backend server. it looks like haproxy_local_ip:random_port:backend_ip:backend_port if we say that haproxy and webserver will comunicate on 127.0.0.1 and web server working on port 8080 - then we have a socket like that tooken: 127.0.0.1:RANDOM_PORT:127.0.0.1:8080 This RANDOM_PORT is in range defined in Sysctl ip_local_port_range This connection on CentOS will be kept for 60seconds. As you see on a loaded server this limit of open ports might be reached. (some math - by default we have about 3 open ports for 60 seconds is about 500 new_connections/second.) That is why it would be great one to be able to configure haproxy to reset connection to backend. I believe that the common architecture is that backend servers are phisically close to haproxy and are on high speed networks where no packet lost is expected. So we don't really need TIME_WAIT state here. It's not needed on localhost for sure. All the best ! 2011/11/29 Willy Tarreau > On Tue, Nov 29, 2011 at 09:41:30AM -0500, James Bardin wrote: > > From looking into this, I don't see an option in HAProxy to RST all > > closed connections on a backend, though the documentation makes it > > sound like the nolinger options does do this. Hopefully one of the > > devs (Willy?) can chime in with some advice. > > Indeed, nolinger does this but it's strongly advised not to use it, > because it precisely kills the TCP connection (reason why there is no > time_wait left), which causes truncated objects on the remote server > if the last segments are lost. The reason is that these lost segments > will not be retransmitted and the client will get an RST instead. > > TIME_WAIT sockets are not an issue on a server. The only trouble they're > causing is that they pollute the "netstat -a" output. But that's all. > These sockets are totally normal and expected. My record is 5 million > on a heavily loaded server :-) > > There is absolutely no reason to worry about these sockets, they're > closed and waiting for either the TCP timer, a SYN or an RST to expire. > > Best regards, > Willy > >
Re: HAProxy and TIME_WAIT
I would preffer not to use tw_reuse, couse that will affect the whole server tcp comunication, not just one process (HAProxy in this case). So I've tested nolinger but what it does isn't completely the solution. That's what happens when ising it - lets say that client is closing the connection to HAProxy with RST - then RST is sent to the backend. But when a client closes the connection normally with FIN/FIN-ACK/ACK then FIN/FIN-ACK/ACK is used by Haproxy to close the connection to backend. That way we hit the core problem again. I'm looking for a solution as decribed in TCP illustrated: "it's also possible to abort a connection by sending a reset instead of a FIN. This is sometimes called an abortive release." so that no matter how the client closes a connection always RST is sent to backend. and it's interesting can it be done if the backend closes a connection with FIN, a RST to be sent from HAProxy to backend. This way no useless resources will be taken. Greetings 2011/11/28 James Bardin > On Mon, Nov 28, 2011 at 12:28 PM, Daniel Rankov > wrote: > > Yeap, I'm aware of net.ipv4.tcp_tw_reuse and the need of TIME_WAIT state, > > but still if there is a way to send a RST /either configuration or > compile > > parameter/ the connection will be destroyed. > > > > TIME_WAIT is usually not a problem if port reuse is enabled (I haven't > seen an example otherwise), and you will usually have FIN_WAIT1 > sockets if there is a problem with connections terminating badly. > > Now that I recall that the socket option to always send RST packets is > called SO_NOLINGER, I noticed that there is an 'option nolinger' for > both front and backends in happroxy. > > > -jim >
Re: HAProxy and TIME_WAIT
On Mon, Nov 28, 2011 at 12:28 PM, Daniel Rankov wrote: > Yeap, I'm aware of net.ipv4.tcp_tw_reuse and the need of TIME_WAIT state, > but still if there is a way to send a RST /either configuration or compile > parameter/ the connection will be destroyed. > TIME_WAIT is usually not a problem if port reuse is enabled (I haven't seen an example otherwise), and you will usually have FIN_WAIT1 sockets if there is a problem with connections terminating badly. Now that I recall that the socket option to always send RST packets is called SO_NOLINGER, I noticed that there is an 'option nolinger' for both front and backends in happroxy. -jim
Re: HAProxy and TIME_WAIT
Yeap, I'm aware of net.ipv4.tcp_tw_reuse and the need of TIME_WAIT state, but still if there is a way to send a RST /either configuration or compile parameter/ the connection will be destroyed. 2011/11/28 James Bardin > On Mon, Nov 28, 2011 at 11:50 AM, Daniel Rankov > wrote: > And on > > loaded server this will cause trouble. Isn't there a chance for HAProxy > to > > send RST, so that conneciton will be dropped ? > > An RST packet won't make the TIME_WAIT socket disappear. It's part if > the TCP protocol, and a socket will sit in that state for 2 minutes > after closing. > > > You can put `net.ipv4.tcp_tw_reuse = 1` in your sysctl.conf to allow > sockets in TIME_WAIT to be reused is needed. > > -jim >
Re: HAProxy and TIME_WAIT
On Mon, Nov 28, 2011 at 11:50 AM, Daniel Rankov wrote: And on > loaded server this will cause trouble. Isn't there a chance for HAProxy to > send RST, so that conneciton will be dropped ? An RST packet won't make the TIME_WAIT socket disappear. It's part if the TCP protocol, and a socket will sit in that state for 2 minutes after closing. You can put `net.ipv4.tcp_tw_reuse = 1` in your sysctl.conf to allow sockets in TIME_WAIT to be reused is needed. -jim