Re: pf, relayd, TCP keep alive and NAT, oh my!
On 2021-06-02, Cameron Simpson wrote: > On 01Jun2021 20:43, Stuart Henderson wrote: >>On 2021-06-01, Cameron Simpson wrote: >>> If I had TCP keep alive turned on, both ends might tidy themselves up. >>> I can't enable that on the clients (various mail readers) or, >>> apparently, on the server configuration. I can't do it in PF because PF >>> just copies packets. I can't seem to do it in relayd either, though that >>> seems the obvious way to intercept the connection for this purpose. >> >>It looks like courier-imap does enable SO_KEEPALIVE if available. > > Hmm. Ok. I wonder how recent that is? I have 5.0.6 IIRC, and current is > 5.1.something. A long time - it was there in the initial git commit when the files were imported from svn, certainly before 5.0.6. https://github.com/svarshavchik/courier-libs/blame/142f42378608e593eb36ceb33895db99948427aa/tcpd/tcpd.c#L1238 >>$ grep . /proc/sys/net/ipv4/tcp_keepalive_* >>/proc/sys/net/ipv4/tcp_keepalive_intvl:75 >>/proc/sys/net/ipv4/tcp_keepalive_probes:9 >>/proc/sys/net/ipv4/tcp_keepalive_time:7200 >> >>7200s (2h) initially, then every 75 seconds. (OpenBSD default times are >>long too; 14400 "slowhz" intervals = also 2h). > > Ah. A long time indeed. Yes, winding these down will help - the above > times are in the same magnitude as the time required to hit the > connection limits. Yes - set in the days before stateful firewalls and NAT devices with limited memory were more common, so the only thing they really needed to protect against was connections building up from clients that had crashed/powered off or with some broken network parhs.
Re: pf, relayd, TCP keep alive and NAT, oh my!
On 01Jun2021 20:43, Stuart Henderson wrote: >On 2021-06-01, Cameron Simpson wrote: >> If I had TCP keep alive turned on, both ends might tidy themselves up. >> I can't enable that on the clients (various mail readers) or, >> apparently, on the server configuration. I can't do it in PF because PF >> just copies packets. I can't seem to do it in relayd either, though that >> seems the obvious way to intercept the connection for this purpose. > >It looks like courier-imap does enable SO_KEEPALIVE if available. Hmm. Ok. I wonder how recent that is? I have 5.0.6 IIRC, and current is 5.1.something. >By default, keepalive timers are long; on a random Linux I had handy: > >$ grep . /proc/sys/net/ipv4/tcp_keepalive_* >/proc/sys/net/ipv4/tcp_keepalive_intvl:75 >/proc/sys/net/ipv4/tcp_keepalive_probes:9 >/proc/sys/net/ipv4/tcp_keepalive_time:7200 > >7200s (2h) initially, then every 75 seconds. (OpenBSD default times are >long too; 14400 "slowhz" intervals = also 2h). Ah. A long time indeed. Yes, winding these down will help - the above times are in the same magnitude as the time required to hit the connection limits. >> Plan B is to build the latest courier-imap from source if I find the >> time, but there may be no build option for this. I guess a single >> setsockopt() call in the source would be enough, _if_ that can be done >> on the accept end, which I haven't checked. > >https://tldp.org/HOWTO/TCP-Keepalive-HOWTO/addsupport.html but I don't think >you'll need it. Ta. >So you probably just need to lower tcp_keepalive_time, and perhaps adjust >tcp_keepalive_intvl. Note there is a tradeoff especially with mobile >clients; they will need to wake and transmit more often, so faster >keepalives will result in more battery/data use. I can wind it down to a handful of minutes without any serious impact I'd expect. >> Plan B0 might be to disable IMAP IDLE support. Hmm. > >Depends on timings whether that will help; think it's a last ditch effort >though, I think it will make things noticably worse for clients. Courier lets me change the advertised capabilities (it is not clear if that affects the actual capabilities). Not joy; possibly some clinets will try IDLE even if it isn't advertised and just cope if not supported, so maybe some clients are using IDLE successfully anyway. At any rate, dropping IDLE from the advertised list didn't help, and my hourly "restart imapd" cron is live again :-( I'll look at the keepalive settings on the server, many thanks! Cheers, Cameron Simpson
Re: pf, relayd, TCP keep alive and NAT, oh my!
On 01Jun2021 11:04, Claudio Jeker wrote: >Make sure you use 'block return' at least for the imap connections. I already do: set block-policy return [... and the first rule ...] # reject everything except as detailed below block return log >This >way when the state is dropped the firewall will issue a RST packet to the >server which will close the connection. Alas, no. I believe that the _modem_ is dropping its NAT state (or some upstream stateful switch is getting likewise bored) and that the connection is idle. The firewall's modem's probably sending an RST to the client if it tries to use the connection after the modem forgets it, or something, causing the client to make a new connection to recover. The state table on the firewall itself seems fine (about 30 connections, in keeping with the staff and devices in the office). The problem is server side (cloud mail server). The connection goes idle, the office modem forgets the NAT, the server never sees _any_ indication that the TCP is no longer valid because it's idle. >On OpenBSD there is the 'net.inet.tcp.always_keepalive' sysctl to >enable keepalive by default. So that is something you can enable on the IMAP >server to force keep-alive on there. Other systems have similar knobs. The IMAP server is Linux, so I'll look at that. Thanks! Also, setting this on the firewall and interposing relayd would also do the same trick. SO that will be my fallback plan. Thanks, Cameron Simpson
Re: pf, relayd, TCP keep alive and NAT, oh my!
On 01Jun2021 08:53, Dirk Coetzee wrote: >As a first guess, I would consider changing / implementing "set >optimization". This made massive difference on our customers satellite >internet connection. The customer has a terrestrial ISP connection. I've got satellite at home, and do indeed use this setting. I'm not sure it will help my client though. Cheers, Cameron Simpson
Re: pf, relayd, TCP keep alive and NAT, oh my!
On 2021-06-01, Cameron Simpson wrote: > If I had TCP keep alive turned on, both ends might tidy themselves up. > I can't enable that on the clients (various mail readers) or, > apparently, on the server configuration. I can't do it in PF because PF > just copies packets. I can't seem to do it in relayd either, though that > seems the obvious way to intercept the connection for this purpose. It looks like courier-imap does enable SO_KEEPALIVE if available. By default, keepalive timers are long; on a random Linux I had handy: $ grep . /proc/sys/net/ipv4/tcp_keepalive_* /proc/sys/net/ipv4/tcp_keepalive_intvl:75 /proc/sys/net/ipv4/tcp_keepalive_probes:9 /proc/sys/net/ipv4/tcp_keepalive_time:7200 7200s (2h) initially, then every 75 seconds. (OpenBSD default times are long too; 14400 "slowhz" intervals = also 2h). > Plan B is to build the latest courier-imap from source if I find the > time, but there may be no build option for this. I guess a single > setsockopt() call in the source would be enough, _if_ that can be done > on the accept end, which I haven't checked. https://tldp.org/HOWTO/TCP-Keepalive-HOWTO/addsupport.html but I don't think you'll need it. So you probably just need to lower tcp_keepalive_time, and perhaps adjust tcp_keepalive_intvl. Note there is a tradeoff especially with mobile clients; they will need to wake and transmit more often, so faster keepalives will result in more battery/data use. > Plan B0 might be to disable IMAP IDLE support. Hmm. Depends on timings whether that will help; think it's a last ditch effort though, I think it will make things noticably worse for clients.
Re: pf, relayd, TCP keep alive and NAT, oh my!
On Tue, Jun 01, 2021 at 10:25:38AM +1000, Cameron Simpson wrote: > Can I enforce or implement TCP keep alives on a TCP stream via my > firewall? > > Background: > > I've got a client with an OpenBSD firewall and a Telstra NBN modem as > their modem. > > Their IMAP server is upstream in the cloud (Unbuntu, courier imap). I > have this odd problem which I am beginning to suspect is the NBN modem > getting bored and dropping its NAT entries. Let me explain... > > At the firewall end I see about 30 ESTABLISHED connections to the IMAP > server. At the IMAP server I see over 500, which is about where the IMAP > service stops accepting new connections, leading to errors from the > client mail readers. > > My current theory is that the IMAP client connections issue the IMAP > IDLE command and go passive, waiting for email notifications from the > server. So we have an idle TCP connection across the firewall and > across the NBN modem (which NATs). > > My conjecture is that at some point the modem discards idle connection > states. (This could just as well happen at any other intermediate > stateful router too.) After that event, the client end does something > which tries to use the connection, gets an RST from the modem, clean > tidyup happens on the client and in the firewall. > > At the server end, none of this is seen and the imapd just sits around > idle, never releasing the connection and never stopping the matching > daemon process. This gradually rises to hit the server's configured > connection limit and it stops accepting new things. > > If I had TCP keep alive turned on, both ends might tidy themselves up. > I can't enable that on the clients (various mail readers) or, > apparently, on the server configuration. I can't do it in PF because PF > just copies packets. I can't seem to do it in relayd either, though that > seems the obvious way to intercept the connection for this purpose. > > Any suggestions? Make sure you use 'block return' at least for the imap connections. This way when the state is dropped the firewall will issue a RST packet to the server which will close the connection. On OpenBSD there is the 'net.inet.tcp.always_keepalive' sysctl to enable keepalive by default. So that is something you can enable on the IMAP server to force keep-alive on there. Other systems have similar knobs. -- :wq Claudio
Re: pf, relayd, TCP keep alive and NAT, oh my!
Hi Cameron, As a first guess, I would consider changing / implementing "set optimization". This made massive difference on our customers satellite internet connection. man pf.conf set optimization environment Optimize state timeouts for one of the following network environments: aggressive Aggressively expire connections. This can greatly reduce the memory usage of the firewall at the cost of dropping idle connections early. conservative Extremely conservative settings. Avoid dropping legitimate connections at the expense of greater memory utilization (possibly much greater on a busy network) and slightly increased processor utilization. high-latency A high-latency environment (such as a satellite connection). normal A normal network environment. Suitable for almost all networks. satellite Alias for high-latency. The default value is normal. -Original Message- From: owner-m...@openbsd.org On Behalf Of Cameron Simpson Sent: Tuesday, 1 June 2021 8:26 AM To: misc@openbsd.org Subject: pf, relayd, TCP keep alive and NAT, oh my! Can I enforce or implement TCP keep alives on a TCP stream via my firewall? Background: I've got a client with an OpenBSD firewall and a Telstra NBN modem as their modem. Their IMAP server is upstream in the cloud (Unbuntu, courier imap). I have this odd problem which I am beginning to suspect is the NBN modem getting bored and dropping its NAT entries. Let me explain... At the firewall end I see about 30 ESTABLISHED connections to the IMAP server. At the IMAP server I see over 500, which is about where the IMAP service stops accepting new connections, leading to errors from the client mail readers. My current theory is that the IMAP client connections issue the IMAP IDLE command and go passive, waiting for email notifications from the server. So we have an idle TCP connection across the firewall and across the NBN modem (which NATs). My conjecture is that at some point the modem discards idle connection states. (This could just as well happen at any other intermediate stateful router too.) After that event, the client end does something which tries to use the connection, gets an RST from the modem, clean tidyup happens on the client and in the firewall. At the server end, none of this is seen and the imapd just sits around idle, never releasing the connection and never stopping the matching daemon process. This gradually rises to hit the server's configured connection limit and it stops accepting new things. If I had TCP keep alive turned on, both ends might tidy themselves up. I can't enable that on the clients (various mail readers) or, apparently, on the server configuration. I can't do it in PF because PF just copies packets. I can't seem to do it in relayd either, though that seems the obvious way to intercept the connection for this purpose. Any suggestions? I haven't fully validated my conjecture yet, BTW. It just fits the symptoms I see. Plan B is to build the latest courier-imap from source if I find the time, but there may be no build option for this. I guess a single setsockopt() call in the source would be enough, _if_ that can be done on the accept end, which I haven't checked. Plan B0 might be to disable IMAP IDLE support. Hmm. Cheers, Cameron Simpson
pf, relayd, TCP keep alive and NAT, oh my!
Can I enforce or implement TCP keep alives on a TCP stream via my firewall? Background: I've got a client with an OpenBSD firewall and a Telstra NBN modem as their modem. Their IMAP server is upstream in the cloud (Unbuntu, courier imap). I have this odd problem which I am beginning to suspect is the NBN modem getting bored and dropping its NAT entries. Let me explain... At the firewall end I see about 30 ESTABLISHED connections to the IMAP server. At the IMAP server I see over 500, which is about where the IMAP service stops accepting new connections, leading to errors from the client mail readers. My current theory is that the IMAP client connections issue the IMAP IDLE command and go passive, waiting for email notifications from the server. So we have an idle TCP connection across the firewall and across the NBN modem (which NATs). My conjecture is that at some point the modem discards idle connection states. (This could just as well happen at any other intermediate stateful router too.) After that event, the client end does something which tries to use the connection, gets an RST from the modem, clean tidyup happens on the client and in the firewall. At the server end, none of this is seen and the imapd just sits around idle, never releasing the connection and never stopping the matching daemon process. This gradually rises to hit the server's configured connection limit and it stops accepting new things. If I had TCP keep alive turned on, both ends might tidy themselves up. I can't enable that on the clients (various mail readers) or, apparently, on the server configuration. I can't do it in PF because PF just copies packets. I can't seem to do it in relayd either, though that seems the obvious way to intercept the connection for this purpose. Any suggestions? I haven't fully validated my conjecture yet, BTW. It just fits the symptoms I see. Plan B is to build the latest courier-imap from source if I find the time, but there may be no build option for this. I guess a single setsockopt() call in the source would be enough, _if_ that can be done on the accept end, which I haven't checked. Plan B0 might be to disable IMAP IDLE support. Hmm. Cheers, Cameron Simpson