Re: pf, relayd, TCP keep alive and NAT, oh my!

2021-06-02 Thread Stuart Henderson
On 2021-06-02, Cameron Simpson  wrote:
> On 01Jun2021 20:43, Stuart Henderson  wrote:
>>On 2021-06-01, Cameron Simpson  wrote:
>>> If I had TCP keep alive turned on, both ends might tidy themselves up.
>>> I can't enable that on the clients (various mail readers) or,
>>> apparently, on the server configuration. I can't do it in PF because PF
>>> just copies packets. I can't seem to do it in relayd either, though that
>>> seems the obvious way to intercept the connection for this purpose.
>>
>>It looks like courier-imap does enable SO_KEEPALIVE if available.
>
> Hmm. Ok. I wonder how recent that is? I have 5.0.6 IIRC, and current is 
> 5.1.something.

A long time - it was there in the initial git commit when the files were
imported from svn, certainly before 5.0.6. 

https://github.com/svarshavchik/courier-libs/blame/142f42378608e593eb36ceb33895db99948427aa/tcpd/tcpd.c#L1238

>>$ grep . /proc/sys/net/ipv4/tcp_keepalive_*
>>/proc/sys/net/ipv4/tcp_keepalive_intvl:75
>>/proc/sys/net/ipv4/tcp_keepalive_probes:9
>>/proc/sys/net/ipv4/tcp_keepalive_time:7200
>>
>>7200s (2h) initially, then every 75 seconds. (OpenBSD default times are
>>long too; 14400 "slowhz" intervals = also 2h).
>
> Ah. A long time indeed. Yes, winding these down will help - the above 
> times are in the same magnitude as the time required to hit the 
> connection limits.

Yes - set in the days before stateful firewalls and NAT devices with limited
memory were more common, so the only thing they really needed to
protect against was connections building up from clients that had
crashed/powered off or with some broken
network parhs.




Re: pf, relayd, TCP keep alive and NAT, oh my!

2021-06-01 Thread Cameron Simpson
On 01Jun2021 20:43, Stuart Henderson  wrote:
>On 2021-06-01, Cameron Simpson  wrote:
>> If I had TCP keep alive turned on, both ends might tidy themselves up.
>> I can't enable that on the clients (various mail readers) or,
>> apparently, on the server configuration. I can't do it in PF because PF
>> just copies packets. I can't seem to do it in relayd either, though that
>> seems the obvious way to intercept the connection for this purpose.
>
>It looks like courier-imap does enable SO_KEEPALIVE if available.

Hmm. Ok. I wonder how recent that is? I have 5.0.6 IIRC, and current is 
5.1.something.

>By default, keepalive timers are long; on a random Linux I had handy:
>
>$ grep . /proc/sys/net/ipv4/tcp_keepalive_*
>/proc/sys/net/ipv4/tcp_keepalive_intvl:75
>/proc/sys/net/ipv4/tcp_keepalive_probes:9
>/proc/sys/net/ipv4/tcp_keepalive_time:7200
>
>7200s (2h) initially, then every 75 seconds. (OpenBSD default times are
>long too; 14400 "slowhz" intervals = also 2h).

Ah. A long time indeed. Yes, winding these down will help - the above 
times are in the same magnitude as the time required to hit the 
connection limits.

>> Plan B is to build the latest courier-imap from source if I find the
>> time, but there may be no build option for this. I guess a single
>> setsockopt() call in the source would be enough, _if_ that can be done
>> on the accept end, which I haven't checked.
>
>https://tldp.org/HOWTO/TCP-Keepalive-HOWTO/addsupport.html but I don't think
>you'll need it.

Ta.

>So you probably just need to lower tcp_keepalive_time, and perhaps adjust
>tcp_keepalive_intvl. Note there is a tradeoff especially with mobile
>clients; they will need to wake and transmit more often, so faster
>keepalives will result in more battery/data use.

I can wind it down to a handful of minutes without any serious impact 
I'd expect.

>> Plan B0 might be to disable IMAP IDLE support. Hmm.
>
>Depends on timings whether that will help; think it's a last ditch effort
>though, I think it will make things noticably worse for clients.

Courier lets me change the advertised capabilities (it is not clear if 
that affects the actual capabilities). Not joy; possibly some clinets 
will try IDLE even if it isn't advertised and just cope if not 
supported, so maybe some clients are using IDLE successfully anyway.

At any rate, dropping IDLE from the advertised list didn't help, and my 
hourly "restart imapd" cron is live again :-(

I'll look at the keepalive settings on the server, many thanks!

Cheers,
Cameron Simpson 



Re: pf, relayd, TCP keep alive and NAT, oh my!

2021-06-01 Thread Cameron Simpson
On 01Jun2021 11:04, Claudio Jeker  wrote:
>Make sure you use 'block return' at least for the imap connections. 

I already do:

set block-policy return
[... and the first rule ...]
# reject everything except as detailed below
block return log

>This
>way when the state is dropped the firewall will issue a RST packet to the
>server which will close the connection.


Alas, no. I believe that the _modem_ is dropping its NAT state (or some 
upstream stateful switch is getting likewise bored) and that the 
connection is idle.  The firewall's modem's probably sending an RST to 
the client if it tries to use the connection after the modem forgets it, 
or something, causing the client to make a new connection to recover.

The state table on the firewall itself seems fine (about 30 connections, 
in keeping with the staff and devices in the office).

The problem is server side (cloud mail server). The connection goes 
idle, the office modem forgets the NAT, the server never sees _any_ 
indication that the TCP is no longer valid because it's idle.

>On OpenBSD there is the 'net.inet.tcp.always_keepalive' sysctl to 
>enable keepalive by default. So that is something you can enable on the IMAP
>server to force keep-alive on there. Other systems have similar knobs.

The IMAP server is Linux, so I'll look at that. Thanks!

Also, setting this on the firewall and interposing relayd would also do 
the same trick. SO that will be my fallback plan.

Thanks,
Cameron Simpson 



Re: pf, relayd, TCP keep alive and NAT, oh my!

2021-06-01 Thread Cameron Simpson
On 01Jun2021 08:53, Dirk Coetzee  wrote:
>As a first guess, I would consider changing / implementing "set 
>optimization". This made massive difference on our customers satellite 
>internet connection.

The customer has a terrestrial ISP connection.

I've got satellite at home, and do indeed use this setting.

I'm not sure it will help my client though.

Cheers,
Cameron Simpson 



Re: pf, relayd, TCP keep alive and NAT, oh my!

2021-06-01 Thread Stuart Henderson
On 2021-06-01, Cameron Simpson  wrote:
> If I had TCP keep alive turned on, both ends might tidy themselves up.  
> I can't enable that on the clients (various mail readers) or, 
> apparently, on the server configuration. I can't do it in PF because PF 
> just copies packets. I can't seem to do it in relayd either, though that 
> seems the obvious way to intercept the connection for this purpose.

It looks like courier-imap does enable SO_KEEPALIVE if available.
By default, keepalive timers are long; on a random Linux I had handy:

$ grep . /proc/sys/net/ipv4/tcp_keepalive_*
/proc/sys/net/ipv4/tcp_keepalive_intvl:75
/proc/sys/net/ipv4/tcp_keepalive_probes:9
/proc/sys/net/ipv4/tcp_keepalive_time:7200

7200s (2h) initially, then every 75 seconds. (OpenBSD default times are
long too; 14400 "slowhz" intervals = also 2h). 

> Plan B is to build the latest courier-imap from source if I find the 
> time, but there may be no build option for this. I guess a single 
> setsockopt() call in the source would be enough, _if_ that can be done 
> on the accept end, which I haven't checked.

https://tldp.org/HOWTO/TCP-Keepalive-HOWTO/addsupport.html but I don't think
you'll need it.

So you probably just need to lower tcp_keepalive_time, and perhaps adjust
tcp_keepalive_intvl. Note there is a tradeoff especially with mobile
clients; they will need to wake and transmit more often, so faster
keepalives will result in more battery/data use. 

> Plan B0 might be to disable IMAP IDLE support. Hmm.

Depends on timings whether that will help; think it's a last ditch effort
though, I think it will make things noticably worse for clients.




Re: pf, relayd, TCP keep alive and NAT, oh my!

2021-06-01 Thread Claudio Jeker
On Tue, Jun 01, 2021 at 10:25:38AM +1000, Cameron Simpson wrote:
> Can I enforce or implement TCP keep alives on a TCP stream via my 
> firewall?
> 
> Background:
> 
> I've got a client with an OpenBSD firewall and a Telstra NBN modem as 
> their modem.
> 
> Their IMAP server is upstream in the cloud (Unbuntu, courier imap). I 
> have this odd problem which I am beginning to suspect is the NBN modem 
> getting bored and dropping its NAT entries. Let me explain...
> 
> At the firewall end I see about 30 ESTABLISHED connections to the IMAP 
> server. At the IMAP server I see over 500, which is about where the IMAP 
> service stops accepting new connections, leading to errors from the 
> client mail readers.
> 
> My current theory is that the IMAP client connections issue the IMAP 
> IDLE command and go passive, waiting for email notifications from the 
> server.  So we have an idle TCP connection across the firewall and 
> across the NBN modem (which NATs).
> 
> My conjecture is that at some point the modem discards idle connection 
> states. (This could just as well happen at any other intermediate 
> stateful router too.) After that event, the client end does something 
> which tries to use the connection, gets an RST from the modem, clean 
> tidyup happens on the client and in the firewall.
> 
> At the server end, none of this is seen and the imapd just sits around 
> idle, never releasing the connection and never stopping the matching 
> daemon process. This gradually rises to hit the server's configured 
> connection limit and it stops accepting new things.
> 
> If I had TCP keep alive turned on, both ends might tidy themselves up.  
> I can't enable that on the clients (various mail readers) or, 
> apparently, on the server configuration. I can't do it in PF because PF 
> just copies packets. I can't seem to do it in relayd either, though that 
> seems the obvious way to intercept the connection for this purpose.
> 
> Any suggestions?

Make sure you use 'block return' at least for the imap connections. This
way when the state is dropped the firewall will issue a RST packet to the
server which will close the connection.

On OpenBSD there is the 'net.inet.tcp.always_keepalive' sysctl to enable
keepalive by default. So that is something you can enable on the IMAP
server to force keep-alive on there. Other systems have similar knobs.

-- 
:wq Claudio



Re: pf, relayd, TCP keep alive and NAT, oh my!

2021-06-01 Thread Dirk Coetzee
Hi Cameron,

As a first guess, I would consider changing / implementing "set optimization". 
This made massive difference on our customers satellite internet connection. 

man pf.conf



set optimization environment
 Optimize state timeouts for one of the following network
 environments:

 aggressive
 Aggressively expire connections.  This can greatly reduce
 the memory usage of the firewall at the cost of dropping
 idle connections early.
 conservative
 Extremely conservative settings.  Avoid dropping
 legitimate connections at the expense of greater memory
 utilization (possibly much greater on a busy network) and
 slightly increased processor utilization.
 high-latency
 A high-latency environment (such as a satellite
 connection).
 normal  A normal network environment.  Suitable for almost all
 networks.
 satellite
 Alias for high-latency.

 The default value is normal.

-Original Message-
From: owner-m...@openbsd.org  On Behalf Of Cameron 
Simpson
Sent: Tuesday, 1 June 2021 8:26 AM
To: misc@openbsd.org
Subject: pf, relayd, TCP keep alive and NAT, oh my!

Can I enforce or implement TCP keep alives on a TCP stream via my firewall?

Background:

I've got a client with an OpenBSD firewall and a Telstra NBN modem as their 
modem.

Their IMAP server is upstream in the cloud (Unbuntu, courier imap). I have this 
odd problem which I am beginning to suspect is the NBN modem getting bored and 
dropping its NAT entries. Let me explain...

At the firewall end I see about 30 ESTABLISHED connections to the IMAP server. 
At the IMAP server I see over 500, which is about where the IMAP service stops 
accepting new connections, leading to errors from the client mail readers.

My current theory is that the IMAP client connections issue the IMAP IDLE 
command and go passive, waiting for email notifications from the server.  So we 
have an idle TCP connection across the firewall and across the NBN modem (which 
NATs).

My conjecture is that at some point the modem discards idle connection states. 
(This could just as well happen at any other intermediate stateful router too.) 
After that event, the client end does something which tries to use the 
connection, gets an RST from the modem, clean tidyup happens on the client and 
in the firewall.

At the server end, none of this is seen and the imapd just sits around idle, 
never releasing the connection and never stopping the matching daemon process. 
This gradually rises to hit the server's configured connection limit and it 
stops accepting new things.

If I had TCP keep alive turned on, both ends might tidy themselves up.  
I can't enable that on the clients (various mail readers) or, apparently, on 
the server configuration. I can't do it in PF because PF just copies packets. I 
can't seem to do it in relayd either, though that seems the obvious way to 
intercept the connection for this purpose.

Any suggestions?

I haven't fully validated my conjecture yet, BTW. It just fits the symptoms I 
see.

Plan B is to build the latest courier-imap from source if I find the time, but 
there may be no build option for this. I guess a single
setsockopt() call in the source would be enough, _if_ that can be done on the 
accept end, which I haven't checked.

Plan B0 might be to disable IMAP IDLE support. Hmm.

Cheers,
Cameron Simpson 



pf, relayd, TCP keep alive and NAT, oh my!

2021-05-31 Thread Cameron Simpson
Can I enforce or implement TCP keep alives on a TCP stream via my 
firewall?

Background:

I've got a client with an OpenBSD firewall and a Telstra NBN modem as 
their modem.

Their IMAP server is upstream in the cloud (Unbuntu, courier imap). I 
have this odd problem which I am beginning to suspect is the NBN modem 
getting bored and dropping its NAT entries. Let me explain...

At the firewall end I see about 30 ESTABLISHED connections to the IMAP 
server. At the IMAP server I see over 500, which is about where the IMAP 
service stops accepting new connections, leading to errors from the 
client mail readers.

My current theory is that the IMAP client connections issue the IMAP 
IDLE command and go passive, waiting for email notifications from the 
server.  So we have an idle TCP connection across the firewall and 
across the NBN modem (which NATs).

My conjecture is that at some point the modem discards idle connection 
states. (This could just as well happen at any other intermediate 
stateful router too.) After that event, the client end does something 
which tries to use the connection, gets an RST from the modem, clean 
tidyup happens on the client and in the firewall.

At the server end, none of this is seen and the imapd just sits around 
idle, never releasing the connection and never stopping the matching 
daemon process. This gradually rises to hit the server's configured 
connection limit and it stops accepting new things.

If I had TCP keep alive turned on, both ends might tidy themselves up.  
I can't enable that on the clients (various mail readers) or, 
apparently, on the server configuration. I can't do it in PF because PF 
just copies packets. I can't seem to do it in relayd either, though that 
seems the obvious way to intercept the connection for this purpose.

Any suggestions?

I haven't fully validated my conjecture yet, BTW. It just fits the 
symptoms I see.

Plan B is to build the latest courier-imap from source if I find the 
time, but there may be no build option for this. I guess a single 
setsockopt() call in the source would be enough, _if_ that can be done 
on the accept end, which I haven't checked.

Plan B0 might be to disable IMAP IDLE support. Hmm.

Cheers,
Cameron Simpson