Re: QUIC idle timeouts and path idle timeouts

Christian Huitema Sun, 01 Sep 2024 18:50:23 -0700

As Ian writes, the problem is on the server, not at the client. If theclient wakes up with something to send after a long silence, it candecide to just resume the connection. But the server can't. If theclient connection is dropped, the server is stuck.

The current solution is to use keep-alives. But this is painful for bothclients and servers. For clients, that means waking up each time any ofthe 17 messenger applications on the phone sends a keep alive, wakes upthe radio, and drains the battery. For servers, that means gettingmessages from every client every 15 seconds, even if they only havemessages to receive every 15 minutes, which increases CPU load and powerconsumption. Not great.

There are a few alternatives. The client could use protocols like PCP orUPNP-IGD to open a port in the local NAT. That's fine if the localrouter supports it. It can work very well if the network supports IPv6and the client just needs to set a pinhole in the local firewall. But itwill not work if the local ISP is using some combination of IPv4 andCarrier Grade NAT. Unless the CGNAT supports PCP and the client has aplausible way to discover the address of the CGNAT. Maybe the IETF couldwork on that, but I am not holding my breadth.

Another way to reduce the impact on the client is to make sure that allapplications doing keep-alive do it exactly at the same time. If theydo, then the radio wakes up only once, sends a train of messages, maybewaits for the ACKs. Not perfect, but at least if preserves the battery abit. Of course, that solution does not help the server at all.

Yet another solution that had been tried is to have a system levelprocess do the keep alive on behalf of all applications in the box. Iwon't go in the details, but we could maybe do a variant of that withMasque. Have the client use Masque for all outgoing connections,connecting to a Masque server outside the CGNAT. Then the client onlyneeds the Masque session alive -- 1 keep alive instead of N. Theend-to-end QUIC session could use IPV6, and long idle timers. Maybesomething we could actually ship!


-- Christian Huitema

On 9/1/2024 1:00 PM, Ian Swett wrote:

This is a real problem, but I'm unsure what the best way to approach it is.

I think you're suggesting that a large server operator could try to infer
NAT timeouts for clients of different IP prefixes and communicate that to
the client as a suggested keepalive/ping timeout?  I'm curious about how to
infer NAT timeouts?  Our servers detect a dead connection, but I'm not sure
how to tell what the reason is and more specifically the NAT timeout?
Sometimes devices just drop off the network.

As you may know, Chrome will send a PING as a keepalive after 15 seconds of
idle only if there are outstanding requests (ie: hanging GETs).  The number
was chosen somewhat arbitrarily and is certainly not optimal, but it did
fix some use cases where hanging GETs were otherwise failing.

Thanks, Ian

On Wed, Jul 24, 2024 at 5:55 PM Martin Thomson <[email protected]> wrote:

The intent of the idle timeout was to have that reflect *endpoint*
policy.  That is, it is independent of path.

It's certainly very interesting to consider what you might do about paths
and keep-alives (or not).  But that's a separable problem.  Having a way
for endpoints to share their information about timeouts might work, but I
worry that that will lead to wasteful keepalive traffic.  How would we
ensure that keepalives are not wasteful?

Is there a better way, such as a quick connection continuation?

On Wed, Jul 24, 2024, at 11:24, Lucas Pardue wrote:

Hi folks,

Wearing no hats.

There's been some chatter this week during IETF about selecting QUIC
idle timeouts in the face of Internet paths that might have shorter
timeouts, such as NAT.

This isn't necessarily a new topic, there's past work that's been done
on measurements and attempts to capture that as in IETF documents. For
example, Lars highlighted a study of home gateway characteristics from
2010 [1]. Then there's RFC 4787 [2], and our very own RFC 9308 [3]

There's likely other work that's happened in the meantime that has
provided further insights.

All the discussion got me wondering whether there might be room for a
QUIC extension that could hint at the path timeout to the peer. For
instance, as a server operator, I might have a wide view of network
characteristics that a client doesn't. Sending keepalive pings from the
server is possible but it might not be in the client's interest to
force it to ACK them, especially if there are power saving
considerations that would be hard for the server to know. Instead, a
hint to the peer would allow it to decide what to do. That could allow
us to maintain a large QUIC idle timeouts as befitting of the
application use case, but adapt to the needs of the path for improved
connection reliability.

Such an extension could hint for each and every path, and therefore a
benefit to multipath, which has some addition path idle timeout
considerations [4].

Thoughts?

[1] - https://dl.acm.org/doi/10.1145/1879141.1879174
[2] -  https://www.rfc-editor.org/rfc/rfc4787.html
[3] - https://www.rfc-editor.org/rfc/rfc9308.html#section-3.2
[4] -

https://www.ietf.org/archive/id/draft-ietf-quic-multipath-10.html#name-idle-timeout

Re: QUIC idle timeouts and path idle timeouts

Reply via email to