On 8/4/2024 3:37 PM, Cameron Steel wrote:
Hi QUIC experts,

I've just completed a writeup of an issue I was experiencing with websites 
using QUIC through my ISP's CGNAT. In short, the issue was due to the CGNAT 
having a rather short UDP timeout of 20 seconds, in combination with the fact 
that Google Chrome seems to use zero-length connection IDs, which prevents 
connection migration.

In the process of checking the behaviour I was observing against the QUIC RFCs, 
I came across a few oddities that I'd like to bring up:

Both RFC 9000 and 9308 fairly plainly state that connections using zero-length 
IDs will not be resilient to NAT rebinding, however RFC 9000 section 5.1.1 does 
have this passage which vaguely implies that multiple network paths are 
possible with zero-length IDs:

An endpoint that selects a zero-length connection ID during the handshake 
cannot issue a new connection ID. A zero-length Destination Connection ID field 
is used in all packets sent toward such an endpoint over any network path.

As this is only implied the once that I can find, I'm assuming it's just 
ambiguous wording and that the intended behaviour is what I observed, that 
connection migration is not permitted when using a zero-length connection ID.

It is a bit more complicated than that. First, let's get the naming right. "Connection migration" describes a voluntary action in which the client tries to reach the server using a different 5-tuple and a different connection ID. What you are encountering here is "NAT Rebinding", i.e., the effect of an uncoordinated decision by the NAT to forget the binding between the 5-tuple used by the client and the "external" 5-tuple.

After the NAT rebinding, all packets sent by the server to the old 5-tuple will be lost: there is no mapping for that and packet are dropped by the NAT, or maybe the mapping has been reused for a new client and packet are dropped by that client because they cannot be decrypted.

The solution is for the server to somehow learn the new value of the client's 5-tuple. It can only do that by receiving packets from the client. All the packets sent after the NAT rebinding and before a new packet is received by the client will be lost, whether connection IDs are used or not. For example, if the application pattern is to send a request, then wait some long time before the server replies, the long wait will increase the risk of NAT rebinding, and the eventual response of the server will be lost.

If the traffic is series of HTTP GET triggering immediate responses, there is hope. The server could learn the new 5-tuple when receiving the GET command. But it needs to associate the arriving packet with the old connection, and it can only do that if the old packet carries a connection ID.


Given that, I'd be very curious to hear any insight into why Chrome has chosen 
not to use connection IDs.

NAT Traversal will work if connection IDs are used in the client to server direction. I was under the impression that Chrome uses 0-length CID in the server to client direction, but Google servers use 8 bytes CID in the client to server direction. If that's the case, NAT rebinding should work.

If anyone is interested in reading my full writeup, you can find it here: 
https://blog.tugzrida.xyz/2024/08/04/too-quic-for-chrome-troubleshooting-udp-nat-rebinding/

Can you attach some kind of packet log so we can see what is really happening? QLOG would be great.

-- Christian Huitema

Reply via email to