On 8/4/2024 3:37 PM, Cameron Steel wrote:
Hi QUIC experts,
I've just completed a writeup of an issue I was experiencing with websites
using QUIC through my ISP's CGNAT. In short, the issue was due to the CGNAT
having a rather short UDP timeout of 20 seconds, in combination with the fact
that Google Chrome seems to use zero-length connection IDs, which prevents
connection migration.
In the process of checking the behaviour I was observing against the QUIC RFCs,
I came across a few oddities that I'd like to bring up:
Both RFC 9000 and 9308 fairly plainly state that connections using zero-length
IDs will not be resilient to NAT rebinding, however RFC 9000 section 5.1.1 does
have this passage which vaguely implies that multiple network paths are
possible with zero-length IDs:
An endpoint that selects a zero-length connection ID during the handshake
cannot issue a new connection ID. A zero-length Destination Connection ID field
is used in all packets sent toward such an endpoint over any network path.
As this is only implied the once that I can find, I'm assuming it's just
ambiguous wording and that the intended behaviour is what I observed, that
connection migration is not permitted when using a zero-length connection ID.
It is a bit more complicated than that. First, let's get the naming
right. "Connection migration" describes a voluntary action in which the
client tries to reach the server using a different 5-tuple and a
different connection ID. What you are encountering here is "NAT
Rebinding", i.e., the effect of an uncoordinated decision by the NAT to
forget the binding between the 5-tuple used by the client and the
"external" 5-tuple.
After the NAT rebinding, all packets sent by the server to the old
5-tuple will be lost: there is no mapping for that and packet are
dropped by the NAT, or maybe the mapping has been reused for a new
client and packet are dropped by that client because they cannot be
decrypted.
The solution is for the server to somehow learn the new value of the
client's 5-tuple. It can only do that by receiving packets from the
client. All the packets sent after the NAT rebinding and before a new
packet is received by the client will be lost, whether connection IDs
are used or not. For example, if the application pattern is to send a
request, then wait some long time before the server replies, the long
wait will increase the risk of NAT rebinding, and the eventual response
of the server will be lost.
If the traffic is series of HTTP GET triggering immediate responses,
there is hope. The server could learn the new 5-tuple when receiving the
GET command. But it needs to associate the arriving packet with the old
connection, and it can only do that if the old packet carries a
connection ID.
Given that, I'd be very curious to hear any insight into why Chrome has chosen
not to use connection IDs.
NAT Traversal will work if connection IDs are used in the client to
server direction. I was under the impression that Chrome uses 0-length
CID in the server to client direction, but Google servers use 8 bytes
CID in the client to server direction. If that's the case, NAT rebinding
should work.
If anyone is interested in reading my full writeup, you can find it here:
https://blog.tugzrida.xyz/2024/08/04/too-quic-for-chrome-troubleshooting-udp-nat-rebinding/
Can you attach some kind of packet log so we can see what is really
happening? QLOG would be great.
-- Christian Huitema