Like Christian said, it's the client-to-server connection ID length that matters here. Almost all servers use non-zero-length connection IDs, so if you fix your servers this should work fine.
On Sun, Aug 4, 2024 at 8:07 PM Cameron Steel <ietfquic= [email protected]> wrote: > Hi Christian, > You are of course correct, I had been conflating terminology in my mind, > apologies. > > The situation I was experiencing the issue in is as you describe: the NAT > mapping times out and the first packet after the timeout is an HTTP GET > which receives a new NAT mapping. > > You are also correct that Chrome does use 0-length id's for server > > client, and the servers I've been using in my testing do use > non-zero-length id's for the other direction. > > Given the error nginx logs when experiencing the issue ("quic no available > client ids for new path while handling decrypted packet"), I had put the > issue down to a suboptimal interpretation of RFC 9000 section 9.1 ("an > endpoint MUST NOT reuse a connection ID when sending to more than one > destination address"). > > I have also seen this behaviour when the server is Caddy, and with a site > behind Cloudflare, so the interpretation seems to be somewhat widespread on > the server side. > > I've attached a pcap from the internal side of a NAT and Chrome net-export > of the issue, let me know if more details would be helpful. > > Cameron Steel. > > On Mon, Aug 5, 2024, at 09:34, Christian Huitema wrote: > > > > On 8/4/2024 3:37 PM, Cameron Steel wrote: > > Hi QUIC experts, > > > > I've just completed a writeup of an issue I was experiencing with > websites using QUIC through my ISP's CGNAT. In short, the issue was due to > the CGNAT having a rather short UDP timeout of 20 seconds, in combination > with the fact that Google Chrome seems to use zero-length connection IDs, > which prevents connection migration. > > > > In the process of checking the behaviour I was observing against the > QUIC RFCs, I came across a few oddities that I'd like to bring up: > > > > Both RFC 9000 and 9308 fairly plainly state that connections using > zero-length IDs will not be resilient to NAT rebinding, however RFC 9000 > section 5.1.1 does have this passage which vaguely implies that multiple > network paths are possible with zero-length IDs: > > > >> An endpoint that selects a zero-length connection ID during the > handshake cannot issue a new connection ID. A zero-length Destination > Connection ID field is used in all packets sent toward such an endpoint > over any network path. > > > > As this is only implied the once that I can find, I'm assuming it's just > ambiguous wording and that the intended behaviour is what I observed, that > connection migration is not permitted when using a zero-length connection > ID. > > It is a bit more complicated than that. First, let's get the naming > right. "Connection migration" describes a voluntary action in which the > client tries to reach the server using a different 5-tuple and a > different connection ID. What you are encountering here is "NAT > Rebinding", i.e., the effect of an uncoordinated decision by the NAT to > forget the binding between the 5-tuple used by the client and the > "external" 5-tuple. > > After the NAT rebinding, all packets sent by the server to the old > 5-tuple will be lost: there is no mapping for that and packet are > dropped by the NAT, or maybe the mapping has been reused for a new > client and packet are dropped by that client because they cannot be > decrypted. > > The solution is for the server to somehow learn the new value of the > client's 5-tuple. It can only do that by receiving packets from the > client. All the packets sent after the NAT rebinding and before a new > packet is received by the client will be lost, whether connection IDs > are used or not. For example, if the application pattern is to send a > request, then wait some long time before the server replies, the long > wait will increase the risk of NAT rebinding, and the eventual response > of the server will be lost. > > If the traffic is series of HTTP GET triggering immediate responses, > there is hope. The server could learn the new 5-tuple when receiving the > GET command. But it needs to associate the arriving packet with the old > connection, and it can only do that if the old packet carries a > connection ID. > > > > > Given that, I'd be very curious to hear any insight into why Chrome has > chosen not to use connection IDs. > > NAT Traversal will work if connection IDs are used in the client to > server direction. I was under the impression that Chrome uses 0-length > CID in the server to client direction, but Google servers use 8 bytes > CID in the client to server direction. If that's the case, NAT rebinding > should work. > > > If anyone is interested in reading my full writeup, you can find it > here: > https://blog.tugzrida.xyz/2024/08/04/too-quic-for-chrome-troubleshooting-udp-nat-rebinding/ > > Can you attach some kind of packet log so we can see what is really > happening? QLOG would be great. > > -- Christian Huitema > > >
