Like Christian said, it's the client-to-server connection ID length that
matters here. Almost all servers use non-zero-length connection IDs, so if
you fix your servers this should work fine.

On Sun, Aug 4, 2024 at 8:07 PM Cameron Steel <ietfquic=
[email protected]> wrote:

> Hi Christian,
> You are of course correct, I had been conflating terminology in my mind,
> apologies.
>
> The situation I was experiencing the issue in is as you describe: the NAT
> mapping times out and the first packet after the timeout is an HTTP GET
> which receives a new NAT mapping.
>
> You are also correct that Chrome does use 0-length id's for server >
> client, and the servers I've been using in my testing do use
> non-zero-length id's for the other direction.
>
> Given the error nginx logs when experiencing the issue ("quic no available
> client ids for new path while handling decrypted packet"), I had put the
> issue down to a suboptimal interpretation of RFC 9000 section 9.1 ("an
> endpoint MUST NOT reuse a connection ID when sending to more than one
> destination address").
>
> I have also seen this behaviour when the server is Caddy, and with a site
> behind Cloudflare, so the interpretation seems to be somewhat widespread on
> the server side.
>
> I've attached a pcap from the internal side of a NAT and Chrome net-export
> of the issue, let me know if more details would be helpful.
>
> Cameron Steel.
>
> On Mon, Aug 5, 2024, at 09:34, Christian Huitema wrote:
>
>
>
> On 8/4/2024 3:37 PM, Cameron Steel wrote:
> > Hi QUIC experts,
> >
> > I've just completed a writeup of an issue I was experiencing with
> websites using QUIC through my ISP's CGNAT. In short, the issue was due to
> the CGNAT having a rather short UDP timeout of 20 seconds, in combination
> with the fact that Google Chrome seems to use zero-length connection IDs,
> which prevents connection migration.
> >
> > In the process of checking the behaviour I was observing against the
> QUIC RFCs, I came across a few oddities that I'd like to bring up:
> >
> > Both RFC 9000 and 9308 fairly plainly state that connections using
> zero-length IDs will not be resilient to NAT rebinding, however RFC 9000
> section 5.1.1 does have this passage which vaguely implies that multiple
> network paths are possible with zero-length IDs:
> >
> >> An endpoint that selects a zero-length connection ID during the
> handshake cannot issue a new connection ID. A zero-length Destination
> Connection ID field is used in all packets sent toward such an endpoint
> over any network path.
> >
> > As this is only implied the once that I can find, I'm assuming it's just
> ambiguous wording and that the intended behaviour is what I observed, that
> connection migration is not permitted when using a zero-length connection
> ID.
>
> It is a bit more complicated than that. First, let's get the naming
> right. "Connection migration" describes a voluntary action in which the
> client tries to reach the server using a different 5-tuple and a
> different connection ID. What you are encountering here is "NAT
> Rebinding", i.e., the effect of an uncoordinated decision by the NAT to
> forget the binding between the 5-tuple used by the client and the
> "external" 5-tuple.
>
> After the NAT rebinding, all packets sent by the server to the old
> 5-tuple will be lost: there is no mapping for that and packet are
> dropped by the NAT, or maybe the mapping has been reused for a new
> client and packet are dropped by that client because they cannot be
> decrypted.
>
> The solution is for the server to somehow learn the new value of the
> client's 5-tuple. It can only do that by receiving packets from the
> client. All the packets sent after the NAT rebinding and before a new
> packet is received by the client will be lost, whether connection IDs
> are used or not. For example, if the application pattern is to send a
> request, then wait some long time before the server replies, the long
> wait will increase the risk of NAT rebinding, and the eventual response
> of the server will be lost.
>
> If the traffic is series of HTTP GET triggering immediate responses,
> there is hope. The server could learn the new 5-tuple when receiving the
> GET command. But it needs to associate the arriving packet with the old
> connection, and it can only do that if the old packet carries a
> connection ID.
>
> >
> > Given that, I'd be very curious to hear any insight into why Chrome has
> chosen not to use connection IDs.
>
> NAT Traversal will work if connection IDs are used in the client to
> server direction. I was under the impression that Chrome uses 0-length
> CID in the server to client direction, but Google servers use 8 bytes
> CID in the client to server direction. If that's the case, NAT rebinding
> should work.
>
> > If anyone is interested in reading my full writeup, you can find it
> here:
> https://blog.tugzrida.xyz/2024/08/04/too-quic-for-chrome-troubleshooting-udp-nat-rebinding/
>
> Can you attach some kind of packet log so we can see what is really
> happening? QLOG would be great.
>
> -- Christian Huitema
>
>
>

Reply via email to