On 14. 09. 22 16:56, Kazunori Fujiwara wrote:
From: Petr Špaček <pspa...@isc.org>
On 15. 08. 22 12:18, Kazunori Fujiwara wrote:

I assume section 3.2 means the EDNS bufsize in the request when it
says
"their payload size", but I am not sure. The text could be clearer on
that.

    *  UDP requestors MAY probe to discover the real MTU value per
       destination.
How?
For example, recent BIND 9 starts small EDNS requestors maxiumum
DNS/UDP payload size (512), and increases gradually.

Correction:
Recent BIND starts with EDNS buffer size 1232 bytes, and it does not
rise the value to "probe" the destination address by to "probe".

FTR I'm testing on 9.19.5-dev commit b13d973, but I believe it is like
that for a long time already.

THanks very much.
commit bb990030d344dafe40a62fe5ed2741de28b8ca66 removed the probing heuristics.

BIND 9.17.6 and later

5516.   [func]      The default EDNS buffer size has been changed from 4096
                     to 1232 bytes, the EDNS buffer size probing has been
                     removed, and named now sets the DF (Don't Fragment) flag
                     on outgoing UDP packets. [GL #2183]

I think the draft as it is currently does not have enough information
for implementers to be followed in safe way.


I'm against publication as it is.

There should be running code, experiments, and measurements to back up
data in this draft. I can't see them at the moment.

Then, do you agree the following requirements ? (as DNS software developpers)

1. SHOULD set DF bit on outgoing UDP packets on IPv4,
    and SHOULD not use FRAGMENT header on IPv6.

Theoretically yes, but it might not be achievable depending on OS API. We tried many iterations in BIND, and discovered that APIs (at least in Linux) are horrible and there are traps everywhere.

Here we _also_ need to protect against "old" PMTU discovery attacks with spoofed ICMP messages which cause fragmentation on the source host (as opposed to fragmentation along the path), which are potentially more dangerous because an off-path attacker can mount them more easily.

For reasons I cannot remember now BIND currently uses socket option IP_PMTUDISC_OMIT defined in tools/include/uapi/linux/in.h as

132 #define IP_PMTUDISC_PROBE               3       /* Ignore dst pmtu      */
133 /* Always use interface mtu (ignores dst pmtu) but don't set DF flag.
134  * Also incoming ICMP frag_needed notifications will be ignored on
135  * this socket to prevent accepting spoofed ones.
136  */
137 #define IP_PMTUDISC_INTERFACE           4
138 /* weaker version of IP_PMTUDISC_INTERFACE, which allows packets to get
139  * fragmented if they exeed the interface mtu
140  */
141 #define IP_PMTUDISC_OMIT                5

I _think_, and I might be easily wrong, that this was done to eliminate impact of spoofed "path MTU exceeded" ICMP messages.

Personally I got lost several times when attempting to understand history of this, so I hesitate to formulate an universal advice or even say how we ended up here.


2. limit DNS payload size 1232 without path MTU discovery.
    (After DNSFlagDay2020, many implementations use 1232)

As Paul wrote down thread, it is random number - and so far it works for us.


3. If path MTU discovery works, UDP responders can send larger (>1232)
    responses fit in the path MTU.

Possibly, but I think it is kind of moot advice without knowing how it can be done. Right now there is no such thing as RFC 8899-equivalent for DNS, so we don't even know if it would work for DNS as we know it. Quick glance at RFC 8899 section 3 is not encouraging in that regard, e.g. point 3, as a single example, shows that 8899 does not match the current state of DNS (because auth does not get answer from a resolver if the large response got through or not).


4. TCP implementations SHOULD set DF bit / not use FRAGMENT header.
    (many TCP implementations already set DF bit)

I doubt we have control over this from the application. Is there even API to control that on TCP sockets?


# If there is a link whose MTU is smaller than 1260 (on IPv4),
# the link may be a blackhole.

Definitely. If the default is "too wrong" the whole thing falls apart.


I'm sorry for not being more informative. The only I know for certain is that we had multiple iterations in BIND, are not happy with any of them, and and it is order of magnitude more complex than we thought.

--
Petr Špaček
Internet Systems Consortium

_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Reply via email to