Re: [DNSOP] WGLC for draft-ietf-dnsop-avoid-fragmentation

2022-09-27 Thread Kazunori Fujiwara
Abley-san, thanks very much for your comments.

> From: Joe Abley 
> Fujiwara-san,
> 
> On Sep 22, 2022, at 11:05, Kazunori Fujiwara  wrote:
> 
>> Thanks. "Path MTU Disovery" API and setting IP_DF API are complex and
>> they often don't work as expected.
>> 
>> However, it may be easy to avoid using the Fragment Header on IPv6.
>> (limit IPv6 response packet smaller than interaface MTU.)
>> (Or, is it not easy ?)
> 
> I think it's easier if we just recommend a maximum message size for UDP 
> transport that is likely to avoid truncation in the majority of cases. 
> Perhaps that is the simplest formulation of your ultimate goal? Just update 
> the base specification and say it clearly. 
> 
> This will make UDP transport only usable for a subset of DNS messages that 
> are ever sent on the Internet. Other transports can remain as-is and do not 
> need this limitation. People who ignore the recommendation are on their own.  

The minimum MTU on IPv6 is 1280.
Then, for example,
we can specify a maximum message size for UDP transport on IPv6 as 1232.

However, the minimum MTU on IPv4 is 68 (Section 3.2 of RFC 791).
We need to arbitarily specify the maximum message size for UDP/IPv4.
For example, 512, 1232, or 1400.

> UDP then becomes a convenient choice for DNS messages that are small and that 
> do not have requirements for confidentiality, bit not a default choice in the 
> protocol sense (despite the fact that it will probably remain the choice for 
> most messages).

> The default transport becomes TCP, since that is the alternative, 
> must-implement transport available in all parts of the system.

We can say that the standard transports for DNS are TCP/DoT/DoQ and
the standard address family is IPv6, however, UDP transport for DNS
and IPv4 will be used in reality, forever.

I believe that the BCP document improves many current use cases.

>> Then, to allow larger than 1232/1400 and smaller than interface MTU
>> response packets, recommendations for UDP requestors are changed as:
>> 
>>  UDP responders can send reponses fit in both
>>  the result of path MTU discovery (if available),
>>  interface MTU and UDP requestor's payload size.
> 
> I think a formulation to avoid magic numbers is probably better but to be 
> honest I don't find magic numbers to be so terrible if they make the advice 
> more clear.
> 
> (I think the current draft's advice is not particularly clear since it 
> contains a lot of
> if then else, but perhaps others think differently.)

>From EDNS spec [RFC6891], response size <= UDP requestor's payload size.

as new recommendations, 
  Set IP_DF bit on IPv4,
  Compose response packet fit in interface MTU -> No Fragment header on IPv6.
  then we don't need the result of path MTU discovery.

> The DNS is used in private networks as well as on the Internet. I think it's 
> ok to say simply that UDP transport is NOT RECOMMENDED for large messages 
> since fragmentation SHOULD be assumed to be unavailable.

I agree. On that case, TC=1 responses will return.

> That leaves wriggle room for implementations who have more knowledge of their 
> network path or who don't care about delivery failures (for example) to do 
> their own thing. I don't think we should spend too much time imagining what 
> those things should be.
> 
 4. TCP implementations SHOULD set DF bit / not use FRAGMENT header.
(many TCP implementations already set DF bit)
>>> 
>>> I doubt we have control over this from the application. Is there even
>>> API to control that on TCP sockets?m
> 
> I don't think we should write as if UDP and TCP are the only transports 
> available. I also think it's a mistake to dive down into rabbit holes 
> relating to any particular transport other than UDP.

I agree. But until draft-ietf-dprive-unilateral-probing will be
published as a RFC, UDP and TCP are only transport to authoritative
servers.

> UDP is the only transport we have in which the DNS protocol needs to care 
> about message size. I think the current draft does a good job in restricting 
> its discussion to UDP.
> 
>> At least, I would like to disable IPv6 fragmentation.
>> and I would like to make "avoid IPv4 fragmetion" our goal.
> 
> I think we should just be bold and declare a RECOMMENDED maximum message size 
> when using UDP transport and make TCP the default choice of transport.

Minimal MTU size for IPv4 is 68.
Then, 512 octet DNS response may be fragmented (without DF bit).

> UDP becomes an acceptable alternative to TCP alongside other transports that 
> might be available, if suitable for a particular message, according to local 
> policy. 
> 
> The maximum that is specified could be a magic number (like the original 
> specification's 512 bytes) or it could be a formulation based on particular 
> address families' minimum capabilities. Clearly some people prefer the 
> latter. 
> 
> The question of how to construct a minimum sized response in cases where a 
> DNS responder really wants to avoid a trip through 

Re: [DNSOP] WGLC for draft-ietf-dnsop-avoid-fragmentation

2022-09-22 Thread Brian Dickson
On Thu, Sep 22, 2022 at 2:05 AM Kazunori Fujiwara 
wrote:

> > From: Petr Špaček 
> >> Then, do you agree the following requirements ? (as DNS software
> >> developpers)
> >> 1. SHOULD set DF bit on outgoing UDP packets on IPv4,
> >> and SHOULD not use FRAGMENT header on IPv6.
> >
> > Theoretically yes, but it might not be achievable depending on OS
> > API. We tried many iterations in BIND, and discovered that APIs (at
> > least in Linux) are horrible and there are traps everywhere.
>
> Thanks. "Path MTU Disovery" API and setting IP_DF API are complex and
> they often don't work as expected.
>
[...]

> On IPv4, I would like to write it in such a way that "setting the DF
> bit" is an goal, and that OS implementors and DNS server
> software developpers will do their best.
>

We (GoDaddy) have actually had some very recent experience with real-world
situations, and have some results to share.
The TL;DR: of those is: it may not be a good idea to set the DF bit on UDP
if the intention is to improve reliability while avoiding fragmentation,
and interface MTU may need to be set lower.

Here is what we have observed:

   - IF the client and server BOTH have settings (EDNS bufsize and
   server-side UDP max size configuration) that are larger than the real-world
   path MTU end-to-end
   - AND UDP responses have DF=1 set
   - AND UDP responses with TC=1 still exceed the minimum path MTU
   - THEN:
  - Packet-too-big ICMP messages are sent, that are useless/ignored or
  even blocked/dropped by ACLs
  - These over-MTU UDP packets are dropped (because DF=1)
  - The TC=1 response is never received
  - Result:
 - Depending on the DNS software implementation, fallback to
 smaller size UDP or to TCP may not occur
  - Additionally, TCP itself may need to be an in-scope issue for
   recommendations (not sure if it is), specifically because:
  - IF the CLIENT has its configured MSS larger than the smallest MTU
  on the path to the server
 - MSS normally defaults to interface MTU
  - AND the client, server, or path do not enable/support PTMUD (e.g.
  blocking of ICMP packet too big messages, or NAT issues)
  - AND the SERVER sets DF=1
  - AND the first data packet in the response exceeds the actual path
  MTU
  - THEN:
 - TCP will not work (the first packet will never reach the client)

The recommendations might end up being much simpler, as a result.

The only successful approach we have found is to address both the UDP and
TCP issues.
The easiest way to do both of these is to set the server MTU to a lower
value (i.e. whatever value makes sense, based on observed traffic).
And for clients, if the client is aware of a commonly-used network path
with lower MTU, set the client's interface MTU to that value, as well as
configure the EDNS bufsize to that same value (or lower).

It may be the case that DNS software cannot unilaterally achieve these
things, and it may be necessary to check whether those conditions are met.
How the DNS software behaves if the conditions are not met is likely a
question to answer:

   - Is it better to prevent the software from running (give a fatal
   error), or to give a warning (that might be ignored) and run anyway?
   - How can the condition itself be checked (interface MTU vs path MTU vs
   EDNS bufsize)?

Brian
___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] WGLC for draft-ietf-dnsop-avoid-fragmentation

2022-09-22 Thread Joe Abley
Fujiwara-san,

On Sep 22, 2022, at 11:05, Kazunori Fujiwara  wrote:

> Thanks. "Path MTU Disovery" API and setting IP_DF API are complex and
> they often don't work as expected.
> 
> However, it may be easy to avoid using the Fragment Header on IPv6.
> (limit IPv6 response packet smaller than interaface MTU.)
> (Or, is it not easy ?)

I think it's easier if we just recommend a maximum message size for UDP 
transport that is likely to avoid truncation in the majority of cases. Perhaps 
that is the simplest formulation of your ultimate goal? Just update the base 
specification and say it clearly. 

This will make UDP transport only usable for a subset of DNS messages that are 
ever sent on the Internet. Other transports can remain as-is and do not need 
this limitation. People who ignore the recommendation are on their own.  

UDP then becomes a convenient choice for DNS messages that are small and that 
do not have requirements for confidentiality, bit not a default choice in the 
protocol sense (despite the fact that it will probably remain the choice for 
most messages).

The default transport becomes TCP, since that is the alternative, 
must-implement transport available in all parts of the system.

> Then, to allow larger than 1232/1400 and smaller than interface MTU
> response packets, recommendations for UDP requestors are changed as:
> 
>  UDP responders can send reponses fit in both
>  the result of path MTU discovery (if available),
>  interface MTU and UDP requestor's payload size.

I think a formulation to avoid magic numbers is probably better but to be 
honest I don't find magic numbers to be so terrible if they make the advice 
more clear.

(I think the current draft's advice is not particularly clear since it contains 
a lot of
if then else, but perhaps others think differently.)

The DNS is used in private networks as well as on the Internet. I think it's ok 
to say simply that UDP transport is NOT RECOMMENDED for large messages since 
fragmentation SHOULD be assumed to be unavailable.

That leaves wriggle room for implementations who have more knowledge of their 
network path or who don't care about delivery failures (for example) to do 
their own thing. I don't think we should spend too much time imagining what 
those things should be.

>>> 4. TCP implementations SHOULD set DF bit / not use FRAGMENT header.
>>>(many TCP implementations already set DF bit)
>> 
>> I doubt we have control over this from the application. Is there even
>> API to control that on TCP sockets?m

I don't think we should write as if UDP and TCP are the only transports 
available. I also think it's a mistake to dive down into rabbit holes relating 
to any particular transport other than UDP.

UDP is the only transport we have in which the DNS protocol needs to care about 
message size. I think the current draft does a good job in restricting its 
discussion to UDP.

> At least, I would like to disable IPv6 fragmentation.
> and I would like to make "avoid IPv4 fragmetion" our goal.

I think we should just be bold and declare a RECOMMENDED maximum message size 
when using UDP transport and make TCP the default choice of transport.

UDP becomes an acceptable alternative to TCP alongside other transports that 
might be available, if suitable for a particular message, according to local 
policy. 

The maximum that is specified could be a magic number (like the original 
specification's 512 bytes) or it could be a formulation based on particular 
address families' minimum capabilities. Clearly some people prefer the latter. 

The question of how to construct a minimum sized response in cases where a DNS 
responder really wants to avoid a trip through another transport might ideally 
live in a separate document.

I appreciate that it's a bit late in the process to be suggesting such a change 
in approach to what is quite a mature document. Sorry about that. 


Joe

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] WGLC for draft-ietf-dnsop-avoid-fragmentation

2022-09-22 Thread Kazunori Fujiwara
> From: Petr Špaček 
>> Then, do you agree the following requirements ? (as DNS software
>> developpers)
>> 1. SHOULD set DF bit on outgoing UDP packets on IPv4,
>> and SHOULD not use FRAGMENT header on IPv6.
> 
> Theoretically yes, but it might not be achievable depending on OS
> API. We tried many iterations in BIND, and discovered that APIs (at
> least in Linux) are horrible and there are traps everywhere.

Thanks. "Path MTU Disovery" API and setting IP_DF API are complex and
they often don't work as expected.

However, it may be easy to avoid using the Fragment Header on IPv6.
(limit IPv6 response packet smaller than interaface MTU.)
(Or, is it not easy ?)
Then, do you agree "SHOULD not use FRAGMENT header on IPv6" ?

On IPv4, I would like to write it in such a way that "setting the DF
bit" is an goal, and that OS implementors and DNS server
software developpers will do their best.

(For example, UDP responders RECOMMENDED to set DF bit on IPv4.)

However, from RFC 2119 keyword definitions, SHOULD==RECOMMENDED,
and there may exist valid reasons to ignore a particular item.
 (When the OS does not have a function to set DF bit,
  it is a alid reason not to set DF bit.)

| 3. SHOULD   This word, or the adjective "RECOMMENDED", mean that there
| may exist valid reasons in particular circumstances to ignore a
| particular item, but the full implications must be understood and
| carefully weighed before choosing a different course.a


>> 2. limit DNS payload size 1232 without path MTU discovery.
>> (After DNSFlagDay2020, many implementations use 1232)
> 
> As Paul wrote down thread, it is random number - and so far it works
> for us.

OK, Then, 

Then, to allow larger than 1232/1400 and smaller than interface MTU
response packets, recommendations for UDP requestors are changed as:

  UDP responders can send reponses fit in both
  the result of path MTU discovery (if available),
  interface MTU and UDP requestor's payload size.

Recommendations for UDP requestors are changed as

  - remove "Don't Fragment" way.
  - UDP requestors SHOULD limit the requestor's payload size as 1400.
[ UDP requestors MAY limit the requestor's payload size as 1232.
  1232 is the IPv6 minimal MTU size
   minus IPv6 header size minus UDP header size. ]

>> 3. If path MTU discovery works, UDP responders can send larger (>1232)
>> responses fit in the path MTU.
> 
> Possibly, but I think it is kind of moot advice without knowing how it
> can be done. Right now there is no such thing as RFC 8899-equivalent
> for DNS, so we don't even know if it would work for DNS as we know
> it. Quick glance at RFC 8899 section 3 is not encouraging in that
> regard, e.g. point 3, as a single example, shows that 8899 does not
> match the current state of DNS (because auth does not get answer from
> a resolver if the large response got through or not).

Then, I would like to change as

  UDP requestors MAY perform "Path MTU discovery" per destination
  to use requestor's UDP payload size larger than 1400.
  Then, calculate their maximum DNS/UDP payload size as
  the reported path MTU
  minus IPv4/IPv6 header size (20 or 40) minus UDP header size (8).

>> 4. TCP implementations SHOULD set DF bit / not use FRAGMENT header.
>> (many TCP implementations already set DF bit)
> 
> I doubt we have control over this from the application. Is there even
> API to control that on TCP sockets?

I agree. I leave it unwritten.

>> # If there is a link whose MTU is smaller than 1260 (on IPv4),
>> # the link may be a blackhole.
> 
> Definitely. If the default is "too wrong" the whole thing falls apart.
> 
> 
> I'm sorry for not being more informative. The only I know for certain
> is that we had multiple iterations in BIND, are not happy with any of
> them, and and it is order of magnitude more complex than we thought.

At least, I would like to disable IPv6 fragmentation.
and I would like to make "avoid IPv4 fragmetion" our goal.

--
Kazunori Fujiwara, JPRS 
___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] WGLC for draft-ietf-dnsop-avoid-fragmentation

2022-09-15 Thread Petr Špaček

On 14. 09. 22 16:56, Kazunori Fujiwara wrote:

From: Petr Špaček 
On 15. 08. 22 12:18, Kazunori Fujiwara wrote:



I assume section 3.2 means the EDNS bufsize in the request when it
says
"their payload size", but I am not sure. The text could be clearer on
that.


*  UDP requestors MAY probe to discover the real MTU value per
   destination.

How?

For example, recent BIND 9 starts small EDNS requestors maxiumum
DNS/UDP payload size (512), and increases gradually.


Correction:
Recent BIND starts with EDNS buffer size 1232 bytes, and it does not
rise the value to "probe" the destination address by to "probe".

FTR I'm testing on 9.19.5-dev commit b13d973, but I believe it is like
that for a long time already.


THanks very much.
commit bb990030d344dafe40a62fe5ed2741de28b8ca66 removed the probing heuristics.

BIND 9.17.6 and later

5516.   [func]  The default EDNS buffer size has been changed from 4096
 to 1232 bytes, the EDNS buffer size probing has been
 removed, and named now sets the DF (Don't Fragment) flag
 on outgoing UDP packets. [GL #2183]


I think the draft as it is currently does not have enough information
for implementers to be followed in safe way.


I'm against publication as it is.

There should be running code, experiments, and measurements to back up
data in this draft. I can't see them at the moment.


Then, do you agree the following requirements ? (as DNS software developpers)

1. SHOULD set DF bit on outgoing UDP packets on IPv4,
and SHOULD not use FRAGMENT header on IPv6.


Theoretically yes, but it might not be achievable depending on OS API. 
We tried many iterations in BIND, and discovered that APIs (at least in 
Linux) are horrible and there are traps everywhere.


Here we _also_ need to protect against "old" PMTU discovery attacks with 
spoofed ICMP messages which cause fragmentation on the source host (as 
opposed to fragmentation along the path), which are potentially more 
dangerous because an off-path attacker can mount them more easily.


For reasons I cannot remember now BIND currently uses socket option 
IP_PMTUDISC_OMIT defined in tools/include/uapi/linux/in.h as



132 #define IP_PMTUDISC_PROBE   3   /* Ignore dst pmtu  */
133 /* Always use interface mtu (ignores dst pmtu) but don't set DF flag.
134  * Also incoming ICMP frag_needed notifications will be ignored on
135  * this socket to prevent accepting spoofed ones.
136  */
137 #define IP_PMTUDISC_INTERFACE   4
138 /* weaker version of IP_PMTUDISC_INTERFACE, which allows packets to get
139  * fragmented if they exeed the interface mtu
140  */
141 #define IP_PMTUDISC_OMIT5


I _think_, and I might be easily wrong, that this was done to eliminate 
impact of spoofed "path MTU exceeded" ICMP messages.


Personally I got lost several times when attempting to understand 
history of this, so I hesitate to formulate an universal advice or even 
say how we ended up here.




2. limit DNS payload size 1232 without path MTU discovery.
(After DNSFlagDay2020, many implementations use 1232)


As Paul wrote down thread, it is random number - and so far it works for us.



3. If path MTU discovery works, UDP responders can send larger (>1232)
responses fit in the path MTU.


Possibly, but I think it is kind of moot advice without knowing how it 
can be done. Right now there is no such thing as RFC 8899-equivalent for 
DNS, so we don't even know if it would work for DNS as we know it. Quick 
glance at RFC 8899 section 3 is not encouraging in that regard, e.g. 
point 3, as a single example, shows that 8899 does not match the current 
state of DNS (because auth does not get answer from a resolver if the 
large response got through or not).




4. TCP implementations SHOULD set DF bit / not use FRAGMENT header.
(many TCP implementations already set DF bit)


I doubt we have control over this from the application. Is there even 
API to control that on TCP sockets?




# If there is a link whose MTU is smaller than 1260 (on IPv4),
# the link may be a blackhole.


Definitely. If the default is "too wrong" the whole thing falls apart.


I'm sorry for not being more informative. The only I know for certain is 
that we had multiple iterations in BIND, are not happy with any of them, 
and and it is order of magnitude more complex than we thought.


--
Petr Špaček
Internet Systems Consortium

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] WGLC for draft-ietf-dnsop-avoid-fragmentation

2022-09-14 Thread paul=40redbarn . org
1232 is an arbitrary sized based on a multi generational misunderstanding. We 
should not repeat it or promote it. 


p vixie 


On Sep 14, 2022 15:56, Kazunori Fujiwara  wrote: 

> From: Petr Špaček  
> On 15. 08. 22 12:18, Kazunori Fujiwara wrote: 
>> 
>>> I assume section 3.2 means the EDNS bufsize in the request when it 
>>> says 
>>> "their payload size", but I am not sure. The text could be clearer on 
>>> that. 
>>> 
*  UDP requestors MAY probe to discover the real MTU value per 
   destination. 
>>> How? 
>> For example, recent BIND 9 starts small EDNS requestors maxiumum 
>> DNS/UDP payload size (512), and increases gradually. 
> 
> Correction: 
> Recent BIND starts with EDNS buffer size 1232 bytes, and it does not 
> rise the value to "probe" the destination address by to "probe". 
> 
> FTR I'm testing on 9.19.5-dev commit b13d973, but I believe it is like 
> that for a long time already. 

THanks very much. 
commit bb990030d344dafe40a62fe5ed2741de28b8ca66 removed the probing heuristics. 

BIND 9.17.6 and later 

5516.   [func]  The default EDNS buffer size has been changed from 4096 
to 1232 bytes, the EDNS buffer size probing has been 
removed, and named now sets the DF (Don't Fragment) flag 
on outgoing UDP packets. [GL #2183] 

> I think the draft as it is currently does not have enough information 
> for implementers to be followed in safe way. 
> 
> 
> I'm against publication as it is. 
> 
> There should be running code, experiments, and measurements to back up 
> data in this draft. I can't see them at the moment. 

Then, do you agree the following requirements ? (as DNS software developpers) 

1. SHOULD set DF bit on outgoing UDP packets on IPv4, 
   and SHOULD not use FRAGMENT header on IPv6. 

2. limit DNS payload size 1232 without path MTU discovery. 
   (After DNSFlagDay2020, many implementations use 1232) 

3. If path MTU discovery works, UDP responders can send larger (>1232) 
   responses fit in the path MTU. 

4. TCP implementations SHOULD set DF bit / not use FRAGMENT header. 
   (many TCP implementations already set DF bit) 

# If there is a link whose MTU is smaller than 1260 (on IPv4), 
# the link may be a blackhole. 

Regards, 

-- 
Kazunori Fujiwara, JPRS  
___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] WGLC for draft-ietf-dnsop-avoid-fragmentation

2022-09-14 Thread Kazunori Fujiwara
> From: Petr Špaček 
> On 15. 08. 22 12:18, Kazunori Fujiwara wrote:
>> 
>>> I assume section 3.2 means the EDNS bufsize in the request when it
>>> says
>>> "their payload size", but I am not sure. The text could be clearer on
>>> that.
>>>
*  UDP requestors MAY probe to discover the real MTU value per
   destination.
>>> How?
>> For example, recent BIND 9 starts small EDNS requestors maxiumum
>> DNS/UDP payload size (512), and increases gradually.
> 
> Correction:
> Recent BIND starts with EDNS buffer size 1232 bytes, and it does not
> rise the value to "probe" the destination address by to "probe".
> 
> FTR I'm testing on 9.19.5-dev commit b13d973, but I believe it is like
> that for a long time already.

THanks very much.
commit bb990030d344dafe40a62fe5ed2741de28b8ca66 removed the probing heuristics.

BIND 9.17.6 and later 

5516.   [func]  The default EDNS buffer size has been changed from 4096
to 1232 bytes, the EDNS buffer size probing has been
removed, and named now sets the DF (Don't Fragment) flag
on outgoing UDP packets. [GL #2183]

> I think the draft as it is currently does not have enough information
> for implementers to be followed in safe way.
> 
> 
> I'm against publication as it is.
> 
> There should be running code, experiments, and measurements to back up
> data in this draft. I can't see them at the moment.

Then, do you agree the following requirements ? (as DNS software developpers)

1. SHOULD set DF bit on outgoing UDP packets on IPv4,
   and SHOULD not use FRAGMENT header on IPv6.

2. limit DNS payload size 1232 without path MTU discovery.
   (After DNSFlagDay2020, many implementations use 1232)

3. If path MTU discovery works, UDP responders can send larger (>1232)
   responses fit in the path MTU.

4. TCP implementations SHOULD set DF bit / not use FRAGMENT header.
   (many TCP implementations already set DF bit)

# If there is a link whose MTU is smaller than 1260 (on IPv4),
# the link may be a blackhole.

Regards,

--
Kazunori Fujiwara, JPRS 
___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] WGLC for draft-ietf-dnsop-avoid-fragmentation

2022-09-14 Thread Petr Špaček

On 15. 08. 22 12:18, Kazunori Fujiwara wrote:



I assume section 3.2 means the EDNS bufsize in the request when it says
"their payload size", but I am not sure. The text could be clearer on
that.


   *  UDP requestors MAY probe to discover the real MTU value per
  destination.

How?

For example, recent BIND 9 starts small EDNS requestors maxiumum
DNS/UDP payload size (512), and increases gradually.


Correction:
Recent BIND starts with EDNS buffer size 1232 bytes, and it does not 
rise the value to "probe" the destination address by to "probe".


FTR I'm testing on 9.19.5-dev commit b13d973, but I believe it is like 
that for a long time already.


I think the draft as it is currently does not have enough information 
for implementers to be followed in safe way.



I'm against publication as it is.

There should be running code, experiments, and measurements to back up 
data in this draft. I can't see them at the moment.


--
Petr Špaček
Internet Systems Consortium

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] WGLC for draft-ietf-dnsop-avoid-fragmentation

2022-08-17 Thread Petr Špaček

On 17. 08. 22 17:09, Daisuke HIGASHI wrote:
Peter van Dijk >:



Thank you for reviewing my implementation.

Note that the function called "probe_pmtu" does not really probe. At
best, it finds some data the kernel cached recently. At worst (i.e.
usually), it tells you the MTU of your local networking interface.


That's correct.


 > - A first response (to requester with small PMTU) can be lost because
 > nobody knows PMTU for destination that a large packet was never sent.
 > It slows down name resolution - Fortunately this is not a big problem
 > because 1) will be recovered by retransmission by the requestor

(a) why would a requestor retransmit? (b) why would the retransmit help?


1) Responder receives first request and makes response (DF=1) that size 
is exceeding PMTU and it was not received by requester.

2) Responder should receive ICMP NEEDFRAG and knows PMTU.


And at this step we opened the attack window to ICMP-based fragmentation 
attacks again.


3) Requester (resolvers) _would_ retransmit same request after timeouts 
if none of response is received.


Possibly. Or it can retransmit to another address, or possibly routed to 
another node in anycast cloud. Or via another path. Or just fallback to 
TCP and don't bother with UDP anymore.


4) Responder composes DNS message fitting in PMTU or TC=1 (if does not 
fits in).



I admit that my current implementation (responder's behavior which the 
I-D describes, in my understanding) relies on requester's timeout / 
retransmission strategy and when PMTU cache information expires same 
timeout event would occurs again.


This is completely unreliable, I'm afraid.

--
Petr Špaček

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] WGLC for draft-ietf-dnsop-avoid-fragmentation

2022-08-17 Thread Daisuke HIGASHI
Peter van Dijk :

>
Thank you for reviewing my implementation.

Note that the function called "probe_pmtu" does not really probe. At
> best, it finds some data the kernel cached recently. At worst (i.e.
> usually), it tells you the MTU of your local networking interface.


That's correct.

>
> > - A first response (to requester with small PMTU) can be lost because
> > nobody knows PMTU for destination that a large packet was never sent.
> > It slows down name resolution - Fortunately this is not a big problem
> > because 1) will be recovered by retransmission by the requestor
>
> (a) why would a requestor retransmit? (b) why would the retransmit help?


1) Responder receives first request and makes response (DF=1) that size is
exceeding PMTU and it was not received by requester.
2) Responder should receive ICMP NEEDFRAG and knows PMTU.
3) Requester (resolvers) _would_ retransmit same request after timeouts if
none of response is received.
4) Responder composes DNS message fitting in PMTU or TC=1 (if does not fits
in).

I admit that my current implementation (responder's behavior which the I-D
describes, in my understanding) relies on requester's timeout /
retransmission strategy and when PMTU cache information expires same
timeout event would occurs again.

-- 
Daisuke Higashi
___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] WGLC for draft-ietf-dnsop-avoid-fragmentation

2022-08-16 Thread Kazunori Fujiwara
> From: "Andrew McConachie" 
>> Path MTU discovery remains widely undeployed due to
>>security issues, and IP fragmentation has exposed weaknesses in
>>application protocols.
> 
> PMTUD doesn’t work through NAT and that’s probably the main reason
> why it doesn’t work on the Internet. I think that’s less of a
> security issue than just a general issue with PMTUD not working on the
> modern Internet.

I will remove "Path MTU discovery remains widely undeployed due to security 
issues".
because the text is only present in the abstract.

>> Currently, DNS is known to be the largest
>>user of IP fragmentation.

And I will remove this line.
because the text is only present in the abstract.

> Compared to what? I would just drop this sentence because it doesn’t
> add anything to the document and it’s trying to make a point that
> doesn’t need to be made.

--
Kazunori Fujiwara, JPRS 
___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] WGLC for draft-ietf-dnsop-avoid-fragmentation

2022-08-15 Thread Peter van Dijk
On Sat, 2022-08-13 at 21:49 +0900, Daisuke HIGASHI wrote:
> I wrote an experimental "avoid-fragmentation" patch for NSD (as per
> section 3.1 and Appexdix C). Due to dependency on getsockopt(IP_MTU),
> currently it should work on Linux only.
> 
> https://github.com/hdais/nsd-avoid-fragmentation#avoid-fragmentation-implementation-for-nsd
> https://github.com/hdais/nsd-avoid-fragmentation/commit/e34931ece95d4bcc20d71d3f3a18e037d2772f23
> 
> I did several tests on avoid-fragmentation, and got some findings or 
> questions:
> 
> - avoid-fragmentation (current draft) can be implemented by small
> modifications as you can see above.

Note that the function called "probe_pmtu" does not really probe. At
best, it finds some data the kernel cached recently. At worst (i.e.
usually), it tells you the MTU of your local networking interface.

> - A first response (to requester with small PMTU) can be lost because
> nobody knows PMTU for destination that a large packet was never sent.
> It slows down name resolution - Fortunately this is not a big problem
> because 1) will be recovered by retransmission by the requestor

(a) why would a requestor retransmit? (b) why would the retransmit help?

(I can imagine answers to these, but they're incomplete - so I'm curious
about your thought process here)

>  2)
> This rarely occurs. Most advertised EDNS bufsize fits in most MTU
> (slightly smaller than 1500) thanks to DNS flag day 2020.
> 
> - API to get PMTU to any destination is available on many platforms
> (other than Linux)?

As far as such APIs exist, they rely on the few bits of data your kernel
happens to have learned recently. Usually, the data you want will not be
there.


Kind regards,
-- 
Peter van Dijk
PowerDNS.COM BV - https://www.powerdns.com/

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] WGLC for draft-ietf-dnsop-avoid-fragmentation

2022-08-15 Thread Kazunori Fujiwara
> From: Peter van Dijk 
> Avoiding fragmentation is good. Putting that in a document is also good.
> But this document is not ready for publication. It also most definitely
> does not describe Best Current Practice; it also does not prescribe a
> Best Current Practice I can agree with or even really implement.
> 
> I'll call out a few specific problems below, but this list is not
> complete.
> 
> The (normative!) reference to RFC8900 is very vague.

I will change informative reference to RFC8900.

> The IP_DONTFRAG reference (well, not really a reference) is handwavy and
> ill-defined. The discussion of socket options is also incomplete. (See
> also: Petr's email)
> (That said, the advice is good.)

I would like to change IP_DONTFRAG/IPV6_DONTFRAG related description as follows.
If there is a better text, please suggest.

| 2.1 "Don't Fragment" way
|
|  In this document, the term "Don't Fragment" way
|  implies
|  to set "Don't Fragment flag (DF) bit" [RFC0791] on IPv4
|  and not to use "Fragment header" [RFC8200] on IPv6.
|
|  How to set "Don't Fragment flag (DF) bit" on IPv4 varies between
|  implementations, IP_DONTFRAG option is used on BSD systems to set
|  the Don't Fragment bit.  On Linux systems this is done via
|  IP_MTU_DISCOVER and IP_PMTUDISC_DO.
|
|  The way without "Fragment header" on IPv6 varies between
|  implementations.  On BSD systems, use IPV6_DONTFRAG socket option
|  defined in [RFC3542].
|
| 3.1.  Recommendations for UDP responders

-   *  UDP responders SHOULD send DNS responses with IP_DONTFRAG /
-  IPV6_DONTFRAG [RFC3542] options.
+   *  UDP requestors SHOULD send DNS requests with "Don't Fragment" way.

>>   *  UDP responders MAY probe to discover the real MTU value per
>>  destination.
> 
> I have no clue how a responder would do this. If I'm wrong, and this is
> possible at all, the document should explain how this would be done.

The third and forth paragraphs may be merged, I think.

> I assume section 3.2 means the EDNS bufsize in the request when it says
> "their payload size", but I am not sure. The text could be clearer on
> that.
> 
>>   *  UDP requestors MAY probe to discover the real MTU value per
>>  destination.
> 
> How?

For example, recent BIND 9 starts small EDNS requestors maxiumum
DNS/UDP payload size (512), and increases gradually.

Do you have good text ?

>>  To avoid name resolution fails, UDP requestors need to retry using
>>  TCP, or UDP with smaller maximum DNS/UDP payload size.
> 
> This lacks 2119/8174 keywords. "need" sounds like SHOULD or MUST, but I
> do not think this behaviour should be mandated of implementations.
> Several resolver implementations currently do none of this, and they work
> fine on the existing Internet. We should not be adding code compensating
> for potential breakage we can imagine. So, I suggest this would at most
> be a MAY, or a lowercase "can retry using ...".

Ok, change to "MAY".

>>The proposed method supports incremental deployment.
> 
> In its current shape, this document does not really propose a method for
> anything. If the document gets updated to provide implementable advice,
> it should get an Implementation Status section.

Thanks. Section 4 Incremental deployment may be removed.

--
Kazunori Fujiwara, JPRS 

> Section 5 is solid advice.
> 
> I also agree with the full text of Petr's response.
>
> Kind regards,
> -- 
> Peter van Dijk
> PowerDNS.COM BV - https://www.powerdns.com/
> 
> ___
> DNSOP mailing list
> DNSOP@ietf.org
> https://www.ietf.org/mailman/listinfo/dnsop
> 

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] WGLC for draft-ietf-dnsop-avoid-fragmentation

2022-08-13 Thread Daisuke HIGASHI
I wrote an experimental "avoid-fragmentation" patch for NSD (as per
section 3.1 and Appexdix C). Due to dependency on getsockopt(IP_MTU),
currently it should work on Linux only.

https://github.com/hdais/nsd-avoid-fragmentation#avoid-fragmentation-implementation-for-nsd
https://github.com/hdais/nsd-avoid-fragmentation/commit/e34931ece95d4bcc20d71d3f3a18e037d2772f23

I did several tests on avoid-fragmentation, and got some findings or questions:

- avoid-fragmentation (current draft) can be implemented by small
modifications as you can see above.

- A first response (to requester with small PMTU) can be lost because
nobody knows PMTU for destination that a large packet was never sent.
It slows down name resolution - Fortunately this is not a big problem
because 1) will be recovered by retransmission by the requestor 2)
This rarely occurs. Most advertised EDNS bufsize fits in most MTU
(slightly smaller than 1500) thanks to DNS flag day 2020.

- Possible TCP fallback attack. An attacker can spoof the PMTU by
sending a fake ICMP NEEDFRAG with small MTU (like 512) and triggers
TCP fallback for any requester/responder session (e.g. DNS sessions
between large DNS authoritative services and large ISP DNS resolver).

- API to get PMTU to any destination is available on many platforms
(other than Linux)?

I really concern about TCP fallback attack. Most DNS servers /
resolvers are not yet prepared with the resources to handle many TCP
requests. We need some protection or PMTU probing method not depending
on ICMP.

>   Section 3.4 of [RFC1122] specifies FIND_MAXSIZES() as one of

>"INTERNET/TRANSPORT LAYER INTERFACEs".

That should be GET_MAXSIZES() in RFC1122. (But is it available many platform?)

-- 
Daisuke Higashi

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] WGLC for draft-ietf-dnsop-avoid-fragmentation

2022-08-05 Thread Paul Vixie




Brian Dickson wrote on 2022-07-31 13:46

On Sun, Jul 31, 2022 at 11:54 AM Paul Vixie wrote:

https://datatracker.ietf.org/wg/plpmtud/about/

(I would note that the above wg is "status: closed".)


don't we all just love it when something reaches successful conclusion?


i suggest further reading and perhaps reconsideration. we've got to
break out of the MTU 1500 jail some day or the internet will end in
header processing related heat death. some work is being done and some
results are already known. we should be open to the possibility of
improvement.

If there is interest in pursuing DNS-specific methods for learning and 
using path MTU values, where would the right venue for that be?


there used to be a dns-research@ mailing list over at oarc. maybe ask 
them to rekindle same? applying PLPMTUD to DNS should be 
straightforward, and would be a great IETF Hackathon goal.



Concrete suggestions on the subject itself:

  ...


none of that is on-topic for the draft being considered on-thread.

Having said all that, I still am in favor of preventing UDP 
fragmentation, regardless of the MTU used, and support this draft.


tyvm.

--
P Vixie

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] WGLC for draft-ietf-dnsop-avoid-fragmentation

2022-08-05 Thread Paul Vixie

see inline.

Andrew McConachie wrote on 2022-08-04 06:32:



On 31 Jul 2022, at 20:53, Paul Vixie wrote:



https://datatracker.ietf.org/wg/plpmtud/about/

i suggest further reading and perhaps reconsideration. we've got to 
break out of the MTU 1500 jail some day or the internet will end in 
header processing related heat death. some work is being done and some 
results are already known. we should be open to the possibility of 
improvement.




I apologize for derailing this conversation by bringing up NAT. My point 
was that the document makes a claim that PMTUD ‘remains widely 
undeployed due to security issues’. Yet it makes no reference to 
anything that might back up that claim. I would suggest the document not 
make any claim as to why PMTUD remains widely undeployed. If it must 
make such a claim then there should be some supporting evidence for it.


the claim isn't essential, but i think it's valuable to understanding. 
would you accept this edit: "was never widely deployed due to perceived 
security issues having to do with ICMP and IP Options"?


separately we ought to add a reference to PLPMTUD (which != PMTUD) but 
that's an answer to a different part of this thread.


--
P Vixie

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] WGLC for draft-ietf-dnsop-avoid-fragmentation

2022-08-04 Thread Mukund Sivaraman
On Thu, Aug 04, 2022 at 03:49:48PM +0200, Joe Abley wrote:
> Hi Andrew,
> 
> On Aug 4, 2022, at 15:33, Andrew McConachie  wrote:
> 
> > I apologize for derailing this conversation by bringing up NAT. My point 
> > was that the document makes a claim that PMTUD ‘remains widely undeployed 
> > due to security issues’. Yet it makes no reference to anything that might 
> > back up that claim.
> 
> I think the concern about the assertion and the lack of citation are 
> reasonable and it ought to be possible to improve the text.
> 
> Anecdotally, the problems I have observed with pMTUd is with networks that 
> blindly and misguidedly block all ICMP inbound "because security" which 
> breaks the signalling path that pMTUd relies upon to know that an interface 
> with a small MTU has been found by an outbound packet sent with DF=1. This 
> used to be [*] overwhelmingly common when sending large responses back from 
> servers to client devices attached through tunnels, e.g. CPE routers using 
> upstream PPPoE: servers that are "protected" by over-suppression of 
> downstream ICMP have no way of knowing that a packet has been dropped and 
> sessions stall.

With CVE-2021-20322, such blocking as mitigation got worse. That was one
heck of a clever UDP source port probing attack too.

Mukund


signature.asc
Description: PGP signature
___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] WGLC for draft-ietf-dnsop-avoid-fragmentation

2022-08-04 Thread Joe Abley
Hi Andrew,

On Aug 4, 2022, at 15:33, Andrew McConachie  wrote:

> I apologize for derailing this conversation by bringing up NAT. My point was 
> that the document makes a claim that PMTUD ‘remains widely undeployed due to 
> security issues’. Yet it makes no reference to anything that might back up 
> that claim.

I think the concern about the assertion and the lack of citation are reasonable 
and it ought to be possible to improve the text.

Anecdotally, the problems I have observed with pMTUd is with networks that 
blindly and misguidedly block all ICMP inbound "because security" which breaks 
the signalling path that pMTUd relies upon to know that an interface with a 
small MTU has been found by an outbound packet sent with DF=1. This used to be 
[*] overwhelmingly common when sending large responses back from servers to 
client devices attached through tunnels, e.g. CPE routers using upstream PPPoE: 
servers that are "protected" by over-suppression of downstream ICMP have no way 
of knowing that a packet has been dropped and sessions stall.

For TCP this is mitigatable on the client side (in this example) by techniques 
like MSS-clamping in the CPE. For other layer-4 protocols, not so much. This 
makes failures difficult to troubleshoot since "the internet" seems fine for 
some users/applications but broken for others.

I do not have a convenient reference for this, but searching for "operational 
problems with pmtud" seems to yield a healthy number of results, so perhaps 
something suitable can be found quite easily. RFC 2923 contains descriptions of 
various problems that are relevant, although its clearly-stated focus is with 
TCP transport, not UDP.


Joe

[*] might still be, but I'm more distant from the operational front line than I 
used to be, these days
___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] WGLC for draft-ietf-dnsop-avoid-fragmentation

2022-08-04 Thread Andrew McConachie



On 31 Jul 2022, at 20:53, Paul Vixie wrote:


Andrew McConachie wrote on 2022-07-28 03:24:

Path MTU discovery remains widely undeployed due to
   security issues, and IP fragmentation has exposed weaknesses in
   application protocols.


PMTUD doesn’t work through NAT and that’s probably the main 
reason why it doesn’t work on the Internet. I think that’s less 
of a security issue than just a general issue with PMTUD not working 
on the modern Internet.
path mtu discovery has been significantly rethought in the modern 
internet:


https://www.rfc-editor.org/rfc/rfc8899.html

apparently, it sometimes works:

https://developers.redhat.com/articles/2022/05/23/plpmtud-delivers-better-path-mtu-discovery-sctp-linux

see also:

<>

https://datatracker.ietf.org/wg/plpmtud/about/

i suggest further reading and perhaps reconsideration. we've got to 
break out of the MTU 1500 jail some day or the internet will end in 
header processing related heat death. some work is being done and some 
results are already known. we should be open to the possibility of 
improvement.




I apologize for derailing this conversation by bringing up NAT. My point 
was that the document makes a claim that PMTUD ‘remains widely 
undeployed due to security issues’. Yet it makes no reference to 
anything that might back up that claim. I would suggest the document not 
make any claim as to why PMTUD remains widely undeployed. If it must 
make such a claim then there should be some supporting evidence for it.


—Andrew

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] WGLC for draft-ietf-dnsop-avoid-fragmentation

2022-07-31 Thread Brian Dickson
On Sun, Jul 31, 2022 at 11:54 AM Paul Vixie  wrote:

>
>
> Andrew McConachie wrote on 2022-07-28 03:24:
> >> Path MTU discovery remains widely undeployed due to
> >>security issues, and IP fragmentation has exposed weaknesses in
> >>application protocols.
> >
> > PMTUD doesn’t work through NAT and that’s probably the main reason why
> > it doesn’t work on the Internet. I think that’s less of a security issue
> > than just a general issue with PMTUD not working on the modern Internet.
> path mtu discovery has been significantly rethought in the modern internet:
>
> https://www.rfc-editor.org/rfc/rfc8899.html
>
> apparently, it sometimes works:
>
>
> https://developers.redhat.com/articles/2022/05/23/plpmtud-delivers-better-path-mtu-discovery-sctp-linux
>
> see also:
>
> < network (so it is not subject to the problems described in RFC2923).
> Instead it finds the proper MTU by starting with relatively small
> packets and searching upwards by probing with test packets.>>
>
> https://datatracker.ietf.org/wg/plpmtud/about/


(I would note that the above wg is "status: closed".)


>
>
> i suggest further reading and perhaps reconsideration. we've got to
> break out of the MTU 1500 jail some day or the internet will end in
> header processing related heat death. some work is being done and some
> results are already known. we should be open to the possibility of
> improvement.
>

If there is interest in pursuing DNS-specific methods for learning and
using path MTU values, where would the right venue for that be?

Concrete suggestions on the subject itself:

   - Could clients learn per-server PTMU from TCP, and (with some added
   active measurement/management of those results) use that in EDNS bufsize
   for UDP?
   - Maybe some additional explicit signaling for support for whatever
   PMTUD mechanisms to use?
   - Since TCP sessions might not be long lived, would a parallel (rather
   than sequential) packet size attempt mechanism be possible?
  - E.g. break the DNS message into a number of first-packet sizes that
  are common or viable MTU sizes, and send them all with the same sequence
  number but different lengths? Have the server ACK the largest
one received
  within some nominal window of time? (Might not work in standard
TCP stack,
  but perhaps a DNS-specific TCP implementation?)
   - I am interested in keeping as much DNS traffic on UDP and unencrypted
   as possible, at the authoritative side. Use DNSSEC to protect the content.
   I am not convinced of the necessity of ADoX generally.

Having said all that, I still am in favor of preventing UDP fragmentation,
regardless of the MTU used, and support this draft.

Brian
___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] WGLC for draft-ietf-dnsop-avoid-fragmentation

2022-07-31 Thread Paul Vixie



Andrew McConachie wrote on 2022-07-28 03:24:

Path MTU discovery remains widely undeployed due to
   security issues, and IP fragmentation has exposed weaknesses in
   application protocols.


PMTUD doesn’t work through NAT and that’s probably the main reason why 
it doesn’t work on the Internet. I think that’s less of a security issue 
than just a general issue with PMTUD not working on the modern Internet.

path mtu discovery has been significantly rethought in the modern internet:

https://www.rfc-editor.org/rfc/rfc8899.html

apparently, it sometimes works:

https://developers.redhat.com/articles/2022/05/23/plpmtud-delivers-better-path-mtu-discovery-sctp-linux

see also:

<>

https://datatracker.ietf.org/wg/plpmtud/about/

i suggest further reading and perhaps reconsideration. we've got to 
break out of the MTU 1500 jail some day or the internet will end in 
header processing related heat death. some work is being done and some 
results are already known. we should be open to the possibility of 
improvement.


--
P Vixie

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] WGLC for draft-ietf-dnsop-avoid-fragmentation

2022-07-29 Thread Peter van Dijk
Hello,

On Tue, 2022-07-26 at 21:13 +, Suzanne Woolf wrote:
> Dear colleagues,
> 
> 
> This message starts the Working Group Last Call for 
> draft-ietf-dnsop-avoid-fragmentation 
> (https://datatracker.ietf.org/doc/draft-ietf-dnsop-avoid-fragmentation/). The 
> requested status is BCP.
> 
> Since we're starting the Last Call during the IETF week, and many folks are 
> on holidays in the next few weeks, the WGLC will end in three weeks (instead 
> of the usual two), on August 16.
> 
> Substantive comments to the list, please. It’s fine for minor edits to go 
> direct to the authors. We need to hear positive support to advance it, or 
> your comments on what still needs to be done. 

Avoiding fragmentation is good. Putting that in a document is also good.
But this document is not ready for publication. It also most definitely
does not describe Best Current Practice; it also does not prescribe a
Best Current Practice I can agree with or even really implement.

I'll call out a few specific problems below, but this list is not
complete.

The (normative!) reference to RFC8900 is very vague.

The IP_DONTFRAG reference (well, not really a reference) is handwavy and
ill-defined. The discussion of socket options is also incomplete. (See
also: Petr's email)

>   *  If the UDP responder detects immediate error that the UDP packet
>  cannot be sent beyond the path MTU size (EMSGSIZE), the UDP
>  responder MAY recreate response packets fit in path MTU size, or
>  TC bit set.

If an answer does not fit, there is often no legitimate smaller answer
that will fit, as convincingly argued by draft-ietf-dnsop-glue-is-not-
optional. Some additionals may be an exception to this. Furthermore, this
situation (a responder receiving a message size error from the kernel) is
extremely unlikely, unless there is a requester that talks to this
responder a lot.

(That said, the advice is good.)

>   *  UDP responders MAY probe to discover the real MTU value per
>  destination.

I have no clue how a responder would do this. If I'm wrong, and this is
possible at all, the document should explain how this would be done.

I assume section 3.2 means the EDNS bufsize in the request when it says
"their payload size", but I am not sure. The text could be clearer on
that.

>   *  UDP requestors MAY probe to discover the real MTU value per
>  destination.

How?

>  To avoid name resolution fails, UDP requestors need to retry using
>  TCP, or UDP with smaller maximum DNS/UDP payload size.

This lacks 2119/8174 keywords. "need" sounds like SHOULD or MUST, but I
do not think this behaviour should be mandated of implementations.
Several resolver implementations currently do none of this, and they work
fine on the existing Internet. We should not be adding code compensating
for potential breakage we can imagine. So, I suggest this would at most
be a MAY, or a lowercase "can retry using ...".

>The proposed method supports incremental deployment.

In its current shape, this document does not really propose a method for
anything. If the document gets updated to provide implementable advice,
it should get an Implementation Status section.

Section 5 is solid advice.

I also agree with the full text of Petr's response.

Kind regards,
-- 
Peter van Dijk
PowerDNS.COM BV - https://www.powerdns.com/

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] WGLC for draft-ietf-dnsop-avoid-fragmentation

2022-07-29 Thread Joe Abley
Hi Andrew,

On Jul 29, 2022, at 11:14, Andrew McConachie  wrote:

> We don’t need a useful standard for NAT to recognize that most 
> implementations break PMTUD, and that those implementations of NAT are 
> deployed enough to make PMTUD significantly broken.

I was really just suggesting that some measurement to support the assertion 
might be nice.

>> So perhaps it's reasonable to say that the IETF use of MTU pre-dates 
>> Ethernet switch vendors' usage, since it pre-dates Ethernet switches, since 
>> it pre-dates Ethernet.
> 
> Ok. But the text still isn’t clear on how many bytes are assumed to be 
> consumed by layer-2 protocols.

I think the point is that it's not necessary to know that.

> We don’t need to have a super tight definition of MTU to progress this 
> document. Implementors just need to know how big of packets they can transmit.

The answer to that question for any particular interface (attached to any 
layer-2 network) is "that interface's MTU".


Joe
___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] WGLC for draft-ietf-dnsop-avoid-fragmentation

2022-07-29 Thread Andrew McConachie



On 28 Jul 2022, at 13:19, Joe Abley wrote:


On Jul 28, 2022, at 12:24, Andrew McConachie  wrote:


PMTUD doesn’t work through NAT


That's a very definitive statement considering that there's no useful 
standard for NAT.


If there's actual research on this to demonstrate that, pragmatically 
speaking, no implementations use the payload of a type 3 code 4 ICMP 
message to identify a translated target for the packet I would like to 
read it, because that sounds interesting.




The document makes the claim that PMTUD “remains widely undeployed due 
to security issues.” My contention is that it has little to do with 
security and more to do with the current structure of the Internet. We 
don’t need a useful standard for NAT to recognize that most 
implementations break PMTUD, and that those implementations of NAT are 
deployed enough to make PMTUD significantly broken. Firewalls break 
PMTUD as well, and I guess that’s a security thing, but currently the 
sentence reads like operators don’t deploy PMTUD  in favor of security 
and I don’t think that’s true.



Currently, DNS is known to be the largest
  user of IP fragmentation.


Compared to what? I would just drop this sentence because it 
doesn’t add anything to the document and it’s trying to make a 
point that doesn’t need to be made.


I'd also like to see a citation for this one if there has been a 
study. I agree that it's probably the most familiar example of 
fragmentation for an audience mainly preoccupied with the DNS, but 
that's probably not a helpful observation :-)


Before I was interested in the DNS I worked for an ethernet switch 
vendor for 8 years, and I often find the way MTU gets talked about in 
IETF documents simply weird.


RFC 791 introduces the term "maximum transmission unit" to be the 
maximum size of a datagram, not the maximum size of a frame whose 
payload is a datagram.


The maximum sized datagram that can be transmitted through the
  next network is called the maximum transmission unit (MTU).

MTU is a measurement of maximum frame size for a network segment 
starting at Layer 2.


I have also heard MTU used in that way. I have always assumed it was 
just sloppy writing.


There may be prior use of the phrase that I'm not aware of (prior to 
1981) but even if that's the case I think it's reasonable to use the 
IETF definition of the phrase in the IETF.


I think Ethernet was not standardised until the publication of IEEE 
802.3 in 1983. I also think the original specification did not 
anticipate switches but described a multi-access network with a 
broader collision domain.


So perhaps it's reasonable to say that the IETF use of MTU pre-dates 
Ethernet switch vendors' usage, since it pre-dates Ethernet switches, 
since it pre-dates Ethernet.


Ok. But the text still isn’t clear on how many bytes are assumed to be 
consumed by layer-2 protocols. We don’t need to have a super tight 
definition of MTU to progress this document. Implementors just need to 
know how big of packets they can transmit.




Joe



___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] WGLC for draft-ietf-dnsop-avoid-fragmentation

2022-07-28 Thread Petr Špaček

On 26. 07. 22 23:13, Suzanne Woolf wrote:

Dear colleagues,


This message starts the Working Group Last Call for 
draft-ietf-dnsop-avoid-fragmentation 
(https://datatracker.ietf.org/doc/draft-ietf-dnsop-avoid-fragmentation/). The 
requested status is BCP.

Since we're starting the Last Call during the IETF week, and many folks are on 
holidays in the next few weeks, the WGLC will end in three weeks (instead of 
the usual two), on August 16.

Substantive comments to the list, please. It’s fine for minor edits to go 
direct to the authors. We need to hear positive support to advance it, or your 
comments on what still needs to be done.


Given the fact that most open-source implementations already avoid 
fragmentation in one way or another, there is no harm in publishing 
"don't fragment" as a RFC, but there is also only limited benefit. 
Attacks and problems were years ahead and implementers had to patch 
software years ago.


After a quick re-read I agree with the idea but there are implementation 
details which are not as simple as laid out in the current text.



   This document proposes to set IP_DONTFRAG / IPV6_DONTFRAG in DNS/UDP
   messages in order to avoid IP fragmentation, and describes how to
   avoid packet losses due to IP_DONTFRAG / IPV6_DONTFRAG.


IP_DONTFRAG and its interaction with networking stack and socket 
interface is complex and possibly OS-dependent.


I don't remember all the gory details but BIND went through several 
iterations of this and currently we use:


- IP_DONTFRAG - socket option disabled (i.e. fragmentation allowed)
- IP_MTU_DISCOVER = IP_PMTUDISC_OMIT (which is woefully undocumented in 
man pages, see https://lists.openwall.net/netdev/2014/02/26/4)


State in other resolvers is somewhat summarized here:
- https://github.com/PowerDNS/pdns/pull/7410/files#r250252209
- https://www.yadifa.eu/archives/yadifa-users/2019-March/000133.html
(From a quick glance it seems we all do the same, or at least try.)

IIRC this was needed to protect from attacks using spoofed Path MTU.

This risk really should be mentioned in the document and discussed. RFC 
8201 section 6 is not theoretical, we have seen weird things happen in 
real networks. CVE-2021-25218 is my witness :-)
Maybe the currently empty section 8.  Security Considerations needs to 
copy RFC 8201 section 6 and then conclude that it is insecure anyway, so 
don't do it?



I'm not sure how to proceed. On one hand IP_DONTFRAG sounds like a good 
idea. On the other hard it does weird things when combined with 
IP_MTU_DISCOVER IP_PMTUDISC_OMIT/_DONT. Maybe DNS is even the right 
place to start and we should be poking OS people to fix/improve it? I 
don't know.



Except for the fundamental problem mentioned above I have just smaller 
things:



3.2.  Recommendations for UDP requestors >*  DNS responses may be dropped 
by IP fragmentation.  Upon a timeout,
  UDP requestors may retry using TCP or UDP, per local policy.

  To avoid name resolution fails, UDP requestors need to retry using
  TCP, or UDP with smaller maximum DNS/UDP payload size.


The last sentence does not have a normative language and which is IMHO 
good, but it slightly errs on side of working around brokenness. ("need 
to retry ...")


I think the document should not have the last sentence at all and leave 
it with up to implementations to do handle timeouts as they like it.




4.  Incremental deployment


I'm not sure if it belongs here to a new "Implementation Status" 
section, but I think it's worth mentioning that this document is almost 
no-op after https://dnsflagday.net/2020/ .


The default EDNS buffer size have been changed to, sometimes, even lower 
values so the fragmentation should not happen anymore.




6.1.  Protocol compliance

+1 (or all votes I can possibly hands on!)

--
Petr Špaček

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] WGLC for draft-ietf-dnsop-avoid-fragmentation

2022-07-28 Thread Joe Abley
On Jul 28, 2022, at 12:24, Andrew McConachie  wrote:

> PMTUD doesn’t work through NAT

That's a very definitive statement considering that there's no useful standard 
for NAT.

If there's actual research on this to demonstrate that, pragmatically speaking, 
no implementations use the payload of a type 3 code 4 ICMP message to identify 
a translated target for the packet I would like to read it, because that sounds 
interesting. 

>> Currently, DNS is known to be the largest
>>   user of IP fragmentation.
> 
> Compared to what? I would just drop this sentence because it doesn’t add 
> anything to the document and it’s trying to make a point that doesn’t need to 
> be made.

I'd also like to see a citation for this one if there has been a study. I agree 
that it's probably the most familiar example of fragmentation for an audience 
mainly preoccupied with the DNS, but that's probably not a helpful observation 
:-)

> Before I was interested in the DNS I worked for an ethernet switch vendor for 
> 8 years, and I often find the way MTU gets talked about in IETF documents 
> simply weird.

RFC 791 introduces the term "maximum transmission unit" to be the maximum size 
of a datagram, not the maximum size of a frame whose payload is a datagram.

The maximum sized datagram that can be transmitted through the
  next network is called the maximum transmission unit (MTU).

> MTU is a measurement of maximum frame size for a network segment starting at 
> Layer 2.

I have also heard MTU used in that way. I have always assumed it was just 
sloppy writing.

There may be prior use of the phrase that I'm not aware of (prior to 1981) but 
even if that's the case I think it's reasonable to use the IETF definition of 
the phrase in the IETF.

I think Ethernet was not standardised until the publication of IEEE 802.3 in 
1983. I also think the original specification did not anticipate switches but 
described a multi-access network with a broader collision domain.

So perhaps it's reasonable to say that the IETF use of MTU pre-dates Ethernet 
switch vendors' usage, since it pre-dates Ethernet switches, since it pre-dates 
Ethernet.


Joe___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] WGLC for draft-ietf-dnsop-avoid-fragmentation

2022-07-28 Thread Andrew McConachie

Path MTU discovery remains widely undeployed due to
   security issues, and IP fragmentation has exposed weaknesses in
   application protocols.


PMTUD doesn’t work through NAT and that’s probably the main reason 
why it doesn’t work on the Internet. I think that’s less of a 
security issue than just a general issue with PMTUD not working on the 
modern Internet.



Currently, DNS is known to be the largest
   user of IP fragmentation.


Compared to what? I would just drop this sentence because it doesn’t 
add anything to the document and it’s trying to make a point that 
doesn’t need to be made.



 Most of the Internet and especially the inner core has an MTU of
 at least 1500 octets.  Maximum DNS/UDP payload size for IPv6 on
 MTU 1500 ethernet is 1452 (1500 minus 40 (IPv6 header size) minus
 8 (UDP header size)).  To allow for possible IP options and
 distant tunnel overhead, authors' recommendation of default
 maximum DNS/UDP payload size is 1400.


Before I was interested in the DNS I worked for an ethernet switch 
vendor for 8 years, and I often find the way MTU gets talked about in 
IETF documents simply weird. MTU is a measurement of maximum frame size 
for a network segment starting at Layer 2. Yet there’s no discussion 
of layer 2 here. The discussion starts at layer 3 and because of that 
the math doesn’t make any sense to me.


Is there just an assumption that layer 2 will consume 18 bytes? 
(6+6+2+4) (DA+SA+ET+FCS) Can we state this assumption in the document? 
As I read it now it’s not clear how many bytes are assumed to be 
consumed by layer 2 headers.


I said many of these same things in a mail to this list on August 12, 
2020 but never received a response.


Thanks,
Andrew

On 26 Jul 2022, at 23:13, Suzanne Woolf wrote:


Dear colleagues,


This message starts the Working Group Last Call for 
draft-ietf-dnsop-avoid-fragmentation 
(https://datatracker.ietf.org/doc/draft-ietf-dnsop-avoid-fragmentation/). 
The requested status is BCP.


Since we're starting the Last Call during the IETF week, and many 
folks are on holidays in the next few weeks, the WGLC will end in 
three weeks (instead of the usual two), on August 16.


Substantive comments to the list, please. It’s fine for minor edits 
to go direct to the authors. We need to hear positive support to 
advance it, or your comments on what still needs to be done.




Thanks,
Suzanne
For the chairs


___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop