[DNSOP] Re: draft-ietf-dnsop-avoid-fragmentation-17.txt - implementer notes

2024-06-06 Thread Benno Overeinder

Hi all,

Speaking as one of the DNS implementers and as part of providing 
feedback on the current draft revision, we have reformulated 
recommendation R2.  It expresses the intention not to fragment UDP 
packets and points out that different operating systems have different 
ways of achieving this.


The current concern of open-source software DNS developers is with Linux 
that the IP_MTU_DISCOVER is not well documented, it has changed over 
time, one has to look into the kernel code to see what is really going 
on, and it is fragile.


New text for R2:

-

R2.  UDP responders should configure their systems to prevent 
fragmentation of UDP packets when sending replies, provided it can be 
done safely. The mechanisms to achieve this vary across different 
operating systems.


For BSD-like operating systems, the IP "Don't Fragment flag (DF) bit" 
[RFC0791] can be used to prevent fragmentation. In contrast, Linux 
systems do not expose a direct API for this purpose and require the use 
of Path MTU socket options (IP_MTU_DISCOVER) to manage fragmentation 
settings. However, it is important to note that enabling IPv4 Path MTU 
Discovery for UDP in current Linux versions is considered harmful and 
dangerous. For more details, refer to Appendix C.


-


On 06/05/2024 15:59, Petr Špaček wrote:

Hello dnsop,

Warren asked implementers to provide feedback on the current text, so 
I'm doing just that.


I'm not an apt copywriter but hopefully following notes will provide 
material for other people to formulate commentary to supplement the 
recommendations.







Cheers,

-- Benno

___
DNSOP mailing list -- dnsop@ietf.org
To unsubscribe send an email to dnsop-le...@ietf.org


[DNSOP]Re: draft-ietf-dnsop-avoid-fragmentation-17.txt - implementer notes

2024-05-21 Thread Paul Wouters

On Mon, 6 May 2024, Petr Špaček wrote:


 R1. UDP responders SHOULD NOT use IPv6 fragmentation [RFC8200].


Operational impact of this recommendation is unclear.

Why? Because clients belong to several sets:
- One set clients cannot receive fragmented answers,


Good because it has been proven to be very insecure.

- another set of clients cannot use TCP to overcome unfragmented UDP size 
limitations,


TCP is a mandatory part of DNS now, so I'm not sure how much sympathy I
would have. If I were a flagday person, I'd call a flagday for this :P

- yet another set of clients actually depend on large answers to function 
(say because they DNSSEC validate, or need to follow huge NS sets geneated by 
MS AD, or they need large RRs to deliver e-mail, or ...).


You mean, those exact records with value to attack using DNS fragments.
Is the right operational concern to keep them vulnerable instead of
breaking them to fix it to avoid a security issue? Why wait for a
specific attack to come out before giving up on these dangerously broken
clients?

It's unclear what proportion of clients belong to intersection of these three 
sets. Banning fragmentation on the **outgoing** side might break these 
clients, and it's extremely hard to measure and debug from the server side.


Breaking them _also_ ensures they can't be victim of fragmentation attacks.


 R2. Where supported, UDP responders SHOULD set IP "Don't Fragment flag
 (DF) bit" [RFC0791] on IPv4. At the time of writing, most DNS server
 software did not set the DF bit for IPv4, and many operating systems'
 kernels constraint make it difficult to set the DF bit in all cases.


E.g. on Linux socket API does not expose DF bit directly. Application can 
request DF bit to be turned on in outgoing packets but at the same time this 
implicitly enables receipt and processing of unauthenticated ICMP messages. 
These messages can be used to manipulate Path MTU records in the kernel and 
mount attacks misusing this technique.


That's clear, and someone should take this up with the linux-net people?


 R3. UDP responders SHOULD compose response packets that fit in the minimum
 of the offered requestor's maximum UDP payload size [RFC6891], the
 interface MTU, the network MTU value configured by the knowledge of the
 network operators, and the RECOMMENDED maximum DNS/UDP payload size 1400.
 (See Appendix A for more information.)


In practice doing syscall to determine MTU _estimate_ for every single peer 
address is impractical, and in most cases the value exposed by kernel is just 
a garbage anyway. It's more practical to assume that outgoing EDNS buffer 
size is configured to a reasonable lower bound by system admin.


I don't think it is asking for a syscall here is it? It is saying the
minimum of:

1) ENDS0 option value received
2) interface MTU
3) Preset network MTU by admin in config
4) 1400

Only 2) would require some syscalls but those are per interface so not
per packet, and one could listen for interface changes to reread these.

What syscalls do you think are impractical?


 R4. If the UDP responder detects an immediate error indicating that the
 UDP packet cannot be sent beyond the path MTU size, the UDP responder MAY
 recreate response packets fit in the path MTU size, or with the TC bit
 set.


Same note about MTU determination applies here. TC=1 sounds reasonable and 
does not require more guesswork or reconstructing and recompressing the 
answer packet.


Once you did the above calcuation, wouldn't you just use that result?

I think you are both not saying things too different? eg you are
building the packet, know the max size (from above) and start adding
additional records, until you run out of space?
Or if you are still writing mandatory data (eg Answer or Authority
Section), you set TC=1 ?


 R5. UDP requestors SHOULD limit the requestor's maximum UDP payload size.
 It SHOULD use a limit of 1400 bytes, but a smaller limit MAY be used. (See
 Appendix A for more information.)


Some operators have better experience with 1400, others with other values. We 
at ISC go with lower value of 1232 because it's easier to have conservative 
value which is more likely to work. Debugging this in production is total 
pain, and using a bit smaller value is in our limited experience not causing 
new issues. That's why we went with lower values.


Let the implementers pick the value. They have the most experience
dealing with support calls. I was assuming the WG discussed this at
length, but perhaps it didn't :)


 R6. UDP requestors SHOULD drop fragmented DNS/UDP responses without IP
 reassembly to avoid cache poisoning attacks.


AFAIK this is impossible to do using normal socket API. The application has 
no access to information about UDP reassembly.


I imagine some userland stacks like DPDK could possibly enforce this.

Having said that, even if it was implementable it's IMHO not the best advice 
for requestor.


IF the requestor is able to detect that a fragment was 

[DNSOP]Re: draft-ietf-dnsop-avoid-fragmentation-17.txt - implementer notes

2024-05-07 Thread Petr Špaček

On 07. 05. 24 2:54, C. M. Heard wrote:

On Mon, May 6, 2024 at 6:59 AM Petr Špaček wrote:

Warren asked implementers to provide feedback on the current text, so
I'm doing just that.


[ ... ]


  3.1. Recommendations for UDP responders

R1. UDP responders SHOULD NOT use IPv6 fragmentation [RFC8200].


Operational impact of this recommendation is unclear.

Why? Because clients belong to several sets:
- One set clients cannot receive fragmented answers,
- another set of clients cannot use TCP to overcome unfragmented UDP
size limitations,
- yet another set of clients actually depend on large answers to
function (say because they DNSSEC validate, or need to follow huge NS
sets generated by MS AD, or they need large RRs to deliver e-mail, or ...).

It's unclear what proportion of clients belong to intersection of these
three sets. Banning fragmentation on the **outgoing** side might break
these clients, and it's extremely hard to measure and debug from the
server side.


This complaint is really unclear. The recommendation is specifically
for responders, i.e., servers. It's not a priori whether the term
"outgoing" means the requestor to responder direction or the responder
to requestor direction. I presume the latter, but it would be better
if this was made obvious by using the same terminology as the draft.

What I think you are saying is that clients that cannot re-send
truncated queries using TCP will be hurt by the recommendation. Aren't
such clients non-conformant with current DNS standards? If so, are they
sufficiently prevalent that it is necessary to continue using
workarounds to accommodate them?


I said:
"Operational impact of this recommendation is unclear."

That means that answer to your question is unknown.

This recommendation is not backed with data. If the data exist they are 
not linked. To the best of my knowledge there is no significant 
operational experience with it. If the experience exists I have not seen 
it published.



On paper the recommendation does not sound bad. Maybe it's good enough 
as aspirational, forward-looking recommendation...


But that's not what the document does. Version 17 currently says it's:
- Best
- Current
- Practice

As I implementer I claim these three words are either not supported by 
data or outright incorrect:

- Best - impact is unknown, experience is lacking
- Current - not deployed at scale
- Practice - well, not even implementable with current OS APIs!


> Wasn't the whole point of DNS Flag Day

to break what was broken and get it fixed?


There was not a flag day for TCP support (yet?).

If you are up for organizing one I'm happy to share first-hand 
experience from organizing previous two DNS Flag Days!




[ ... ]


R6. UDP requestors SHOULD drop fragmented DNS/UDP responses without IP 
reassembly to avoid cache poisoning attacks.


AFAIK this is impossible to do using normal socket API. The application
has no access to information about UDP reassembly.

Having said that, even if it was implementable it's IMHO not the best
advice for requestor.

IF the requestor is able to detect that a fragment was received then it
would be MUCH better to trigger retry using different protocol right
away. Just dropping the packet:
a] causes timeouts
b] leaves a time window open for another attack attempt


I wondered about this after I read the draft (which was after WG last
call, or I would have commented). I'm not aware of any stack that
allows the application to disable IP reassembly, nor any that indicates
whether a received UDP datagram was received in a single IP datagram or
in multiple IP fragments. If that is indeed the case, this
recommendation should be removed, since it is not actionable.

Additionally, my understanding of the motivation for this is to prevent
off-path cache poisoning attacks. If I correctly understand what I
have read, these are a problem for IPv4 (which has only a 16-bit
datagram ID) or for IPv6 stacks that emit predictable datagram IDs.
It seems to me that the advice to avoid reassembly would need to be
more nuanced, even if it were actionable.


Generally I agree. Having said that, paradoxically I think R6 advice is 
much better than R1... **if** it were practically implementable. Again, 
this can be aspirational forward-looking recommendation.


If we can get API to detect fragmented (even partial) datagrams we can 
harden the client side and most of other recommendations will be moot.


Example: The requestor could treat any fragmented answer as equivalent 
to TC=1 answer with no data inside. That should take care of all known 
fragmentation-based attacks (I think) and it does not depend on 
responder side at all.


--
Petr Špaček
Internet Systems Consortium

___
DNSOP mailing list -- dnsop@ietf.org
To unsubscribe send an email to dnsop-le...@ietf.org


[DNSOP]Re: draft-ietf-dnsop-avoid-fragmentation-17.txt - implementer notes

2024-05-06 Thread C. M. Heard
Greetings,

I am replying from the POV of an outsider to DNSOP.

On Mon, May 6, 2024 at 6:59 AM Petr Špaček wrote:
>
> Hello dnsop,
>
> Warren asked implementers to provide feedback on the current text, so
> I'm doing just that.

[ ... ]

> >  3.1. Recommendations for UDP responders
> >
> > R1. UDP responders SHOULD NOT use IPv6 fragmentation [RFC8200].
>
> Operational impact of this recommendation is unclear.
>
> Why? Because clients belong to several sets:
> - One set clients cannot receive fragmented answers,
> - another set of clients cannot use TCP to overcome unfragmented UDP
> size limitations,
> - yet another set of clients actually depend on large answers to
> function (say because they DNSSEC validate, or need to follow huge NS
> sets generated by MS AD, or they need large RRs to deliver e-mail, or ...).
>
> It's unclear what proportion of clients belong to intersection of these
> three sets. Banning fragmentation on the **outgoing** side might break
> these clients, and it's extremely hard to measure and debug from the
> server side.

This complaint is really unclear. The recommendation is specifically
for responders, i.e., servers. It's not a priori whether the term
"outgoing" means the requestor to responder direction or the responder
to requestor direction. I presume the latter, but it would be better
if this was made obvious by using the same terminology as the draft.

What I think you are saying is that clients that cannot re-send
truncated queries using TCP will be hurt by the recommendation. Aren't
such clients non-conformant with current DNS standards? If so, are they
sufficiently prevalent that it is necessary to continue using
workarounds to accommodate them? Wasn't the whole point of DNS Flag Day
to break what was broken and get it fixed?

[ ... ]

> > R6. UDP requestors SHOULD drop fragmented DNS/UDP responses without IP 
> > reassembly to avoid cache poisoning attacks.
>
> AFAIK this is impossible to do using normal socket API. The application
> has no access to information about UDP reassembly.
>
> Having said that, even if it was implementable it's IMHO not the best
> advice for requestor.
>
> IF the requestor is able to detect that a fragment was received then it
> would be MUCH better to trigger retry using different protocol right
> away. Just dropping the packet:
> a] causes timeouts
> b] leaves a time window open for another attack attempt

I wondered about this after I read the draft (which was after WG last
call, or I would have commented). I'm not aware of any stack that
allows the application to disable IP reassembly, nor any that indicates
whether a received UDP datagram was received in a single IP datagram or
in multiple IP fragments. If that is indeed the case, this
recommendation should be removed, since it is not actionable.

Additionally, my understanding of the motivation for this is to prevent
off-path cache poisoning attacks. If I correctly understand what I
have read, these are a problem for IPv4 (which has only a 16-bit
datagram ID) or for IPv6 stacks that emit predictable datagram IDs.
It seems to me that the advice to avoid reassembly would need to be
more nuanced, even if it were actionable.

Thanks and regards,

Mike Heard

___
DNSOP mailing list -- dnsop@ietf.org
To unsubscribe send an email to dnsop-le...@ietf.org