[DNSOP]Re: draft-ietf-dnsop-avoid-fragmentation-17.txt - implementer notes

2024-05-06 Thread C. M. Heard
Greetings,

I am replying from the POV of an outsider to DNSOP.

On Mon, May 6, 2024 at 6:59 AM Petr Špaček wrote:
>
> Hello dnsop,
>
> Warren asked implementers to provide feedback on the current text, so
> I'm doing just that.

[ ... ]

> >  3.1. Recommendations for UDP responders
> >
> > R1. UDP responders SHOULD NOT use IPv6 fragmentation [RFC8200].
>
> Operational impact of this recommendation is unclear.
>
> Why? Because clients belong to several sets:
> - One set clients cannot receive fragmented answers,
> - another set of clients cannot use TCP to overcome unfragmented UDP
> size limitations,
> - yet another set of clients actually depend on large answers to
> function (say because they DNSSEC validate, or need to follow huge NS
> sets generated by MS AD, or they need large RRs to deliver e-mail, or ...).
>
> It's unclear what proportion of clients belong to intersection of these
> three sets. Banning fragmentation on the **outgoing** side might break
> these clients, and it's extremely hard to measure and debug from the
> server side.

This complaint is really unclear. The recommendation is specifically
for responders, i.e., servers. It's not a priori whether the term
"outgoing" means the requestor to responder direction or the responder
to requestor direction. I presume the latter, but it would be better
if this was made obvious by using the same terminology as the draft.

What I think you are saying is that clients that cannot re-send
truncated queries using TCP will be hurt by the recommendation. Aren't
such clients non-conformant with current DNS standards? If so, are they
sufficiently prevalent that it is necessary to continue using
workarounds to accommodate them? Wasn't the whole point of DNS Flag Day
to break what was broken and get it fixed?

[ ... ]

> > R6. UDP requestors SHOULD drop fragmented DNS/UDP responses without IP 
> > reassembly to avoid cache poisoning attacks.
>
> AFAIK this is impossible to do using normal socket API. The application
> has no access to information about UDP reassembly.
>
> Having said that, even if it was implementable it's IMHO not the best
> advice for requestor.
>
> IF the requestor is able to detect that a fragment was received then it
> would be MUCH better to trigger retry using different protocol right
> away. Just dropping the packet:
> a] causes timeouts
> b] leaves a time window open for another attack attempt

I wondered about this after I read the draft (which was after WG last
call, or I would have commented). I'm not aware of any stack that
allows the application to disable IP reassembly, nor any that indicates
whether a received UDP datagram was received in a single IP datagram or
in multiple IP fragments. If that is indeed the case, this
recommendation should be removed, since it is not actionable.

Additionally, my understanding of the motivation for this is to prevent
off-path cache poisoning attacks. If I correctly understand what I
have read, these are a problem for IPv4 (which has only a 16-bit
datagram ID) or for IPv6 stacks that emit predictable datagram IDs.
It seems to me that the advice to avoid reassembly would need to be
more nuanced, even if it were actionable.

Thanks and regards,

Mike Heard

___
DNSOP mailing list -- dnsop@ietf.org
To unsubscribe send an email to dnsop-le...@ietf.org


[DNSOP] draft-ietf-dnsop-avoid-fragmentation-17.txt - implementer notes

2024-05-06 Thread Petr Špaček

Hello dnsop,

Warren asked implementers to provide feedback on the current text, so 
I'm doing just that.


I'm not an apt copywriter but hopefully following notes will provide 
material for other people to formulate commentary to supplement the 
recommendations.



On 01. 03. 24 3:54, internet-dra...@ietf.org wrote:

Internet-Draft draft-ietf-dnsop-avoid-fragmentation-17.txt is now available.
It is a work item of the Domain Name System Operations (DNSOP) WG of the IETF.

Title:   IP Fragmentation Avoidance in DNS over UDP
Authors: Kazunori Fujiwara
 Paul Vixie
Name:draft-ietf-dnsop-avoid-fragmentation-17.txt
Pages:   14
Dates:   2024-02-29




 3.1. Recommendations for UDP responders

R1. UDP responders SHOULD NOT use IPv6 fragmentation [RFC8200].


Operational impact of this recommendation is unclear.

Why? Because clients belong to several sets:
- One set clients cannot receive fragmented answers,
- another set of clients cannot use TCP to overcome unfragmented UDP 
size limitations,
- yet another set of clients actually depend on large answers to 
function (say because they DNSSEC validate, or need to follow huge NS 
sets geneated by MS AD, or they need large RRs to deliver e-mail, or ...).


It's unclear what proportion of clients belong to intersection of these 
three sets. Banning fragmentation on the **outgoing** side might break 
these clients, and it's extremely hard to measure and debug from the 
server side.




R2. Where supported, UDP responders SHOULD set IP "Don't Fragment flag (DF) 
bit" [RFC0791] on IPv4. At the time of writing, most DNS server software did not set 
the DF bit for IPv4, and many operating systems' kernels constraint make it difficult to 
set the DF bit in all cases.


E.g. on Linux socket API does not expose DF bit directly. Application 
can request DF bit to be turned on in outgoing packets but at the same 
time this implicitly enables receipt and processing of unauthenticated 
ICMP messages. These messages can be used to manipulate Path MTU records 
in the kernel and mount attacks misusing this technique.




R3. UDP responders SHOULD compose response packets that fit in the minimum of 
the offered requestor's maximum UDP payload size [RFC6891], the interface MTU, 
the network MTU value configured by the knowledge of the network operators, and 
the RECOMMENDED maximum DNS/UDP payload size 1400. (See Appendix A for more 
information.)


In practice doing syscall to determine MTU _estimate_ for every single 
peer address is impractical, and in most cases the value exposed by 
kernel is just a garbage anyway. It's more practical to assume that 
outgoing EDNS buffer size is configured to a reasonable lower bound by 
system admin.




R4. If the UDP responder detects an immediate error indicating that the UDP 
packet cannot be sent beyond the path MTU size, the UDP responder MAY recreate 
response packets fit in the path MTU size, or with the TC bit set.


Same note about MTU determination applies here. TC=1 sounds reasonable 
and does not require more guesswork or reconstructing and recompressing 
the answer packet.




R5. UDP requestors SHOULD limit the requestor's maximum UDP payload size. It 
SHOULD use a limit of 1400 bytes, but a smaller limit MAY be used. (See 
Appendix A for more information.)


Some operators have better experience with 1400, others with other 
values. We at ISC go with lower value of 1232 because it's easier to 
have conservative value which is more likely to work. Debugging this in 
production is total pain, and using a bit smaller value is in our 
limited experience not causing new issues. That's why we went with lower 
values.




R6. UDP requestors SHOULD drop fragmented DNS/UDP responses without IP 
reassembly to avoid cache poisoning attacks.


AFAIK this is impossible to do using normal socket API. The application 
has no access to information about UDP reassembly.


Having said that, even if it was implementable it's IMHO not the best 
advice for requestor.


IF the requestor is able to detect that a fragment was received then it 
would be MUCH better to trigger retry using different protocol right 
away. Just dropping the packet:

a] causes timeouts
b] leaves a time window open for another attack attempt



R7. DNS responses may be dropped by IP fragmentation. Upon a timeout, to avoid 
resolution failures, UDP requestors SHOULD retry using TCP or UDP with a 
smaller EDNS requestor's maximum UDP payload size per local policy. UDP 
requestors SHOULD observe [RFC8961] in setting their timeout.


Problem:
There is no indication if timeout was caused by fragmentation - it might 
have been caused by other factors. The server might be simply dead.


Server selection algorithm in DNS is currently undefined and each 
implementation has it's own retry strategy. TCP might or might not be 
first choice. I don't see compelling reason why this should be prescribed.



Proposed change - replace current text with:

DN