Re: [DNSOP] A question on values in draft-dnsop-caching-resolution-failures
Duane/Evan/Mukund/All, What do feel is the consensus on lowering the value to 1 second ? >From the previous suggested text: Resolvers MUST cache resolution failures for at least 1 second. The initial duration SHOULD be configurable by the operator. A longer cache duration for resolution failures will reduce the processing burden from repeated queries, but will also lengthen the recovery period from transitory issues. It does sound like this paragraph works better. Resolvers SHOULD employ an exponential or linear backoff algorithm to increase the amount of time for subsequent resolution failures. For example, the initial time for negatively caching a resolution failure is set to 5 seconds. The time is increased after each retry that results in another resolution failure. Consistent with [RFC2308], resolution failures MUST NOT be cached for longer than 5 minutes. May we get some feedback on this? thanks tim On Mon, Jul 24, 2023 at 11:41 AM Evan Hunt wrote: > On Mon, Jul 24, 2023 at 06:26:46PM +, Wessels, Duane wrote: > > It was not our intention that “2” would be the only possible exponent in > > the backoff algorithm. Would this slightly revised text be more > > agreeable? > > > >Resolvers SHOULD employ an exponential or linear backoff algorithm to > >increase the amount of time for subsequent resolution failures. For > >example, the initial time for negatively caching a resolution failure > >is set to 5 seconds. The time is increased after each retry that > >results in another resolution failure. Consistent with [RFC2308], > >resolution failures MUST NOT be cached for longer than 5 minutes. > > That's definitely an improvement, yes. > > -- > Evan Hunt -- e...@isc.org > Internet Systems Consortium, Inc. > ___ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop
Re: [DNSOP] A question on values in draft-dnsop-caching-resolution-failures
On Mon, Jul 24, 2023 at 06:26:46PM +, Wessels, Duane wrote: > It was not our intention that “2” would be the only possible exponent in > the backoff algorithm. Would this slightly revised text be more > agreeable? > >Resolvers SHOULD employ an exponential or linear backoff algorithm to >increase the amount of time for subsequent resolution failures. For >example, the initial time for negatively caching a resolution failure >is set to 5 seconds. The time is increased after each retry that >results in another resolution failure. Consistent with [RFC2308], >resolution failures MUST NOT be cached for longer than 5 minutes. That's definitely an improvement, yes. -- Evan Hunt -- e...@isc.org Internet Systems Consortium, Inc. ___ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop
Re: [DNSOP] A question on values in draft-dnsop-caching-resolution-failures
Evan, > On Jul 24, 2023, at 10:34 AM, Evan Hunt wrote: > > The original text says a series of seven resolution failures would increase > the duration before a retry to five minutes: 5 seconds to 10 to 20 to 40 to > 80 to 160 to 300. Lowering the starting value to one second means it would > take nine failures to reach 300. > It was not our intention that “2” would be the only possible exponent in the backoff algorithm. Would this slightly revised text be more agreeable? Resolvers SHOULD employ an exponential or linear backoff algorithm to increase the amount of time for subsequent resolution failures. For example, the initial time for negatively caching a resolution failure is set to 5 seconds. The time is increased after each retry that results in another resolution failure. Consistent with [RFC2308], resolution failures MUST NOT be cached for longer than 5 minutes. DW ___ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop
Re: [DNSOP] A question on values in draft-dnsop-caching-resolution-failures
On Mon, Jul 24, 2023 at 10:00:37AM +0530, Mukund Sivaraman wrote: > When seeing prescriptive text, implementors often wants to know the > rationale behind it. If the value of 5 is changed to 1, please mention > and have the authors include in the document why the lower limit is > 1s. Is it an arbitrary change? Is this change based on the default value > of BIND's servfail-ttl named.conf option? Yes, it is. For background: BIND implemented a SERVFAIL cache in 2014 with a default cache duration of 10 seconds; after a slew of complaints, in 2015 we lowered it to 1 second, and also reduced the configurable maximum from 5 minutes to 30 seconds. The reason was that certain common failure conditions are transitory, and it's not unreasonable to prioritize rapid recovery. Now, to be clear, the comparison isn't exactly apples to apples: the BIND SERVFAIL cache is a somewhat stupider mechanism than the one outlined in the draft. It caches *all* SERVFAIL responses, regardless of the reason they were generated. For example: when the cache is cold, a query may time out or hit DDoS mitigation limits before it's finished getting through the whole iteration process; an immediate retry would start further along the delegation chain and would succeed. Such problems weren't noticeable until we implemented the 10-second cache, but became very noticeable afterward. If we were able to selectively cache *only* those SERVFAILs that are unlikely to recover soon, then five seconds might indeed be a good starting point. But, with our relatively dumb cache, we found that one second did a fairly good job reducing the processing burden from repeated queries, and eliminated the user complaints about the resolver taking forever to recover from short-lived problems. It's been working well enough that it hasn't been a priority to develop a more complex failure cache. In any case, even with the assumption that future implementations *will* have better selectiveness, I'm leery of using 5 seconds as hard minimum in an RFC. I think it's likely that some operators will find that excessive and want the option to tune it to a lower value. Also, if you *are* doing exponential backoff, then two failures in a row will get your duration up to 4 seconds anyway, so the difference between starting at 1 and starting 5 isn't really all that significant. > > * Note that the original text has this as SHOULD. I've heard reasons for > > both SHOULD and MAY. > > What are these reasons? I suggested MAY because I think exponential backoff is a pretty specific (and rather aggressive) approach to cache timing, and I'm not entirely comfortable with it having the almost-mandatory force of a SHOULD. The original text says a series of seven resolution failures would increase the duration before a retry to five minutes: 5 seconds to 10 to 20 to 40 to 80 to 160 to 300. Lowering the starting value to one second means it would take nine failures to reach 300. IMHO, keeping the recovery period flat, or increasing it linearly (5, 10, 15, etc), could also be operationally reasonable choices, so I'm not sure why we need to be so emphatic about *this* particular backoff strategy in the RFC. I have no objection to mentioning it, but it felt like a MAY to me. It's a mild preference though, and if I'm the only one who feels that way, I won't argue about it further. -- Evan Hunt -- e...@isc.org Internet Systems Consortium, Inc. ___ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop
Re: [DNSOP] A question on values in draft-dnsop-caching-resolution-failures
Hi Tim On Sun, Jul 23, 2023 at 09:00:58PM -0700, Tim Wicinski wrote: > There was some operational feedback that suggests 1 second is also > a very reasonable value here. With some discussion, here is some > suggested text: > > Resolvers MUST cache resolution failures for at least 1 second. When seeing prescriptive text, implementors often wants to know the rationale behind it. If the value of 5 is changed to 1, please mention and have the authors include in the document why the lower limit is 1s. Is it an arbitrary change? Is this change based on the default value of BIND's servfail-ttl named.conf option? Sometimes the reason for decisions is found in the mailing list archives, but not always. > The initial duration SHOULD be configurable by the operator. A [snip] > * Note that the original text has this as SHOULD. I've heard reasons for > both SHOULD and MAY. What are these reasons? Mukund signature.asc Description: PGP signature ___ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop
Re: [DNSOP] A question on values in draft-dnsop-caching-resolution-failures
Duane On Sun, Jul 23, 2023 at 9:20 PM Wessels, Duane wrote: > Tim, > > You said you received some operational feedback. I wonder if it would be > appropriate to add this operational (or implementation?) feedback to the > (currently empty) Implementation Status section that Peter van Dijk > suggested we add, in his DNS directorate review? > This seems very reasonable. I'll work with them on this. > I’m not necessarily opposed to reducing the minimum caching time from 5 to > 1, especially if we can document valid reasons for doing so. However, I do > think it is going a bit to far to weaken both the minimum caching time and > the requirement level for exponential backoff. So I would really argue to > keep the SHOULD in the second paragraph. > > Alternatively, we might consider something like 5 seconds without an > exponential backoff implementation OR an initial 1 second cache time with > an exponential backoff. > My first opinion is to keep the SHOULD, and let folks speak up if they feel this is wrong. I liked the way you worded the timeout range in section 3.1. I do think/feel the value should be configurable by the operator. thanks tim > DW > > > > > On Jul 23, 2023, at 9:00 PM, Tim Wicinski wrote: > > > > > > > > All, > > > > We had a discussion this morning during the hackathon about a value with > > the document caching-resolution-failures. The current text in 3.2 says: > > > > Resolvers MUST cache resolution failures for at least 5 seconds. The > > value of 5 seconds is chosen as a reasonable amount of time that an > > end user could be expected to wait. > > > > Resolvers SHOULD employ an exponential backoff algorithm to increase > > the amount of time for subsequent resolution failures. For example, > > the initial time for negatively caching a resolution failure is set > > to 5 seconds. The time is doubled after each retry that results in > > another resolution failure. Consistent with [RFC2308], resolution > > failures MUST NOT be cached for longer than 5 minutes. > > > > > > There was some operational feedback that suggests 1 second is also > > a very reasonable value here. With some discussion, here is some > > suggested text: > > > > Resolvers MUST cache resolution failures for at least 1 second. > > The initial duration SHOULD be configurable by the operator. A > > longer cache duration for resolution failures will reduce the > > processing burden from repeated queries, but will also lengthen > > the recovery period from transitory issues. > > > > Resolvers MAY* employ an exponential backoff algorithm to increase > > the cache duration when resolution failures are persistent. For > > example, the initial time for negatively caching a resolution > > failure could be set to 5 seconds, and doubled after each retry > > that results in another resolution failure, up to a configurable > > maximum. > > > > Consistent with [RFC2308], resolution failures MUST NOT be cached > > for longer than 5 minutes. > > --- > > > > * Note that the original text has this as SHOULD. I've heard reasons for > both SHOULD and MAY. > > > > We'd like to hear from the working group on this value, and what the > working group thinks of this change > > > > thanks > > tim > > > > ___ > > DNSOP mailing list > > DNSOP@ietf.org > > > https://secure-web.cisco.com/1EOBeLhMBEWg1uxqfTYxtUCMTcEb3F3FEA2EO7c3JOioTtVNfCLJH16XnnbuotVr49ldBsx_KxI4Vx5CjDqNuYdQ17vtalwP-jShq2peErxec4rVO5LJ33FG2rYySJ-hZugq-0SR7DVGxYLZEl-uJBfoRv8Zktrm5CSMGpC4jjfksy9itIXwMXbnVKRQ8qOV2E-xDb5PqUtQMLBambGxjnlXoTHtQl2dqFRx1kA7Tyg6-9vnpU5kAoRVbl_5ghCwqXM4Go0HV4s-Z-P0vPvWnuXP40ATm_rhOsymJUvwkppy58V9UrsCxC81vA7ic1gIe/https%3A%2F%2Fwww.ietf.org%2Fmailman%2Flistinfo%2Fdnsop > > ___ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop
Re: [DNSOP] A question on values in draft-dnsop-caching-resolution-failures
Tim, You said you received some operational feedback. I wonder if it would be appropriate to add this operational (or implementation?) feedback to the (currently empty) Implementation Status section that Peter van Dijk suggested we add, in his DNS directorate review? I’m not necessarily opposed to reducing the minimum caching time from 5 to 1, especially if we can document valid reasons for doing so. However, I do think it is going a bit to far to weaken both the minimum caching time and the requirement level for exponential backoff. So I would really argue to keep the SHOULD in the second paragraph. Alternatively, we might consider something like 5 seconds without an exponential backoff implementation OR an initial 1 second cache time with an exponential backoff. DW > On Jul 23, 2023, at 9:00 PM, Tim Wicinski wrote: > > > > All, > > We had a discussion this morning during the hackathon about a value with > the document caching-resolution-failures. The current text in 3.2 says: > > Resolvers MUST cache resolution failures for at least 5 seconds. The > value of 5 seconds is chosen as a reasonable amount of time that an > end user could be expected to wait. > > Resolvers SHOULD employ an exponential backoff algorithm to increase > the amount of time for subsequent resolution failures. For example, > the initial time for negatively caching a resolution failure is set > to 5 seconds. The time is doubled after each retry that results in > another resolution failure. Consistent with [RFC2308], resolution > failures MUST NOT be cached for longer than 5 minutes. > > > There was some operational feedback that suggests 1 second is also > a very reasonable value here. With some discussion, here is some > suggested text: > > Resolvers MUST cache resolution failures for at least 1 second. > The initial duration SHOULD be configurable by the operator. A > longer cache duration for resolution failures will reduce the > processing burden from repeated queries, but will also lengthen > the recovery period from transitory issues. > > Resolvers MAY* employ an exponential backoff algorithm to increase > the cache duration when resolution failures are persistent. For > example, the initial time for negatively caching a resolution > failure could be set to 5 seconds, and doubled after each retry > that results in another resolution failure, up to a configurable > maximum. > > Consistent with [RFC2308], resolution failures MUST NOT be cached > for longer than 5 minutes. > --- > > * Note that the original text has this as SHOULD. I've heard reasons for both > SHOULD and MAY. > > We'd like to hear from the working group on this value, and what the working > group thinks of this change > > thanks > tim > > ___ > DNSOP mailing list > DNSOP@ietf.org > https://secure-web.cisco.com/1EOBeLhMBEWg1uxqfTYxtUCMTcEb3F3FEA2EO7c3JOioTtVNfCLJH16XnnbuotVr49ldBsx_KxI4Vx5CjDqNuYdQ17vtalwP-jShq2peErxec4rVO5LJ33FG2rYySJ-hZugq-0SR7DVGxYLZEl-uJBfoRv8Zktrm5CSMGpC4jjfksy9itIXwMXbnVKRQ8qOV2E-xDb5PqUtQMLBambGxjnlXoTHtQl2dqFRx1kA7Tyg6-9vnpU5kAoRVbl_5ghCwqXM4Go0HV4s-Z-P0vPvWnuXP40ATm_rhOsymJUvwkppy58V9UrsCxC81vA7ic1gIe/https%3A%2F%2Fwww.ietf.org%2Fmailman%2Flistinfo%2Fdnsop ___ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop
[DNSOP] A question on values in draft-dnsop-caching-resolution-failures
All, We had a discussion this morning during the hackathon about a value with the document caching-resolution-failures. The current text in 3.2 says: Resolvers MUST cache resolution failures for at least 5 seconds. The value of 5 seconds is chosen as a reasonable amount of time that an end user could be expected to wait. Resolvers SHOULD employ an exponential backoff algorithm to increase the amount of time for subsequent resolution failures. For example, the initial time for negatively caching a resolution failure is set to 5 seconds. The time is doubled after each retry that results in another resolution failure. Consistent with [RFC2308], resolution failures MUST NOT be cached for longer than 5 minutes. There was some operational feedback that suggests 1 second is also a very reasonable value here. With some discussion, here is some suggested text: Resolvers MUST cache resolution failures for at least 1 second. The initial duration SHOULD be configurable by the operator. A longer cache duration for resolution failures will reduce the processing burden from repeated queries, but will also lengthen the recovery period from transitory issues. Resolvers MAY* employ an exponential backoff algorithm to increase the cache duration when resolution failures are persistent. For example, the initial time for negatively caching a resolution failure could be set to 5 seconds, and doubled after each retry that results in another resolution failure, up to a configurable maximum. Consistent with [RFC2308], resolution failures MUST NOT be cached for longer than 5 minutes. --- * Note that the original text has this as SHOULD. I've heard reasons for both SHOULD and MAY. We'd like to hear from the working group on this value, and what the working group thinks of this change thanks tim ___ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop