Thanks very much for the review, Mukund! Puneet has already incorporated the editorial feedback into the GitHub copy.
Mukund Sivaraman writes: >> "It is predicated on the observation that authoritative server >> unavailability can cause outages even when the underlying data >> those servers would return is typically unchanged." > > While reading this, I wonder whether the last sentence was meant to be > written as: "It is predicated on the observation that zone data returned > by authoritative nameservers during a resolver refresh would typically > be unchanged." I agree that it's a reasonable rephrasing, and I'm wondering if you see that there's a practical difference. Not that I'm particularly wed to the original text, just wondering if I'm missing something either in grammar or semantics that makes the rewrite superior. > > issues, and so on. If the recursive server is unable to contact the > > authoritative servers for a name but still has relevant data that has > > s/for a name/for a query/ Puneet made this change, but I'll observe that since names are how authoritative servers get delegated, I think "servers for a name" is an acceptable, natural way to refer to the process. > It is a curious thing why it was decided that [TTL] values with the > high order bit set are not clamped but set to zero. Possibly because > it can be thought that such high values are bogus and assumed to be > made in error, and so a resolver should attempt to re-query such > records instead of caching them for a very long time. OTOH, one can > think the same of a TTL=2147483647 answer too. :P Yeah, I don't know the reasoning and haven't searched the dnsext archives to see if there was discussion on it back in the mid-90s when the treat-it-as-zero clarification was decided. Neither zero not 2^31 seem like good ideas really, and the issue will be in the presentation to dnsop today. > This option seems to me too complicated to generate, parse and make use > of. RRs are re-ordered very late during message rendering in some DNS > implementations, and updating this syntax in the EDNS option just looks > too painful. It does not appear parsing (by resolvers) will be easy > either, and whether this fine granularity in determining staleness is > generally useful. I do agree that it is somewhat more complicated, but I'm not sure about "too complicated". My thinking when I first offered it is that if I'm using an option for diagnostic purposes, I want explicit information returned. So for something like "dig +any +edns-stale example.com" (or whatever) when I'm debugging, I can count off the indices well enough. I would not expect automated systems that receive the option to really care much about specifically what RRSets were stale, if they were concerned about staleness at all. And if they do care ... they can count off the indices well enough too. Here's another fun idea, to specifically identify the relevant parts of the message: name compression pointers! Okay, okay ... yeah it makes me feel a little skeevy too. On the other hand you could also iterate over each name/type with either recursion disabled or the other EDNS option, so there's that way of dealing with diagnostics too. > Would it be better to limit fetches by the resolver for 30 seconds, > while still returning TTL=0 answers? It's an interesting though, but besides my general wariness of TTL 0 records I do note that this would mean keeping more state in the resolver than it currently has to keep, and when I think about how I'd implement that in BIND at least there's a fair bit of complexity there. > Do all implementations mentioned earlier supporting the idea of this > draft attempt to refresh stale data before serving it? Does this draft > prescribe if resolvers SHOULD/MUST do so? Because the two approaches > result in quite different behaviors. To the best of my knowledge, no, though hopefully I've missed news on an update to Unbound. My understanding about how Unbound does it's feature is that it's basically "shoot first and ask questions later". That is, I think it'll use any data from the cache first, stale or not, before trying the resolution. That's part of the reason that I wrote explicitly in the draft about honoring the intent of the TTL and using stale data only in unusual circumstances. > I think some implementations of this draft do not implement the client > response timer, and so waiting for the query resolution timer (which may > be a large duration) may result in application getaddrinfo() timeouts. That would circumvent pretty much the whole intent to add resiliency for clients waiting for answers. _______________________________________________ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop