Re: [DNSOP] I-D Action: draft-ietf-dnsop-nxdomain-cut-01.txt

Paul Vixie Tue, 15 Mar 2016 18:08:05 -0700


Ted Lemon wrote:

this is getting pretty good. anyone who stopped reading before now, may
want to delve back in at this point.


I on the other hand am a little frustrated because a while back I
thought we agreed, and now it appears that we don't.


i wrongly remembered this as an optimization rather than a clarification.

You can't really purge lazily. TTL or LRU are your best bets. Purging
lazily requires you to take the performance hit John and I were
arguingabout. Sure, it can be done, but ouch.

if the purge burden is one label-stripping cache management action pertransmitted response, then the cost is small compared to the network i/ocost. even given your live-lock observation, this level ofimplementation detail is not something that dns as a system notices.

however, back to the topic at hand, the implementation costs are not
germane to system correctness. if nominum CNS and/or nlnetlabs Unbound
can't implement the clarified nxdomain semantics, then they just won't.
your customers and your investors can decide what this means to them.


This is the core of our disagreement, I think. You appear now to be
saying that system correctness demands that a cache purge subdomains of
an NXDOMAIN.  ...

no. it requires the effect of a purge, or "an effective purge". as longas what happens on the wire is the same as what would have happened inthe event of an actual purge, then whether you actually purged or notmakes no difference to dns at the system level. besides which, thenominum and nlnetlabs software teams are at the top echelon of thefield, and i think they will be better than i at designing their code.

... That simply isn't true. An answer whose TTL hasn't expired is a
valid answer. A caching server that responds to a query for a cached
name with the data that was cached is operating correctly. ...

two days ago i might have agreed with you. but putting resimprove-00into the blender and making me drink it has made me obstreperous asfollows: if an answer of <name,type=1,rrset> is in cache and you hear aquestion for <name,type=2> and forward it and you then hear nxdomain,then would you purge the <name,type=1,rrset> element when caching thenxdomain for <name>, or would you retain the old type=1 element, suchthat when the nxdomain expires, you could once again answer questionsfor <name,type=1> using the <name,type=1,rrset> you started out with?

if so, then we should talk about what best efforts means when applied todistributed cache coherency. if not, then we should talk about why youwould treat previously cached <sub.name,type,rrset> values, since themeaning of the nxdomain in the context of <sub.name> is the same as themeaning of the nxdomain in the context of <name>. in other words, let'sfirst agree on what exactly it is that we disagree about.

inconsistency is in that sense a known hazard, but not a benefit. we
call the system "best efforts" because we know it won't be consistent
but we want everyone involved to do their best anyway.


It's true that inconsistency is not a benefit, but it's standard
operating procedure, and we have to deal with it every day.

well, so is spam, and so are outages, but we do all we can reasonably doto minimize those activities. i'm not saying minimize at all imaginablecosts, of course. engineering economics says minimize at all reasonableor in-budget costs. where a cost would require upgrading somebody else'snetwork, we say that's not in-budget. where a cost would require addinglogic to our own product or patching or upgrading technology in our ownnetwork, we can at least argue as to what the budget should be.

saying that the dns specification ought not specify visible on-the-wirebehaviour by a recursive server after it receives an nxdomain, becausethat will make life harder for recursive servers whose caches areimplemented as flat hash tables, when there are extant recursive serverswho don't use flat hash tables and thus see no special burden, wouldstrike me as at best "odd." since the specification as written alreadyimplies this behavious for subdomains, and is universally alreadyimplemented for domains, you are arguing from a position of weakness:you're asking for lassitude due to code you already deployed, and toaccommodate you, we would have to loosen, not merely clarify, thespecification.

... One reason we have to deal with it is that, quite annoyingly, DNS
caches cache NXDOMAINs.   Ping a nonexistent host, add its name to
the authoritative server, and ping it again, and you will get a
"host unknown" error both times, because the NXDOMAIN from the first
ping attempt was cached.   This is bad behavior that we have codified
into a standard because it improves performance.

since it is identically true of positive cache elements that they can beannoying since they allow use of old data even after the authority haschanged its zone contents, i must be missing the point, because i do notsee this as "bad behavior". the system was designed to operate at scale,and it has. if you can design system with equal or better scale that hasbetter coherency, i promise to review your work and comment on it. beaware that trying to remember who heard what so that you can send themchange notifications asynchronously requires even morecrypto-authentication work than dnssec, which is Hard To Deploy.

So if you want to tell me that, for the sake of correctness, I have
to go purge entries out of my hashed cache because I got an NXDOMAIN,
then I will tell you for the sake of correctness that we should never
cache NXDOMAINs, and we will both glare at each other sternly until
one of us cracks a smile.  ...

negative caching is like response rate limiting: we can't operate thenetwork at current or projected scale without them. the incorrectness(specifically: the incoherence) we allow due to caching (of bothpositive and negative elements) was a trade-off for scale. if you want adifferent trade-off, please propose one, being aware that i'll bereviewing its scaling properties.

caching is in some sense optional, since authority servers can send lowTTL's (usually 1 second, since 0 is badly implemented by a lot of theolder servers i wrote which are still in production use.) an authorityserver can also limit negative caching intervals by transmitting theirSOA with negative responses, and setting SOA.MINIMUM to a low value.

i don't advise turning off caching; i'm simply saying that if you wantto turn it off, you can do so, either by transmitting low-cacheabilitydata from an authority, or by not implementing a cache in a recursive.

note that i'm not requiring that you actually purge, only that youeffectively purge. i don't know if that changes your appreciation.


i cracked a smile at about 2am. thanks for that.

--
P Vixie

_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] I-D Action: draft-ietf-dnsop-nxdomain-cut-01.txt

Reply via email to