Ted Lemon wrote:
this is getting pretty good. anyone who stopped reading before now, may
want to delve back in at this point.

I on the other hand am a little frustrated because a while back I
thought we agreed, and now it appears that we don't.

i wrongly remembered this as an optimization rather than a clarification.

You can't really purge lazily. TTL or LRU are your best bets. Purging
lazily requires you to take the performance hit John and I were
arguingabout. Sure, it can be done, but ouch.

if the purge burden is one label-stripping cache management action per transmitted response, then the cost is small compared to the network i/o cost. even given your live-lock observation, this level of implementation detail is not something that dns as a system notices.

however, back to the topic at hand, the implementation costs are not
germane to system correctness. if nominum CNS and/or nlnetlabs Unbound
can't implement the clarified nxdomain semantics, then they just won't.
your customers and your investors can decide what this means to them.

This is the core of our disagreement, I think. You appear now to be
saying that system correctness demands that a cache purge subdomains of
an NXDOMAIN.  ...

no. it requires the effect of a purge, or "an effective purge". as long as what happens on the wire is the same as what would have happened in the event of an actual purge, then whether you actually purged or not makes no difference to dns at the system level. besides which, the nominum and nlnetlabs software teams are at the top echelon of the field, and i think they will be better than i at designing their code.

... That simply isn't true. An answer whose TTL hasn't expired is a
valid answer. A caching server that responds to a query for a cached
name with the data that was cached is operating correctly. ...

two days ago i might have agreed with you. but putting resimprove-00 into the blender and making me drink it has made me obstreperous as follows: if an answer of <name,type=1,rrset> is in cache and you hear a question for <name,type=2> and forward it and you then hear nxdomain, then would you purge the <name,type=1,rrset> element when caching the nxdomain for <name>, or would you retain the old type=1 element, such that when the nxdomain expires, you could once again answer questions for <name,type=1> using the <name,type=1,rrset> you started out with?

if so, then we should talk about what best efforts means when applied to distributed cache coherency. if not, then we should talk about why you would treat previously cached <sub.name,type,rrset> values, since the meaning of the nxdomain in the context of <sub.name> is the same as the meaning of the nxdomain in the context of <name>. in other words, let's first agree on what exactly it is that we disagree about.

inconsistency is in that sense a known hazard, but not a benefit. we
call the system "best efforts" because we know it won't be consistent
but we want everyone involved to do their best anyway.

It's true that inconsistency is not a benefit, but it's standard
operating procedure, and we have to deal with it every day.

well, so is spam, and so are outages, but we do all we can reasonably do to minimize those activities. i'm not saying minimize at all imaginable costs, of course. engineering economics says minimize at all reasonable or in-budget costs. where a cost would require upgrading somebody else's network, we say that's not in-budget. where a cost would require adding logic to our own product or patching or upgrading technology in our own network, we can at least argue as to what the budget should be.

saying that the dns specification ought not specify visible on-the-wire behaviour by a recursive server after it receives an nxdomain, because that will make life harder for recursive servers whose caches are implemented as flat hash tables, when there are extant recursive servers who don't use flat hash tables and thus see no special burden, would strike me as at best "odd." since the specification as written already implies this behavious for subdomains, and is universally already implemented for domains, you are arguing from a position of weakness: you're asking for lassitude due to code you already deployed, and to accommodate you, we would have to loosen, not merely clarify, the specification.

... One reason we have to deal with it is that, quite annoyingly, DNS
caches cache NXDOMAINs.   Ping a nonexistent host, add its name to
the authoritative server, and ping it again, and you will get a
"host unknown" error both times, because the NXDOMAIN from the first
ping attempt was cached.   This is bad behavior that we have codified
into a standard because it improves performance.

since it is identically true of positive cache elements that they can be annoying since they allow use of old data even after the authority has changed its zone contents, i must be missing the point, because i do not see this as "bad behavior". the system was designed to operate at scale, and it has. if you can design system with equal or better scale that has better coherency, i promise to review your work and comment on it. be aware that trying to remember who heard what so that you can send them change notifications asynchronously requires even more crypto-authentication work than dnssec, which is Hard To Deploy.

So if you want to tell me that, for the sake of correctness, I have
to go purge entries out of my hashed cache because I got an NXDOMAIN,
then I will tell you for the sake of correctness that we should never
cache NXDOMAINs, and we will both glare at each other sternly until
one of us cracks a smile.  ...

negative caching is like response rate limiting: we can't operate the network at current or projected scale without them. the incorrectness (specifically: the incoherence) we allow due to caching (of both positive and negative elements) was a trade-off for scale. if you want a different trade-off, please propose one, being aware that i'll be reviewing its scaling properties.

caching is in some sense optional, since authority servers can send low TTL's (usually 1 second, since 0 is badly implemented by a lot of the older servers i wrote which are still in production use.) an authority server can also limit negative caching intervals by transmitting their SOA with negative responses, and setting SOA.MINIMUM to a low value.

i don't advise turning off caching; i'm simply saying that if you want to turn it off, you can do so, either by transmitting low-cacheability data from an authority, or by not implementing a cache in a recursive.

note that i'm not requiring that you actually purge, only that you effectively purge. i don't know if that changes your appreciation.

i cracked a smile at about 2am. thanks for that.

--
P Vixie

_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Reply via email to