Ted Lemon wrote:
this is getting pretty good. anyone who stopped reading before now, may
want to delve back in at this point.
I on the other hand am a little frustrated because a while back I
thought we agreed, and now it appears that we don't.
i wrongly remembered this as an optimization rather than a clarification.
You can't really purge lazily. TTL or LRU are your best bets. Purging
lazily requires you to take the performance hit John and I were
arguingabout. Sure, it can be done, but ouch.
if the purge burden is one label-stripping cache management action per
transmitted response, then the cost is small compared to the network i/o
cost. even given your live-lock observation, this level of
implementation detail is not something that dns as a system notices.
however, back to the topic at hand, the implementation costs are not
germane to system correctness. if nominum CNS and/or nlnetlabs Unbound
can't implement the clarified nxdomain semantics, then they just won't.
your customers and your investors can decide what this means to them.
This is the core of our disagreement, I think. You appear now to be
saying that system correctness demands that a cache purge subdomains of
an NXDOMAIN. ...
no. it requires the effect of a purge, or "an effective purge". as long
as what happens on the wire is the same as what would have happened in
the event of an actual purge, then whether you actually purged or not
makes no difference to dns at the system level. besides which, the
nominum and nlnetlabs software teams are at the top echelon of the
field, and i think they will be better than i at designing their code.
... That simply isn't true. An answer whose TTL hasn't expired is a
valid answer. A caching server that responds to a query for a cached
name with the data that was cached is operating correctly. ...
two days ago i might have agreed with you. but putting resimprove-00
into the blender and making me drink it has made me obstreperous as
follows: if an answer of <name,type=1,rrset> is in cache and you hear a
question for <name,type=2> and forward it and you then hear nxdomain,
then would you purge the <name,type=1,rrset> element when caching the
nxdomain for <name>, or would you retain the old type=1 element, such
that when the nxdomain expires, you could once again answer questions
for <name,type=1> using the <name,type=1,rrset> you started out with?
if so, then we should talk about what best efforts means when applied to
distributed cache coherency. if not, then we should talk about why you
would treat previously cached <sub.name,type,rrset> values, since the
meaning of the nxdomain in the context of <sub.name> is the same as the
meaning of the nxdomain in the context of <name>. in other words, let's
first agree on what exactly it is that we disagree about.
inconsistency is in that sense a known hazard, but not a benefit. we
call the system "best efforts" because we know it won't be consistent
but we want everyone involved to do their best anyway.
It's true that inconsistency is not a benefit, but it's standard
operating procedure, and we have to deal with it every day.
well, so is spam, and so are outages, but we do all we can reasonably do
to minimize those activities. i'm not saying minimize at all imaginable
costs, of course. engineering economics says minimize at all reasonable
or in-budget costs. where a cost would require upgrading somebody else's
network, we say that's not in-budget. where a cost would require adding
logic to our own product or patching or upgrading technology in our own
network, we can at least argue as to what the budget should be.
saying that the dns specification ought not specify visible on-the-wire
behaviour by a recursive server after it receives an nxdomain, because
that will make life harder for recursive servers whose caches are
implemented as flat hash tables, when there are extant recursive servers
who don't use flat hash tables and thus see no special burden, would
strike me as at best "odd." since the specification as written already
implies this behavious for subdomains, and is universally already
implemented for domains, you are arguing from a position of weakness:
you're asking for lassitude due to code you already deployed, and to
accommodate you, we would have to loosen, not merely clarify, the
specification.
... One reason we have to deal with it is that, quite annoyingly, DNS
caches cache NXDOMAINs. Ping a nonexistent host, add its name to
the authoritative server, and ping it again, and you will get a
"host unknown" error both times, because the NXDOMAIN from the first
ping attempt was cached. This is bad behavior that we have codified
into a standard because it improves performance.
since it is identically true of positive cache elements that they can be
annoying since they allow use of old data even after the authority has
changed its zone contents, i must be missing the point, because i do not
see this as "bad behavior". the system was designed to operate at scale,
and it has. if you can design system with equal or better scale that has
better coherency, i promise to review your work and comment on it. be
aware that trying to remember who heard what so that you can send them
change notifications asynchronously requires even more
crypto-authentication work than dnssec, which is Hard To Deploy.
So if you want to tell me that, for the sake of correctness, I have
to go purge entries out of my hashed cache because I got an NXDOMAIN,
then I will tell you for the sake of correctness that we should never
cache NXDOMAINs, and we will both glare at each other sternly until
one of us cracks a smile. ...
negative caching is like response rate limiting: we can't operate the
network at current or projected scale without them. the incorrectness
(specifically: the incoherence) we allow due to caching (of both
positive and negative elements) was a trade-off for scale. if you want a
different trade-off, please propose one, being aware that i'll be
reviewing its scaling properties.
caching is in some sense optional, since authority servers can send low
TTL's (usually 1 second, since 0 is badly implemented by a lot of the
older servers i wrote which are still in production use.) an authority
server can also limit negative caching intervals by transmitting their
SOA with negative responses, and setting SOA.MINIMUM to a low value.
i don't advise turning off caching; i'm simply saying that if you want
to turn it off, you can do so, either by transmitting low-cacheability
data from an authority, or by not implementing a cache in a recursive.
note that i'm not requiring that you actually purge, only that you
effectively purge. i don't know if that changes your appreciation.
i cracked a smile at about 2am. thanks for that.
--
P Vixie
_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop