On 3/2/2011 1:57 PM, David Sparro wrote:
On 3/2/2011 1:20 PM, Kevin Darcy wrote:
I'm not saying I agree with this perspective, only that I've dealt with
load-balancer vendors enough (Cisco in particular) to understand that
this is where they're coming from.
Besides, what alternative is there? If the load-balancer returns an
address that it knows to not be working, then it's purposely causing the
client to go into a relatively-slow connection-timeout failure mode. Is
that responsible behavior?
Short answer: yes. The DNS side of the load-balancer has does't know
why it got the query. Maybe I was trying to ping the endpoint, I
could have been trying to make an FTP connection, or HTTPS, etc. In
order for it to be consistent, it would have to be able to figure out
that a SERVFAIL should be returned for the query from my gopher://
connection, but an IP should be returned for http://.
That's an implementation decision. If an implementor decides to run a
bunch of disparate services under a single FQDN (as opposed to, say,
www.example.com/ftp.example.com/gopher.example.com and so forth), then
they'd need to come up with a reasonable way with their load-balancer
keepalives to decide when the whole thing is "down" or not. If the vast
majority of their traffic is web-based (typical), they may choose to
call the whole thing "down" if the web part is down, and the other parts
(FTP, gopher, whatever) will just have to suffer. That's the price to be
paid for the convenience of having a single name for a bunch of
different services -- lack of granularity.
Things would be better, of course, if clients used SRV records for
accessing resources -- then a single "service" name could be
differentiated by protocol. But for whatever reason client software
authors have not, by and large, embraced this idea.
If it gives a "normal" response that is
lacking answer information (NODATA, NXDOMAIN), then this response gets
negatively cached, and the negative cache entry may delay clients from
re-trying the resource even after it recovers. So, what's left? NOTIMP?
FORMERR? REFUSED? NOTAUTH? Those aren't any better than SERVFAIL from a
strictly functional perspective, and are even more misleading and
confusing with respect to the real source of the problem.
SERVFAIL caching is coming to a BIND server release this year. (I
listened to the BIND 9.8 features webinar this morning. I don't
remember which version (9.9 or 9.10) had this attached to it on the
What's Next slide.)
I think Mark has the right approach: return a "special" address (e.g.
0.0.0.0 or the IPv6 equivalent) in this situation, instead of messing
with the RCODE.
- Kevin
_______________________________________________
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users