On 3/2/2011 1:57 PM, David Sparro wrote:
On 3/2/2011 1:20 PM, Kevin Darcy wrote:

I'm not saying I agree with this perspective, only that I've dealt with
load-balancer vendors enough (Cisco in particular) to understand that
this is where they're coming from.

Besides, what alternative is there? If the load-balancer returns an
address that it knows to not be working, then it's purposely causing the
client to go into a relatively-slow connection-timeout failure mode. Is
that responsible behavior?

Short answer: yes. The DNS side of the load-balancer has does't know why it got the query. Maybe I was trying to ping the endpoint, I could have been trying to make an FTP connection, or HTTPS, etc. In order for it to be consistent, it would have to be able to figure out that a SERVFAIL should be returned for the query from my gopher:// connection, but an IP should be returned for http://.
That's an implementation decision. If an implementor decides to run a bunch of disparate services under a single FQDN (as opposed to, say, www.example.com/ftp.example.com/gopher.example.com and so forth), then they'd need to come up with a reasonable way with their load-balancer keepalives to decide when the whole thing is "down" or not. If the vast majority of their traffic is web-based (typical), they may choose to call the whole thing "down" if the web part is down, and the other parts (FTP, gopher, whatever) will just have to suffer. That's the price to be paid for the convenience of having a single name for a bunch of different services -- lack of granularity.

Things would be better, of course, if clients used SRV records for accessing resources -- then a single "service" name could be differentiated by protocol. But for whatever reason client software authors have not, by and large, embraced this idea.

If it gives a "normal" response that is
lacking answer information (NODATA, NXDOMAIN), then this response gets
negatively cached, and the negative cache entry may delay clients from
re-trying the resource even after it recovers. So, what's left? NOTIMP?
FORMERR? REFUSED? NOTAUTH? Those aren't any better than SERVFAIL from a
strictly functional perspective, and are even more misleading and
confusing with respect to the real source of the problem.

SERVFAIL caching is coming to a BIND server release this year. (I listened to the BIND 9.8 features webinar this morning. I don't remember which version (9.9 or 9.10) had this attached to it on the What's Next slide.)

I think Mark has the right approach: return a "special" address (e.g. 0.0.0.0 or the IPv6 equivalent) in this situation, instead of messing with the RCODE.

- Kevin


_______________________________________________
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Reply via email to