On Mar 2, 2011, at 1:20 PM, Kevin Darcy wrote:

On 3/2/2011 10:34 AM, David Sparro wrote:


On 3/1/2011 5:27 PM, Kevin Darcy wrote:
See my other post. This is designed-in behavior for Cisco GSSes, since
there is no "service unavailable, try again later" RCODE.


When the question is "what is the ip address of 'foo'" an answer of "the web server is down" in nonsensical.

Hmmm... matter of perspective I suppose. Load-balancer architecture sees DNS as just the externally-visible portion of a whole subsystem. The SERVFAIL, in their view, does not communicate a DNS problem _per_se_, but a problem with the whole subsystem. It's more of a "what you're trying to get to is unavailable right now" message, communicated, in their view, _through_ DNS (as a sort of conduit), not necessarily _about_ DNS. They don't see it as specifically meaning "I've got a DNS problem".

But, everyone else *will*.


I'm not saying I agree with this perspective, only that I've dealt with load-balancer vendors enough (Cisco in particular) to understand that this is where they're coming from.

Besides, what alternative is there? If the load-balancer returns an address that it knows to not be working, then it's purposely causing the client to go into a relatively-slow connection-timeout failure mode. Is that responsible behavior? If it gives a "normal" response that is lacking answer information (NODATA, NXDOMAIN), then this response gets negatively cached, and the negative cache entry may delay clients from re-trying the resource even after it recovers. So, what's left? NOTIMP? FORMERR? REFUSED? NOTAUTH? Those aren't any better than SERVFAIL from a strictly functional perspective, and are even more misleading and confusing with respect to the real source of the problem.

A few options:
1: once the LB knows that all back-ends are down, it can continue to answer with the correct A, but drop the TTL to be much shorter -- this allows things to recover faster. 2: have the LB itself serve a 'sorry' page -- the ability to serve static content locally should be simple, but if it not able to do so it can always return a set of 'sorry' servers optimized for this purpose.

You shouldn't be breaking both your serving *and* 'sorry' backends often enough for there to be special handling needed (and, if you are, you shouldn't make things worse by making other folk waste their time debugging your problem).

W


                                                                                
                                                                                
                                                   - Kevin


_______________________________________________
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


--
I had no shoes and wept. Then I met a man who had no feet. So I said, "Hey man, got any shoes you're not using?"


_______________________________________________
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Reply via email to