Yesterday FB properties incident demonstrated that some ISPs/Enterprises Full-service resolvers farms are better prepared to handle the situation and not impact resolution services for all the other domains.
We do have documents to explain BCP to deal with negative answers and how to select secondaries for your domains but not that I know of how to handle of what I'm describing as the Perfect Storm of Pending Upstream Queries. I do expect that on the software architecture this is something related to how to join outstanding upstream queries and how to cache total timeout of auth-servers. On the operational side I know that some operators opted as a hopelessness hack to configure auth zones for dealing with the situation. Anyway I think that even though the incident was not DNS related "We", as the DNS community, could probably do better in future events. I would like to start a discussion or to hear implenters and operators of Full-service resolvers on what would be the best software architecture or best current configuration practice to handle a traffic pattern when a very popular name enters a scenario were all the auth-servers are timing-out or network unreachable. []s Fred _______________________________________________ dns-operations mailing list [email protected] https://lists.dns-oarc.net/mailman/listinfo/dns-operations
