Re: that MIT paper again
Regarding both Paul's message below and Simon Walter's earlier message on this topic... Simon Walters scribed: I'm slightly concerned that the authors think web traffic is the big source of DNS, they may well be right (especially given one of the authors is talking about his own network), but my quick glance at the Two things - first, the paper breaks down the DNS traffic by the protocol that generated it - see section III C, which notes a small percentage of these lookups are related to reverse bloack-lists such as rbl.maps.vix.com -- but remember that the study was published in 2001 based upon measurements made in January and December of 2000. RBL traffic wasn't nearly the proportion of DNS queries that it is today. As the person responsible for our group's spam filtering (one mailserver among many that were measured as a part of the study), we didn't start using spamassassin until late 2001, and I believe we were one of the more aggressive spam filtering groups in our lab. Also note that they found that about 20% of the TCP connections were FTP connections, mostly to/from mirror sites hosted in our lab. Sendmail of five years ago also wasn't as aggressive about performing reverse verification of sender addresses. I asked Jaeyeon about this (we share an office), and she noted that: In our follow-up measurement study, [we found] that DNSBL related DNS lookups at CSAIL in February 2004 account for 14% of all DNS lookups. In comparison, DNSBL related traffic accounted for merely 0.4% of all DNS lookups at CSAIL in December 2000. Your question was right on the money for contemporary DNS data. The abstract doesn't mention that the TTL on NS records is found to be important for scalability of the DNS. Probably the main point Paul wants us to note. Just because the DNS in insensitive to slight changes in A record TTL doesn't mean TTL doesn't matter on other records. This is a key observation, and seems like it's definitely missing from the abstract (alas, space constraints...). They're not talking about the NS records, and they're not talking about the associated A records for _nameservers_. On Sat, Aug 07, 2004 at 04:55:00PM +, Paul Vixie scribed: here's what i've learned by watching nanog's reaction to this paper, and by re-reading the paper itself. 1. almost nobody has time to invest in reading this kind of paper. 2. almost everybody is willing to form a strong opinion regardless of that. 3. people from #2 use the paper they didn't read in #1 to justify an opinion. :) human nature. 4. folks who need academic credit will write strong self-consistent papers. 5. those papers do not have to be inclusive or objective to get published. 6. on the internet, many folks by nature think locally and act globally. 7. #6 includes manufacturers, operators, endusers, spammers, and researchers. 8. the confluence of bad science and disinterested operators is disheartening. 9. good actual policy must often fly in the face of accepted mantra. I'm not quite sure how to respond to this part (because I'm not quite sure what you meant...). It's possible that the data analyzed in the paper may not be representative of, say, commercial Internet traffic, but how is the objectivity in question? The conclusions of the paper are actually pretty consistent with what informed intuition might suggest. First: If NS records had lower TTL values, essentially all of the DNS lookup traffic observed in our trace would have gone to a root or gTLLD server, which would have increased the load on them by a factor of about five. Good NS-record caching is therefore critical to DNS scalability. and second: Most of the benefit of caching [of A records] is achieved with TTL values of only a small number of minutes. This is because most cache hits are produced by single clients looking up the same server multiple times in quick succession [...] As most operational experience can confirm, operating a nameserver for joe-random-domain is utterly trivial -- we used to (primary) a couple thousand domains on a p90 with bind 4.. As your own experience can confirm, running a root nameserver is considerably less trivial. The paper confirms the need for good TTL and caching management to reduce the load on root nameservers, but once you're outside that sphere of ~100 critical servers, the hugely distributed and heavy-tailed nature of DNS lookups renders caching a bit less effective except in those cases where client access patterns cause intense temporal correlations. -Dave -- work: [EMAIL PROTECTED] me: [EMAIL PROTECTED] MIT Laboratory for Computer Science http://www.angio.net/
Re: that MIT paper again
i wrote: wrt the mit paper on why small ttl's are harmless, i recommend that y'all actually read it, the whole thing, plus some of the references, rather than assuming that the abstract is well supported by the body. http://nms.lcs.mit.edu/papers/dns-imw2001.html here's what i've learned by watching nanog's reaction to this paper, and by re-reading the paper itself. 1. almost nobody has time to invest in reading this kind of paper. 2. almost everybody is willing to form a strong opinion regardless of that. 3. people from #2 use the paper they didn't read in #1 to justify an opinion. 4. folks who need academic credit will write strong self-consistent papers. 5. those papers do not have to be inclusive or objective to get published. 6. on the internet, many folks by nature think locally and act globally. 7. #6 includes manufacturers, operators, endusers, spammers, and researchers. 8. the confluence of bad science and disinterested operators is disheartening. 9. good actual policy must often fly in the face of accepted mantra. we now return control of your television set to you.
Re: that MIT paper again (Re: VeriSign's rapid DNS updates in .com/.net ) (longish)
On 23.07 22:30, Simon Waters wrote: The abstract doesn't mention that the TTL on NS records is found to be important for scalability of the DNS. Sic! And it is the *child* TTL that counts for most implementations.
that MIT paper again (Re: VeriSign's rapid DNS updates in .com/.net )
i'd said: wrt the mit paper on why small ttl's are harmless, i recommend that y'all actually read it, the whole thing, plus some of the references, rather than assuming that the abstract is well supported by the body. someone asked me: Would you happen to have the URL for the MIT paper? I meant to keep it to read at a latertime, but it seems I deleted the message. http://nms.lcs.mit.edu/papers/dns-imw2001.html
Re: that MIT paper again (Re: VeriSign's rapid DNS updates in .com/.net ) (longish)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 | Date: Fri, 23 Jul 2004 17:01:54 + | From: Paul Vixie [EMAIL PROTECTED] | Subject: that MIT paper again (Re: VeriSign's rapid DNS updates in .com/.net ) | |wrt the mit paper on why small ttl's are harmless, i recommend that |y'all actually read it, the whole thing, plus some of the references, |rather than assuming that the abstract is well supported by the body. | http://nms.lcs.mit.edu/papers/dns-imw2001.html I think most people are probably way too busy. I'll comment, and Paul can tell me where I am wrong or incomplete ;) I'm slightly concerned that the authors think web traffic is the big source of DNS, they may well be right (especially given one of the authors is talking about his own network), but my quick glance at the type of queries shouts to me that SMTP (and email related traffic, RBL's, etc) generate a disproportionate amount of wide area DNS traffic byte for byte of data. I would think this is one that is pretty easy to settle for specific networks. In particular I see a lot of retries generated by email servers for UBE and virus dross (in our case for upto 5 days), when human surfers have famously given up the domain as dead after the first 8 seconds. Perhaps if most people preview HTML in emails, surfing and email access to novel URI are one and the same. They conclude that the great bulk of benefit from sharing a DNS cache is obtained in the first 10 to 20 clients. Although they scale this only to 1000+ clients, maybe some NANOG members can comment if they have scaled DNS caches much bigger than this, but I suspect a lot of the scaling issues are driven by maintainance costs and reliability, since DNS doesn't generate much WAN traffic in comparison to HTTP for most people here (let's face it the root/tld owners are probably the only people who even think about bandwidth of DNS traffic). They conclude the TTL on A records isn't so crucial. The abstract doesn't mention that the TTL on NS records is found to be important for scalability of the DNS. Probably the main point Paul wants us to note. Just because the DNS in insensitive to slight changes in A record TTL doesn't mean TTL doesn't matter on other records. The paper leaves a lot of hanging question about poor performance, the number of unanswered queries, and poor latency, which I'm sure can be pinned down to the generally poor state of the DNS (both forward and especially reverse), and a few bad applications. The big difference between the places/times studied, suggests to me how the DNS performs depends a lot on what mix of questions you ask it. They suggest not passing on unqualified names would lose a lot of fluff (me I still think big caches could zone transfer . and save both traffic and, more importantly for the end users, latency, but that goes further than their proposal). Remember resolvers do various interesting things with unqualified names depending who coded them and when. The paper doesn't pass any judgement on types of lookups, but obviously not all DNS lookups are equal from the end user perspective. For example reverse DNS from HTTP server is typically done off the critical path (asynchronously), where as the same reverse lookup may be in the critical path for deciding whether to accept an email message (not that most people regard email as that time critical). Be nice to do a study classifying them along the lines of DNS lookups you wait for, DNS lookups that slow things down, DNS lookups that have to be done by Friday for the weekly statistics. Some *nix vendor(s?) should make sure loghost is in /etc/hosts or not in /etc/syslog.conf by default by the sound of it ;) As regards rapid update by Verisign - bring it on - I'm always embarassed to tell clients they may have to wait upto 12 hours for a new website in this day and age. And any errors that gets made in the initial setup takes too long to fix, I don't want to be setting up a site 3PM Friday, and having to check it Monday morning to discover some typo means it is Tuesday before it works, when in a sane world one TTL + 5 minutes is long enough. I think relying on accurate DNS information to distinguish spammers from genuine senders is at best shakey currently, the only people I can think would suffer with making it easier and quicker to create new domains would be people relying on something like SPF, but I think that just reveals issues with SPF, and the design flaws of SPF shouldn't influence how we should manage the DNS. -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Debian - http://enigmail.mozdev.org iD8DBQFBAYOEGFXfHI9FVgYRApWTAKCupO6Eo5i0QtDqEuYs5d1xgEMetgCgjFJf LQBGn1G1gsdbKlg8pagoEVM= =fu+g -END PGP SIGNATURE-
Re: that MIT paper again (Re: VeriSign's rapid DNS updates in .com/.net ) (longish)
On Fri, 23 Jul 2004 22:30:46 BST, Simon Waters [EMAIL PROTECTED] said: I think relying on accurate DNS information to distinguish spammers from genuine senders is at best shakey currently, the only people I can think would suffer with making it easier and quicker to create new domains would be people relying on something like SPF, but I think that just reveals issues with SPF, and the design flaws of SPF shouldn't influence how we should manage the DNS. Ahh.. but if SPF (complete with issues and design flaws) is widely deployed, we may not have any choice regarding whether its issues and flaws dictate the DNS management. Remember that we've seen this before - RFC2052 didn't specify a '_', RFC2782 does. And we all know where BIND's delegation-only came from pgpXkHSYEKm4D.pgp Description: PGP signature