Re: That MIT paper
> i must have misspoken. when i asked "what if 20,000 sites decreased their > cache utilization by 1% due to a general lowering of TTL's inspired by > MIT's paper" i was wondering if anyone thought that the result would be a > straight across-the-board increase in traffic at the root servers. there > are theories that say yes. other theories say it'll be higher. others, > lower. any study that fails to address these questions is worse than > useless. no. it may happen to be studying things other than the root servers. and that's ok and often useful randy
Re: That MIT paper
> > At root and gTLD servers I assume DNS traffic occupies significantly > > more than 3% of all traffic there. Still, a 1% increase remains 1%. > >Sure, but the ratio still plays out. ... i must have misspoken. when i asked "what if 20,000 sites decreased their cache utilization by 1% due to a general lowering of TTL's inspired by MIT's paper" i was wondering if anyone thought that the result would be a straight across-the-board increase in traffic at the root servers. there are theories that say yes. other theories say it'll be higher. others, lower. any study that fails to address these questions is worse than useless. -- Paul Vixie
Re: That MIT paper
On Thu, Aug 12, 2004 at 01:35:36PM +0200, Niels Bakker scribed: > > * [EMAIL PROTECTED] (David G. Andersen) [Thu 12 Aug 2004, 02:55 CEST]: > > Global impact is greatest when the resulting load changes are > > concentrated in one place. The most clear example of that is changes > > that impact the root servers. When a 1% increase in total traffic > > is instead spread among hundreds of thousands of different, relatively > > unloaded DNS servers, the impact on any one DNS server is minimal. > > And since we're talking about a protocol that variously occupies less than > > 3% of all Internet traffic, the packet count / byte count impact is > > negligible (unless it's concentrated, as happens at root and > > gtld servers). > > This doesn't make sense to me. You're saying here that a 1% increase in > average traffic is a 1% average increase in traffic. What's your point? > > if a load change is concentrated in one place how can the impact be > global? Because that point could be "critical infrastructure" (to abuse the buzzword). If a 1% increase in DNS traffic is 100,000 requests per second (this number is not indicative of anything, just an illustration), that could represent an extra request per second per nameserver -- or 7,000 more requests per second at the root. One of these is pretty trivial, and the other could be unpleasant. > At root and gTLD servers I assume DNS traffic occupies significantly > more than 3% of all traffic there. Still, a 1% increase remains 1%. Sure, but the ratio still plays out. If your total traffic due to DNS is small, then even a large (percentage) increase in DNS traffic doesn't affect your overall traffic volume, though it might hurt your nameservers. If you're a root server, doubling the DNS traffic nearly doubles total traffic volume, so in addition to DNS-specific issues, you'll also start looking at full pipes. -Dave -- work: [EMAIL PROTECTED] me: [EMAIL PROTECTED] MIT Laboratory for Computer Science http://www.angio.net/
Re: That MIT paper
I was reminded about rfc1537. Been a long time since I read that, so a good reminder. But it only deals with SOA records. And it's 11 years old (closer to 12). The topic at hand was NS records. Any other guidance? -- William Allen Simpson Key fingerprint = 17 40 5E 67 15 6F 31 26 DD 0D B9 9B 6A 15 2C 32
Re: That MIT paper
David, * [EMAIL PROTECTED] (David G. Andersen) [Thu 12 Aug 2004, 02:55 CEST]: > Global impact is greatest when the resulting load changes are > concentrated in one place. The most clear example of that is changes > that impact the root servers. When a 1% increase in total traffic > is instead spread among hundreds of thousands of different, relatively > unloaded DNS servers, the impact on any one DNS server is minimal. > And since we're talking about a protocol that variously occupies less than > 3% of all Internet traffic, the packet count / byte count impact is > negligible (unless it's concentrated, as happens at root and > gtld servers). This doesn't make sense to me. You're saying here that a 1% increase in average traffic is a 1% average increase in traffic. What's your point? if a load change is concentrated in one place how can the impact be global? How can a 1% load increase in one specific place have anything but minimal impact? At root and gTLD servers I assume DNS traffic occupies significantly more than 3% of all traffic there. Still, a 1% increase remains 1%. -- Niels.
Re: That MIT paper
Paul Vixie wrote: > > (what if a general decline in TTL's resulted from publication of That > MIT Paper?) > It's an academic paper. The best antedote would be to publish a nicely researched reply paper. Meanwhile, I'm probably one of those guilty of too large a reduction of TTLs. I remember when the example file had a TTL of 99 for NS. What's the best practice? Currently, we're using (dig result, criticism appreciated): watervalley.net.1d12h IN SOAns2.watervalley.net. hshere.watervalley.net. ( 2004081002 ; serial 4h33m20s; refresh 10M ; retry 1d12h ; expiry 1H ); minimum watervalley.net.1H IN MX10 mail.watervalley.net. watervalley.net.1H IN A 12.168.164.26 watervalley.net.1H IN NSns3.watervalley.net. watervalley.net.1H IN NSns1.ispc.org. watervalley.net.1H IN NSns2.ispc.org. watervalley.net.1H IN NSns2.watervalley.net. watervalley.net.1H IN NSns3.ispc.org. ;; ADDITIONAL SECTION: mail.watervalley.net. 1H IN A 12.168.164.3 ns1.ispc.org. 15h29m10s IN A 66.254.94.14 ns2.ispc.org. 15h29m10s IN A 199.125.85.129 ns2.watervalley.net.1D IN A 12.168.164.2 ns3.ispc.org. 15h29m10s IN A 12.168.164.102 ns3.watervalley.net.1H IN A 64.49.16.2 -- William Allen Simpson Key fingerprint = 17 40 5E 67 15 6F 31 26 DD 0D B9 9B 6A 15 2C 32
Re: That MIT paper
there are many sites and isps like mit and kaist. there are few root servers. while i care about the root servers, i presume that they are run by competent folk and certainly they are measured to death (which is rather boring from the pov of most of us). i care about isp and user site measurements. i think the study by the mit crew, which i have read a number of times, was a real service to the community. randy
Re: That MIT paper
On Wed, Aug 11, 2004 at 04:49:18PM +, Paul Vixie scribed: > what i meant by "act globally, think locally" in connection with That > MIT Paper is that the caching effects seen at mit are at best > representative of that part of mit's campus for that week, and that Totally agreed. The paper was based upon two traces, one from MIT LCS, and one from KAIST in Korea. I think that the authors understood that they were only looking at two sites, but their numbers have a very interesting story to tell -- and I think that they're actually fairly generalizable. For instance, the rather poorly-behaving example from your f-root snapshot is rather consistent with one of the findings in the paper: [Regarding root and gTLD server lookups] "...It is likely that many of these are automatically generated by incorrectly implemented or configured resolvers; for example, the most common error 'loopback' is unlikely to be entered by a user" > even a variance of 1% in caching effectiveness at MIT that's due to > generally high or low TTL's (on A, or MX, or any other kind of data) > becomes a huge factor in f-root's load, since MIT's load is only one But remember - the only TTLs that the paper was suggesting could be reduced were non-nameserver A records. You could drop those all to zero and not affect f-root's load one bit. In fairness, I think this is jumbled together with NS record caching in the paper, since most responses from the root/gTLD servers include both NS records and A records in an additional section. Global impact is greatest when the resulting load changes are concentrated in one place. The most clear example of that is changes that impact the root servers. When a 1% increase in total traffic is instead spread among hundreds of thousands of different, relatively unloaded DNS servers, the impact on any one DNS server is minimal. And since we're talking about a protocol that variously occupies less than 3% of all Internet traffic, the packet count / byte count impact is negligible (unless it's concentrated, as happens at root and gtld servers). The other questions you raise, such as: > how much of the measured traffic was due to bad logic in > caching/forwarding servers, or in clients? how > will high and low ttl's affect bad logic that's known to be in wide > deployment? are equally important questions to ask, but .. there are only so many questions that a single paper can answer. This one provides valuable insight into client behavior and when and why DNS caching is effective. There have been other papers in the past (for instance, Danzig's 1992 study) that examined questions closer to those you pose. The results from those papers were useful in an entirely different way (namely, that almost all root server traffic was totally bogus because of client errors). It's clear that from the perspective of a root name server operator, the latter questions are probably more important. But from the perspective of, say, an Akamai or a Yahoo (or joe-random dot com), the former insights are equally valuable. -Dave -- work: [EMAIL PROTECTED] me: [EMAIL PROTECTED] MIT Laboratory for Computer Science http://www.angio.net/
Re: That MIT paper
what i meant by "act globally, think locally" in connection with That MIT Paper is that the caching effects seen at mit are at best representative of that part of mit's campus for that week, and that even a variance of 1% in caching effectiveness at MIT that's due to generally high or low TTL's (on A, or MX, or any other kind of data) becomes a huge factor in f-root's load, since MIT's load is only one drop in a larger ocean. see duane's paper, which is more of a "think globally, act locally" kind of thing. how much of the measured traffic was due to bad logic in caching/forwarding servers, or in clients? how will high and low ttl's affect bad logic that's known to be in wide deployment? what if 20,000 enterprise networks the size of MIT all saw a 1% decrease in caching effectiveness due to generally low TTL's? (what if a general decline in TTL's resulted from publication of That MIT Paper?) here's a snapshot of f-root's life. That MIT Paper not only fails to address it, and fails to take it into account, it fails to identify the global characteristic of the variables under study. caching performance is not simply a local issue. everyone connected to the internet acts globally. it is wildly foolish to think locally. 16:44:35.118922 208.139.64.98.12978 > 192.5.5.241.53: 16218 ? H.ROOT-SERVERS.NET. (36) 16:44:35.121171 208.139.64.98.12978 > 192.5.5.241.53: 10080 A6? H.ROOT-SERVERS.NET. (36) 16:44:35.124668 208.139.64.98.12978 > 192.5.5.241.53: 1902 ? C.ROOT-SERVERS.NET. (36) 16:44:35.127544 208.139.64.98.12978 > 192.5.5.241.53: 10098 ? G.ROOT-SERVERS.NET. (36) 16:44:35.130185 208.139.64.98.12978 > 192.5.5.241.53: 6010 A6? C.ROOT-SERVERS.NET. (36) 16:44:35.133828 208.139.64.98.12978 > 192.5.5.241.53: 1920 A6? G.ROOT-SERVERS.NET. (36) 16:44:35.136286 208.139.64.98.12978 > 192.5.5.241.53: 12169 ? F.ROOT-SERVERS.NET. (36) 16:44:35.139433 208.139.64.98.12978 > 192.5.5.241.53: 3988 A6? F.ROOT-SERVERS.NET. (36) 16:44:35.142324 208.139.64.98.12978 > 192.5.5.241.53: 10140 A6? B.ROOT-SERVERS.NET. (36) 16:44:35.145453 208.139.64.98.12978 > 192.5.5.241.53: 14244 ? B.ROOT-SERVERS.NET. (36) 16:44:35.149344 208.139.64.98.12978 > 192.5.5.241.53: 16297 A6? J.ROOT-SERVERS.NET. (36) 16:44:35.151674 208.139.64.98.12978 > 192.5.5.241.53: 1968 ? J.ROOT-SERVERS.NET. (36)
Re: That MIT paper
Hi, > But, to my understanding a too short TTL will do harm to cache server > performance > esp. the amount of RR cached is so large that BIND have to wait for > swapping I/O > and re-fetching those timeout RR again. I think you missed the main point of the report, it does not say that low TTLs are a good idea in general. What it does say is that the stability and performance of the DNS is mainyl based on a rather high TTL for NS records, which distrubutes the query load among a larger number of servers and avoid therefore SPFs and Bootlenecks. Compared to that the overall performance and load impact of lowering the TTL for A records down to a few 100 seconds is not an issue, mostly because the large number of queries for A records vom clients happen in very short intervals of time, just look at what your webbrowser is doing when you are surfing and therefore will be cached after the first query by the local nameserver anyway. The important thing here is that this nameserver does not have to go throught the same chain od DNS servers again to find the one who gives him the right answer a few hours or days later, but instead can just ask this server directly from his cached NS record. Parts of this I can also verify from my own experience. Although a nicely tuned cascade of nameservers might add some measurable performance to DNS resolution on client side, when surfing, the most noticeable performance improvement is having a decent DNS server in your local lan which you can reach within a few µS. So in short: LOW TTL A Records, will not affect stability and perfomance of DNS much. LOW TTL NS Records, bad bad Idea. Bye, Siggi.
Re: That MIT paper
Hi, >The paper doesn't pass any judgement on types of lookups, but obviously >not all DNS lookups are equal from the end user perspective. In our observation, looking for IP address consists 70% of our cache server load, MX consists of 14% and PTR only occupies 5%. And, on the other hand, the coarse analysis of our network traffic shows, Web traffic occupies only 8% while stream meadia occupies the most part of traffic. So, the authors the conclusion may be correct as viewing film online does not rely on DNS so much as browsing web pages. But, to my understanding a too short TTL will do harm to cache server performance esp. the amount of RR cached is so large that BIND have to wait for swapping I/O and re-fetching those timeout RR again. >"In our follow-up measurement study, [we found] that DNSBL related > DNS lookups at CSAIL in February 2004 account for 14% of all DNS > lookups. In comparison, DNSBL related traffic accounted for merely > 0.4% of all DNS lookups at CSAIL in December 2000." Is these work published or available publicly? Any work done with performance tuning with cache server? > 1. almost nobody has time to invest in reading this kind of paper. > 2. almost everybody is willing to form a strong opinion regardless of that. > 3. people from #2 use the paper they didn't read in #1 to justify an opinion. people rely on their experience, but science tries to find on basis of analysis. Usually, we met problems which is caused by people replace scientific conclusion with their experience. Joe Introducing Spymac MailPro: http://www.spymac.com/mailpro/
Re: that MIT paper again
Regarding both Paul's message below and Simon Walter's earlier message on this topic... Simon Walters scribed: > I'm slightly concerned that the authors think web traffic is the big > source of DNS, they may well be right (especially given one of the > authors is talking about his own network), but my quick glance at the Two things - first, the paper breaks down the DNS traffic by the protocol that generated it - see section III C, which notes "a small percentage of these lookups are related to reverse bloack-lists such as rbl.maps.vix.com" -- but remember that the study was published in 2001 based upon measurements made in January and December of 2000. RBL traffic wasn't nearly the proportion of DNS queries that it is today. As the person responsible for our group's spam filtering (one mailserver among many that were measured as a part of the study), we didn't start using spamassassin until late 2001, and I believe we were one of the more aggressive spam filtering groups in our lab. Also note that they found that about 20% of the TCP connections were FTP connections, mostly to/from mirror sites hosted in our lab. Sendmail of five years ago also wasn't as aggressive about performing reverse verification of sender addresses. I asked Jaeyeon about this (we share an office), and she noted that: "In our follow-up measurement study, [we found] that DNSBL related DNS lookups at CSAIL in February 2004 account for 14% of all DNS lookups. In comparison, DNSBL related traffic accounted for merely 0.4% of all DNS lookups at CSAIL in December 2000." Your question was right on the money for contemporary DNS data. > The abstract doesn't mention that the TTL on NS records is found to be > important for scalability of the DNS. Probably the main point Paul > wants us to note. Just because the DNS in insensitive to slight > changes in A record TTL doesn't mean TTL doesn't matter on other > records. This is a key observation, and seems like it's definitely missing from the abstract (alas, space constraints...). They're not talking about the NS records, and they're not talking about the associated A records for _nameservers_. On Sat, Aug 07, 2004 at 04:55:00PM +, Paul Vixie scribed: > > here's what i've learned by watching nanog's reaction to this paper, and > by re-reading the paper itself. > > 1. almost nobody has time to invest in reading this kind of paper. > 2. almost everybody is willing to form a strong opinion regardless of that. > 3. people from #2 use the paper they didn't read in #1 to justify an opinion. :) human nature. > 4. folks who need academic credit will write strong self-consistent papers. > 5. those papers do not have to be inclusive or objective to get published. > 6. on the internet, many folks by nature "think locally and act globally". > > 7. #6 includes manufacturers, operators, endusers, spammers, and researchers. > 8. the confluence of bad science and disinterested operators is disheartening. > 9. good "actual policy" must often fly in the face of "accepted mantra". I'm not quite sure how to respond to this part (because I'm not quite sure what you meant...). It's possible that the data analyzed in the paper may not be representative of, say, commercial Internet traffic, but how is the objectivity in question? The conclusions of the paper are actually pretty consistent with what informed intuition might suggest. First: "If NS records had lower TTL values, essentially all of the DNS lookup traffic observed in our trace would have gone to a root or gTLLD server, which would have increased the load on them by a factor of about five. Good NS-record caching is therefore critical to DNS scalability." and second: "Most of the benefit of caching [of A records] is achieved with TTL values of only a small number of minutes. This is because most cache hits are produced by single clients looking up the same server multiple times in quick succession [...]" As most operational experience can confirm, operating a nameserver for joe-random-domain is utterly trivial -- we used to (primary) a couple thousand domains on a p90 with bind 4.. As your own experience can confirm, running a root nameserver is considerably less trivial. The paper confirms the need for good TTL and caching management to reduce the load on root nameservers, but once you're outside that sphere of ~100 critical servers, the hugely distributed and heavy-tailed nature of DNS lookups renders caching a bit less effective except in those cases where client access patterns cause intense temporal correlations. -Dave -- work: [EMAIL PROTECTED] me: [EMAIL PROTECTED] MIT Laboratory for Computer Science http://www.angio.net/
Re: that MIT paper again
i wrote: > wrt the mit paper on why small ttl's are harmless, i recommend that > y'all actually read it, the whole thing, plus some of the references, > rather than assuming that the abstract is well supported by the body. > > http://nms.lcs.mit.edu/papers/dns-imw2001.html here's what i've learned by watching nanog's reaction to this paper, and by re-reading the paper itself. 1. almost nobody has time to invest in reading this kind of paper. 2. almost everybody is willing to form a strong opinion regardless of that. 3. people from #2 use the paper they didn't read in #1 to justify an opinion. 4. folks who need academic credit will write strong self-consistent papers. 5. those papers do not have to be inclusive or objective to get published. 6. on the internet, many folks by nature "think locally and act globally". 7. #6 includes manufacturers, operators, endusers, spammers, and researchers. 8. the confluence of bad science and disinterested operators is disheartening. 9. good "actual policy" must often fly in the face of "accepted mantra". we now return control of your television set to you.
Re: that MIT paper again (Re: VeriSign's rapid DNS updates in .com/.net ) (longish)
On 23.07 22:30, Simon Waters wrote: > > The abstract doesn't mention that the TTL on NS records is found to be > important for scalability of the DNS. Sic! And it is the *child* TTL that counts for most implementations.
Re: that MIT paper again (Re: VeriSign's rapid DNS updates in .com/.net ) (longish)
On Fri, 23 Jul 2004 22:30:46 BST, Simon Waters <[EMAIL PROTECTED]> said: > I think relying on accurate DNS information to distinguish spammers from > genuine senders is at best shakey currently, the only people I can think > would suffer with making it easier and quicker to create new domains > would be people relying on something like SPF, but I think that just > reveals issues with SPF, and the design flaws of SPF shouldn't influence > how we should manage the DNS. Ahh.. but if SPF (complete with issues and design flaws) is widely deployed, we may not have any choice regarding whether its issues and flaws dictate the DNS management. Remember that we've seen this before - RFC2052 didn't specify a '_', RFC2782 does. And we all know where BIND's "delegation-only" came from pgpXkHSYEKm4D.pgp Description: PGP signature
Re: that MIT paper again (Re: VeriSign's rapid DNS updates in .com/.net ) (longish)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 | Date: Fri, 23 Jul 2004 17:01:54 + | From: Paul Vixie <[EMAIL PROTECTED]> | Subject: that MIT paper again (Re: VeriSign's rapid DNS updates in .com/.net ) | |>>wrt the mit paper on why small ttl's are harmless, i recommend that |>>y'all actually read it, the whole thing, plus some of the references, |>>rather than assuming that the abstract is well supported by the body. | http://nms.lcs.mit.edu/papers/dns-imw2001.html I think most people are probably way too busy. I'll comment, and Paul can tell me where I am wrong or incomplete ;) I'm slightly concerned that the authors think web traffic is the big source of DNS, they may well be right (especially given one of the authors is talking about his own network), but my quick glance at the type of queries shouts to me that SMTP (and email related traffic, RBL's, etc) generate a disproportionate amount of wide area DNS traffic byte for byte of data. I would think this is one that is pretty easy to settle for specific networks. In particular I see a lot of retries generated by email servers for UBE and virus dross (in our case for upto 5 days), when human surfers have famously given up the domain as dead after the first 8 seconds. Perhaps if most people preview HTML in emails, surfing and email access to novel URI are one and the same. They conclude that the great bulk of benefit from sharing a DNS cache is obtained in the first 10 to 20 clients. Although they scale this only to 1000+ clients, maybe some NANOG members can comment if they have scaled DNS caches much bigger than this, but I suspect a lot of the scaling issues are driven by maintainance costs and reliability, since DNS doesn't generate much WAN traffic in comparison to HTTP for most people here (let's face it the root/tld owners are probably the only people who even think about bandwidth of DNS traffic). They conclude the TTL on A records isn't so crucial. The abstract doesn't mention that the TTL on NS records is found to be important for scalability of the DNS. Probably the main point Paul wants us to note. Just because the DNS in insensitive to slight changes in A record TTL doesn't mean TTL doesn't matter on other records. The paper leaves a lot of hanging question about "poor performance", the number of unanswered queries, and poor latency, which I'm sure can be pinned down to the generally poor state of the DNS (both forward and especially reverse), and a few bad applications. The big difference between the places/times studied, suggests to me how the DNS performs depends a lot on what mix of questions you ask it. They suggest not passing on unqualified names would lose a lot of fluff (me I still think big caches could zone transfer "." and save both traffic and, more importantly for the end users, latency, but that goes further than their proposal). Remember resolvers do various interesting things with unqualified names depending who coded them and when. The paper doesn't pass any judgement on types of lookups, but obviously not all DNS lookups are equal from the end user perspective. For example reverse DNS from HTTP server is typically done off the critical path (asynchronously), where as the same reverse lookup may be in the critical path for deciding whether to accept an email message (not that most people regard email as that time critical). Be nice to do a study classifying them along the lines of "DNS lookups you wait for", "DNS lookups that slow things down", "DNS lookups that have to be done by Friday for the weekly statistics". Some *nix vendor(s?) should make sure loghost is in /etc/hosts or not in /etc/syslog.conf by default by the sound of it ;) As regards rapid update by Verisign - bring it on - I'm always embarassed to tell clients they may have to wait upto 12 hours for a new website in this day and age. And any errors that gets made in the initial setup takes too long to fix, I don't want to be setting up a site 3PM Friday, and having to check it Monday morning to discover some typo means it is Tuesday before it works, when in a sane world one TTL + 5 minutes is long enough. I think relying on accurate DNS information to distinguish spammers from genuine senders is at best shakey currently, the only people I can think would suffer with making it easier and quicker to create new domains would be people relying on something like SPF, but I think that just reveals issues with SPF, and the design flaws of SPF shouldn't influence how we should manage the DNS. -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Debian - http://enigmail.mozdev.org iD8DBQFBAYOEGFXfHI9FVgYRApWTAKCupO6Eo5i0QtDqEuYs5d1xgEMetgCgjFJf LQBGn1G1gsdbKlg8pagoEVM= =fu+g -END PGP SIGNATURE-
that MIT paper again (Re: VeriSign's rapid DNS updates in .com/.net )
i'd said: > > wrt the mit paper on why small ttl's are harmless, i recommend that > > y'all actually read it, the whole thing, plus some of the references, > > rather than assuming that the abstract is well supported by the body. someone asked me: > Would you happen to have the URL for the MIT paper? I meant to keep it > to read at a latertime, but it seems I deleted the message. http://nms.lcs.mit.edu/papers/dns-imw2001.html