Remembered seeing this on Twitter from last week ( https://twitter.com/bboreham/status/973871688495652865):
"PSA: In #Kubernetes <https://twitter.com/hashtag/Kubernetes?src=hash> use absolute DNS names not relative, where possible - put a dot at the end of the name. Cuts DNS lookups by 5x.I.e. instead of "http://example.com <https://t.co/IToGrHJxuK>" put "http://example.com <https://t.co/IToGrHJxuK> ." On Tue, 20 Mar 2018 at 11:44 Evan Jones <evan.jo...@bluecore.com> wrote: > The downside that I am aware of is that you don't get the Kubernetes DNS > magic, where names automatically point to your services. For the particular > use case where I ran into this, it worked perfectly! > > I was also going to attempt to add an alias so we could eventually migrate > to dnsPolicy: Host instead of the confusingly named Default, but it seemed > challenging enough that I never got around to it. > > Evan > > > On Tue, Mar 20, 2018 at 1:55 AM, <m...@percy.io> wrote: > >> On Thursday, October 5, 2017 at 1:29:28 PM UTC-7, Evan Jones wrote: >> > The sustained 1000 qps comes from an application making that many >> outbound connections. I agree that the application is very inefficient and >> shouldn't be doing a DNS lookup for every request it sends, but it's a >> python program that uses urllib2.urlopen so it creates a new connection >> each time. I suspect this isn't that unusual? This could be a server that >> hits an external service for every user request, for example. Given the >> activity on the GitHub issues I linked, it appears I'm not the only person >> to have run into this. >> > >> > >> > Thanks for the response though, since that answers my question: there >> is currently no plans to change how this works. Hopefully if anyone else >> hits this they might find this email so they can solve it faster than I did. >> > >> > >> > Finally the fact that dnsPolicy: Default is *not* the default is also >> surprising. It should probably be called dnsPolicy: Host or something >> instead. >> > >> > >> > >> > >> > >> > On Oct 5, 2017 13:54, "'Tim Hockin' via Kubernetes user discussion and >> Q&A" <kubernet...@googlegroups.com> wrote: >> > We had a proposal to avoid conntrack for DNS, but no real movement on >> it. >> > >> > >> > >> > We have flags to adjust the conntrack table size. >> > >> > >> > >> > Kernel has params to tweak timeouts, which users can tweak. >> > >> > >> > >> > Sustained 1000 QPS DNS seems artificial. >> > >> > >> > >> > On Thu, Oct 5, 2017 at 10:47 AM, Evan Jones <evan....@bluecore.com> >> wrote: >> > >> > > TL;DR: Kubernetes dnsPolicy: ClusterFirst can become a bottleneck >> with a >> > >> > > high rate of outbound connections. It seems like the problem is >> filling the >> > >> > > nf_conntrack table, causing client applications to fail to do DNS >> lookups. I >> > >> > > resolved this problem by switching my application to dnsPolicy: >> Default, >> > >> > > which provided much better performance for my application that does >> not need >> > >> > > cluster DNS. >> > >> > > >> > >> > > It seems like this is probably a "known" problem (see issues below), >> but I >> > >> > > can't tell: Is there a solution being worked on for this? >> > >> > > >> > >> > > Thanks! >> > >> > > >> > >> > > >> > >> > > Details: >> > >> > > >> > >> > > We were running a load generator, and were surprised to find that the >> > >> > > aggregate rate did not increase as we added more instances and nodes >> to our >> > >> > > cluster (GKE 1.7.6-gke.1). Eventually the application started getting >> errors >> > >> > > like "Name or service not known" at surprisingly low rates, like ~1000 >> > >> > > requests/second. Switching the application to dnsPolicy: Default >> resolved >> > >> > > the issue. >> > >> > > >> > >> > > I spent some time digging into this, and the problem is not the CPU >> > >> > > utilization kube-dns / dnsmasq itself. On my small cluster of ~10 >> > >> > > n1-standard-1 instances, I can get about 80000 cached DNS >> queries/second. I >> > >> > > *think* the issue is that when there are enough machines talking to >> this >> > >> > > single DNS server, it fills the nf_conntrack table, causing packets >> to get >> > >> > > dropped, which I believe ends up rate limiting the clients. dmesg on >> the >> > >> > > node that is running kube-dns shows a constant stream of: >> > >> > > >> > >> > > [1124553.016331] nf_conntrack: table full, dropping packet >> > >> > > [1124553.021680] nf_conntrack: table full, dropping packet >> > >> > > [1124553.027024] nf_conntrack: table full, dropping packet >> > >> > > [1124553.032807] nf_conntrack: table full, dropping packet >> > >> > > >> > >> > > It seems to me that this is a bottleneck for Kubernetes clusters, >> since by >> > >> > > default all queries are directed to a small number of machines, which >> will >> > >> > > then fill the connection tracking tables. >> > >> > > >> > >> > > Is there a planned solution to this bottleneck? I was very surprised >> that >> > >> > > *DNS* would be my bottleneck on a Kubernetes cluster, and at >> shockingly low >> > >> > > rates. >> > >> > > >> > >> > > >> > >> > > Related Github issues >> > >> > > >> > >> > > The following Github issues may be related to this problem. They all >> have a >> > >> > > bunch of discussion but no clear resolution: >> > >> > > >> > >> > > Run kube-dns on each node: >> > >> > > https://github.com/kubernetes/kubernetes/issues/45363 >> > >> > > Run dnsmasq on each node; mentions conntrack: >> > >> > > https://github.com/kubernetes/kubernetes/issues/32749 >> > >> > > kube-dns should be a daemonset / run on each node >> > >> > > https://github.com/kubernetes/kubernetes/issues/26707 >> > >> > > >> > >> > > dnsmasq intermittent connection refused: >> > >> > > https://github.com/kubernetes/kubernetes/issues/45976 >> > >> > > Intermitted DNS to external name: >> > >> > > https://github.com/kubernetes/kubernetes/issues/47142 >> > >> > > >> > >> > > kube-aws seems to already do something to run a local DNS resolver on >> each >> > >> > > node? https://github.com/kubernetes-incubator/kube-aws/pull/792/ >> > >> > > >> > >> > > -- >> > >> > > You received this message because you are subscribed to the Google >> Groups >> > >> > > "Kubernetes user discussion and Q&A" group. >> > >> > > To unsubscribe from this group and stop receiving emails from it, >> send an >> > >> > > email to kubernetes-use...@googlegroups.com. >> > >> > > To post to this group, send email to kubernet...@googlegroups.com. >> > >> > > Visit this group at https://groups.google.com/group/kubernetes-users. >> > >> > > For more options, visit https://groups.google.com/d/optout. >> > >> > >> > >> > -- >> > >> > You received this message because you are subscribed to a topic in the >> Google Groups "Kubernetes user discussion and Q&A" group. >> > >> > To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/kubernetes-users/7JBq6jhMZHc/unsubscribe >> . >> > >> > To unsubscribe from this group and all its topics, send an email to >> kubernetes-use...@googlegroups.com. >> > >> > To post to this group, send email to kubernet...@googlegroups.com. >> > >> > Visit this group at https://groups.google.com/group/kubernetes-users. >> > >> > For more options, visit https://groups.google.com/d/optout. >> >> Evan, >> >> This post was very helpful. We've hit this exact same issue in our >> Kubernetes cluster where we make a lot of outbound connections. >> >> Did you find any downsides with setting "dnsPolicy: Default" and did you >> end up sticking with that as the solution? >> >> Cheers, >> Mike >> > >> -- >> You received this message because you are subscribed to a topic in the >> Google Groups "Kubernetes user discussion and Q&A" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/kubernetes-users/7JBq6jhMZHc/unsubscribe >> . >> > To unsubscribe from this group and all its topics, send an email to >> kubernetes-users+unsubscr...@googlegroups.com. >> > >> To post to this group, send email to kubernetes-users@googlegroups.com. >> Visit this group at https://groups.google.com/group/kubernetes-users. >> For more options, visit https://groups.google.com/d/optout. >> > -- > You received this message because you are subscribed to the Google Groups > "Kubernetes user discussion and Q&A" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to kubernetes-users+unsubscr...@googlegroups.com. > To post to this group, send email to kubernetes-users@googlegroups.com. > Visit this group at https://groups.google.com/group/kubernetes-users. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group. To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-users+unsubscr...@googlegroups.com. To post to this group, send email to kubernetes-users@googlegroups.com. Visit this group at https://groups.google.com/group/kubernetes-users. For more options, visit https://groups.google.com/d/optout.