On Thu, Oct 5, 2017 at 1:29 PM, Evan Jones <evan.jo...@bluecore.com> wrote:
> The sustained 1000 qps comes from an application making that many outbound
> connections. I agree that the application is very inefficient and shouldn't
> be doing a DNS lookup for every request it sends, but it's a python program
> that uses urllib2.urlopen so it creates a new connection each time. I
> suspect this isn't that unusual? This could be a server that hits an
> external service for every user request, for example. Given the activity on
> the GitHub issues I linked, it appears I'm not the only person to have run
> into this.

You're certainly not the ONLY but it's not that common.  Regardless,
the work to make this hurt less has not been done.

> Thanks for the response though, since that answers my question: there is
> currently no plans to change how this works. Hopefully if anyone else hits
> this they might find this email so they can solve it faster than I did.

You can tweak the flags to mitigate, I hope?

> Finally the fact that dnsPolicy: Default is *not* the default is also
> surprising. It should probably be called dnsPolicy: Host or something
> instead.

Yeah "Host" might have been better.  I would take PRs to add Host and
let it mean the same as "Default" and deprecate (but not remove)
"Default".

Tim


> On Oct 5, 2017 13:54, "'Tim Hockin' via Kubernetes user discussion and Q&A"
> <kubernetes-users@googlegroups.com> wrote:
>>
>> We had a proposal to avoid conntrack for DNS, but no real movement on it.
>>
>> We have flags to adjust the conntrack table size.
>>
>> Kernel has params to tweak timeouts, which users can tweak.
>>
>> Sustained 1000 QPS DNS seems artificial.
>>
>> On Thu, Oct 5, 2017 at 10:47 AM, Evan Jones <evan.jo...@bluecore.com>
>> wrote:
>> > TL;DR: Kubernetes dnsPolicy: ClusterFirst can become a bottleneck with a
>> > high rate of outbound connections. It seems like the problem is filling
>> > the
>> > nf_conntrack table, causing client applications to fail to do DNS
>> > lookups. I
>> > resolved this problem by switching my application to dnsPolicy: Default,
>> > which provided much better performance for my application that does not
>> > need
>> > cluster DNS.
>> >
>> > It seems like this is probably a "known" problem (see issues below), but
>> > I
>> > can't tell: Is there a solution being worked on for this?
>> >
>> > Thanks!
>> >
>> >
>> > Details:
>> >
>> > We were running a load generator, and were surprised to find that the
>> > aggregate rate did not increase as we added more instances and nodes to
>> > our
>> > cluster (GKE 1.7.6-gke.1). Eventually the application started getting
>> > errors
>> > like "Name or service not known" at surprisingly low rates, like ~1000
>> > requests/second. Switching the application to dnsPolicy: Default
>> > resolved
>> > the issue.
>> >
>> > I spent some time digging into this, and the problem is not the CPU
>> > utilization kube-dns / dnsmasq itself. On my small cluster of ~10
>> > n1-standard-1 instances, I can get about 80000 cached DNS
>> > queries/second. I
>> > *think* the issue is that when there are enough machines talking to this
>> > single DNS server, it fills the nf_conntrack table, causing packets to
>> > get
>> > dropped, which I believe ends up rate limiting the clients. dmesg on the
>> > node that is running kube-dns shows a constant stream of:
>> >
>> > [1124553.016331] nf_conntrack: table full, dropping packet
>> > [1124553.021680] nf_conntrack: table full, dropping packet
>> > [1124553.027024] nf_conntrack: table full, dropping packet
>> > [1124553.032807] nf_conntrack: table full, dropping packet
>> >
>> > It seems to me that this is a bottleneck for Kubernetes clusters, since
>> > by
>> > default all queries are directed to a small number of machines, which
>> > will
>> > then fill the connection tracking tables.
>> >
>> > Is there a planned solution to this bottleneck? I was very surprised
>> > that
>> > *DNS* would be my bottleneck on a Kubernetes cluster, and at shockingly
>> > low
>> > rates.
>> >
>> >
>> > Related Github issues
>> >
>> > The following Github issues may be related to this problem. They all
>> > have a
>> > bunch of discussion but no clear resolution:
>> >
>> > Run kube-dns on each node:
>> > https://github.com/kubernetes/kubernetes/issues/45363
>> > Run dnsmasq on each node; mentions conntrack:
>> > https://github.com/kubernetes/kubernetes/issues/32749
>> > kube-dns should be a daemonset / run on each node
>> > https://github.com/kubernetes/kubernetes/issues/26707
>> >
>> > dnsmasq intermittent connection refused:
>> > https://github.com/kubernetes/kubernetes/issues/45976
>> > Intermitted DNS to external name:
>> > https://github.com/kubernetes/kubernetes/issues/47142
>> >
>> > kube-aws seems to already do something to run a local DNS resolver on
>> > each
>> > node? https://github.com/kubernetes-incubator/kube-aws/pull/792/
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups
>> > "Kubernetes user discussion and Q&A" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> > an
>> > email to kubernetes-users+unsubscr...@googlegroups.com.
>> > To post to this group, send email to kubernetes-users@googlegroups.com.
>> > Visit this group at https://groups.google.com/group/kubernetes-users.
>> > For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "Kubernetes user discussion and Q&A" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/kubernetes-users/7JBq6jhMZHc/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> kubernetes-users+unsubscr...@googlegroups.com.
>> To post to this group, send email to kubernetes-users@googlegroups.com.
>> Visit this group at https://groups.google.com/group/kubernetes-users.
>> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Kubernetes user discussion and Q&A" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kubernetes-users+unsubscr...@googlegroups.com.
> To post to this group, send email to kubernetes-users@googlegroups.com.
> Visit this group at https://groups.google.com/group/kubernetes-users.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to kubernetes-users+unsubscr...@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.
Visit this group at https://groups.google.com/group/kubernetes-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to