Did you check what the tracked connections were? We had to massively reduce the timeouts on UDP tracking, but this got things under control well. Check whether your application may be doing one DNS request per transaction / outgoing request, this happens in many standard libraries unless you take great care.
/MR On Wed, Mar 28, 2018, 17:57 Jonathan Tronson <jtron...@gmail.com> wrote: > When the downstream service went south we rapidly went from ~25k to 500k > in the table in less than a minute. I wouldn’t think there would be a > reasonable number to set that to that could prevent the entire node from > being affected. TPS was so high that catastrophe could be delayed a bit but > not prevented by a higher number. > > We also noticed that when this breakdown occurs the network traffic and > CPU utilization on our DNS servers increased tremendously. > > On Mar 28, 2018, at 8:44 AM, Rodrigo Campos <rodrig...@gmail.com> wrote: > > Just curious, but why not change the contrack limit? > > On Wednesday, March 28, 2018, <jtron...@gmail.com> wrote: > >> Is there anything similar to a network policy that limits x open >> connections per pod? >> >> During a 100k TPS load test, a subset of pods had errors connecting to a >> downstream service and we maxed out the nf_conntrack table (500k) which >> affected the rest of the pods on each node that had this issue - which >> happened to be 55% of the cluster. >> >> Besides handling this at the application level, I wanted to protect the >> cluster as a whole so that not one deployment can affect the entire cluster >> in this manner. >> >> Thanks for any help. >> >> -Jonathan >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Kubernetes user discussion and Q&A" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to kubernetes-users+unsubscr...@googlegroups.com. >> To post to this group, send email to kubernetes-users@googlegroups.com. >> Visit this group at https://groups.google.com/group/kubernetes-users. >> For more options, visit https://groups.google.com/d/optout. >> > -- > > You received this message because you are subscribed to a topic in the > Google Groups "Kubernetes user discussion and Q&A" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/kubernetes-users/ZlteifiQO8c/unsubscribe > . > To unsubscribe from this group and all its topics, send an email to > kubernetes-users+unsubscr...@googlegroups.com. > > > To post to this group, send email to kubernetes-users@googlegroups.com. > Visit this group at https://groups.google.com/group/kubernetes-users. > For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to the Google Groups > "Kubernetes user discussion and Q&A" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to kubernetes-users+unsubscr...@googlegroups.com. > To post to this group, send email to kubernetes-users@googlegroups.com. > Visit this group at https://groups.google.com/group/kubernetes-users. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group. To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-users+unsubscr...@googlegroups.com. To post to this group, send email to kubernetes-users@googlegroups.com. Visit this group at https://groups.google.com/group/kubernetes-users. For more options, visit https://groups.google.com/d/optout.