On 24/04/14 20:41, David Joslin wrote: > Thanks for the reply, Simon. > > DNSSEC isn't enabled. > > I wonder if the pattern of the problem gives any clues... > > As I said, on a normal day with around 40-50 clients on the network there > is no problem at all with dnsmasq managing to use barely 0 - 2% of the CPU. > When the problem occurred there were a little over 100 clients. Running top > showed dnsmasq using 100% cpu so I restarted dnsmasq and kept an eye on > top. For maybe 5 or 10 minutes there was no problem, with dnsmasq using > very little cpu. Then dnsmasq would start to peak at maybe 20-30% for a > couple of seconds before dropping back. Then it would start peaking at > higher and higher levels before dropping back. Eventually, after running > for maybe half an hour it would start peaking at over 90% and staying there > for longer before dropping back. At this point dns requests would become > very slow (and maybe time out). And then dnsmasq would hit 100% cpu and > would stay there. Dns requests would time out and only restarting dnsmasq > would fix the problem. The pattern would then start over again. > > I may be wrong but it doesn't seem that dnsmasq is hitting a bug that > suddenly causes it to loop and hog the cpu until it's killed. It seems to > gradually show more and more of the problem before it eventually hogs 100% > cpu and has to be killed. > > If the problem was caused by dnsmasq being overloaded with requests, is it > likely or possible that 50 clients could put very little load on it but 100 > clients could swamp it? Also, would the problem not show itself as soon as > dnsmasq was restarted rather than showing the gradual increase in peak > usage until it hits 100%?
Logs would help. The pattern doesn't look familiar, but if I had to guess, I'd say that the problem is DHCP, not DNS. Every change to the DHCP lease database causes the file storing it to be re-written, and I suspect that's what's eating CPU, in disk wait. Version of dnsmasq in use would be useful, and a copy of your config (to me privately, if you prefer.) When dnsmasq is running at 100%, try running strace -p <pid of dnsmasq process> that will run forever, printing what syscalls are being made, you can ctrl-c it after a show while, which will stop strace, but not dnsmasq. Cheers, Simon > > I hope this helps. Any thoughts on this pattern? > > Cheers > > David > > > On 24 April 2014 12:41, Simon Kelley <[email protected]> wrote: > >> On 22/04/14 20:04, David Joslin wrote: >>> Hi >>> >>> I have an Asus rt-n16 router running the Shibby version of the Tomato >>> firmware which includes dnsmasq version 2.69test3. It's in use in a >>> building that frequently has 50+ users on a wireless network and dnsmasq >>> has performed extremely well with very little load on the router. >>> >>> However, we've recently run a couple of conferences in the building and >> the >>> number of people using the wireless network has been just over 100. >> Several >>> times there have been problems resolving addresses and when I've looked >> at >>> the router dnsmasq has been using 100% cpu. Restarting dnsmasq >> temporarily >>> fixes the problem but it occurs again maybe 20 minutes later. >>> >>> I've turned off logging, increased the cache-size and the maximum number >> of >>> dhcp leases (anything I could see that might be a problem with more >> users) >>> but this hasn't fixed the problem. >>> >>> I wondered if anyone has come across anything similar or has any >>> suggestions? >>> >> >> The first thing is to try and decide which of two possible scenarios ar >> happening. The first is that you've triggered a bug in the code and >> dnsmasq is looping somewhere without ever getting back to the select() >> loop and doing actual work. The second is that it's getting so much work >> that it's running out of CPU to do it. >> >> In the first case, dnsmasq will stop working entirely. Is that >> consistent with "problems resolving addresses" or does it still >> partially work? Turning off logging is probably counter-productive here, >> the logs may have valuable clues. >> >> >> In the second case, DNSSEC is something to worry about. Do you have that >> turned on? >> >> Also, it's possible to arrive at configurations with DNS forwarding >> loops where once DNS query gets sent upstream, but somehow ends up back >> at the dnsmasq instance that originally forwarded it and then goes round >> in circles. It's quite difficult to do this without at least two dnsmasq >> instances, but it is possible. >> >> Finally, logging to a syslog daemon which does its own DNS lookups (to >> label logs from remote hosts) can create a collapse: dnsmasq will log >> several lines for each DNS query, if each of those lines generates a new >> DNS query which has to handled by dnsmasq, it all goes wrong very quickly. >> >> >> Cheers, >> >> >> Simon. >> >> >> >> _______________________________________________ >> Dnsmasq-discuss mailing list >> [email protected] >> http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss >> > > > > _______________________________________________ > Dnsmasq-discuss mailing list > [email protected] > http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss > _______________________________________________ Dnsmasq-discuss mailing list [email protected] http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
