[Pdns-users] Cache Problems with upgrade to Recursor 3.3

Jeremy Utley Wed, 01 Dec 2010 10:41:41 -0800

Good afternoon,

We've been working on upgrading our recursors frompdns-recursor-3.1.7.1-1 to pdns-recursor-3.3-1, and have seen someoddities I wanted to ask the list about. First, a basic rundown of ourenvironment:

Our existing production servers are running pdns-recursor-3.1.7.1-1installed via RPMs downloaded from your website. The recursor itself isran within a Xen PV virtual machine on a CentOS 5.5 base. To ensure weutilize all 4 cores of the processors in those machines, 2 instances ofthe recursor are launched simultaneously, listening on different IPaddresses, and we utilize the fork option. We have a total of 6machines configured this way, behind a Foundry load balancer whichhandles sharing the load between them. This implementation has been inplace for about a year with no issues. We also use Cacti graphs forcollecting performance data, by extending SNMP with output from therec_control command.

The new test server is pdns-recursor-3.3-1 installed via RPM downloadedfrom your website, and also running within a Xen PV virtual machine on aCentOS 5.5 base. Rather than launching multiple instances, we arelaunching 4 recursor threads (machines have 4 CPU cores). Most othersettings are configured identically between old and new servers. Thistest server was added to the load balancer on Monday afternoon, taking afraction of the traffic that would have gone to the 6 old machines.

The problem I'm seeing is the caching does not seem to be workingproperly, which is causing a performance hit. To document this effect,the following graph images were taken a little while ago from our Cactiinstallation:


http://www.jutley.org/DNS

Looking at the 4th graph down, which is the cache statistics on the oldversion recursor, you will see that around 90% of all questions arecache hits, with around 10% as cache misses. And, looking at the thirdgraph (showing how fast queries are answered), you'll see that over 90%of all queries are answered in less than 1 ms.

However, looking at the bottom graph, which is the cache statistics onthe new recursor, the statistics are totally different. Only 1.1% ofthe total questions are cache hits, while 6.8% are cache misses, whichto me makes no sense, since a question *HAS* to be either a cache hit orcache miss. And, looking at the 7th graph (answer speed on the newrecursor version), most queries are taking more than 10ms to answer.

Just as additional info, the data collected by cacti to generate thesegraphs comes from the following command:

/usr/bin/rec_control get questions cache-entries cache-hits cache-missesconcurrent-queries resource-limits unauthorized-tcp unauthorized-udpspoof-prevents answers-slow client-parse-errors answers0-1 answers1-10answers10-100 answers100-1000 qa-latency


Am I mis-interpreting this, or is there something definately going on?

Thanks for your time,

Jeremy
_______________________________________________
Pdns-users mailing list
Pdns-users@mailman.powerdns.com
http://mailman.powerdns.com/mailman/listinfo/pdns-users

[Pdns-users] Cache Problems with upgrade to Recursor 3.3

Reply via email to