Good afternoon,

We've been working on upgrading our recursors from pdns-recursor-3.1.7.1-1 to pdns-recursor-3.3-1, and have seen some oddities I wanted to ask the list about. First, a basic rundown of our environment:

Our existing production servers are running pdns-recursor-3.1.7.1-1 installed via RPMs downloaded from your website. The recursor itself is ran within a Xen PV virtual machine on a CentOS 5.5 base. To ensure we utilize all 4 cores of the processors in those machines, 2 instances of the recursor are launched simultaneously, listening on different IP addresses, and we utilize the fork option. We have a total of 6 machines configured this way, behind a Foundry load balancer which handles sharing the load between them. This implementation has been in place for about a year with no issues. We also use Cacti graphs for collecting performance data, by extending SNMP with output from the rec_control command.

The new test server is pdns-recursor-3.3-1 installed via RPM downloaded from your website, and also running within a Xen PV virtual machine on a CentOS 5.5 base. Rather than launching multiple instances, we are launching 4 recursor threads (machines have 4 CPU cores). Most other settings are configured identically between old and new servers. This test server was added to the load balancer on Monday afternoon, taking a fraction of the traffic that would have gone to the 6 old machines.

The problem I'm seeing is the caching does not seem to be working properly, which is causing a performance hit. To document this effect, the following graph images were taken a little while ago from our Cacti installation:

http://www.jutley.org/DNS

Looking at the 4th graph down, which is the cache statistics on the old version recursor, you will see that around 90% of all questions are cache hits, with around 10% as cache misses. And, looking at the third graph (showing how fast queries are answered), you'll see that over 90% of all queries are answered in less than 1 ms.

However, looking at the bottom graph, which is the cache statistics on the new recursor, the statistics are totally different. Only 1.1% of the total questions are cache hits, while 6.8% are cache misses, which to me makes no sense, since a question *HAS* to be either a cache hit or cache miss. And, looking at the 7th graph (answer speed on the new recursor version), most queries are taking more than 10ms to answer.

Just as additional info, the data collected by cacti to generate these graphs comes from the following command:

/usr/bin/rec_control get questions cache-entries cache-hits cache-misses concurrent-queries resource-limits unauthorized-tcp unauthorized-udp spoof-prevents answers-slow client-parse-errors answers0-1 answers1-10 answers10-100 answers100-1000 qa-latency

Am I mis-interpreting this, or is there something definately going on?

Thanks for your time,

Jeremy
_______________________________________________
Pdns-users mailing list
Pdns-users@mailman.powerdns.com
http://mailman.powerdns.com/mailman/listinfo/pdns-users

Reply via email to