On Jun 14, 2007, at 5:03 PM, Kris Kennaway wrote:
It's at least arguable that doing queries against a data set
including a bunch of repeats is "skewed" in a more realistic
fashion. :-)  A quick look at some of the data sources I have handy
such as http access logs or Squid proxy logs suggests that (for
example) out of a database of 17+ million requests, there were only
46000 unique IPs involved.

There were still lots of repeats, just some of them were repeated
hundreds of thousands of times - I stripped about a dozen of those
(googlebots, I'm looking at you ;-), leaving a distribution that was
less biased to the top end.

Heh, yes, it's surprising how happy a webspider is to crawl around a heavily-interlinked site. :-)

Perhaps someone ought to add a:

  Crawl-delay: 600

...statement to http://www.freebsd.org/robots.txt...?

You might find it interesting to compare doing queries against your
raw and filtered datasets, just to see what kind of difference you
get, if any.

Cached queries perform much better, as you might expect.  As an
estimate I was getting query rates exceeding 120000 qps when serving
entirely out of cache, and I dont think I reached the upper bound yet.

Sure, anything cached or anything the nameserver is authoritative for is going to be directly answerable without having to do an external recursive query.

What was the external network connectivity in terms of speed?  The
docs suggest you need something like a 16MBs up/8 Mbs down
connectivity in order to get up to 50K requests/sec....

I wasn't seeing anything close to this, so I guess it depends how much
data is being returned by the queries (I was doing PTR lookups).  I
forget the exact numbers but it wasn't exceeding about 10Mbit in both
directions, which should have been well within link capacity.  Also
the lock profiling data bears out the interpretation that it was BIND
that was becoming saturated and not the hardware.

OK, thanks for the info. Maybe I'll get a chance to run some numbers of my own testing, if I can free up some time from WWDC....

[ ... ]
It would be interesting to test BIND performance when acting as an
authoritative server, which probably has very different performance
characteristics; the difficulty there is getting access to a suitably
interesting and representative zone file and query data.

I suppose you could also set up a test nameserver which claims to be
authoritative for all of in-addr.arpa, and set up a bunch (65K?) /16
reverse zone files, and then test against real unmodified IPs, but it
would be easier to do something like this:

Set up a nameserver which is authoritative for 1.10.in-addr.arpa (ie,
the reverse zone for 10.1/16), and use a zonefile with the $GENERATE
directive to populate your PTR records:

[ ...zonefile snipped for brevity... ]

...and then feed it a query database consisting of PTR lookups.  If
you wanted to, you could take your existing IP database, and glue the
last two octets of the real IPs onto 10.1 to produce a reasonable
assortment of IPs to perform a reverse lookup upon.

I could construct something like this but I'd prefer a more
"realistic" workload (i.e. an uneven distribution of queries against
different subsets of the data).  I don't have a good idea what
"realistic" means here, which makes it hard to construct one from
scratch.  Fortunately I have an offer from someone for access to a
real large zone file and a large sample of queries.

Ah, very good, then.

While I expect there to be quite a difference between recursive queries vs. authoritative/locally answerable queries (after all, that seems to be why both dnsperf and resperf were created as distinct programs), I'm not convinced that there is too much difference between doing reverse lookups for one set of IPs versus another if those IPs are all in zones the server is authoritative for.

--
-Chuck



_______________________________________________
freebsd-performance@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to