On Sat, 17 Jun 2006, Danial Thom wrote:

At some point you're going to have to figure out that there's a reason that every time anyone other than you tests FreeBSD it completely pigs out. Sqeezing out some extra bytes in netperf isn't "performance". Performance is everything that a system can do. If you're eating 10% more cpu to get a few more bytes in netperf, you haven't increased the performance of the system.

This test wasn't netperf, it was a 32-process web server and a 32-process client, doing sendfile on UFS-backed data files. It was definitely a potted benchmark, in that it omits some of the behaviors of web servers (dynamic content, significantly variable data set, etc), but is intended to be more than a simple micro-benchmark involving two sockets and packet blasting. Specifically, it was intended to validate whether or not there were immediately observable changes in TCP behavior based on adjusting HZ under load. The answer was a qualified yes: there was a small but noticeable negative affect on high load web serving in the test environment by reducing HZ, likely due to to reduced timer accuracy. Specifically: simply frobbing HZ isn't a strategy that necessarily results in a performance improvement.

You need to do things like run 2 benchmarks at once. What happens to the "performance" of one benchmark when you increase the "performance" of the other? Run a database benchmark while you're running a network benchmark, or while you're passing a controlled stream of traffic through the box.

The point of this exercise was to demonstrate the complexity of the issue of adjusting HZ, and to suggest that simply changing the value in the further absense of evidence could have negative effects, and that we might want to investigate a more mature middle ground, such as a modified timer architecture. I'm sorry if that conclusion wasn't clear from my e-mail.

I'd also love to see the results of the exact same test with only 1 cpu enabled, to see how well you scale generally. I'm astounded that no-one ever seems to post 1 vs 2 cpu performance, which is the entire point of SMP.

Single CPU results were included in my e-mail. There are actually a couple of other variations of interest you want to measure in more general benchmarking exercises:

- Kernel compiled without any SMP support.  Specifically, without lock
  prefixes on atomic instructions.

- Kernel compiled with SMP support, but with use of additional CPUs disabled.

- Kernel compiled with SMP support, and with varying numbers of CPUs enabled.

The first two cases are important, because they help identify the difference between the general overhead of compiling in locked instructions (and related issues), and the overheads associated with contention, caches, inter-CPU IPI traffic, scheduling, etc. By failing to compare the top to cases, it might be easy to conclude that a performance improve is due to the additional cost of atomic instructions, whereas in reality it may be the result of a poor scheduling decision, or of data unnecessarily cache missing in both CPUsrather than one because processing of the data is split poorly over available CPUs.

Robert N M Watson
Computer Laboratory
University of Cambridge
_______________________________________________
freebsd-performance@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to