> In the end, though, benchmarks ignore one of the most important rules
> of software performance: "throughput" (i.e. the amount of processing
> that your system can do just prior to being overloaded) is almost never
> the most important consideration.  Other considerations such as
> flexibility, robustness, responsiveness and scalability are almost
> always more important.

Mmm, such statements really assume that there's a sensible meaning to
`almost always' when applied to the set of all programmers, whereas I
think a much more realistic assumption is that `there's lots of people out
there, all with different priorities' and present things in way which 
lets people perform their own evaluations. (In cases where I've reason to
believe how I code I can simply, reliably and significantly affect
throughput I care very much about it.) The problem with language
benchmarks is not that they `over-rate' the importance of performance but
that they assume per se that choice of language is a single-variable
(execution speed) optimization problem; there's no attempt to measure the
other items in your list, most especially flexibility. (I'm assuming you
mean flexibility of the programmer rewriting, retargeting, refactoring and
re-engineering exisiting code.) Of course, I don't have any good ideas
about how to measure these, particularly flexibility, in a practicallty
implementable and accurate way :-)

> I've thought for a while that what we need is more benchmarks like
> pseudoknot: Real tasks which real people want to do.  Computing
> Ackermann's function is all well and good, but when's the last time you
> actually needed to compute it in a real program?

I suspect there probably are things that make Ackermann's function a bad
`test-case' (eg, computationally simple and regular => good cache
utilisation after optimzation which don't extrapolate?)  but, for the
purposes I'd want to use benchmarks for these deficiencies compared to
things like BLAS performance -- which are things that _some_ real people
do all day -- are probably don't affect the results all that much. Of more
concern to me is, when's the last time you actually got a well specified
computational problem and a reasonable amount of time to write a carefully
crafted program to solve it, (particularly when you had some reassurance
that the very specification of what to solve wouldn't change after the
first time you ran the code :-) )?

> Off the top of my head, some "real" tasks which could be benchmarked
> include:
>       - MPEG video compression.

Here you really want to measure `glue-code' overhead (both performance
wise and `human rewriting'-wise) of linking together core processing
elements either written in low-level code (MMX, etc) or available on DSP
chips, in a way which would allow shifting components and algorithms
about. (I.e., in my opinion it's an informative task to have
`benchmark-type' data about because it's complicated with many ways to
solve the problem, not because it's `real world'.)

