It doesn't matter whether they try to confirm the measurements for themselves 
or not - what matters is that they are provided with the all the information 
required to do so.


I only have 5 years experience publishing the measurements for the benchmarks 
game - and I've come across a handful of people who did try to confirm the 
measurements for themselves.

(The most interesting example compared a couple of language implementations on 
one particular task but measured at 2 dozen different input values. That nicely 
demonstrated that the same language implementation wasn't always faster across 
all the input values. The 3 different input values shown on the benchmarks game 
isn't usually enough to demonstrate that kind of thing.)


That's an interesting observation. I didn't even think of that before, but it does make sense.

I was debating on posting this, but I figured it couldn't hurt: the biggest problem I have with the benchmarks they use is that, at least from my perspective, they're not all very common algorithms. Some things I'd love to see are B-Trees, which are common in databases, encryption, compression, etc. as they are very common and therefore provide more useful comparisons. Even MapReduce would be good since that's becoming very popular.

Taking it a step further, there needs to be well-defined standard implementations and alternative implementations. The standard implementations would be designed to be straight-forward designs that don't use any trickery so that we can actually compare language implementations. The alternative ones would then show how you can make the implementations faster. I mention this because a buddy of mine submitted a C version of one benchmark, but implemented his own thread pooling code. It was rejected even though the C++ version used Boost, which also, from what I'm told, uses thread pooling. A standard implementation could be used to define if things like thread pooling can/should be used. I'd argue not in this case as not every language supports and/or requires it. E.g. Erlang.

Of course, this is all just some ideas that I'm not going to try to implement as it's just going to be too much work to do and I don't have the resources to do it right. Even then, how do we make it truly fair and accurate? Based on what I've seen in this thread, it's a pretty hard problem if even the data can affect a languages performance.

Casey

Reply via email to