Hi Jason,

Sorry for the delayed response. Thanks for pointing out the darcs-benchmark
package. I had not seen that before and there may be some room for sharing
infrastructure. Parsing the runtime stats is pretty easy, but comparing
different runs, computing statistics, and generating tables should be a
common task.

On a related note, when I uploaded the fibon package, I put it in a new
"Benchmarking" category as opposed to the existing "Testing" category. In my
mind testing is more for correctness and benchmarking is for performance. I
think it would be useful to include other benchmarking packages
(darcs-benchmark, criterion) in that category.



--------------------------------------------------
From: "Jason Dagit" <da...@codersbase.com>
Sent: Tuesday, November 09, 2010 7:58 PM
To: "David Peixotto" <d...@rice.edu>
Cc: <hask...@haskell.org>; <haskell-cafe@haskell.org>
Subject: Re: [Haskell] ANNOUNCE: The Fibon benchmark suite (v0.2.0)

On Tue, Nov 9, 2010 at 5:47 PM, David Peixotto <d...@rice.edu> wrote:


On Nov 9, 2010, at 3:45 PM, Jason Dagit wrote:

I have a few questions:
  * What differentiates fibon from criterion?  I see both use the
statistics package.


I think the two packages have different benchmarking targets.

Criterion allows you to easily test individual functions and gives some
help with benchmarking in the presence of lazy evaluation. If some code
does
not execute for a long time it will run it multiple times to get sensible
timings. Criterion does a much more sophisticated statistical analysis of
the results, but I hope to incorporate that into the Fibon analysis in
the
future.

Fibon is a more traditional benchmarking suite like SPEC or nofib. My
interest is using it to test compiler optimizations. It can only
benchmark
at the whole program level by running an executable. It checks that the
program produces the correct output, can collect extra metrics generated
by
the program, separates collecting results from analyzing results, and
generates tables directly comparing the results from different benchmark
runs.

  * Does it track memory statistics?  I glanced at the FAQ but didn't see
anything about it.


Yes, it can read memory statistics dumped by the GHC runtime. It has
built
in support for reading the stats dumped by `+RTS -t --machine-readable`
which includes things like bytes allocated and time spent in GC.


Oh, I see.  In that case, it's more similar to darcs-benchmark.  Except
that
darcs-benchmark is tailored specifically at benchmarking darcs.  Where
they
overlap is parsing the RTS statistics, running the whole program, and
tabular reports.  Darcs-benchmark adds to that an embedded DSL for
specifying operations to do on the repository between benchmarks (and
translating those operations to runnable shell snippets).

I wonder if Fibon and darcs-benchmark could share common infrastructure
beyond the statistics package.  It sure sounds like it to me.  Perhaps
some
collaboration is in order.


  * Are the numbers in the sample output seconds or milliseconds?  What
is
the stddev (eg., what does the distribution of run-times look like)?


I'm not sure which results you are referring to exactly (the numbers in
the
announcement were lines of code). I picked benchmarks that all ran for at
least a second (and hopefully longer) with compiler optimizations
enabled.
On an 8-core Xeon, the median time over all benchmarks is 8.43 seconds,
mean
time is 12.57 seconds and standard deviation is 14.56 seconds.


I probably read your email too fast, sorry.  Thanks for the clarification.

Thanks,
Jason

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to