Re: Benchmarking experiences: Cabal test vs compiling nofib/spectral/simple/Main.hs

2021-01-23 Thread Ben Gamari
Sebastian Graf  writes:

> Hi Andreas,
>
> I similarly benchmark compiler performance by compiling Cabal, but only
> occasionally. I mostly trust ghc/alloc metrics in CI and check Cabal when I
> think there's something afoot and/or want to measure runtime, not only
> allocations.
>
I think this is a very reasonable strategy. When working explicitly on
compiler performance I generally default to the Cabal test as

 1. I find the 20 or 90 seconds (depending upon optimisation level) that
it takes is small relative to the time it took to actually find the
issue I am trying to fix, and

 2. I want to be certain I am not sacrificing compiler performance in
one case in exchange for improvements elsewhere; the nofib tests are so
small that I find it hard to convince myself that this is the case.

> I'm inclined to think that for my purposes (testing the impact of
> optimisations) the GHC codebase offers sufficient variety to turn up
> fundamental regressions, but maybe it makes sense to build some packages
> from head.hackage to detect regressions like
> https://gitlab.haskell.org/ghc/ghc/-/issues/19203 earlier. It's all a bit
> open-ended and I frankly think I wouldn't get done anything if all my
> patches would have to get to the bottom of all regressions and improvements
> on the entire head.hackage set. I somewhat trust that users will complain
> eventually and file a bug report and that our CI efforts mean that compiler
> performance will improve in the mean.
>
> Although it's probably more of a tooling problem: I simply don't know how
> to collect the compiler performance metrics for arbitrary cabal packages.
> If these metrics would be collected as part of CI, maybe as a nightly or
> weekly job, it would be easier to get to the bottom of a regression before
> it manifests in a released GHC version. But it all depends on how easy that
> would be to set up and how many CI cycles it would burn, and I certainly
> don't feel like I'm in a position to answer either question.
>
We actually already do this in head.hackage: every GHC commit on
`master` runs `head.hackage` with -ddump-timings. The compiler metrics
that result are then dumped into a database, which can be queried via
Postgrest. IIRC, I described this in an email to ghc-devs a few months
ago.

Unfortunately, Ryan and I have thusfar found it very difficult to
keep head.hackage and the associated infrastructure building reliably
enough to make this a useful long-term metric. I do hope we can do
better in the future; I suspect we will want to be better about marking
MRs that may break user code with ~"user facing", allowing us to ensure
that head.hackage is updated *before* the change makes it into `master`.

Cheers,

- Ben



signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Benchmarking experiences: Cabal test vs compiling nofib/spectral/simple/Main.hs

2021-01-23 Thread Sebastian Graf
Hi Andreas,

I similarly benchmark compiler performance by compiling Cabal, but only
occasionally. I mostly trust ghc/alloc metrics in CI and check Cabal when I
think there's something afoot and/or want to measure runtime, not only
allocations.

I'm inclined to think that for my purposes (testing the impact of
optimisations) the GHC codebase offers sufficient variety to turn up
fundamental regressions, but maybe it makes sense to build some packages
from head.hackage to detect regressions like
https://gitlab.haskell.org/ghc/ghc/-/issues/19203 earlier. It's all a bit
open-ended and I frankly think I wouldn't get done anything if all my
patches would have to get to the bottom of all regressions and improvements
on the entire head.hackage set. I somewhat trust that users will complain
eventually and file a bug report and that our CI efforts mean that compiler
performance will improve in the mean.

Although it's probably more of a tooling problem: I simply don't know how
to collect the compiler performance metrics for arbitrary cabal packages.
If these metrics would be collected as part of CI, maybe as a nightly or
weekly job, it would be easier to get to the bottom of a regression before
it manifests in a released GHC version. But it all depends on how easy that
would be to set up and how many CI cycles it would burn, and I certainly
don't feel like I'm in a position to answer either question.

Cheers,
Sebastian


Am Mi., 20. Jan. 2021 um 15:28 Uhr schrieb Andreas Klebinger <
klebinger.andr...@gmx.at>:

> Hello Devs,
>
> When I started to work on GHC a few years back the Wiki recommended
> using nofib/spectral/simple/Main.hs as
> a test case for compiler performance changes. I've been using this ever
> since.
>
> "Recently" the cabal-test (compiling cabal-the-library) has become sort
> of a default benchmark for GHC performance.
> I've used the Cabal test as well and it's probably a better test case
> than nofib/spectral/simple/Main.hs.
> I've started using both usually using spectral/simple to benchmark
> intermediate changes and then looking
> at the cabal test for the final patch at the end. So far I have rarely
> seen a large
> difference between using cabal or spectral/simple. Sometimes the
> magnitude of the effect was different
> between the two, but I've never seen one regress/improve while the other
> didn't.
>
> Since the topic came up recently in a discussion I wonder if others use
> similar means to quickly bench ghc changes
> and what your experiences were in terms of simpler benchmarks being
> representative compared to the cabal test.
>
> Cheers,
> Andreas
> ___
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Benchmarking experiences: Cabal test vs compiling nofib/spectral/simple/Main.hs

2021-01-20 Thread Andreas Klebinger

Hello Devs,

When I started to work on GHC a few years back the Wiki recommended
using nofib/spectral/simple/Main.hs as
a test case for compiler performance changes. I've been using this ever
since.

"Recently" the cabal-test (compiling cabal-the-library) has become sort
of a default benchmark for GHC performance.
I've used the Cabal test as well and it's probably a better test case
than nofib/spectral/simple/Main.hs.
I've started using both usually using spectral/simple to benchmark
intermediate changes and then looking
at the cabal test for the final patch at the end. So far I have rarely
seen a large
difference between using cabal or spectral/simple. Sometimes the
magnitude of the effect was different
between the two, but I've never seen one regress/improve while the other
didn't.

Since the topic came up recently in a discussion I wonder if others use
similar means to quickly bench ghc changes
and what your experiences were in terms of simpler benchmarks being
representative compared to the cabal test.

Cheers,
Andreas
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs