Re: Measuring performance of GHC

2016-12-05 Thread Moritz Angermann
Hi,

I see the following challenges here, which have partially be touched
by the discussion in the mentioned proposal.

- The tests we are looking at, might be quite time intensive (lots of
  modules that take substantial time to compile).  Is this practical to
  run when people locally execute nofib to get *some* idea of the
  performance implications?  Where is the threshold for the total
  execution time on running nofib?

- One of the core issues I see in day to day programming (even though
  not necessarily with haskell right now) is that the spare time I have
  to file bug reports, boil down performance regressions etc. and file
  them with open source projects is not paid for and hence minimal.
  Hence whenever the tools I use make it really easy for me to file a
  bug, performance regression or fix something that takes the least time
  the chances of me being able to help out increase greatly.  This was one
  of the ideas behind using just pull requests.
  E.g. This code seems to be really slow, or has subjectively regressed in
  compilation time. I also feel confident I can legally share this code
  snipped. So I just create a quick pull request with a short description,
  and then carry on with what ever pressing task I’m trying to solve right
  now.

- Making sure that measurements are reliable. (E.g. running on a dedicated
  machine with no other applications interfering.) I assume Joachim has
  quite some experience here.

Thanks.

Cheers,
 Moritz


> On Dec 6, 2016, at 9:44 AM, Ben Gamari  wrote:
> 
> Michal Terepeta  writes:
> 
>> Interesting! I must have missed this proposal.  It seems that it didn't meet
>> with much enthusiasm though (but it also proposes to have a completely
>> separate
>> repo on github).
>> 
>> Personally, I'd be happy with something more modest:
>> - A collection of modules/programs that are more representative of real
>>  Haskell programs and stress various aspects of the compiler.
>>  (this seems to be a weakness of nofib, where >90% of modules compile
>>  in less than 0.4s)
> 
> This would be great.
> 
>> - A way to compile all of those and do "before and after" comparisons
>>  easily. To measure the time, we should probably try to compile each
>>  module at least a few times. (it seems that this is not currently
>>  possible with `tests/perf/compiler` and
>>  nofib only compiles the programs once AFAICS)
>> 
>> Looking at the comments on the proposal from Moritz, most people would
>> prefer to
>> extend/improve nofib or `tests/perf/compiler` tests. So I guess the main
>> question is - what would be better:
>> - Extending nofib with modules that are compile only (i.e., not
>>  runnable) and focus on stressing the compiler?
>> - Extending `tests/perf/compiler` with ability to run all the tests and do
>>  easy "before and after" comparisons?
>> 
> I don't have a strong opinion on which of these would be better.
> However, I would point out that currently the tests/perf/compiler tests
> are extremely labor-intensive to maintain while doing relatively little
> to catch performance regressions. There are a few issues here:
> 
> * some tests aren't very reproducible between runs, meaning that
>   contributors sometimes don't catch regressions in their local
>   validations
> * many tests aren't very reproducible between platforms and all tests
>   are inconsistent between differing word sizes. This means that we end
>   up having many sets of expected performance numbers in the testsuite.
>   In practice nearly all of these except 64-bit Linux are out-of-date.
> * our window-based acceptance criterion for performance metrics doesn't
>   catch most regressions, which typically bump allocations by a couple
>   percent or less (whereas the acceptance thresholds range from 5% to
>   20%). This means that the testsuite fails to catch many deltas, only
>   failing when some unlucky person finally pushes the number over the
>   threshold.
> 
> Joachim and I discussed this issue a few months ago at Hac Phi; he had
> an interesting approach to tracking expected performance numbers which
> may both alleviate these issues and reduce the maintenance burden that
> the tests pose. I wrote down some terse notes in #12758.
> 
> Cheers,
> 
> - Ben

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Measuring performance of GHC

2016-12-05 Thread Ben Gamari
Michal Terepeta  writes:

> Interesting! I must have missed this proposal.  It seems that it didn't meet
> with much enthusiasm though (but it also proposes to have a completely
> separate
> repo on github).
>
> Personally, I'd be happy with something more modest:
> - A collection of modules/programs that are more representative of real
>   Haskell programs and stress various aspects of the compiler.
>   (this seems to be a weakness of nofib, where >90% of modules compile
>   in less than 0.4s)

This would be great.

> - A way to compile all of those and do "before and after" comparisons
>   easily. To measure the time, we should probably try to compile each
>   module at least a few times. (it seems that this is not currently
>   possible with `tests/perf/compiler` and
>   nofib only compiles the programs once AFAICS)
>
> Looking at the comments on the proposal from Moritz, most people would
> prefer to
> extend/improve nofib or `tests/perf/compiler` tests. So I guess the main
> question is - what would be better:
> - Extending nofib with modules that are compile only (i.e., not
>   runnable) and focus on stressing the compiler?
> - Extending `tests/perf/compiler` with ability to run all the tests and do
>   easy "before and after" comparisons?
>
I don't have a strong opinion on which of these would be better.
However, I would point out that currently the tests/perf/compiler tests
are extremely labor-intensive to maintain while doing relatively little
to catch performance regressions. There are a few issues here:

 * some tests aren't very reproducible between runs, meaning that
   contributors sometimes don't catch regressions in their local
   validations
 * many tests aren't very reproducible between platforms and all tests
   are inconsistent between differing word sizes. This means that we end
   up having many sets of expected performance numbers in the testsuite.
   In practice nearly all of these except 64-bit Linux are out-of-date.
 * our window-based acceptance criterion for performance metrics doesn't
   catch most regressions, which typically bump allocations by a couple
   percent or less (whereas the acceptance thresholds range from 5% to
   20%). This means that the testsuite fails to catch many deltas, only
   failing when some unlucky person finally pushes the number over the
   threshold.

Joachim and I discussed this issue a few months ago at Hac Phi; he had
an interesting approach to tracking expected performance numbers which
may both alleviate these issues and reduce the maintenance burden that
the tests pose. I wrote down some terse notes in #12758.

Cheers,

- Ben


signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Measuring performance of GHC

2016-12-05 Thread Ben Gamari
Michal Terepeta  writes:

> Hi everyone,
>
> I've been running nofib a few times recently to see the effect of some
> changes
> on compile time (not the runtime of the compiled program). And I've started
> wondering how representative nofib is when it comes to measuring compile
> time
> and compiler allocations? It seems that most of the nofib programs compile
> really quickly...
>
> Is there some collections of modules/libraries/applications that were put
> together with the purpose of benchmarking GHC itself and I just haven't
> seen/found it?
>
Sadly no; I've put out a number of calls for minimal programs (e.g.
small, fairly free-standing real-world applications) but the response
hasn't been terribly strong. I frankly can't blame people for not
wanting to take the time to strip out dependencies from their working
programs. Joachim and I have previously discussed the possibility of
manually collecting a set of popular Hackage libraries on a regular
basis for use in compiler performance characterization.

Cheers,

- Ben



signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Please don’t break travis

2016-12-05 Thread Matthew Pickering
I made #12930 to track this.

Matt

On Fri, Dec 2, 2016 at 11:22 PM, Joachim Breitner
 wrote:
> Hi,
>
> again, Travis is failing to build master since a while. Unfortunately,
> only the author of commits get mailed by Travis, so I did not notice it
> so far. But usually, when Travis reports a build failure, this is
> something actionable! If in doubt, contact me.
>
> The breakage at the moment occurs only with -DDEBUG on:
>
> Compile failed (exit code 1) errors were:
> ghc-stage2: panic! (the 'impossible' happened)
>   (GHC version 8.1.20161118 for x86_64-unknown-linux):
> No match in record selector is_iloc
>
> Please report this as a GHC bug:  
> http://www.haskell.org/ghc/reportabug
>
>
> *** unexpected failure for rn017(normal)
> Compile failed (exit code 1) errors were:
> ghc-stage2: panic! (the 'impossible' happened)
>   (GHC version 8.1.20161118 for x86_64-unknown-linux):
> No match in record selector is_iloc
>
> Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug
>
>
> *** unexpected failure for T7672(normal)
>
> And started appearing, unless I am mistaken, with
>
> From: Matthew Pickering 
> Date: Fri, 18 Nov 2016 16:28:30 +
> Subject: [PATCH] Optimise whole module exports
>
> We directly build up the correct AvailInfos rather than generating
> lots of singleton instances and combining them with expensive calls to
> unionLists.
>
> There are two other small changes.
>
> * Pushed the nubAvails call into the explicit export list
>   branch as we construct them correctly and uniquely ourselves.
> * fix_faminst only needs to check the first element of the export
>   list as we maintain the (yucky) invariant that the parent is the
>   first thing in it.
>
> Reviewers: simonpj, austin, bgamari
>
> Reviewed By: simonpj, bgamari
>
> Subscribers: simonpj, thomie, niteria
>
> Differential Revision: https://phabricator.haskell.org/D2657
>
> Matthew, can you verify that this is a regression introduce here?
>
> Greetings,
> Joachim
>
> --
> Joachim “nomeata” Breitner
>   m...@joachim-breitner.de • https://www.joachim-breitner.de/
>   XMPP: nome...@joachim-breitner.de • OpenPGP-Key: 0xF0FBF51F
>   Debian Developer: nome...@debian.org
> ___
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Measuring performance of GHC

2016-12-05 Thread Michal Terepeta
On Mon, Dec 5, 2016 at 12:00 PM Moritz Angermann 
wrote:

> Hi,
>
> I’ve started the GHC Performance Regression Collection Proposal[1]
> (Rendered [2])
> a while ago with the idea of having a trivially community curated set of
> small[3]
> real-world examples with performance regressions. I might be at fault here
> for
> not describing this to the best of my abilities. Thus if there is
> interested, and
> this sounds like an useful idea, maybe we should still pursue this
> proposal?
>
> Cheers,
>  moritz
>
> [1]: https://github.com/ghc-proposals/ghc-proposals/pull/26
> [2]:
> https://github.com/angerman/ghc-proposals/blob/prop/perf-regression/proposals/-perf-regression.rst
> [3]: for some definition of small
>

Interesting! I must have missed this proposal.  It seems that it didn't meet
with much enthusiasm though (but it also proposes to have a completely
separate
repo on github).

Personally, I'd be happy with something more modest:
- A collection of modules/programs that are more representative of real
Haskell
  programs and stress various aspects of the compiler.
  (this seems to be a weakness of nofib, where >90% of modules compile in
less
  than 0.4s)
- A way to compile all of those and do "before and after" comparisons
easily. To
  measure the time, we should probably try to compile each module at least
a few
  times.
  (it seems that this is not currently possible with `tests/perf/compiler`
and
  nofib only compiles the programs once AFAICS)

Looking at the comments on the proposal from Moritz, most people would
prefer to
extend/improve nofib or `tests/perf/compiler` tests. So I guess the main
question is - what would be better:
- Extending nofib with modules that are compile only (i.e., not runnable)
and
  focus on stressing the compiler?
- Extending `tests/perf/compiler` with ability to run all the tests and do
  easy "before and after" comparisons?

Personally, I'm slightly leaning towards `tests/perf/compiler` since this
would
allow sharing the same module as a test for `validate` and to be used for
comparing the performance of the compiler before and after a change.

What do you think?

Thanks,
Michal
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Measuring performance of GHC

2016-12-05 Thread Moritz Angermann
Hi,

I’ve started the GHC Performance Regression Collection Proposal[1] (Rendered 
[2])
a while ago with the idea of having a trivially community curated set of 
small[3]
real-world examples with performance regressions. I might be at fault here for
not describing this to the best of my abilities. Thus if there is interested, 
and
this sounds like an useful idea, maybe we should still pursue this proposal?

Cheers,
 moritz

[1]: https://github.com/ghc-proposals/ghc-proposals/pull/26
[2]: 
https://github.com/angerman/ghc-proposals/blob/prop/perf-regression/proposals/-perf-regression.rst
[3]: for some definition of small

> On Dec 5, 2016, at 6:31 PM, Simon Peyton Jones via ghc-devs 
>  wrote:
> 
> If not, maybe we should create something? IMHO it sounds reasonable to have
> 
> separate benchmarks for:
> 
> - Performance of GHC itself.
> 
> - Performance of the code generated by GHC.
> 
>  
> I think that would be great, Michael.  We have a small and unrepresentative 
> sample in testsuite/tests/perf/compiler
>  
> Simon
>  
> From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf Of Michal 
> Terepeta
> Sent: 04 December 2016 19:47
> To: ghc-devs 
> Subject: Measuring performance of GHC
>  
> Hi everyone,
> 
>  
> 
> I've been running nofib a few times recently to see the effect of some changes
> 
> on compile time (not the runtime of the compiled program). And I've started
> 
> wondering how representative nofib is when it comes to measuring compile time
> 
> and compiler allocations? It seems that most of the nofib programs compile
> 
> really quickly...
> 
>  
> 
> Is there some collections of modules/libraries/applications that were put
> 
> together with the purpose of benchmarking GHC itself and I just haven't
> 
> seen/found it?
> 
>  
> 
> If not, maybe we should create something? IMHO it sounds reasonable to have
> 
> separate benchmarks for:
> 
> - Performance of GHC itself.
> 
> - Performance of the code generated by GHC.
> 
>  
> 
> Thanks,
> 
> Michal
> 
>  
> 
> ___
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


RE: Measuring performance of GHC

2016-12-05 Thread Simon Peyton Jones via ghc-devs
If not, maybe we should create something? IMHO it sounds reasonable to have
separate benchmarks for:
- Performance of GHC itself.
- Performance of the code generated by GHC.

I think that would be great, Michael.  We have a small and unrepresentative 
sample in testsuite/tests/perf/compiler

Simon

From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf Of Michal 
Terepeta
Sent: 04 December 2016 19:47
To: ghc-devs 
Subject: Measuring performance of GHC

Hi everyone,

I've been running nofib a few times recently to see the effect of some changes
on compile time (not the runtime of the compiled program). And I've started
wondering how representative nofib is when it comes to measuring compile time
and compiler allocations? It seems that most of the nofib programs compile
really quickly...

Is there some collections of modules/libraries/applications that were put
together with the purpose of benchmarking GHC itself and I just haven't
seen/found it?

If not, maybe we should create something? IMHO it sounds reasonable to have
separate benchmarks for:
- Performance of GHC itself.
- Performance of the code generated by GHC.

Thanks,
Michal

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs