Re: Compile times: Average vs worst case runtime

2018-09-01 Thread Andreas Klebinger

I've looked further into this.

The bad news is the ratio is likely not a useful indicator for compile time.

The good news is I spent the last day and a half profiling and 
optimizing the code and found the major reason why performance degraded 
so fast.
There was a certain pattern which had complexity of O(blocks * branches) 
which now should be O(blocks * log branches) with negligibly increased 
memory usage.


The worst cases I know (a function with 1500 guards compiled with -O0) 
is now only about 3% slower than the old algorithm. Which seems reasonable.


I will check how this affects compile time for regular code and rerun 
benchmarks to make sure I've not introduced regressions.
But if it holds up we should at least enable it at O1/O2. Depending how 
compile times are for regular code maybe for -O0 as well.


Cheers
Andreas




Andreas Klebinger schrieb:


Simon Peyton Jones schrieb:

Good work.

| However it will also be possible to fall back to the old behavior to
| avoid the large compile times for modules known to trigger this
| edge case.

Not for specific modules, I hope; rather fall back when the number of 
cases becomes large, or something like that?  That'd be fine.
As it stands it's a regular flag and hence per Module (just like 
worker wrapper or similar) and user controlled.
In the code example I have in my head it was a large case that get's 
desugared to a larger chain of if/elseif/elseif//else.

So it's not strictly large cases but usually it is.

I actually didn't consider doing a "dynamic" fall back so far. That's 
a great idea!
If we are lucky we can use the ratio of edges to blocks, and fall back 
to the old one if we are above a certain function size and ratio.

With the threshold in turn depending on the optimization level.

But not sure if that's good design as we then end up with cases where 
performance suddenly gets 5% worse if

people add a constructor or code get's slightly larger for some reason.


The only other potential worry is how tricky/understandable is your 
patch.  Hopefully it's not hard.
I hope so, the algorithm itself isn't that complicated to some of the 
stuff in GHC and I tried to document it well.

But it's also far from being trivial code.


| 1. Always since on average both compile and runtime is faster.
| 2. -O1/O2 since users often use this when performance matters.
| 3. -O2 since users expect compilation times to blow up with it?
| 4. Never as users experience compile time slowdowns when they hit 
these

| edge cases.

Well, if you can robustly fall back in edge cases, you'll never hit 
the blow-ups.  So then (1) would be fine would it not?
Guess I will have to see how well the edge/block ratio correlates with 
compile time blowups. If we can use that to rule out the really bad 
cases then (1) should be fine.

If not I will have to come back to the question.

Cheers
Andreas


Simon




| -Original Message-
| From: ghc-devs  On Behalf Of Andreas
| Klebinger
| Sent: 30 August 2018 18:08
| To: ghc-devs@haskell.org
| Subject: Compile times: Average vs worst case runtime
|
| Hello Devs,
|
| I developed a patch improving on GHC's code layout during GSoC:
| https://ghc.haskell.org/trac/ghc/ticket/15124
| The gains are pretty nice with most library benchmarks showing
| improvments in the 1-2% range or more.
|
| Even compile times went down comparing head vs stage2 built with, and
| using my patch!
| Making compilation of nofib overall faster by 0.5-1% depending on the
| exact setup.
|
| Now the bad news:
| In some cases the algorithm has bad big O performance, in practice 
this

| seems to be code with
| things like cases with 200+ alternatives. Where these cases also
| represent most of the code compiled.
|
| The worst case I saw was doubled compile time in a Module which only
| consisted of a 500 deep if/else chain only selecting
| an value.
|
| While there are some small optimizations possible to improve on this I
| don't expect to make these edge cases much faster overall.
| However it will also be possible to fall back to the old behavior to
| avoid the large compile times for modules known to trigger this
| edge case.
|
| Which brings me to my main question: When should we use the new code
| layout.
| 1. Always since on average both compile and runtime is faster.
| 2. -O1/O2 since users often use this when performance matters.
| 3. -O2 since users expect compilation times to blow up with it?
| 4. Never as users experience compile time slowdowns when they hit 
these

| edge cases.
|
| I'm would prefer 2. 3. 1. 4. in that order but I wonder what the wider
| community is thinking.
|
| Cheers
| Andreas
| ___
| ghc-devs mailing list
| ghc-devs@haskell.org
| http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs




___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Compile times: Average vs worst case runtime

2018-08-30 Thread Andreas Klebinger


Simon Peyton Jones schrieb:

Good work.

| However it will also be possible to fall back to the old behavior to
| avoid the large compile times for modules known to trigger this
| edge case.

Not for specific modules, I hope; rather fall back when the number of cases 
becomes large, or something like that?  That'd be fine.
As it stands it's a regular flag and hence per Module (just like worker 
wrapper or similar) and user controlled.
In the code example I have in my head it was a large case that get's 
desugared to a larger chain of if/elseif/elseif//else.

So it's not strictly large cases but usually it is.

I actually didn't consider doing a "dynamic" fall back so far. That's a 
great idea!
If we are lucky we can use the ratio of edges to blocks, and fall back 
to the old one if we are above a certain function size and ratio.

With the threshold in turn depending on the optimization level.

But not sure if that's good design as we then end up with cases where 
performance suddenly gets 5% worse if

people add a constructor or code get's slightly larger for some reason.


The only other potential worry is how tricky/understandable is your patch.  
Hopefully it's not hard.
I hope so, the algorithm itself isn't that complicated to some of the 
stuff in GHC and I tried to document it well.

But it's also far from being trivial code.


| 1. Always since on average both compile and runtime is faster.
| 2. -O1/O2 since users often use this when performance matters.
| 3. -O2 since users expect compilation times to blow up with it?
| 4. Never as users experience compile time slowdowns when they hit these
| edge cases.

Well, if you can robustly fall back in edge cases, you'll never hit the 
blow-ups.  So then (1) would be fine would it not?
Guess I will have to see how well the edge/block ratio correlates with 
compile time blowups. If we can use that to rule out the really bad 
cases then (1) should be fine.

If not I will have to come back to the question.

Cheers
Andreas


Simon




| -Original Message-
| From: ghc-devs  On Behalf Of Andreas
| Klebinger
| Sent: 30 August 2018 18:08
| To: ghc-devs@haskell.org
| Subject: Compile times: Average vs worst case runtime
|
| Hello Devs,
|
| I developed a patch improving on GHC's code layout during GSoC:
| https://ghc.haskell.org/trac/ghc/ticket/15124
| The gains are pretty nice with most library benchmarks showing
| improvments in the 1-2% range or more.
|
| Even compile times went down comparing head vs stage2 built with, and
| using my patch!
| Making compilation of nofib overall faster by 0.5-1% depending on the
| exact setup.
|
| Now the bad news:
| In some cases the algorithm has bad big O performance, in practice this
| seems to be code with
| things like cases with 200+ alternatives. Where these cases also
| represent most of the code compiled.
|
| The worst case I saw was doubled compile time in a Module which only
| consisted of a 500 deep if/else chain only selecting
| an value.
|
| While there are some small optimizations possible to improve on this I
| don't expect to make these edge cases much faster overall.
| However it will also be possible to fall back to the old behavior to
| avoid the large compile times for modules known to trigger this
| edge case.
|
| Which brings me to my main question: When should we use the new code
| layout.
| 1. Always since on average both compile and runtime is faster.
| 2. -O1/O2 since users often use this when performance matters.
| 3. -O2 since users expect compilation times to blow up with it?
| 4. Never as users experience compile time slowdowns when they hit these
| edge cases.
|
| I'm would prefer 2. 3. 1. 4. in that order but I wonder what the wider
| community is thinking.
|
| Cheers
| Andreas
| ___
| ghc-devs mailing list
| ghc-devs@haskell.org
| http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


RE: Compile times: Average vs worst case runtime

2018-08-30 Thread Simon Peyton Jones via ghc-devs
Good work.

| However it will also be possible to fall back to the old behavior to
| avoid the large compile times for modules known to trigger this
| edge case.

Not for specific modules, I hope; rather fall back when the number of cases 
becomes large, or something like that?  That'd be fine.

The only other potential worry is how tricky/understandable is your patch.  
Hopefully it's not hard.

| 1. Always since on average both compile and runtime is faster.
| 2. -O1/O2 since users often use this when performance matters.
| 3. -O2 since users expect compilation times to blow up with it?
| 4. Never as users experience compile time slowdowns when they hit these
| edge cases.

Well, if you can robustly fall back in edge cases, you'll never hit the 
blow-ups.  So then (1) would be fine would it not?

Simon




| -Original Message-
| From: ghc-devs  On Behalf Of Andreas
| Klebinger
| Sent: 30 August 2018 18:08
| To: ghc-devs@haskell.org
| Subject: Compile times: Average vs worst case runtime
| 
| Hello Devs,
| 
| I developed a patch improving on GHC's code layout during GSoC:
| https://ghc.haskell.org/trac/ghc/ticket/15124
| The gains are pretty nice with most library benchmarks showing
| improvments in the 1-2% range or more.
| 
| Even compile times went down comparing head vs stage2 built with, and
| using my patch!
| Making compilation of nofib overall faster by 0.5-1% depending on the
| exact setup.
| 
| Now the bad news:
| In some cases the algorithm has bad big O performance, in practice this
| seems to be code with
| things like cases with 200+ alternatives. Where these cases also
| represent most of the code compiled.
| 
| The worst case I saw was doubled compile time in a Module which only
| consisted of a 500 deep if/else chain only selecting
| an value.
| 
| While there are some small optimizations possible to improve on this I
| don't expect to make these edge cases much faster overall.
| However it will also be possible to fall back to the old behavior to
| avoid the large compile times for modules known to trigger this
| edge case.
| 
| Which brings me to my main question: When should we use the new code
| layout.
| 1. Always since on average both compile and runtime is faster.
| 2. -O1/O2 since users often use this when performance matters.
| 3. -O2 since users expect compilation times to blow up with it?
| 4. Never as users experience compile time slowdowns when they hit these
| edge cases.
| 
| I'm would prefer 2. 3. 1. 4. in that order but I wonder what the wider
| community is thinking.
| 
| Cheers
| Andreas
| ___
| ghc-devs mailing list
| ghc-devs@haskell.org
| http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Compile times: Average vs worst case runtime

2018-08-30 Thread Andreas Klebinger

Hello Devs,

I developed a patch improving on GHC's code layout during GSoC: 
https://ghc.haskell.org/trac/ghc/ticket/15124
The gains are pretty nice with most library benchmarks showing 
improvments in the 1-2% range or more.


Even compile times went down comparing head vs stage2 built with, and 
using my patch!
Making compilation of nofib overall faster by 0.5-1% depending on the 
exact setup.


Now the bad news:
In some cases the algorithm has bad big O performance, in practice this 
seems to be code with
things like cases with 200+ alternatives. Where these cases also 
represent most of the code compiled.


The worst case I saw was doubled compile time in a Module which only 
consisted of a 500 deep if/else chain only selecting

an value.

While there are some small optimizations possible to improve on this I 
don't expect to make these edge cases much faster overall.
However it will also be possible to fall back to the old behavior to 
avoid the large compile times for modules known to trigger this

edge case.

Which brings me to my main question: When should we use the new code layout.
1. Always since on average both compile and runtime is faster.
2. -O1/O2 since users often use this when performance matters.
3. -O2 since users expect compilation times to blow up with it?
4. Never as users experience compile time slowdowns when they hit these 
edge cases.


I'm would prefer 2. 3. 1. 4. in that order but I wonder what the wider 
community is thinking.


Cheers
Andreas
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs