Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-12-19 Thread Wilco Dijkstra
Hi,

>> I've noticed that your patch caused a regression:
>> FAIL: gcc.dg/tree-prof/pr77698.c scan-rtl-dump-times alignments
>> "internal loop alignment added" 1

I've created https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93007

Cheers,
Wilco



Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-10-16 Thread Richard Earnshaw (lists)

On 16/10/2019 13:13, Wilco Dijkstra wrote:

Hi Christophe,


I've noticed that your patch caused a regression:
FAIL: gcc.dg/tree-prof/pr77698.c scan-rtl-dump-times alignments
"internal loop alignment added" 1


That's just a testism - it only tests for loop alignment and doesn't
consider the possibility of the loop being jumped into like this:

.L17:
         adds    r0, r0, #1
         b       .L27
.L6:
         ldr     r4, [r2, #12]
         adds    r0, r0, #4
         ldr     lr, [r1]
         str     lr, [r3, r4, lsl #2]
         ldr     r4, [r2, #12]
         ldr     lr, [r1]
         str     lr, [r3, r4, lsl #2]
         ldr     r4, [r2, #12]
         ldr     lr, [r1]
         str     lr, [r3, r4, lsl #2]
.L27:
         ldr     r4, [r2, #12]
         cmp     ip, r0
         ldr     lr, [r1]
         str     lr, [r3, r4, lsl #2]
         bne     .L6
         pop     {r4, pc}

It seems minor changes in scheduling allows blocks to be commoned or not.
The underlying issue is that commoning like this should not be allowed on blocks
with different profile stats - particularly on loops where it inhibits 
scheduling of
the loop itself.

Cheers,
Wilco



So what's your proposed solution?  Leaving the test failing is not an 
option.


Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-10-16 Thread Wilco Dijkstra
Hi Christophe,

> I've noticed that your patch caused a regression:
> FAIL: gcc.dg/tree-prof/pr77698.c scan-rtl-dump-times alignments
> "internal loop alignment added" 1

That's just a testism - it only tests for loop alignment and doesn't
consider the possibility of the loop being jumped into like this:

.L17:
        adds    r0, r0, #1
        b       .L27
.L6:
        ldr     r4, [r2, #12]
        adds    r0, r0, #4
        ldr     lr, [r1]
        str     lr, [r3, r4, lsl #2]
        ldr     r4, [r2, #12]
        ldr     lr, [r1]
        str     lr, [r3, r4, lsl #2]
        ldr     r4, [r2, #12]
        ldr     lr, [r1]
        str     lr, [r3, r4, lsl #2]
.L27:
        ldr     r4, [r2, #12]
        cmp     ip, r0
        ldr     lr, [r1]
        str     lr, [r3, r4, lsl #2]
        bne     .L6
        pop     {r4, pc}

It seems minor changes in scheduling allows blocks to be commoned or not.
The underlying issue is that commoning like this should not be allowed on blocks
with different profile stats - particularly on loops where it inhibits 
scheduling of
the loop itself.

Cheers,
Wilco

Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-10-15 Thread Christophe Lyon
On Sat, 12 Oct 2019 at 02:52, Ramana Radhakrishnan
 wrote:
>
> On Fri, Oct 11, 2019 at 6:19 PM Wilco Dijkstra  wrote:
> >
> > Hi Ramana,
> >
> > > Can you see what happens with the Cortex-A8 or Cortex-A9 schedulers to
> > > spread the range across some v7-a CPUs as well ? While they aren't that 
> > > popular today I
> > > would suggest you look at them because the defaults for v7-a are still to 
> > > use the
> > > Cortex-A8 scheduler and the Cortex-A9 scheduler might well also get used 
> > > in places given
> > > the availability of hardware.
> >
> > The results are practically identical to Cortex-A53 and A57 - there is a 
> > huge codesize win
> > across the board on SPEC2006, there isn't a single benchmark that is larger 
> > (ie. more
> > spilling).
> >
> > > I'd be happy to move this forward if you could show if there is no 
> > > *increase* in spills
> > > for the same range of benchmarks that you are doing for the Cortex-A8 and 
> > > Cortex-A9
> > > schedulers.
> >
> > There certainly isn't. I don't think results like these could be any more 
> > one-sided, it's a
> > significant win for every single benchmark, both for codesize and 
> > performance!
> >
>
> Ok go ahead - please be sensitive to testsuite regressions.
>

Hi Wilco,

I've noticed that your patch caused a regression:
FAIL: gcc.dg/tree-prof/pr77698.c scan-rtl-dump-times alignments
"internal loop alignment added" 1

Christophe


when the compiler is configured --with-mode thumb (or forcing -mthumb
when running the tests)
> Ramana
>
>
> > What isn't clear is whether something has gone horribly wrong in the 
> > scheduler which
> > could be fixed/reverted, but as it is right now I can't see it being useful 
> > at all. This means
> > we should also reevaluate whether pressure scheduling now hurts AArch64 too.
> >
> > Cheers,
> > Wilco


Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-10-11 Thread Ramana Radhakrishnan
On Fri, Oct 11, 2019 at 10:42 PM Wilco Dijkstra  wrote:
>
> Hi,
>
>  > the defaults for v7-a are still to use the
>  > Cortex-A8 scheduler
>
> I missed that part, but that's a serious bug btw - Cortex-A8 is 15 years old 
> now so
> way beyond obsolete. Even Cortex-A53 is ancient now, but it has an accurate 
> scheduler
> that performs surprisingly well on both in-order and out-of-order 
> implementations.

On armv8-a we do use cortex-a53 as the default scheduler in the AArch32 backend.

regards
Ramana

>
> Cheers,
> Wilco


Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-10-11 Thread Ramana Radhakrishnan
On Fri, Oct 11, 2019 at 6:19 PM Wilco Dijkstra  wrote:
>
> Hi Ramana,
>
> > Can you see what happens with the Cortex-A8 or Cortex-A9 schedulers to
> > spread the range across some v7-a CPUs as well ? While they aren't that 
> > popular today I
> > would suggest you look at them because the defaults for v7-a are still to 
> > use the
> > Cortex-A8 scheduler and the Cortex-A9 scheduler might well also get used in 
> > places given
> > the availability of hardware.
>
> The results are practically identical to Cortex-A53 and A57 - there is a huge 
> codesize win
> across the board on SPEC2006, there isn't a single benchmark that is larger 
> (ie. more
> spilling).
>
> > I'd be happy to move this forward if you could show if there is no 
> > *increase* in spills
> > for the same range of benchmarks that you are doing for the Cortex-A8 and 
> > Cortex-A9
> > schedulers.
>
> There certainly isn't. I don't think results like these could be any more 
> one-sided, it's a
> significant win for every single benchmark, both for codesize and performance!
>

Ok go ahead - please be sensitive to testsuite regressions.

Ramana


> What isn't clear is whether something has gone horribly wrong in the 
> scheduler which
> could be fixed/reverted, but as it is right now I can't see it being useful 
> at all. This means
> we should also reevaluate whether pressure scheduling now hurts AArch64 too.
>
> Cheers,
> Wilco


Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-10-11 Thread Wilco Dijkstra
Hi,

 > the defaults for v7-a are still to use the
 > Cortex-A8 scheduler

I missed that part, but that's a serious bug btw - Cortex-A8 is 15 years old 
now so
way beyond obsolete. Even Cortex-A53 is ancient now, but it has an accurate 
scheduler
that performs surprisingly well on both in-order and out-of-order 
implementations.

Cheers,
Wilco

Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-10-11 Thread Wilco Dijkstra
Hi Ramana, 

> Can you see what happens with the Cortex-A8 or Cortex-A9 schedulers to
> spread the range across some v7-a CPUs as well ? While they aren't that 
> popular today I
> would suggest you look at them because the defaults for v7-a are still to use 
> the
> Cortex-A8 scheduler and the Cortex-A9 scheduler might well also get used in 
> places given
> the availability of hardware.

The results are practically identical to Cortex-A53 and A57 - there is a huge 
codesize win
across the board on SPEC2006, there isn't a single benchmark that is larger 
(ie. more
spilling).

> I'd be happy to move this forward if you could show if there is no *increase* 
> in spills
> for the same range of benchmarks that you are doing for the Cortex-A8 and 
> Cortex-A9
> schedulers.

There certainly isn't. I don't think results like these could be any more 
one-sided, it's a
significant win for every single benchmark, both for codesize and performance!

What isn't clear is whether something has gone horribly wrong in the scheduler 
which
could be fixed/reverted, but as it is right now I can't see it being useful at 
all. This means
we should also reevaluate whether pressure scheduling now hurts AArch64 too.

Cheers,
Wilco

Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-10-10 Thread Ramana Radhakrishnan
On Tue, Jul 30, 2019 at 4:16 PM Wilco Dijkstra  wrote:
>
> Hi all,
>
>  >On 30/07/2019 10:31, Ramana Radhakrishnan wrote:
>  >> On 30/07/2019 10:08, Christophe Lyon wrote:
>
>  >>> Hi Wilco,
>  >>>
>  >>> Do you know which benchmarks were used when this was checked-in?
>  >>> It isn't clear from
>  >>> https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html
>  >>
>  >> It was from my time in Linaro and thus would have been a famous embedded
>  >> benchmark, coremark , spec2000 - all tested probably on cortex-a9 and
>  >> Cortex-A15. In addition to this I would like to see what the impact of
>  >> this is on something like Cortex-A53 as the issue rates are likely to be
>  >> different on the schedulers causing different behaviour.
>
> Obviously there are differences between various schedulers, but the general
> issue is that register pressure is increased many times beyond the spilling 
> limit
> (a few cases I looked at had a pressure well over 120 when there are only 14
> integer registers - this causes panic spilling in the register allocator).
>
> In fact the spilling overhead between the 2 algorithms is almost identical on
> Cortex-A53 and Cortex-A57, so the issue isn't directly related to the pipeline
> model used. It seems more related to the scheduler being too aggressive
> and not caring about register pressure at all (for example lifting a load 100
> instructions before its use so it must be spilled).

In those days it would have been the Cortex-A8, Cortex-A9 schedulers
and the Cortex-A15
schedulers and IIRC the benchmarking would have been mostly on a
Cortex-A9 board or on some
Cortex-A15 boards we had (long gone now) inside Arm.

Can you see what happens with the Cortex-A8 or Cortex-A9 schedulers to
spread the range
across some v7-a CPUs as well ? While they aren't that popular today I
would suggest
you look at them because the defaults for v7-a are still to use the
Cortex-A8 scheduler and
the Cortex-A9 scheduler might well also get used in places given the
availability of hardware.


>
>  >> I don't have all the notes today for that - maybe you can look into the
>  >> linaro wiki.
>  >>
>  >> I am concerned about taking this patch in without some more data across
>  >> a variety of cores.
>  >>
>  >
>  > My concern is the original patch
>  > (https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html) is lacking in
>  > any real detail as to the reasons for the choice of the second algorithm
>  > over the first.
>  >
>  > - It's not clear what the win was
>  > - It's not clear what outliers there were and whether they were 
> significant.
>  >
>  > And finally, it's not clear if, 7 years later, this is still the best
>  > choice.
>  >
>  > If the second algorithm really is better, why is no other target using
>  > it by default?
>  >
>  > I think we need a bit more information (both ways).  In particular I'm
>  > concerned not just by the overall benchmark average, but also the amount
>  > of variance across the benchmarks.  I think the default needs to avoid
>  > significant outliers if at all possible, even if it is marginally less
>  > good on the average.
>
> The results clearly show that algorithm 1 works best on Arm today - I haven't
> seen a single benchmark where algorithm 2 results in less spilling. We could
> tune algorithm 2 so it switches back to algorithm 1 when register pressure is
> high or a basic block is large. However until it is fixed, the evidence is 
> that
> algorithm 1 is the best choice for current cores.

I'd be happy to move this forward if you could show if there is no
*increase* in spills
for the same range of benchmarks that you are doing for the Cortex-A8
and Cortex-A9
schedulers.

Sorry about the time it has taken. I've been a bit otherwise occupied recently.

regards
Ramana

>
> Wilco


Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-10-10 Thread Wilco Dijkstra
ping

   Currently the Arm backend selects the alternative sched pressure algorithm.
   The issue is that this doesn't take register pressure into account, and so
   it causes significant additional spilling on Arm where there are only 14
   allocatable registers.  SPEC2006 shows significant codesize reduction
   with the default pressure algorithm, so switch back to that.  PR77308 shows
   ~800 fewer instructions.
   
   SPECINT2006 is ~0.6% faster on Cortex-A57 together with the other DImode
   patches. Overall SPEC codesize is 1.1% smaller.
   
   Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57
   
   ChangeLog:
   2019-07-29  Wilco Dijkstra  
   
   * config/arm/arm.c (arm_option_override): Don't override sched
   pressure algorithm.
   
   --
   
   diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
   index 
81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515
 100644
   --- a/gcc/config/arm/arm.c
   +++ b/gcc/config/arm/arm.c
   @@ -3575,11 +3575,6 @@ arm_option_override (void)
  if (use_neon_for_64bits == 1)
 prefer_neon_for_64bits = true;

   -  /* Use the alternative scheduling-pressure algorithm by default.  */
   -  maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, 
SCHED_PRESSURE_MODEL,
   -global_options.x_param_values,
   -global_options_set.x_param_values);
   -
  /* Look through ready list and all of queue for instructions
 relevant for L2 auto-prefetcher.  */
  int param_sched_autopref_queue_depth;


Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-09-09 Thread Wilco Dijkstra
   
 
ping
 
 
  
    
  Currently the Arm backend selects the alternative sched pressure algorithm.
   The issue is that this doesn't take register pressure into account, and so
   it causes significant additional spilling on Arm where there are only 14
   allocatable registers.  SPEC2006 shows significant codesize reduction
   with the default pressure algorithm, so switch back to that.  PR77308 shows
   ~800 fewer instructions.
   
   SPECINT2006 is ~0.6% faster on Cortex-A57 together with the other DImode
   patches. Overall SPEC codesize is 1.1% smaller.
   
   Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57
   
   ChangeLog:
   2019-07-29  Wilco Dijkstra  
   
   * config/arm/arm.c (arm_option_override): Don't override sched
   pressure algorithm.
   
   --
   
   diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
   index 
81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515
 100644
   --- a/gcc/config/arm/arm.c
   +++ b/gcc/config/arm/arm.c
   @@ -3575,11 +3575,6 @@ arm_option_override (void)
  if (use_neon_for_64bits == 1)
     prefer_neon_for_64bits = true;
    
   -  /* Use the alternative scheduling-pressure algorithm by default.  */
   -  maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, 
SCHED_PRESSURE_MODEL,
   -    global_options.x_param_values,
   -    global_options_set.x_param_values);
   -
  /* Look through ready list and all of queue for instructions
     relevant for L2 auto-prefetcher.  */
  int param_sched_autopref_queue_depth;
   
   

Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-09-02 Thread Wilco Dijkstra
ping


 
   
 Currently the Arm backend selects the alternative sched pressure algorithm.
  The issue is that this doesn't take register pressure into account, and so
  it causes significant additional spilling on Arm where there are only 14
  allocatable registers.  SPEC2006 shows significant codesize reduction
  with the default pressure algorithm, so switch back to that.  PR77308 shows
  ~800 fewer instructions.
  
  SPECINT2006 is ~0.6% faster on Cortex-A57 together with the other DImode
  patches. Overall SPEC codesize is 1.1% smaller.
  
  Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57
  
  ChangeLog:
  2019-07-29  Wilco Dijkstra  
  
  * config/arm/arm.c (arm_option_override): Don't override sched
  pressure algorithm.
  
  --
  
  diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
  index 
81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515
 100644
  --- a/gcc/config/arm/arm.c
  +++ b/gcc/config/arm/arm.c
  @@ -3575,11 +3575,6 @@ arm_option_override (void)
     if (use_neon_for_64bits == 1)
    prefer_neon_for_64bits = true;
   
  -  /* Use the alternative scheduling-pressure algorithm by default.  */
  -  maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, 
SCHED_PRESSURE_MODEL,
  -    global_options.x_param_values,
  -    global_options_set.x_param_values);
  -
     /* Look through ready list and all of queue for instructions
    relevant for L2 auto-prefetcher.  */
     int param_sched_autopref_queue_depth;
  
  

Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-08-19 Thread Wilco Dijkstra


ping


  
Currently the Arm backend selects the alternative sched pressure algorithm.
 The issue is that this doesn't take register pressure into account, and so
 it causes significant additional spilling on Arm where there are only 14
 allocatable registers.  SPEC2006 shows significant codesize reduction
 with the default pressure algorithm, so switch back to that.  PR77308 shows
 ~800 fewer instructions.
 
 SPECINT2006 is ~0.6% faster on Cortex-A57 together with the other DImode
 patches. Overall SPEC codesize is 1.1% smaller.
 
 Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57
 
 ChangeLog:
 2019-07-29  Wilco Dijkstra  
 
     * config/arm/arm.c (arm_option_override): Don't override sched
     pressure algorithm.
 
 --
 
 diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
 index 
81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515
 100644
 --- a/gcc/config/arm/arm.c
 +++ b/gcc/config/arm/arm.c
 @@ -3575,11 +3575,6 @@ arm_option_override (void)
    if (use_neon_for_64bits == 1)
   prefer_neon_for_64bits = true;
  
 -  /* Use the alternative scheduling-pressure algorithm by default.  */
 -  maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, SCHED_PRESSURE_MODEL,
 -    global_options.x_param_values,
 -    global_options_set.x_param_values);
 -
    /* Look through ready list and all of queue for instructions
   relevant for L2 auto-prefetcher.  */
    int param_sched_autopref_queue_depth;
 
 

Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-07-30 Thread Wilco Dijkstra
Hi all,
 
 >On 30/07/2019 10:31, Ramana Radhakrishnan wrote:
 >> On 30/07/2019 10:08, Christophe Lyon wrote:

 >>> Hi Wilco,
 >>>
 >>> Do you know which benchmarks were used when this was checked-in?
 >>> It isn't clear from 
 >>> https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html
 >> 
 >> It was from my time in Linaro and thus would have been a famous embedded 
 >> benchmark, coremark , spec2000 - all tested probably on cortex-a9 and 
 >> Cortex-A15. In addition to this I would like to see what the impact of 
 >> this is on something like Cortex-A53 as the issue rates are likely to be 
 >> different on the schedulers causing different behaviour.

Obviously there are differences between various schedulers, but the general
issue is that register pressure is increased many times beyond the spilling 
limit
(a few cases I looked at had a pressure well over 120 when there are only 14
integer registers - this causes panic spilling in the register allocator).

In fact the spilling overhead between the 2 algorithms is almost identical on
Cortex-A53 and Cortex-A57, so the issue isn't directly related to the pipeline
model used. It seems more related to the scheduler being too aggressive
and not caring about register pressure at all (for example lifting a load 100
instructions before its use so it must be spilled).

 >> I don't have all the notes today for that - maybe you can look into the 
 >> linaro wiki.
 >> 
 >> I am concerned about taking this patch in without some more data across 
 >> a variety of cores.
 >> 
 >
 > My concern is the original patch 
 > (https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html) is lacking in 
 > any real detail as to the reasons for the choice of the second algorithm 
 > over the first.
 > 
 > - It's not clear what the win was
 > - It's not clear what outliers there were and whether they were significant.
 >
 > And finally, it's not clear if, 7 years later, this is still the best 
 > choice.
 > 
 > If the second algorithm really is better, why is no other target using 
 > it by default?
 >
 > I think we need a bit more information (both ways).  In particular I'm 
 > concerned not just by the overall benchmark average, but also the amount 
 > of variance across the benchmarks.  I think the default needs to avoid 
 > significant outliers if at all possible, even if it is marginally less 
 > good on the average.
 
The results clearly show that algorithm 1 works best on Arm today - I haven't
seen a single benchmark where algorithm 2 results in less spilling. We could
tune algorithm 2 so it switches back to algorithm 1 when register pressure is
high or a basic block is large. However until it is fixed, the evidence is that
algorithm 1 is the best choice for current cores.

Wilco

Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-07-30 Thread Richard Earnshaw (lists)




On 30/07/2019 10:31, Ramana Radhakrishnan wrote:

On 30/07/2019 10:08, Christophe Lyon wrote:
On Mon, 29 Jul 2019 at 18:49, Wilco Dijkstra  
wrote:


Currently the Arm backend selects the alternative sched pressure 
algorithm.
The issue is that this doesn't take register pressure into account, 
and so

it causes significant additional spilling on Arm where there are only 14
allocatable registers.  SPEC2006 shows significant codesize reduction
with the default pressure algorithm, so switch back to that.  PR77308 
shows

~800 fewer instructions.

SPECINT2006 is ~0.6% faster on Cortex-A57 together with the other DImode
patches. Overall SPEC codesize is 1.1% smaller.



Hi Wilco,

Do you know which benchmarks were used when this was checked-in?
It isn't clear from 
https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html


It was from my time in Linaro and thus would have been a famous embedded 
benchmark, coremark , spec2000 - all tested probably on cortex-a9 and 
Cortex-A15. In addition to this I would like to see what the impact of 
this is on something like Cortex-A53 as the issue rates are likely to be 
different on the schedulers causing different behaviour.



I don't have all the notes today for that - maybe you can look into the 
linaro wiki.


I am concerned about taking this patch in without some more data across 
a variety of cores.




My concern is the original patch 
(https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html) is lacking in 
any real detail as to the reasons for the choice of the second algorithm 
over the first.


- It's not clear what the win was
- It's not clear what outliers there were and whether they were significant.

And finally, it's not clear if, 7 years later, this is still the best 
choice.


If the second algorithm really is better, why is no other target using 
it by default?


I think we need a bit more information (both ways).  In particular I'm 
concerned not just by the overall benchmark average, but also the amount 
of variance across the benchmarks.  I think the default needs to avoid 
significant outliers if at all possible, even if it is marginally less 
good on the average.


R.


Thanks,
Ramana




Thanks,

Christophe


Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57

ChangeLog:
2019-07-29  Wilco Dijkstra  

 * config/arm/arm.c (arm_option_override): Don't override sched
 pressure algorithm.

--

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515 
100644

--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3575,11 +3575,6 @@ arm_option_override (void)
    if (use_neon_for_64bits == 1)
   prefer_neon_for_64bits = true;

-  /* Use the alternative scheduling-pressure algorithm by default.  */
-  maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, 
SCHED_PRESSURE_MODEL,

-    global_options.x_param_values,
-    global_options_set.x_param_values);
-
    /* Look through ready list and all of queue for instructions
   relevant for L2 auto-prefetcher.  */
    int param_sched_autopref_queue_depth;





Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-07-30 Thread Ramana Radhakrishnan

On 30/07/2019 10:08, Christophe Lyon wrote:

On Mon, 29 Jul 2019 at 18:49, Wilco Dijkstra  wrote:


Currently the Arm backend selects the alternative sched pressure algorithm.
The issue is that this doesn't take register pressure into account, and so
it causes significant additional spilling on Arm where there are only 14
allocatable registers.  SPEC2006 shows significant codesize reduction
with the default pressure algorithm, so switch back to that.  PR77308 shows
~800 fewer instructions.

SPECINT2006 is ~0.6% faster on Cortex-A57 together with the other DImode
patches. Overall SPEC codesize is 1.1% smaller.



Hi Wilco,

Do you know which benchmarks were used when this was checked-in?
It isn't clear from https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html


It was from my time in Linaro and thus would have been a famous embedded 
benchmark, coremark , spec2000 - all tested probably on cortex-a9 and 
Cortex-A15. In addition to this I would like to see what the impact of 
this is on something like Cortex-A53 as the issue rates are likely to be 
different on the schedulers causing different behaviour.



I don't have all the notes today for that - maybe you can look into the 
linaro wiki.


I am concerned about taking this patch in without some more data across 
a variety of cores.


Thanks,
Ramana




Thanks,

Christophe


Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57

ChangeLog:
2019-07-29  Wilco Dijkstra  

 * config/arm/arm.c (arm_option_override): Don't override sched
 pressure algorithm.

--

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3575,11 +3575,6 @@ arm_option_override (void)
if (use_neon_for_64bits == 1)
   prefer_neon_for_64bits = true;

-  /* Use the alternative scheduling-pressure algorithm by default.  */
-  maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, SCHED_PRESSURE_MODEL,
-global_options.x_param_values,
-global_options_set.x_param_values);
-
/* Look through ready list and all of queue for instructions
   relevant for L2 auto-prefetcher.  */
int param_sched_autopref_queue_depth;





Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-07-30 Thread Christophe Lyon
On Mon, 29 Jul 2019 at 18:49, Wilco Dijkstra  wrote:
>
> Currently the Arm backend selects the alternative sched pressure algorithm.
> The issue is that this doesn't take register pressure into account, and so
> it causes significant additional spilling on Arm where there are only 14
> allocatable registers.  SPEC2006 shows significant codesize reduction
> with the default pressure algorithm, so switch back to that.  PR77308 shows
> ~800 fewer instructions.
>
> SPECINT2006 is ~0.6% faster on Cortex-A57 together with the other DImode
> patches. Overall SPEC codesize is 1.1% smaller.
>

Hi Wilco,

Do you know which benchmarks were used when this was checked-in?
It isn't clear from https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html

Thanks,

Christophe

> Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57
>
> ChangeLog:
> 2019-07-29  Wilco Dijkstra  
>
> * config/arm/arm.c (arm_option_override): Don't override sched
> pressure algorithm.
>
> --
>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 
> 81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515
>  100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -3575,11 +3575,6 @@ arm_option_override (void)
>if (use_neon_for_64bits == 1)
>   prefer_neon_for_64bits = true;
>
> -  /* Use the alternative scheduling-pressure algorithm by default.  */
> -  maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, 
> SCHED_PRESSURE_MODEL,
> -global_options.x_param_values,
> -global_options_set.x_param_values);
> -
>/* Look through ready list and all of queue for instructions
>   relevant for L2 auto-prefetcher.  */
>int param_sched_autopref_queue_depth;
>


[PATCH][ARM] Switch to default sched pressure algorithm

2019-07-29 Thread Wilco Dijkstra
Currently the Arm backend selects the alternative sched pressure algorithm.
The issue is that this doesn't take register pressure into account, and so
it causes significant additional spilling on Arm where there are only 14
allocatable registers.  SPEC2006 shows significant codesize reduction
with the default pressure algorithm, so switch back to that.  PR77308 shows
~800 fewer instructions.

SPECINT2006 is ~0.6% faster on Cortex-A57 together with the other DImode
patches. Overall SPEC codesize is 1.1% smaller.

Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57

ChangeLog:
2019-07-29  Wilco Dijkstra  

* config/arm/arm.c (arm_option_override): Don't override sched
pressure algorithm.

--

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3575,11 +3575,6 @@ arm_option_override (void)
   if (use_neon_for_64bits == 1)
  prefer_neon_for_64bits = true;
 
-  /* Use the alternative scheduling-pressure algorithm by default.  */
-  maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, SCHED_PRESSURE_MODEL,
-global_options.x_param_values,
-global_options_set.x_param_values);
-
   /* Look through ready list and all of queue for instructions
  relevant for L2 auto-prefetcher.  */
   int param_sched_autopref_queue_depth;