Re: [PATCH][ARM] Switch to default sched pressure algorithm
Hi, >> I've noticed that your patch caused a regression: >> FAIL: gcc.dg/tree-prof/pr77698.c scan-rtl-dump-times alignments >> "internal loop alignment added" 1 I've created https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93007 Cheers, Wilco
Re: [PATCH][ARM] Switch to default sched pressure algorithm
On 16/10/2019 13:13, Wilco Dijkstra wrote: Hi Christophe, I've noticed that your patch caused a regression: FAIL: gcc.dg/tree-prof/pr77698.c scan-rtl-dump-times alignments "internal loop alignment added" 1 That's just a testism - it only tests for loop alignment and doesn't consider the possibility of the loop being jumped into like this: .L17: adds r0, r0, #1 b .L27 .L6: ldr r4, [r2, #12] adds r0, r0, #4 ldr lr, [r1] str lr, [r3, r4, lsl #2] ldr r4, [r2, #12] ldr lr, [r1] str lr, [r3, r4, lsl #2] ldr r4, [r2, #12] ldr lr, [r1] str lr, [r3, r4, lsl #2] .L27: ldr r4, [r2, #12] cmp ip, r0 ldr lr, [r1] str lr, [r3, r4, lsl #2] bne .L6 pop {r4, pc} It seems minor changes in scheduling allows blocks to be commoned or not. The underlying issue is that commoning like this should not be allowed on blocks with different profile stats - particularly on loops where it inhibits scheduling of the loop itself. Cheers, Wilco So what's your proposed solution? Leaving the test failing is not an option.
Re: [PATCH][ARM] Switch to default sched pressure algorithm
Hi Christophe, > I've noticed that your patch caused a regression: > FAIL: gcc.dg/tree-prof/pr77698.c scan-rtl-dump-times alignments > "internal loop alignment added" 1 That's just a testism - it only tests for loop alignment and doesn't consider the possibility of the loop being jumped into like this: .L17: adds r0, r0, #1 b .L27 .L6: ldr r4, [r2, #12] adds r0, r0, #4 ldr lr, [r1] str lr, [r3, r4, lsl #2] ldr r4, [r2, #12] ldr lr, [r1] str lr, [r3, r4, lsl #2] ldr r4, [r2, #12] ldr lr, [r1] str lr, [r3, r4, lsl #2] .L27: ldr r4, [r2, #12] cmp ip, r0 ldr lr, [r1] str lr, [r3, r4, lsl #2] bne .L6 pop {r4, pc} It seems minor changes in scheduling allows blocks to be commoned or not. The underlying issue is that commoning like this should not be allowed on blocks with different profile stats - particularly on loops where it inhibits scheduling of the loop itself. Cheers, Wilco
Re: [PATCH][ARM] Switch to default sched pressure algorithm
On Sat, 12 Oct 2019 at 02:52, Ramana Radhakrishnan wrote: > > On Fri, Oct 11, 2019 at 6:19 PM Wilco Dijkstra wrote: > > > > Hi Ramana, > > > > > Can you see what happens with the Cortex-A8 or Cortex-A9 schedulers to > > > spread the range across some v7-a CPUs as well ? While they aren't that > > > popular today I > > > would suggest you look at them because the defaults for v7-a are still to > > > use the > > > Cortex-A8 scheduler and the Cortex-A9 scheduler might well also get used > > > in places given > > > the availability of hardware. > > > > The results are practically identical to Cortex-A53 and A57 - there is a > > huge codesize win > > across the board on SPEC2006, there isn't a single benchmark that is larger > > (ie. more > > spilling). > > > > > I'd be happy to move this forward if you could show if there is no > > > *increase* in spills > > > for the same range of benchmarks that you are doing for the Cortex-A8 and > > > Cortex-A9 > > > schedulers. > > > > There certainly isn't. I don't think results like these could be any more > > one-sided, it's a > > significant win for every single benchmark, both for codesize and > > performance! > > > > Ok go ahead - please be sensitive to testsuite regressions. > Hi Wilco, I've noticed that your patch caused a regression: FAIL: gcc.dg/tree-prof/pr77698.c scan-rtl-dump-times alignments "internal loop alignment added" 1 Christophe when the compiler is configured --with-mode thumb (or forcing -mthumb when running the tests) > Ramana > > > > What isn't clear is whether something has gone horribly wrong in the > > scheduler which > > could be fixed/reverted, but as it is right now I can't see it being useful > > at all. This means > > we should also reevaluate whether pressure scheduling now hurts AArch64 too. > > > > Cheers, > > Wilco
Re: [PATCH][ARM] Switch to default sched pressure algorithm
On Fri, Oct 11, 2019 at 10:42 PM Wilco Dijkstra wrote: > > Hi, > > > the defaults for v7-a are still to use the > > Cortex-A8 scheduler > > I missed that part, but that's a serious bug btw - Cortex-A8 is 15 years old > now so > way beyond obsolete. Even Cortex-A53 is ancient now, but it has an accurate > scheduler > that performs surprisingly well on both in-order and out-of-order > implementations. On armv8-a we do use cortex-a53 as the default scheduler in the AArch32 backend. regards Ramana > > Cheers, > Wilco
Re: [PATCH][ARM] Switch to default sched pressure algorithm
On Fri, Oct 11, 2019 at 6:19 PM Wilco Dijkstra wrote: > > Hi Ramana, > > > Can you see what happens with the Cortex-A8 or Cortex-A9 schedulers to > > spread the range across some v7-a CPUs as well ? While they aren't that > > popular today I > > would suggest you look at them because the defaults for v7-a are still to > > use the > > Cortex-A8 scheduler and the Cortex-A9 scheduler might well also get used in > > places given > > the availability of hardware. > > The results are practically identical to Cortex-A53 and A57 - there is a huge > codesize win > across the board on SPEC2006, there isn't a single benchmark that is larger > (ie. more > spilling). > > > I'd be happy to move this forward if you could show if there is no > > *increase* in spills > > for the same range of benchmarks that you are doing for the Cortex-A8 and > > Cortex-A9 > > schedulers. > > There certainly isn't. I don't think results like these could be any more > one-sided, it's a > significant win for every single benchmark, both for codesize and performance! > Ok go ahead - please be sensitive to testsuite regressions. Ramana > What isn't clear is whether something has gone horribly wrong in the > scheduler which > could be fixed/reverted, but as it is right now I can't see it being useful > at all. This means > we should also reevaluate whether pressure scheduling now hurts AArch64 too. > > Cheers, > Wilco
Re: [PATCH][ARM] Switch to default sched pressure algorithm
Hi, > the defaults for v7-a are still to use the > Cortex-A8 scheduler I missed that part, but that's a serious bug btw - Cortex-A8 is 15 years old now so way beyond obsolete. Even Cortex-A53 is ancient now, but it has an accurate scheduler that performs surprisingly well on both in-order and out-of-order implementations. Cheers, Wilco
Re: [PATCH][ARM] Switch to default sched pressure algorithm
Hi Ramana, > Can you see what happens with the Cortex-A8 or Cortex-A9 schedulers to > spread the range across some v7-a CPUs as well ? While they aren't that > popular today I > would suggest you look at them because the defaults for v7-a are still to use > the > Cortex-A8 scheduler and the Cortex-A9 scheduler might well also get used in > places given > the availability of hardware. The results are practically identical to Cortex-A53 and A57 - there is a huge codesize win across the board on SPEC2006, there isn't a single benchmark that is larger (ie. more spilling). > I'd be happy to move this forward if you could show if there is no *increase* > in spills > for the same range of benchmarks that you are doing for the Cortex-A8 and > Cortex-A9 > schedulers. There certainly isn't. I don't think results like these could be any more one-sided, it's a significant win for every single benchmark, both for codesize and performance! What isn't clear is whether something has gone horribly wrong in the scheduler which could be fixed/reverted, but as it is right now I can't see it being useful at all. This means we should also reevaluate whether pressure scheduling now hurts AArch64 too. Cheers, Wilco
Re: [PATCH][ARM] Switch to default sched pressure algorithm
On Tue, Jul 30, 2019 at 4:16 PM Wilco Dijkstra wrote: > > Hi all, > > >On 30/07/2019 10:31, Ramana Radhakrishnan wrote: > >> On 30/07/2019 10:08, Christophe Lyon wrote: > > >>> Hi Wilco, > >>> > >>> Do you know which benchmarks were used when this was checked-in? > >>> It isn't clear from > >>> https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html > >> > >> It was from my time in Linaro and thus would have been a famous embedded > >> benchmark, coremark , spec2000 - all tested probably on cortex-a9 and > >> Cortex-A15. In addition to this I would like to see what the impact of > >> this is on something like Cortex-A53 as the issue rates are likely to be > >> different on the schedulers causing different behaviour. > > Obviously there are differences between various schedulers, but the general > issue is that register pressure is increased many times beyond the spilling > limit > (a few cases I looked at had a pressure well over 120 when there are only 14 > integer registers - this causes panic spilling in the register allocator). > > In fact the spilling overhead between the 2 algorithms is almost identical on > Cortex-A53 and Cortex-A57, so the issue isn't directly related to the pipeline > model used. It seems more related to the scheduler being too aggressive > and not caring about register pressure at all (for example lifting a load 100 > instructions before its use so it must be spilled). In those days it would have been the Cortex-A8, Cortex-A9 schedulers and the Cortex-A15 schedulers and IIRC the benchmarking would have been mostly on a Cortex-A9 board or on some Cortex-A15 boards we had (long gone now) inside Arm. Can you see what happens with the Cortex-A8 or Cortex-A9 schedulers to spread the range across some v7-a CPUs as well ? While they aren't that popular today I would suggest you look at them because the defaults for v7-a are still to use the Cortex-A8 scheduler and the Cortex-A9 scheduler might well also get used in places given the availability of hardware. > > >> I don't have all the notes today for that - maybe you can look into the > >> linaro wiki. > >> > >> I am concerned about taking this patch in without some more data across > >> a variety of cores. > >> > > > > My concern is the original patch > > (https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html) is lacking in > > any real detail as to the reasons for the choice of the second algorithm > > over the first. > > > > - It's not clear what the win was > > - It's not clear what outliers there were and whether they were > significant. > > > > And finally, it's not clear if, 7 years later, this is still the best > > choice. > > > > If the second algorithm really is better, why is no other target using > > it by default? > > > > I think we need a bit more information (both ways). In particular I'm > > concerned not just by the overall benchmark average, but also the amount > > of variance across the benchmarks. I think the default needs to avoid > > significant outliers if at all possible, even if it is marginally less > > good on the average. > > The results clearly show that algorithm 1 works best on Arm today - I haven't > seen a single benchmark where algorithm 2 results in less spilling. We could > tune algorithm 2 so it switches back to algorithm 1 when register pressure is > high or a basic block is large. However until it is fixed, the evidence is > that > algorithm 1 is the best choice for current cores. I'd be happy to move this forward if you could show if there is no *increase* in spills for the same range of benchmarks that you are doing for the Cortex-A8 and Cortex-A9 schedulers. Sorry about the time it has taken. I've been a bit otherwise occupied recently. regards Ramana > > Wilco
Re: [PATCH][ARM] Switch to default sched pressure algorithm
ping Currently the Arm backend selects the alternative sched pressure algorithm. The issue is that this doesn't take register pressure into account, and so it causes significant additional spilling on Arm where there are only 14 allocatable registers. SPEC2006 shows significant codesize reduction with the default pressure algorithm, so switch back to that. PR77308 shows ~800 fewer instructions. SPECINT2006 is ~0.6% faster on Cortex-A57 together with the other DImode patches. Overall SPEC codesize is 1.1% smaller. Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57 ChangeLog: 2019-07-29 Wilco Dijkstra * config/arm/arm.c (arm_option_override): Don't override sched pressure algorithm. -- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -3575,11 +3575,6 @@ arm_option_override (void) if (use_neon_for_64bits == 1) prefer_neon_for_64bits = true; - /* Use the alternative scheduling-pressure algorithm by default. */ - maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, SCHED_PRESSURE_MODEL, -global_options.x_param_values, -global_options_set.x_param_values); - /* Look through ready list and all of queue for instructions relevant for L2 auto-prefetcher. */ int param_sched_autopref_queue_depth;
Re: [PATCH][ARM] Switch to default sched pressure algorithm
ping Currently the Arm backend selects the alternative sched pressure algorithm. The issue is that this doesn't take register pressure into account, and so it causes significant additional spilling on Arm where there are only 14 allocatable registers. SPEC2006 shows significant codesize reduction with the default pressure algorithm, so switch back to that. PR77308 shows ~800 fewer instructions. SPECINT2006 is ~0.6% faster on Cortex-A57 together with the other DImode patches. Overall SPEC codesize is 1.1% smaller. Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57 ChangeLog: 2019-07-29 Wilco Dijkstra * config/arm/arm.c (arm_option_override): Don't override sched pressure algorithm. -- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -3575,11 +3575,6 @@ arm_option_override (void) if (use_neon_for_64bits == 1) prefer_neon_for_64bits = true; - /* Use the alternative scheduling-pressure algorithm by default. */ - maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, SCHED_PRESSURE_MODEL, - global_options.x_param_values, - global_options_set.x_param_values); - /* Look through ready list and all of queue for instructions relevant for L2 auto-prefetcher. */ int param_sched_autopref_queue_depth;
Re: [PATCH][ARM] Switch to default sched pressure algorithm
ping Currently the Arm backend selects the alternative sched pressure algorithm. The issue is that this doesn't take register pressure into account, and so it causes significant additional spilling on Arm where there are only 14 allocatable registers. SPEC2006 shows significant codesize reduction with the default pressure algorithm, so switch back to that. PR77308 shows ~800 fewer instructions. SPECINT2006 is ~0.6% faster on Cortex-A57 together with the other DImode patches. Overall SPEC codesize is 1.1% smaller. Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57 ChangeLog: 2019-07-29 Wilco Dijkstra * config/arm/arm.c (arm_option_override): Don't override sched pressure algorithm. -- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -3575,11 +3575,6 @@ arm_option_override (void) if (use_neon_for_64bits == 1) prefer_neon_for_64bits = true; - /* Use the alternative scheduling-pressure algorithm by default. */ - maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, SCHED_PRESSURE_MODEL, - global_options.x_param_values, - global_options_set.x_param_values); - /* Look through ready list and all of queue for instructions relevant for L2 auto-prefetcher. */ int param_sched_autopref_queue_depth;
Re: [PATCH][ARM] Switch to default sched pressure algorithm
ping Currently the Arm backend selects the alternative sched pressure algorithm. The issue is that this doesn't take register pressure into account, and so it causes significant additional spilling on Arm where there are only 14 allocatable registers. SPEC2006 shows significant codesize reduction with the default pressure algorithm, so switch back to that. PR77308 shows ~800 fewer instructions. SPECINT2006 is ~0.6% faster on Cortex-A57 together with the other DImode patches. Overall SPEC codesize is 1.1% smaller. Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57 ChangeLog: 2019-07-29 Wilco Dijkstra * config/arm/arm.c (arm_option_override): Don't override sched pressure algorithm. -- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -3575,11 +3575,6 @@ arm_option_override (void) if (use_neon_for_64bits == 1) prefer_neon_for_64bits = true; - /* Use the alternative scheduling-pressure algorithm by default. */ - maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, SCHED_PRESSURE_MODEL, - global_options.x_param_values, - global_options_set.x_param_values); - /* Look through ready list and all of queue for instructions relevant for L2 auto-prefetcher. */ int param_sched_autopref_queue_depth;
Re: [PATCH][ARM] Switch to default sched pressure algorithm
Hi all, >On 30/07/2019 10:31, Ramana Radhakrishnan wrote: >> On 30/07/2019 10:08, Christophe Lyon wrote: >>> Hi Wilco, >>> >>> Do you know which benchmarks were used when this was checked-in? >>> It isn't clear from >>> https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html >> >> It was from my time in Linaro and thus would have been a famous embedded >> benchmark, coremark , spec2000 - all tested probably on cortex-a9 and >> Cortex-A15. In addition to this I would like to see what the impact of >> this is on something like Cortex-A53 as the issue rates are likely to be >> different on the schedulers causing different behaviour. Obviously there are differences between various schedulers, but the general issue is that register pressure is increased many times beyond the spilling limit (a few cases I looked at had a pressure well over 120 when there are only 14 integer registers - this causes panic spilling in the register allocator). In fact the spilling overhead between the 2 algorithms is almost identical on Cortex-A53 and Cortex-A57, so the issue isn't directly related to the pipeline model used. It seems more related to the scheduler being too aggressive and not caring about register pressure at all (for example lifting a load 100 instructions before its use so it must be spilled). >> I don't have all the notes today for that - maybe you can look into the >> linaro wiki. >> >> I am concerned about taking this patch in without some more data across >> a variety of cores. >> > > My concern is the original patch > (https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html) is lacking in > any real detail as to the reasons for the choice of the second algorithm > over the first. > > - It's not clear what the win was > - It's not clear what outliers there were and whether they were significant. > > And finally, it's not clear if, 7 years later, this is still the best > choice. > > If the second algorithm really is better, why is no other target using > it by default? > > I think we need a bit more information (both ways). In particular I'm > concerned not just by the overall benchmark average, but also the amount > of variance across the benchmarks. I think the default needs to avoid > significant outliers if at all possible, even if it is marginally less > good on the average. The results clearly show that algorithm 1 works best on Arm today - I haven't seen a single benchmark where algorithm 2 results in less spilling. We could tune algorithm 2 so it switches back to algorithm 1 when register pressure is high or a basic block is large. However until it is fixed, the evidence is that algorithm 1 is the best choice for current cores. Wilco
Re: [PATCH][ARM] Switch to default sched pressure algorithm
On 30/07/2019 10:31, Ramana Radhakrishnan wrote: On 30/07/2019 10:08, Christophe Lyon wrote: On Mon, 29 Jul 2019 at 18:49, Wilco Dijkstra wrote: Currently the Arm backend selects the alternative sched pressure algorithm. The issue is that this doesn't take register pressure into account, and so it causes significant additional spilling on Arm where there are only 14 allocatable registers. SPEC2006 shows significant codesize reduction with the default pressure algorithm, so switch back to that. PR77308 shows ~800 fewer instructions. SPECINT2006 is ~0.6% faster on Cortex-A57 together with the other DImode patches. Overall SPEC codesize is 1.1% smaller. Hi Wilco, Do you know which benchmarks were used when this was checked-in? It isn't clear from https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html It was from my time in Linaro and thus would have been a famous embedded benchmark, coremark , spec2000 - all tested probably on cortex-a9 and Cortex-A15. In addition to this I would like to see what the impact of this is on something like Cortex-A53 as the issue rates are likely to be different on the schedulers causing different behaviour. I don't have all the notes today for that - maybe you can look into the linaro wiki. I am concerned about taking this patch in without some more data across a variety of cores. My concern is the original patch (https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html) is lacking in any real detail as to the reasons for the choice of the second algorithm over the first. - It's not clear what the win was - It's not clear what outliers there were and whether they were significant. And finally, it's not clear if, 7 years later, this is still the best choice. If the second algorithm really is better, why is no other target using it by default? I think we need a bit more information (both ways). In particular I'm concerned not just by the overall benchmark average, but also the amount of variance across the benchmarks. I think the default needs to avoid significant outliers if at all possible, even if it is marginally less good on the average. R. Thanks, Ramana Thanks, Christophe Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57 ChangeLog: 2019-07-29 Wilco Dijkstra * config/arm/arm.c (arm_option_override): Don't override sched pressure algorithm. -- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -3575,11 +3575,6 @@ arm_option_override (void) if (use_neon_for_64bits == 1) prefer_neon_for_64bits = true; - /* Use the alternative scheduling-pressure algorithm by default. */ - maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, SCHED_PRESSURE_MODEL, - global_options.x_param_values, - global_options_set.x_param_values); - /* Look through ready list and all of queue for instructions relevant for L2 auto-prefetcher. */ int param_sched_autopref_queue_depth;
Re: [PATCH][ARM] Switch to default sched pressure algorithm
On 30/07/2019 10:08, Christophe Lyon wrote: On Mon, 29 Jul 2019 at 18:49, Wilco Dijkstra wrote: Currently the Arm backend selects the alternative sched pressure algorithm. The issue is that this doesn't take register pressure into account, and so it causes significant additional spilling on Arm where there are only 14 allocatable registers. SPEC2006 shows significant codesize reduction with the default pressure algorithm, so switch back to that. PR77308 shows ~800 fewer instructions. SPECINT2006 is ~0.6% faster on Cortex-A57 together with the other DImode patches. Overall SPEC codesize is 1.1% smaller. Hi Wilco, Do you know which benchmarks were used when this was checked-in? It isn't clear from https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html It was from my time in Linaro and thus would have been a famous embedded benchmark, coremark , spec2000 - all tested probably on cortex-a9 and Cortex-A15. In addition to this I would like to see what the impact of this is on something like Cortex-A53 as the issue rates are likely to be different on the schedulers causing different behaviour. I don't have all the notes today for that - maybe you can look into the linaro wiki. I am concerned about taking this patch in without some more data across a variety of cores. Thanks, Ramana Thanks, Christophe Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57 ChangeLog: 2019-07-29 Wilco Dijkstra * config/arm/arm.c (arm_option_override): Don't override sched pressure algorithm. -- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -3575,11 +3575,6 @@ arm_option_override (void) if (use_neon_for_64bits == 1) prefer_neon_for_64bits = true; - /* Use the alternative scheduling-pressure algorithm by default. */ - maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, SCHED_PRESSURE_MODEL, -global_options.x_param_values, -global_options_set.x_param_values); - /* Look through ready list and all of queue for instructions relevant for L2 auto-prefetcher. */ int param_sched_autopref_queue_depth;
Re: [PATCH][ARM] Switch to default sched pressure algorithm
On Mon, 29 Jul 2019 at 18:49, Wilco Dijkstra wrote: > > Currently the Arm backend selects the alternative sched pressure algorithm. > The issue is that this doesn't take register pressure into account, and so > it causes significant additional spilling on Arm where there are only 14 > allocatable registers. SPEC2006 shows significant codesize reduction > with the default pressure algorithm, so switch back to that. PR77308 shows > ~800 fewer instructions. > > SPECINT2006 is ~0.6% faster on Cortex-A57 together with the other DImode > patches. Overall SPEC codesize is 1.1% smaller. > Hi Wilco, Do you know which benchmarks were used when this was checked-in? It isn't clear from https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html Thanks, Christophe > Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57 > > ChangeLog: > 2019-07-29 Wilco Dijkstra > > * config/arm/arm.c (arm_option_override): Don't override sched > pressure algorithm. > > -- > > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > index > 81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515 > 100644 > --- a/gcc/config/arm/arm.c > +++ b/gcc/config/arm/arm.c > @@ -3575,11 +3575,6 @@ arm_option_override (void) >if (use_neon_for_64bits == 1) > prefer_neon_for_64bits = true; > > - /* Use the alternative scheduling-pressure algorithm by default. */ > - maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, > SCHED_PRESSURE_MODEL, > -global_options.x_param_values, > -global_options_set.x_param_values); > - >/* Look through ready list and all of queue for instructions > relevant for L2 auto-prefetcher. */ >int param_sched_autopref_queue_depth; >
[PATCH][ARM] Switch to default sched pressure algorithm
Currently the Arm backend selects the alternative sched pressure algorithm. The issue is that this doesn't take register pressure into account, and so it causes significant additional spilling on Arm where there are only 14 allocatable registers. SPEC2006 shows significant codesize reduction with the default pressure algorithm, so switch back to that. PR77308 shows ~800 fewer instructions. SPECINT2006 is ~0.6% faster on Cortex-A57 together with the other DImode patches. Overall SPEC codesize is 1.1% smaller. Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57 ChangeLog: 2019-07-29 Wilco Dijkstra * config/arm/arm.c (arm_option_override): Don't override sched pressure algorithm. -- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -3575,11 +3575,6 @@ arm_option_override (void) if (use_neon_for_64bits == 1) prefer_neon_for_64bits = true; - /* Use the alternative scheduling-pressure algorithm by default. */ - maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, SCHED_PRESSURE_MODEL, -global_options.x_param_values, -global_options_set.x_param_values); - /* Look through ready list and all of queue for instructions relevant for L2 auto-prefetcher. */ int param_sched_autopref_queue_depth;