Tweak ALAP calculation in SCHED_PRESSURE_MODEL
This patch fixes a flaw in the relationship between the way that SCHED_PRESSURE_MODEL calculates the alap and depth vs how it uses them in model_order_p. A comment in model_order_p says: /* Combine the length of the longest path of satisfied true dependencies that leads to each instruction (depth) with the length of the longest path of any dependencies that leads from the instruction (alap). Prefer instructions with the greatest combined length. If the combined lengths are equal, prefer instructions with the greatest depth. The idea is that, if we have a set S of "equal" instructions that each have ALAP value X, and we pick one such instruction I, any true-dependent successors of I that have ALAP value X - 1 should be preferred over S. This encourages the schedule to be "narrow" rather than "wide". However, if I is a low-priority instruction that we decided to schedule because of its model_classify_pressure, and if there is a set of higher-priority instructions T, the aforementioned successors of I should not have the edge over T. */ The expectation was that scheduling an instruction X would give a greater priority to the highest-priority successor instructions Y than X had: Y.depth would be X.depth + 1 and Y.alap would be X.alap - 1, giving an equal combined height, but with the greater depth winning as a tie-breaker. But this doesn't work if the alap value was ultimately determined by an anti-dependence. This is particularly bad when --param max-pending-list-length kicks in, since we then start adding fake anti-dependencies in order to keep the list length down. These fake dependencies tend to be on the critical path. The attached patch avoids that by making the alap calculation only look at true dependencies. This shouldn't be too bad, since we use INSN_PRIORITY as the final tie-breaker than that does take anti-dependencies into account. This reduces the number of spills in the hot function from 436.cactusADM by 14% on aarch64 at -O3 (and the number of instructions in general). SPEC2017 shows a minor improvement on Cortex-A72 (about 0.1% overall). Thanks to Wilco for the benchmarking. Bootstrapped and tested on aarch64-none-linux-gnu. Is this ok for trunk? Thanks, Kyrill 2018-11-08 Richard Sandiford gcc/ * haifa-sched.c (model_analyze_insns): Only add 1 to the consumer's ALAP if the dependence is a true dependence. diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c index 1fdc9df9fb26f23758ec8326cec91eecc4c917c1..01825de440c2e818eceab5ab7411b20b05ee54f1 100644 --- a/gcc/haifa-sched.c +++ b/gcc/haifa-sched.c @@ -3504,8 +3504,10 @@ model_analyze_insns (void) FOR_EACH_DEP (iter, SD_LIST_FORW, sd_it, dep) { con = MODEL_INSN_INFO (DEP_CON (dep)); - if (con->insn && insn->alap < con->alap + 1) - insn->alap = con->alap + 1; + unsigned int min_alap + = con->alap + (DEP_TYPE (dep) == REG_DEP_TRUE); + if (con->insn && insn->alap < min_alap) + insn->alap = min_alap; } insn->old_queue = QUEUE_INDEX (iter);
Re: Tweak ALAP calculation in SCHED_PRESSURE_MODEL
On 11/8/18 5:10 AM, Kyrill Tkachov wrote: > > > This patch fixes a flaw in the relationship between the way that > SCHED_PRESSURE_MODEL calculates the alap and depth vs how it uses > them in model_order_p. A comment in model_order_p says: > > /* Combine the length of the longest path of satisfied true dependencies > that leads to each instruction (depth) with the length of the longest > path of any dependencies that leads from the instruction (alap). > Prefer instructions with the greatest combined length. If the > combined > lengths are equal, prefer instructions with the greatest depth. > > The idea is that, if we have a set S of "equal" instructions that each > have ALAP value X, and we pick one such instruction I, any > true-dependent > successors of I that have ALAP value X - 1 should be preferred over S. > This encourages the schedule to be "narrow" rather than "wide". > However, if I is a low-priority instruction that we decided to > schedule because of its model_classify_pressure, and if there > is a set of higher-priority instructions T, the aforementioned > successors of I should not have the edge over T. */ > > The expectation was that scheduling an instruction X would give a > greater priority to the highest-priority successor instructions Y than > X had: Y.depth would be X.depth + 1 and Y.alap would be X.alap - 1, > giving an equal combined height, but with the greater depth winning as > a tie-breaker. But this doesn't work if the alap value was ultimately > determined by an anti-dependence. > > This is particularly bad when --param max-pending-list-length kicks in, > since we then start adding fake anti-dependencies in order to keep the > list length down. These fake dependencies tend to be on the critical > path. > > The attached patch avoids that by making the alap calculation only > look at true dependencies. This shouldn't be too bad, since we use > INSN_PRIORITY as the final tie-breaker than that does take > anti-dependencies into account. > > This reduces the number of spills in the hot function from 436.cactusADM > by 14% on aarch64 at -O3 (and the number of instructions in general). > SPEC2017 shows a minor improvement on Cortex-A72 (about 0.1% overall). > Thanks to Wilco for the benchmarking. > > Bootstrapped and tested on aarch64-none-linux-gnu. > > Is this ok for trunk? > > Thanks, > Kyrill > > 2018-11-08 Richard Sandiford > > gcc/ > * haifa-sched.c (model_analyze_insns): Only add 1 to the consumer's > ALAP if the dependence is a true dependence. So at the least the documentation of the ALAP field would need to be updated as well as the comment you referenced (the "any dependencies"). But more importantly, it seems like blindly ignoring anti dependencies is just a hack that happens to work. I wonder if we could somehow mark the fake dependencies we add, and avoid bumping the ALAP when we encounter those fake dependencies. It probably wouldn't be a bad idea to look at the default for MAX_PENDING_LIST_LENGTH. Based on the current default value and the comments in the code that value could well have been tuned 25 or more years ago! How often are we falling over that during a bootstrap and during spec builds? Jeff
Re: Tweak ALAP calculation in SCHED_PRESSURE_MODEL
Hi Jeff, On 10/11/18 00:04, Jeff Law wrote: On 11/8/18 5:10 AM, Kyrill Tkachov wrote: This patch fixes a flaw in the relationship between the way that SCHED_PRESSURE_MODEL calculates the alap and depth vs how it uses them in model_order_p. A comment in model_order_p says: /* Combine the length of the longest path of satisfied true dependencies that leads to each instruction (depth) with the length of the longest path of any dependencies that leads from the instruction (alap). Prefer instructions with the greatest combined length. If the combined lengths are equal, prefer instructions with the greatest depth. The idea is that, if we have a set S of "equal" instructions that each have ALAP value X, and we pick one such instruction I, any true-dependent successors of I that have ALAP value X - 1 should be preferred over S. This encourages the schedule to be "narrow" rather than "wide". However, if I is a low-priority instruction that we decided to schedule because of its model_classify_pressure, and if there is a set of higher-priority instructions T, the aforementioned successors of I should not have the edge over T. */ The expectation was that scheduling an instruction X would give a greater priority to the highest-priority successor instructions Y than X had: Y.depth would be X.depth + 1 and Y.alap would be X.alap - 1, giving an equal combined height, but with the greater depth winning as a tie-breaker. But this doesn't work if the alap value was ultimately determined by an anti-dependence. This is particularly bad when --param max-pending-list-length kicks in, since we then start adding fake anti-dependencies in order to keep the list length down. These fake dependencies tend to be on the critical path. The attached patch avoids that by making the alap calculation only look at true dependencies. This shouldn't be too bad, since we use INSN_PRIORITY as the final tie-breaker than that does take anti-dependencies into account. This reduces the number of spills in the hot function from 436.cactusADM by 14% on aarch64 at -O3 (and the number of instructions in general). SPEC2017 shows a minor improvement on Cortex-A72 (about 0.1% overall). Thanks to Wilco for the benchmarking. Bootstrapped and tested on aarch64-none-linux-gnu. Is this ok for trunk? Thanks, Kyrill 2018-11-08 Richard Sandiford gcc/ * haifa-sched.c (model_analyze_insns): Only add 1 to the consumer's ALAP if the dependence is a true dependence. So at the least the documentation of the ALAP field would need to be updated as well as the comment you referenced (the "any dependencies"). Ah, I can easily update the patch for that. But more importantly, it seems like blindly ignoring anti dependencies is just a hack that happens to work. I wonder if we could somehow mark the fake dependencies we add, and avoid bumping the ALAP when we encounter those fake dependencies. I did experiment with this. I added a new property to dep_t to mark it as "artificial", that is created in the parts of sched-deps.c that add dependencies when MAX_PENDING_LIST_LENGTH is exceeded. Then ALAP is bumped only when the dependency is not artificial in this way. This didn't help much on the testcase we were looking at (the hot function in cactus from SPEC2006). The code size increase and number of spills decreased by only 6 (out of ~800) whereas with Richards' patch it improved much more (~140 decrease, with a corresponding improvement in stack usage and code size). Richard did suggest that anti-dependencies are already taken into account in the INSN_PRIORITY tie-breaker, so perhaps that is a better scheme indeed? It probably wouldn't be a bad idea to look at the default for MAX_PENDING_LIST_LENGTH. Based on the current default value and the comments in the code that value could well have been tuned 25 or more years ago! Probably. I see that s390 and spu increase that param in their backends to much larger values than the default I played around with increasing it on aarch64. It improved things somewhat, but Richard's patch still gave superior results. Thanks, Kyrill How often are we falling over that during a bootstrap and during spec builds? Jeff
Re: Tweak ALAP calculation in SCHED_PRESSURE_MODEL
On 11/16/18 5:35 AM, Kyrill Tkachov wrote: > >> But more importantly, it seems like blindly ignoring anti dependencies >> is just a hack that happens to work. I wonder if we could somehow mark >> the fake dependencies we add, and avoid bumping the ALAP when we >> encounter those fake dependencies. > > I did experiment with this. I added a new property to dep_t > to mark it as "artificial", that is created in the parts of sched-deps.c > that add dependencies when MAX_PENDING_LIST_LENGTH is exceeded. > > Then ALAP is bumped only when the dependency is not artificial in this way. > This didn't help much on the testcase we were looking at (the hot > function in cactus from SPEC2006). > > The code size increase and number of spills decreased by only 6 (out of > ~800) whereas with Richards' > patch it improved much more (~140 decrease, with a corresponding > improvement in stack usage and code size). > > Richard did suggest that anti-dependencies are already taken into > account in the INSN_PRIORITY tie-breaker, > so perhaps that is a better scheme indeed? ISTM that your experiment indicates that it's not the artificial dependencies that are the problem. It's the real anti-dependencies that are mucking things up. That's fine, it just means I don't think we want/need to do anything special for the artificial dependencies. We certainly use INSN_PRIORITY as one of the (many) tie breakers in the rank_for_schedule routine. BUt I don't know if that's a better place or not. I trust Richard, so if he thinks the patch is the best approach, then let's go with it after that trivial comment fix I mentioned in my previous message. > >> >> It probably wouldn't be a bad idea to look at the default for >> MAX_PENDING_LIST_LENGTH. Based on the current default value and the >> comments in the code that value could well have been tuned 25 or more >> years ago! > > Probably. I see that s390 and spu increase that param in their backends > to much larger values than the default > I played around with increasing it on aarch64. It improved things > somewhat, but Richard's patch still gave superior results. ACK. Thanks for testing. If you want to adjust, that would seem fine as a follow-up. jeff
Re: Tweak ALAP calculation in SCHED_PRESSURE_MODEL
Hi Jeff, On 16/11/18 17:08, Jeff Law wrote: On 11/16/18 5:35 AM, Kyrill Tkachov wrote: But more importantly, it seems like blindly ignoring anti dependencies is just a hack that happens to work. I wonder if we could somehow mark the fake dependencies we add, and avoid bumping the ALAP when we encounter those fake dependencies. I did experiment with this. I added a new property to dep_t to mark it as "artificial", that is created in the parts of sched-deps.c that add dependencies when MAX_PENDING_LIST_LENGTH is exceeded. Then ALAP is bumped only when the dependency is not artificial in this way. This didn't help much on the testcase we were looking at (the hot function in cactus from SPEC2006). The code size increase and number of spills decreased by only 6 (out of ~800) whereas with Richards' patch it improved much more (~140 decrease, with a corresponding improvement in stack usage and code size). Richard did suggest that anti-dependencies are already taken into account in the INSN_PRIORITY tie-breaker, so perhaps that is a better scheme indeed? ISTM that your experiment indicates that it's not the artificial dependencies that are the problem. It's the real anti-dependencies that are mucking things up. That's fine, it just means I don't think we want/need to do anything special for the artificial dependencies. I must apologise. Since I sent this out earlier I found a bug in my implementation of the above experiment which meant I wasn't marking the dependencies properly in all cases :( With that fixed, the approach removes ~100 spills which is much better than what I reported initially however still not as good as Richards' patch (removed ~140 spills). I've kicked off a SPEC2006 benchmarking run to see if it has any any effect. We certainly use INSN_PRIORITY as one of the (many) tie breakers in the rank_for_schedule routine. BUt I don't know if that's a better place or not. I trust Richard, so if he thinks the patch is the best approach, then let's go with it after that trivial comment fix I mentioned in my previous message. I'll respin Richard's patch with the comment updates and resend that, unless the benchmark run above shows something interesting. It probably wouldn't be a bad idea to look at the default for MAX_PENDING_LIST_LENGTH. Based on the current default value and the comments in the code that value could well have been tuned 25 or more years ago! Probably. I see that s390 and spu increase that param in their backends to much larger values than the default I played around with increasing it on aarch64. It improved things somewhat, but Richard's patch still gave superior results. ACK. Thanks for testing. If you want to adjust, that would seem fine as a follow-up. I'd need to benchmark such a change, but thanks. MAX_PENDING_LIST_LENGTH only exists to limit compile-time, right? Thanks, and sorry for the confusion. Kyrill jeff
Re: Tweak ALAP calculation in SCHED_PRESSURE_MODEL
On 11/16/18 10:21 AM, Kyrill Tkachov wrote: > It probably wouldn't be a bad idea to look at the default for MAX_PENDING_LIST_LENGTH. Based on the current default value and the comments in the code that value could well have been tuned 25 or more years ago! >>> Probably. I see that s390 and spu increase that param in their backends >>> to much larger values than the default >>> I played around with increasing it on aarch64. It improved things >>> somewhat, but Richard's patch still gave superior results. >> ACK. Thanks for testing. If you want to adjust, that would seem fine >> as a follow-up. > > I'd need to benchmark such a change, but thanks. > > MAX_PENDING_LIST_LENGTH only exists to limit compile-time, right? I believe so. jeff
Re: Tweak ALAP calculation in SCHED_PRESSURE_MODEL
On 11/8/18 6:10 AM, Kyrill Tkachov wrote: > The attached patch avoids that by making the alap calculation only > look at true dependencies. This shouldn't be too bad, since we use > INSN_PRIORITY as the final tie-breaker than that does take > anti-dependencies into account. > > This reduces the number of spills in the hot function from 436.cactusADM > by 14% on aarch64 at -O3 (and the number of instructions in general). > SPEC2017 shows a minor improvement on Cortex-A72 (about 0.1% overall). > Thanks to Wilco for the benchmarking. I tried the patch on PowerPC since it also uses SCHED_PRESSURE_MODEL algorithm. For CPU2006 only cactusADM had a noticeable difference, but I'm seeing a 5% degradation. Looking at the generated asm for function bench_staggeredleapfrog2_(), I see about a 1% increase in number of loads and stores generated and an extra 100 bytes allocated on the stack. -Pat
Re: Tweak ALAP calculation in SCHED_PRESSURE_MODEL
On 16/11/18 18:19, Pat Haugen wrote: On 11/8/18 6:10 AM, Kyrill Tkachov wrote: > The attached patch avoids that by making the alap calculation only > look at true dependencies. This shouldn't be too bad, since we use > INSN_PRIORITY as the final tie-breaker than that does take > anti-dependencies into account. > > This reduces the number of spills in the hot function from 436.cactusADM > by 14% on aarch64 at -O3 (and the number of instructions in general). > SPEC2017 shows a minor improvement on Cortex-A72 (about 0.1% overall). > Thanks to Wilco for the benchmarking. I tried the patch on PowerPC since it also uses SCHED_PRESSURE_MODEL algorithm. For CPU2006 only cactusADM had a noticeable difference, but I'm seeing a 5% degradation. Looking at the generated asm for function bench_staggeredleapfrog2_(), I see about a 1% increase in number of loads and stores generated and an extra 100 bytes allocated on the stack. Thanks for trying it out! Given that, I'll try to polish up the more targeted approach that filters out only the fake dependencies, which should give a more targeted approach. Thanks, Kyrill -Pat
Re: Tweak ALAP calculation in SCHED_PRESSURE_MODEL
On 16/11/18 18:19, Pat Haugen wrote: On 11/8/18 6:10 AM, Kyrill Tkachov wrote: The attached patch avoids that by making the alap calculation only look at true dependencies. This shouldn't be too bad, since we use INSN_PRIORITY as the final tie-breaker than that does take anti-dependencies into account. This reduces the number of spills in the hot function from 436.cactusADM by 14% on aarch64 at -O3 (and the number of instructions in general). SPEC2017 shows a minor improvement on Cortex-A72 (about 0.1% overall). Thanks to Wilco for the benchmarking. I tried the patch on PowerPC since it also uses SCHED_PRESSURE_MODEL algorithm. For CPU2006 only cactusADM had a noticeable difference, but I'm seeing a 5% degradation. Looking at the generated asm for function bench_staggeredleapfrog2_(), I see about a 1% increase in number of loads and stores generated and an extra 100 bytes allocated on the stack. -Pat This is a follow-up from https://gcc.gnu.org/ml/gcc-patches/2018-11/msg01525.html This version introduces an "artificial" property of the dependencies produced in sched-deps.c that is recorded when they are created due to MAX_PENDING_LIST_LENGTH and they are thus ignored in the model_analyze_insns ALAP calculation. This approach gives most of the benefits of the original patch [1] on aarch64. I tried it on the cactusADM hot function (bench_staggeredleapfrog2_) on powerpc64le-unknown-linux-gnu with -O3 and found that the initial version proposed did indeed increase the instruction count and stack space. This version gives a small improvement on powerpc in terms of instruction count (number of st* instructions stays the same), so I'm hoping this version addresses Pat's concerns. Pat, could you please try this version out if you've got the chance? In terms of implementation, it required extending the various add_dependency functions/helpers to take an extra argument marking the dependency as "artificial", which is what most of the patch diff does. It is otherwise a fairly simple patch. Bootstrapped and tested on aarch64-none-linux-gnu. Is this ok for trunk if the PPC performance results look ok? [1] https://gcc.gnu.org/ml/gcc-patches/2018-11/msg01514.html 2018-11-19 Richard Sandiford Kyrylo Tkachov * haifa-sched.c (model_analyze_insns): Avoid counting artificial anti-dependencies. * sched-int.h (struct _dep): Add artificial field. (DEP_ARTIFICIAL): Define accessor. (DEP_ANTI_ARTIFICIAL): Define. (DEP_POSTPONED): Adjust definition. (add_dependence): Add default bool argument to prototype. * sched-deps.c (init_dep_1): Initialize artificial field. (add_dependence_1): Add default bool parameter. Handle in definition. (add_dependence_list): Likewise. (add_dependence_list_and_free): Likewise. (flush_pending_lists): Likewise. (haifa_note_dep): Handle DEP_ANTI_ARTIFICIAL in ds. (sched_analyze_1): Mark new dependencies created as part of handling MAX_PENDING_LIST_LENGTH limit as artificial. (sched_analyze_2): Likewise. (sched_analyze_insn): Likewise. (deps_analyze_insn): Likewise. (dump_ds): Handle DEP_ANTI_ARTIFICIAL. diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c index 2c84ce3814357b30e1aaed57f1de3034d99afd57..c1787d01c5f4765a63986efe04c59748792e4932 100644 --- a/gcc/haifa-sched.c +++ b/gcc/haifa-sched.c @@ -3504,8 +3504,13 @@ model_analyze_insns (void) FOR_EACH_DEP (iter, SD_LIST_FORW, sd_it, dep) { con = MODEL_INSN_INFO (DEP_CON (dep)); - if (con->insn && insn->alap < con->alap + 1) - insn->alap = con->alap + 1; + /* Consider all dependencies except those introduced artificially + by the scheduler as part of adhering to the + MAX_PENDING_LIST_LENGTH limit. */ + unsigned int min_alap + = con->alap + !DEP_ARTIFICIAL (dep); + if (con->insn && insn->alap < min_alap) + insn->alap = min_alap; } insn->old_queue = QUEUE_INDEX (iter); diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c index f89f28269fd5ecf96688ed255d07b6976d2180c4..a83b779086099e9a1a0690b480230f4d7ad3e0ba 100644 --- a/gcc/sched-deps.c +++ b/gcc/sched-deps.c @@ -100,6 +100,7 @@ init_dep_1 (dep_t dep, rtx_insn *pro, rtx_insn *con, enum reg_note type, ds_t ds DEP_COST (dep) = UNKNOWN_DEP_COST; DEP_NONREG (dep) = 0; DEP_MULTIPLE (dep) = 0; + DEP_ARTIFICIAL (dep) = 0; DEP_REPLACE (dep) = NULL; } @@ -472,16 +473,18 @@ static int cache_size; static bool mark_as_hard; static int deps_may_trap_p (const_rtx); -static void add_dependence_1 (rtx_insn *, rtx_insn *, enum reg_note); +static void add_dependence_1 (rtx_insn *, rtx_insn *, enum reg_note, + bool = false); static void add_dependence_list (rtx_insn *, rtx_insn_list *, int, - enum reg_note, bool); + enum reg_note, bool, bool = false); static void add_dependence_list_and_free (struct deps_desc *, rtx_insn *, rtx_insn_list **, int, enum reg_note, - bool); + bool, bo
Re: Tweak ALAP calculation in SCHED_PRESSURE_MODEL
On 11/19/18 11:54 AM, Kyrill Tkachov wrote: > On 16/11/18 18:19, Pat Haugen wrote: >> On 11/8/18 6:10 AM, Kyrill Tkachov wrote: >>> The attached patch avoids that by making the alap calculation only >>> look at true dependencies. This shouldn't be too bad, since we use >>> INSN_PRIORITY as the final tie-breaker than that does take >>> anti-dependencies into account. >>> >>> This reduces the number of spills in the hot function from 436.cactusADM >>> by 14% on aarch64 at -O3 (and the number of instructions in general). >>> SPEC2017 shows a minor improvement on Cortex-A72 (about 0.1% overall). >>> Thanks to Wilco for the benchmarking. >> I tried the patch on PowerPC since it also uses SCHED_PRESSURE_MODEL >> algorithm. For CPU2006 only cactusADM had a noticeable difference, but I'm >> seeing a 5% degradation. Looking at the generated asm for function >> bench_staggeredleapfrog2_(), I see about a 1% increase in number of loads >> and stores generated and an extra 100 bytes allocated on the stack. >> >> -Pat >> > > This is a follow-up from > https://gcc.gnu.org/ml/gcc-patches/2018-11/msg01525.html > This version introduces an "artificial" property of the dependencies produced > in > sched-deps.c that is recorded when they are created due to > MAX_PENDING_LIST_LENGTH > and they are thus ignored in the model_analyze_insns ALAP calculation. > > This approach gives most of the benefits of the original patch [1] on aarch64. > I tried it on the cactusADM hot function (bench_staggeredleapfrog2_) on > powerpc64le-unknown-linux-gnu > with -O3 and found that the initial version proposed did indeed increase the > instruction count > and stack space. This version gives a small improvement on powerpc in terms > of instruction count > (number of st* instructions stays the same), so I'm hoping this version > addresses Pat's concerns. > Pat, could you please try this version out if you've got the chance? > I tried the new verison on cactusADM, it's showing a 2% degradation. I've kicked off a full CPU2006 run just to see if any others are affected. -Pat
Re: Tweak ALAP calculation in SCHED_PRESSURE_MODEL
On 11/19/18 2:30 PM, Pat Haugen wrote: >> This is a follow-up from >> https://gcc.gnu.org/ml/gcc-patches/2018-11/msg01525.html >> This version introduces an "artificial" property of the dependencies >> produced in >> sched-deps.c that is recorded when they are created due to >> MAX_PENDING_LIST_LENGTH >> and they are thus ignored in the model_analyze_insns ALAP calculation. >> >> This approach gives most of the benefits of the original patch [1] on >> aarch64. >> I tried it on the cactusADM hot function (bench_staggeredleapfrog2_) on >> powerpc64le-unknown-linux-gnu >> with -O3 and found that the initial version proposed did indeed increase the >> instruction count >> and stack space. This version gives a small improvement on powerpc in terms >> of instruction count >> (number of st* instructions stays the same), so I'm hoping this version >> addresses Pat's concerns. >> Pat, could you please try this version out if you've got the chance? >> > I tried the new verison on cactusADM, it's showing a 2% degradation. I've > kicked off a full CPU2006 run just to see if any others are affected. The other benchmarks were neutral. So the only benchmark showing a change is the 2% degradation on cactusADM. Comparing the generated .s files for bench_staggeredleapfrog2_(), there is about a 0.7% increase in load insns and still the 1% increase in store insns. -Pat
Re: Tweak ALAP calculation in SCHED_PRESSURE_MODEL
On 20/11/18 16:48, Pat Haugen wrote: On 11/19/18 2:30 PM, Pat Haugen wrote: This is a follow-up from https://gcc.gnu.org/ml/gcc-patches/2018-11/msg01525.html This version introduces an "artificial" property of the dependencies produced in sched-deps.c that is recorded when they are created due to MAX_PENDING_LIST_LENGTH and they are thus ignored in the model_analyze_insns ALAP calculation. This approach gives most of the benefits of the original patch [1] on aarch64. I tried it on the cactusADM hot function (bench_staggeredleapfrog2_) on powerpc64le-unknown-linux-gnu with -O3 and found that the initial version proposed did indeed increase the instruction count and stack space. This version gives a small improvement on powerpc in terms of instruction count (number of st* instructions stays the same), so I'm hoping this version addresses Pat's concerns. Pat, could you please try this version out if you've got the chance? I tried the new verison on cactusADM, it's showing a 2% degradation. I've kicked off a full CPU2006 run just to see if any others are affected. The other benchmarks were neutral. So the only benchmark showing a change is the 2% degradation on cactusADM. Comparing the generated .s files for bench_staggeredleapfrog2_(), there is about a 0.7% increase in load insns and still the 1% increase in store insns. Sigh :( What options are you compiling with? I tried a powerpc64le compiler with plain -O3 and saw got a slight improvement (by manual expection) Thanks, Kyrill -Pat
Re: Tweak ALAP calculation in SCHED_PRESSURE_MODEL
On 11/20/18 10:53 AM, Kyrill Tkachov wrote: > On 20/11/18 16:48, Pat Haugen wrote: >> On 11/19/18 2:30 PM, Pat Haugen wrote: This is a follow-up from https://gcc.gnu.org/ml/gcc-patches/2018-11/msg01525.html This version introduces an "artificial" property of the dependencies produced in sched-deps.c that is recorded when they are created due to MAX_PENDING_LIST_LENGTH and they are thus ignored in the model_analyze_insns ALAP calculation. This approach gives most of the benefits of the original patch [1] on aarch64. I tried it on the cactusADM hot function (bench_staggeredleapfrog2_) on powerpc64le-unknown-linux-gnu with -O3 and found that the initial version proposed did indeed increase the instruction count and stack space. This version gives a small improvement on powerpc in terms of instruction count (number of st* instructions stays the same), so I'm hoping this version addresses Pat's concerns. Pat, could you please try this version out if you've got the chance? >>> I tried the new verison on cactusADM, it's showing a 2% degradation. I've >>> kicked off a full CPU2006 run just to see if any others are affected. >> The other benchmarks were neutral. So the only benchmark showing a change is >> the 2% degradation on cactusADM. Comparing the generated .s files for >> bench_staggeredleapfrog2_(), there is about a 0.7% increase in load insns >> and still the 1% increase in store insns. > > Sigh :( > What options are you compiling with? I tried a powerpc64le compiler with > plain -O3 and saw got a slight improvement (by manual expection) I was using the following: -O3 -mcpu=power8 -fpeel-loops -funroll-loops -ffast-math -mpopcntd -mrecip=all. When I run with just -O3 -mcpu=power8 I see just under a 1% degradation. -Pat