Re: [PATCH] Run pass_sink_code once more before store_mergin

2021-09-02 Thread Martin Jambor
Hi,

On Tue, May 18 2021, Xionghu Luo via Gcc-patches wrote:
>

[...]

> From 7fcc6ca9ef3b6acbfbcbd3da4be1d1c0eef4be80 Mon Sep 17 00:00:00 2001
> From: Xiong Hu Luo 
> Date: Mon, 17 May 2021 20:46:15 -0500
> Subject: [PATCH] Run pass_sink_code once more before store_merging
>
> Gimple sink code pass runs quite early, there may be some new
> oppertunities exposed by later gimple optmization passes, this patch
> runs the sink code pass once more before store_merging.  For detailed
> discussion, please refer to:
> https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562352.html
>
> Tested the SPEC2017 performance on P8LE, 544.nab_r is improved
> by 2.43%, but no big changes to other cases, GEOMEAN is improved quite
> small with 0.25%.
>
> gcc/ChangeLog:
>
>   * passes.def: Add sink_code before store_merging.
>   * tree-ssa-sink.c (pass_sink_code:clone): New.


Unfortunately, this seems to have caused PR 102178
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178)

Sorry about the bad news,

Martin



Re: [PATCH] Run pass_sink_code once more before store_mergin

2021-05-19 Thread Bernd Edlinger
On 5/18/21 12:34 PM, Richard Biener wrote:
> On Tue, 18 May 2021, Xionghu Luo wrote:
> 
>> Hi,
>>
>> On 2021/5/18 15:02, Richard Biener wrote:
>>> Can you, for the new gcc.dg/tree-ssa/ssa-sink-18.c testcase, add
>>> a comment explaining what operations we expect to sink?  The testcase
>>> is likely somewhat fragile in the exact number of sinkings
>>> (can you check on some other target and maybe P8BE with -m32 for
>>> example?), so for future adjustments it would be nice to easiliy
>>> figure out what we expect to happen.
>>>
>>> OK with that change.
>>
>> Thanks a lot for the reminder! ssa-sink-18.c generates different code
>> for m32 and m64 exactly due to different type size conversion and ivopts
>> selection, since -m32 and -m64 couldn't co-exist in one test file, shall
>> I restrict it to -m64 only or check target lp64/target ilp32?
>> I've verified this case shows same behavior on X86, AArch64 and Power for
>> both m32 and m64.
>>
>> -m32:
>>[local count: 75120046]:
>>   # len_155 = PHI 
>>   len_182 = len_155 + 1;
>>   _35 = (unsigned int) ip_249;
>>   _36 = _35 + len_182;
>>   _380 = (uint8_t *) _36;
>>   if (maxlen_179 > len_182)
>> goto ; [94.50%]
>>   else
>> goto ; [5.50%]
>> ...
>>
>> Sinking _329 = (uint8_t *) _36;
>>  from bb 29 to bb 86
>> Sinking _36 = _35 + len_182;
>>  from bb 29 to bb 86
>> Sinking _35 = (unsigned int) ip_249;
>>  from bb 29 to bb 86
>>
>> Pass statistics of "sink": 
>> Sunk statements: 3
>>
>>
>> -m64:
>>[local count: 75120046]:
>>   # ivtmp.23_34 = PHI 
>>   _38 = (unsigned int) ivtmp.23_34;
>>   len_161 = _38 + 4294967295;
>>   _434 = (unsigned long) ip_254;
>>   _433 = ivtmp.23_34 + _434;
>>   _438 = (uint8_t *) _433;
>>   if (_38 < maxlen_187)
>> goto ; [94.50%]
>>   else
>> goto ; [5.50%]
>> ...
>>
>> Sinking _367 = (uint8_t *) _320;
>>  from bb 31 to bb 90
>> Sinking _320 = _321 + ivtmp.25_326;
>>  from bb 31 to bb 90
>> Sinking _321 = (unsigned long) ip_229;
>>  from bb 31 to bb 90
>> Sinking len_158 = _322 + 4294967295;
>>  from bb 31 to bb 33
>>
>> Pass statistics of "sink": 
>> Sunk statements: 4
>>
>>
>> Regarding to the comments part:
>>
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-18.c 
>> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-18.c
>> index 52b9a74b65f..5147f7b85cd 100644
>> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-18.c
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-18.c
>> @@ -193,16 +193,17 @@ compute_on_bytes (uint8_t *in_data, int in_len, 
>> uint8_t *out_data, int out_len)
>>return op - out_data;
>>   }
>> + /* For this case, pass sink2 sinks statements from hot loop header to loop
>> +exits after gimple loop optimizations, which generates instructions 
>> executed
>> +each iteration in loop, but the results are used outside of loop:
>> +With -m64,
>> +"Sinking _367 = (uint8_t *) _320;
>> + from bb 31 to bb 90
>> + Sinking _320 = _321 + ivtmp.25_326;
>> + from bb 31 to bb 90
>> + Sinking _321 = (unsigned long) ip_229;
>> + from bb 31 to bb 90
>> + Sinking len_158 = _322 + 4294967295;
>> +from bb 31 to bb 33"  */
>>
>> - /* { dg-final { scan-tree-dump-times "Sunk statements: 4" 1 "sink2" } } */
>> + /* { dg-final { scan-tree-dump-times "Sunk statements: 4" 1 "sink2" { 
>> target lp64 } } } */
>> + /* { dg-final { scan-tree-dump-times "Sunk statements: 3" 1 "sink2" { 
>> target ilp32 } } } */
> 
> Yes, that looks good.
> 
> Thanks,
> Richard.
> 

Hi,

I've noticed the test case gcc.dg/tree-ssa/ssa-sink-3.c was accidentally
committed as empty file, and therefore fails:

FAIL: gcc.dg/tree-ssa/ssa-sink-3.c (test for excess errors)

I've commited as obvious the following fix (which restores the test case and
Xionghu Luo's intended change.


Thanks
Bernd.
From 51cfa55431c38f3c29c7b7287ad8a2da5c06 Mon Sep 17 00:00:00 2001
From: Bernd Edlinger 
Date: Wed, 19 May 2021 09:51:44 +0200
Subject: [PATCH] Fix commit mistake in testcase gcc.dg/tree-ssa/ssa-sink-3.c

the test case was accidenally changed to empty file.

2021-05-19  Bernd Edlinger  

	* gcc.dg/tree-ssa/ssa-sink-3.c: Fix test case.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-3.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-3.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-3.c
index e69de29..ad88ccc 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-3.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */ 
+/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+extern void foo(int a);
+int
+main (int argc)
+{
+  int a;
+  a = argc + 1;
+  if (argc + 3)
+{
+  foo (a);
+}
+}
+/* We should sink the a = argc + 1 calculation into the if branch  */
+/* { dg-final { scan-tree-dump-times "Sunk statements: 1" 1 "sink1" } } */
-- 
1.9.1



Re: [PATCH] Run pass_sink_code once more before store_mergin

2021-05-18 Thread Richard Biener
On Tue, 18 May 2021, Xionghu Luo wrote:

> Hi,
> 
> On 2021/5/18 15:02, Richard Biener wrote:
> > Can you, for the new gcc.dg/tree-ssa/ssa-sink-18.c testcase, add
> > a comment explaining what operations we expect to sink?  The testcase
> > is likely somewhat fragile in the exact number of sinkings
> > (can you check on some other target and maybe P8BE with -m32 for
> > example?), so for future adjustments it would be nice to easiliy
> > figure out what we expect to happen.
> > 
> > OK with that change.
> 
> Thanks a lot for the reminder! ssa-sink-18.c generates different code
> for m32 and m64 exactly due to different type size conversion and ivopts
> selection, since -m32 and -m64 couldn't co-exist in one test file, shall
> I restrict it to -m64 only or check target lp64/target ilp32?
> I've verified this case shows same behavior on X86, AArch64 and Power for
> both m32 and m64.
> 
> -m32:
>[local count: 75120046]:
>   # len_155 = PHI 
>   len_182 = len_155 + 1;
>   _35 = (unsigned int) ip_249;
>   _36 = _35 + len_182;
>   _380 = (uint8_t *) _36;
>   if (maxlen_179 > len_182)
> goto ; [94.50%]
>   else
> goto ; [5.50%]
> ...
> 
> Sinking _329 = (uint8_t *) _36;
>  from bb 29 to bb 86
> Sinking _36 = _35 + len_182;
>  from bb 29 to bb 86
> Sinking _35 = (unsigned int) ip_249;
>  from bb 29 to bb 86
> 
> Pass statistics of "sink": 
> Sunk statements: 3
> 
> 
> -m64:
>[local count: 75120046]:
>   # ivtmp.23_34 = PHI 
>   _38 = (unsigned int) ivtmp.23_34;
>   len_161 = _38 + 4294967295;
>   _434 = (unsigned long) ip_254;
>   _433 = ivtmp.23_34 + _434;
>   _438 = (uint8_t *) _433;
>   if (_38 < maxlen_187)
> goto ; [94.50%]
>   else
> goto ; [5.50%]
> ...
> 
> Sinking _367 = (uint8_t *) _320;
>  from bb 31 to bb 90
> Sinking _320 = _321 + ivtmp.25_326;
>  from bb 31 to bb 90
> Sinking _321 = (unsigned long) ip_229;
>  from bb 31 to bb 90
> Sinking len_158 = _322 + 4294967295;
>  from bb 31 to bb 33
> 
> Pass statistics of "sink": 
> Sunk statements: 4
> 
> 
> Regarding to the comments part:
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-18.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-18.c
> index 52b9a74b65f..5147f7b85cd 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-18.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-18.c
> @@ -193,16 +193,17 @@ compute_on_bytes (uint8_t *in_data, int in_len, uint8_t 
> *out_data, int out_len)
>return op - out_data;
>   }
> + /* For this case, pass sink2 sinks statements from hot loop header to loop
> +exits after gimple loop optimizations, which generates instructions 
> executed
> +each iteration in loop, but the results are used outside of loop:
> +With -m64,
> +"Sinking _367 = (uint8_t *) _320;
> + from bb 31 to bb 90
> + Sinking _320 = _321 + ivtmp.25_326;
> + from bb 31 to bb 90
> + Sinking _321 = (unsigned long) ip_229;
> + from bb 31 to bb 90
> + Sinking len_158 = _322 + 4294967295;
> +from bb 31 to bb 33"  */
> 
> - /* { dg-final { scan-tree-dump-times "Sunk statements: 4" 1 "sink2" } } */
> + /* { dg-final { scan-tree-dump-times "Sunk statements: 4" 1 "sink2" { 
> target lp64 } } } */
> + /* { dg-final { scan-tree-dump-times "Sunk statements: 3" 1 "sink2" { 
> target ilp32 } } } */

Yes, that looks good.

Thanks,
Richard.


Re: [PATCH] Run pass_sink_code once more before store_mergin

2021-05-18 Thread Xionghu Luo via Gcc-patches
Hi,

On 2021/5/18 15:02, Richard Biener wrote:
> Can you, for the new gcc.dg/tree-ssa/ssa-sink-18.c testcase, add
> a comment explaining what operations we expect to sink?  The testcase
> is likely somewhat fragile in the exact number of sinkings
> (can you check on some other target and maybe P8BE with -m32 for
> example?), so for future adjustments it would be nice to easiliy
> figure out what we expect to happen.
> 
> OK with that change.

Thanks a lot for the reminder! ssa-sink-18.c generates different code
for m32 and m64 exactly due to different type size conversion and ivopts
selection, since -m32 and -m64 couldn't co-exist in one test file, shall
I restrict it to -m64 only or check target lp64/target ilp32?
I've verified this case shows same behavior on X86, AArch64 and Power for
both m32 and m64.

-m32:
   [local count: 75120046]:
  # len_155 = PHI 
  len_182 = len_155 + 1;
  _35 = (unsigned int) ip_249;
  _36 = _35 + len_182;
  _380 = (uint8_t *) _36;
  if (maxlen_179 > len_182)
goto ; [94.50%]
  else
goto ; [5.50%]
...

Sinking _329 = (uint8_t *) _36;
 from bb 29 to bb 86
Sinking _36 = _35 + len_182;
 from bb 29 to bb 86
Sinking _35 = (unsigned int) ip_249;
 from bb 29 to bb 86

Pass statistics of "sink": 
Sunk statements: 3


-m64:
   [local count: 75120046]:
  # ivtmp.23_34 = PHI 
  _38 = (unsigned int) ivtmp.23_34;
  len_161 = _38 + 4294967295;
  _434 = (unsigned long) ip_254;
  _433 = ivtmp.23_34 + _434;
  _438 = (uint8_t *) _433;
  if (_38 < maxlen_187)
goto ; [94.50%]
  else
goto ; [5.50%]
...

Sinking _367 = (uint8_t *) _320;
 from bb 31 to bb 90
Sinking _320 = _321 + ivtmp.25_326;
 from bb 31 to bb 90
Sinking _321 = (unsigned long) ip_229;
 from bb 31 to bb 90
Sinking len_158 = _322 + 4294967295;
 from bb 31 to bb 33

Pass statistics of "sink": 
Sunk statements: 4


Regarding to the comments part:

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-18.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-18.c
index 52b9a74b65f..5147f7b85cd 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-18.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-18.c
@@ -193,16 +193,17 @@ compute_on_bytes (uint8_t *in_data, int in_len, uint8_t 
*out_data, int out_len)
   return op - out_data;
  }
+ /* For this case, pass sink2 sinks statements from hot loop header to loop
+exits after gimple loop optimizations, which generates instructions 
executed
+each iteration in loop, but the results are used outside of loop:
+With -m64,
+"Sinking _367 = (uint8_t *) _320;
+ from bb 31 to bb 90
+ Sinking _320 = _321 + ivtmp.25_326;
+ from bb 31 to bb 90
+ Sinking _321 = (unsigned long) ip_229;
+ from bb 31 to bb 90
+ Sinking len_158 = _322 + 4294967295;
+from bb 31 to bb 33"  */

- /* { dg-final { scan-tree-dump-times "Sunk statements: 4" 1 "sink2" } } */
+ /* { dg-final { scan-tree-dump-times "Sunk statements: 4" 1 "sink2" { target 
lp64 } } } */
+ /* { dg-final { scan-tree-dump-times "Sunk statements: 3" 1 "sink2" { target 
ilp32 } } } */

-- 
Thanks,
Xionghu


Re: [PATCH] Run pass_sink_code once more before store_mergin

2021-05-18 Thread Richard Biener
On Tue, 18 May 2021, Xionghu Luo wrote:

> Hi,
> 
> On 2021/5/17 16:11, Richard Biener wrote:
> > On Fri, 14 May 2021, Xionghu Luo wrote:
> > 
> >> Hi Richi,
> >>
> >> On 2021/4/21 19:54, Richard Biener wrote:
> >>> On Tue, 20 Apr 2021, Xionghu Luo wrote:
> >>>
> 
> 
>  On 2021/4/15 19:34, Richard Biener wrote:
> > On Thu, 15 Apr 2021, Xionghu Luo wrote:
> >
> >> Thanks,
> >>
> >> On 2021/4/14 14:41, Richard Biener wrote:
>  "#538,#235,#234,#233" will all be sunk from bb 35 to bb 37 by
>  rtl-sink,
>  but it moves #538 first, then #235, there is strong dependency here.
>  It
>  seemsdoesn't like the LCM framework that could solve all and do the
>  delete-insert in one iteration.
> >>> So my question was whether we want to do both within the LCM store
> >>> sinking framework.  The LCM dataflow is also used by RTL PRE which
> >>> handles both loads and non-loads so in principle it should be able
> >>> to handle stores and non-stores for the sinking case (PRE on the
> >>> reverse CFG).
> >>>
> >>> A global dataflow is more powerful than any local ad-hoc method.
> >>
> >> My biggest concern is whether the LCM DF framework could support
> >> sinking
> >> *multiple* reverse-dependent non-store instructions together by *one*
> >> calling of LCM DF.   If this is not supported, we need run multiple LCM
> >> until no new changes, it would be time consuming obviously (unless
> >> compiling time is not important here).
> >
> > As said it is used for PRE and there it most definitely can do that.
> 
>  I did some investigation about PRE and attached a case to show how it
>  works, it is quite like store-motion, and actually there is a rtl-hoist
>  pass in gcse.c which only works for code size.  All of them are
>  leveraging the LCM framework to move instructions upward or downward.
> 
>  PRE and rtl-hoist move instructions upward, they analyze/hash the SOURCE
>  exprs and call pre_edge_lcm, store-motion and rtl-sink move instructions
>  downward, so they analyze/hash the DEST exprs and call pre_edge_rev_lcm.
>  The four problems are all converted to the LCM DF problem with
>  n_basic_blocks * m_exprs of 4 matrix (antic, transp, avail, kill) as
>  input
>  and two outputs of where to insert/delete.
> 
>  PRE scan each instruction and hash the SRC to table without *checking the
>  relationship between instructions*, for the case attached, BB 37, BB 38
>  and BB 41 both contains SOURCE expr "r262:DI+r139:DI", but BB 37 and BB
>  41
>  save it to index 106, BB 38 save it to index 110. After finishing this
>  pass,
>  "r262:DI+r139:DI" BB41 is replaced with "r194:DI=r452:DI", then insert
>  expr to BB 75~BB 80 to create full redundancies from partial
>  redundancies,
>  finally update instruction in BB 37.
> >>>
> >>> I'm not familiar with the actual PRE code but reading the toplevel comment
> >>> it seems that indeed it can only handle expressions contained in a single
> >>> insn unless a REG_EQUAL note provides a short-hand for the larger one.
> >>>
> >>> That of course means it would need to mark things as not transparent
> >>> for correctness where they'd be if moved together.  Now, nothing
> >>> prevents you changing the granularity of what you feed LCM.
> >>>
> >>> So originally we arrived at looking into LCM because there's already
> >>> a (store) sinking pass on RTL (using LCM) so adding another (loop-special)
> >>> one didn't look like the best obvious solution.
> >>>
> >>> That said, LCM would work for single-instruction expressions.
> >>> Alternatively a greedy algorithm like you prototyped could be used.
> >>> Another pass to look at would be RTL invariant motion which seems to
> >>> compute some kind of dependency graph - not sure if that would be
> >>> adaptable for the reverse CFG problem.
> >>>
> >>
> >> Actually my RTL sinking pass patch is borrowed from RTL loop invariant
> >> motion, it is  quite limited since only moves instructions from loop header
> >> to loop exits, though it could be refined with various of algorithms.
> >> Compared to the initial method of running gimple sink pass once more,
> >> it seems much more complicated and limited without gaining obvious
> >> performance
> >> benefit, shall we turn back to consider gimple sink2 pass from original
> >> since
> >> we are in stage1 now?
> > 
> > OK, so while there might be new sinking opportunities exposed during
> > RTL expansion and early RTL opts we can consider adding another sink pass
> > on GIMPLE.  Since it's basically a scheduling optimization placement
> > shouldn't matter much but I suppose we should run it before store
> > merging, so anywhere between cd_dce and that.
> > 
> > Richard.
> > 
> 
> Attached the patch as discussed, put it before store_merging is fine.
> Regression tested pass on P8LE, O

Re: [PATCH] Run pass_sink_code once more before store_mergin

2021-05-17 Thread Xionghu Luo via Gcc-patches

Hi,

On 2021/5/17 16:11, Richard Biener wrote:

On Fri, 14 May 2021, Xionghu Luo wrote:


Hi Richi,

On 2021/4/21 19:54, Richard Biener wrote:

On Tue, 20 Apr 2021, Xionghu Luo wrote:




On 2021/4/15 19:34, Richard Biener wrote:

On Thu, 15 Apr 2021, Xionghu Luo wrote:


Thanks,

On 2021/4/14 14:41, Richard Biener wrote:

"#538,#235,#234,#233" will all be sunk from bb 35 to bb 37 by rtl-sink,
but it moves #538 first, then #235, there is strong dependency here. It
seemsdoesn't like the LCM framework that could solve all and do the
delete-insert in one iteration.

So my question was whether we want to do both within the LCM store
sinking framework.  The LCM dataflow is also used by RTL PRE which
handles both loads and non-loads so in principle it should be able
to handle stores and non-stores for the sinking case (PRE on the
reverse CFG).

A global dataflow is more powerful than any local ad-hoc method.


My biggest concern is whether the LCM DF framework could support sinking
*multiple* reverse-dependent non-store instructions together by *one*
calling of LCM DF.   If this is not supported, we need run multiple LCM
until no new changes, it would be time consuming obviously (unless
compiling time is not important here).


As said it is used for PRE and there it most definitely can do that.


I did some investigation about PRE and attached a case to show how it
works, it is quite like store-motion, and actually there is a rtl-hoist
pass in gcse.c which only works for code size.  All of them are
leveraging the LCM framework to move instructions upward or downward.

PRE and rtl-hoist move instructions upward, they analyze/hash the SOURCE
exprs and call pre_edge_lcm, store-motion and rtl-sink move instructions
downward, so they analyze/hash the DEST exprs and call pre_edge_rev_lcm.
The four problems are all converted to the LCM DF problem with
n_basic_blocks * m_exprs of 4 matrix (antic, transp, avail, kill) as input
and two outputs of where to insert/delete.

PRE scan each instruction and hash the SRC to table without *checking the
relationship between instructions*, for the case attached, BB 37, BB 38
and BB 41 both contains SOURCE expr "r262:DI+r139:DI", but BB 37 and BB 41
save it to index 106, BB 38 save it to index 110. After finishing this pass,
"r262:DI+r139:DI" BB41 is replaced with "r194:DI=r452:DI", then insert
expr to BB 75~BB 80 to create full redundancies from partial redundancies,
finally update instruction in BB 37.


I'm not familiar with the actual PRE code but reading the toplevel comment
it seems that indeed it can only handle expressions contained in a single
insn unless a REG_EQUAL note provides a short-hand for the larger one.

That of course means it would need to mark things as not transparent
for correctness where they'd be if moved together.  Now, nothing
prevents you changing the granularity of what you feed LCM.

So originally we arrived at looking into LCM because there's already
a (store) sinking pass on RTL (using LCM) so adding another (loop-special)
one didn't look like the best obvious solution.

That said, LCM would work for single-instruction expressions.
Alternatively a greedy algorithm like you prototyped could be used.
Another pass to look at would be RTL invariant motion which seems to
compute some kind of dependency graph - not sure if that would be
adaptable for the reverse CFG problem.



Actually my RTL sinking pass patch is borrowed from RTL loop invariant
motion, it is  quite limited since only moves instructions from loop header
to loop exits, though it could be refined with various of algorithms.
Compared to the initial method of running gimple sink pass once more,
it seems much more complicated and limited without gaining obvious performance
benefit, shall we turn back to consider gimple sink2 pass from original since
we are in stage1 now?


OK, so while there might be new sinking opportunities exposed during
RTL expansion and early RTL opts we can consider adding another sink pass
on GIMPLE.  Since it's basically a scheduling optimization placement
shouldn't matter much but I suppose we should run it before store
merging, so anywhere between cd_dce and that.

Richard.



Attached the patch as discussed, put it before store_merging is fine.
Regression tested pass on P8LE, OK for trunk? :)


Thanks,
Xionghu
From 7fcc6ca9ef3b6acbfbcbd3da4be1d1c0eef4be80 Mon Sep 17 00:00:00 2001
From: Xiong Hu Luo 
Date: Mon, 17 May 2021 20:46:15 -0500
Subject: [PATCH] Run pass_sink_code once more before store_merging

Gimple sink code pass runs quite early, there may be some new
oppertunities exposed by later gimple optmization passes, this patch
runs the sink code pass once more before store_merging.  For detailed
discussion, please refer to:
https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562352.html

Tested the SPEC2017 performance on P8LE, 544.nab_r is improved
by 2.43%, but no big changes to other cases, GEOMEAN is improved quite
small with 0.25%.

gcc/ChangeLog: