> On Tue, 26 Oct 2021, Xionghu Luo wrote:
> 
> > 
> > 
> > On 2021/10/21 18:55, Richard Biener wrote:
> > > On Thu, 21 Oct 2021, Xionghu Luo wrote:
> > > 
> > >>
> > >>
> > >> On 2021/10/15 13:51, Xionghu Luo via Gcc-patches wrote:
> > >>>
> > >>>
> > >>> On 2021/9/23 20:17, Richard Biener wrote:
> > >>>> On Wed, 22 Sep 2021, Xionghu Luo wrote:
> > >>>>
> > >>>>>
> > >>>>>
> > >>>>> On 2021/8/11 17:16, Richard Biener wrote:
> > >>>>>> On Wed, 11 Aug 2021, Xionghu Luo wrote:
> > >>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On 2021/8/10 22:47, Richard Biener wrote:
> > >>>>>>>> On Mon, 9 Aug 2021, Xionghu Luo wrote:
> > >>>>>>>>
> > >>>>>>>>> Thanks,
> > >>>>>>>>>
> > >>>>>>>>> On 2021/8/6 19:46, Richard Biener wrote:
> > >>>>>>>>>> On Tue, 3 Aug 2021, Xionghu Luo wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> loop split condition is moved between loop1 and loop2, the 
> > >>>>>>>>>>> split bb's
> > >>>>>>>>>>> count and probability should also be duplicated instead of 
> > >>>>>>>>>>> (100% vs
> > >>>>>>>>>>> INV),
> > >>>>>>>>>>> secondly, the original loop1 and loop2 count need be 
> > >>>>>>>>>>> propotional from
> > >>>>>>>>>>> the
> > >>>>>>>>>>> original loop.
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> diff base/loop-cond-split-1.c.151t.lsplit
> > >>>>>>>>>>> patched/loop-cond-split-1.c.151t.lsplit:
> > >>>>>>>>>>> ...
> > >>>>>>>>>>>       int prephitmp_16;
> > >>>>>>>>>>>       int prephitmp_25;
> > >>>>>>>>>>>
> > >>>>>>>>>>>       <bb 2> [local count: 118111600]:
> > >>>>>>>>>>>       if (n_7(D) > 0)
> > >>>>>>>>>>>         goto <bb 4>; [89.00%]
> > >>>>>>>>>>>       else
> > >>>>>>>>>>>         goto <bb 3>; [11.00%]
> > >>>>>>>>>>>
> > >>>>>>>>>>>       <bb 3> [local count: 118111600]:
> > >>>>>>>>>>>       return;
> > >>>>>>>>>>>
> > >>>>>>>>>>>       <bb 4> [local count: 105119324]:
> > >>>>>>>>>>>       pretmp_3 = ga;
> > >>>>>>>>>>>
> > >>>>>>>>>>> -  <bb 5> [local count: 955630225]:
> > >>>>>>>>>>> +  <bb 5> [local count: 315357973]:
> > >>>>>>>>>>>       # i_13 = PHI <i_10(20), 0(4)>
> > >>>>>>>>>>>       # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
> > >>>>>>>>>>>       if (prephitmp_12 != 0)
> > >>>>>>>>>>>         goto <bb 6>; [33.00%]
> > >>>>>>>>>>>       else
> > >>>>>>>>>>>         goto <bb 7>; [67.00%]
> > >>>>>>>>>>>
> > >>>>>>>>>>> -  <bb 6> [local count: 315357972]:
> > >>>>>>>>>>> +  <bb 6> [local count: 104068130]:
> > >>>>>>>>>>>       _2 = do_something ();
> > >>>>>>>>>>>       ga = _2;
> > >>>>>>>>>>>
> > >>>>>>>>>>> -  <bb 7> [local count: 955630225]:
> > >>>>>>>>>>> +  <bb 7> [local count: 315357973]:
> > >>>>>>>>>>>       # prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
> > >>>>>>>>>>>       i_10 = inc (i_13);
> > >>>>>>>>>>>       if (n_7(D) > i_10)
> > >>>>>>>>>>>         goto <bb 21>; [89.00%]
> > >>>>>>>>>>>       else
> > >>>>>>>>>>>         goto <bb 11>; [11.00%]
> > >>>>>>>>>>>
> > >>>>>>>>>>>       <bb 11> [local count: 105119324]:
> > >>>>>>>>>>>       goto <bb 3>; [100.00%]
> > >>>>>>>>>>>
> > >>>>>>>>>>> -  <bb 21> [local count: 850510901]:
> > >>>>>>>>>>> +  <bb 21> [local count: 280668596]:
> > >>>>>>>>>>>       if (prephitmp_12 != 0)
> > >>>>>>>>>>> -    goto <bb 20>; [100.00%]
> > >>>>>>>>>>> +    goto <bb 20>; [33.00%]
> > >>>>>>>>>>>       else
> > >>>>>>>>>>> -    goto <bb 19>; [INV]
> > >>>>>>>>>>> +    goto <bb 19>; [67.00%]
> > >>>>>>>>>>>
> > >>>>>>>>>>> -  <bb 20> [local count: 850510901]:
> > >>>>>>>>>>> +  <bb 20> [local count: 280668596]:
> > >>>>>>>>>>>       goto <bb 5>; [100.00%]
> > >>>>>>>>>>>
> > >>>>>>>>>>> -  <bb 19> [count: 0]:
> > >>>>>>>>>>> +  <bb 19> [local count: 70429947]:
> > >>>>>>>>>>>       # i_23 = PHI <i_10(21)>
> > >>>>>>>>>>>       # prephitmp_25 = PHI <prephitmp_5(21)>
> > >>>>>>>>>>>
> > >>>>>>>>>>> -  <bb 12> [local count: 955630225]:
> > >>>>>>>>>>> +  <bb 12> [local count: 640272252]:
> > >>>>>>>>>>>       # i_15 = PHI <i_23(19), i_22(16)>
> > >>>>>>>>>>>       # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
> > >>>>>>>>>>>       i_22 = inc (i_15);
> > >>>>>>>>>>>       if (n_7(D) > i_22)
> > >>>>>>>>>>>         goto <bb 16>; [89.00%]
> > >>>>>>>>>>>       else
> > >>>>>>>>>>>         goto <bb 11>; [11.00%]
> > >>>>>>>>>>>
> > >>>>>>>>>>> -  <bb 16> [local count: 850510901]:
> > >>>>>>>>>>> +  <bb 16> [local count: 569842305]:
> > >>>>>>>>>>>       goto <bb 12>; [100.00%]
> > >>>>>>>>>>>
> > >>>>>>>>>>>     }
> > >>>>>>>>>>>
> > >>>>>>>>>>> gcc/ChangeLog:
> > >>>>>>>>>>>
> > >>>>>>>>>>>  * tree-ssa-loop-split.c (split_loop): Fix incorrect 
> > >>>>>>>>>>> probability.
> > >>>>>>>>>>>  (do_split_loop_on_cond): Likewise.
> > >>>>>>>>>>> ---
> > >>>>>>>>>>>     gcc/tree-ssa-loop-split.c | 16 ++++++++--------
> > >>>>>>>>>>>     1 file changed, 8 insertions(+), 8 deletions(-)
> > >>>>>>>>>>>
> > >>>>>>>>>>> diff --git a/gcc/tree-ssa-loop-split.c 
> > >>>>>>>>>>> b/gcc/tree-ssa-loop-split.c
> > >>>>>>>>>>> index 3a09bbc39e5..8e5a7ded0f7 100644
> > >>>>>>>>>>> --- a/gcc/tree-ssa-loop-split.c
> > >>>>>>>>>>> +++ b/gcc/tree-ssa-loop-split.c
> > >>>>>>>>>>> @@ -583,10 +583,10 @@ split_loop (class loop *loop1)
> > >>>>>>>>>>>      basic_block cond_bb;
> > >>>>>>>>>
> > >>>>>>>>>           if (!initial_true)
> > >>>>>>>>> -   cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> > >>>>>>>>> +   cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> > >>>>>>>>> +
> > >>>>>>>>> + edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
> > >>>>>>>>> +                    ? EDGE_SUCC (bbs[i], 0)
> > >>>>>>>>> +                    : EDGE_SUCC (bbs[i], 1);
> > >>>>>>>>>
> > >>>>>>>>>>>     
> > >>>>>>>>>>>         class loop *loop2 = loop_version (loop1, cond, &cond_bb,
> > >>>>>>>>>>> -                                          
> > >>>>>>>>>>> profile_probability::always
> > >>>>>>>>>>> (),
> > >>>>>>>>>>> -                                          
> > >>>>>>>>>>> profile_probability::always
> > >>>>>>>>>>> (),
> > >>>>>>>>>>> -                                          
> > >>>>>>>>>>> profile_probability::always
> > >>>>>>>>>>> (),
> > >>>>>>>>>>> -                                          
> > >>>>>>>>>>> profile_probability::always
> > >>>>>>>>>>> (),
> > >>>>>>>>>>> +                                          
> > >>>>>>>>>>> true_edge->probability,
> > >>>>>>>>>>> +
> > >>>>>>>>>>> true_edge->probability.invert (),
> > >>>>>>>>>>> +                                          
> > >>>>>>>>>>> true_edge->probability,
> > >>>>>>>>>>> +
> > >>>>>>>>>>> true_edge->probability.invert (),
> > >>>>>>>>>>>             true);
> > >>>>>>>>>>
> > >>>>>>>>>> there is no 'true_edge' variable at this point.
> > >>>>>>>>>
> > >>>>>>>>> Sorry, missed the above hunk when split the patch.
> > >>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>>      gcc_assert (loop2);
> > >>>>>>>>>>>     @@ -1486,10 +1486,10 @@ do_split_loop_on_cond (struct loop 
> > >>>>>>>>>>> *loop1,
> > >>>>>>>>>>> edge invar_branch)
> > >>>>>>>>>>>       initialize_original_copy_tables ();
> > >>>>>>>>>>>     
> > >>>>>>>>>>>       struct loop *loop2 = loop_version (loop1, 
> > >>>>>>>>>>> boolean_true_node,
> > >>>>>>>>>>> NULL,
> > >>>>>>>>>>> -                                    
> > >>>>>>>>>>> profile_probability::always (),
> > >>>>>>>>>>> -                                    profile_probability::never 
> > >>>>>>>>>>> (),
> > >>>>>>>>>>> -                                    
> > >>>>>>>>>>> profile_probability::always (),
> > >>>>>>>>>>> -                                    
> > >>>>>>>>>>> profile_probability::always (),
> > >>>>>>>>>>> +                                    
> > >>>>>>>>>>> invar_branch->probability.invert
> > >>>>>>>>>>> (),
> > >>>>>>>>>>> +                                    invar_branch->probability,
> > >>>>>>>>>>> +                                    
> > >>>>>>>>>>> invar_branch->probability.invert
> > >>>>>>>>>>> (),
> > >>>>>>>>>>> +                                    invar_branch->probability,
> > >>>>>>>>>>>                                      true);
> > >>>>>>>>>>>       if (!loop2)
> > >>>>>>>>>>>         {
> > >>>>>>>>>>
> > >>>>>>>>>> The patch introduction seems to talk about do_split_loop_on_cond 
> > >>>>>>>>>> only.
> > >>>>>>>>>
> > >>>>>>>>> split_loop faces similar issue though it sets the two branches to 
> > >>>>>>>>> 100% vs
> > >>>>>>>>> 100%
> > >>>>>>>>> and no scaling which seems also incorrect.
> > >>>>>>>>>
> > >>>>>>>>>> Since loop versioning inserts a condition with the passed 
> > >>>>>>>>>> probabilities
> > >>>>>>>>>> but in this case a 'boolean_true_node' condition the then and 
> > >>>>>>>>>> else
> > >>>>>>>>>> probabilities passed look correct.  It's just the scaling 
> > >>>>>>>>>> arguments
> > >>>>>>>>>> that look wrong?  This loop_version call should get a comment as 
> > >>>>>>>>>> to
> > >>>>>>>>>> why we are passing probabilities the way we do.
> > >>>>>>>>>
> > >>>>>>>>> This optimization is transforming from:
> > >>>>>>>>>
> > >>>>>>>>> for (i = 0; i < n; i = inc (i))
> > >>>>>>>>>       {
> > >>>>>>>>>         if (ga)
> > >>>>>>>>>           ga = do_something ();
> > >>>>>>>>> }
> > >>>>>>>>>
> > >>>>>>>>> to:
> > >>>>>>>>>
> > >>>>>>>>>     for (i = 0; i < x; i = inc (i))
> > >>>>>>>>> {
> > >>>>>>>>>       if (true)
> > >>>>>>>>>           ga = do_something ();
> > >>>>>>>>>           if (!ga)
> > >>>>>>>>>             break;
> > >>>>>>>>> }
> > >>>>>>>>>     for (; i < n; i = inc (i))
> > >>>>>>>>> {
> > >>>>>>>>>       if (false)
> > >>>>>>>>>            ga = do_something ();
> > >>>>>>>>> }
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> `boolean_true_node` is passed in to make the first loop's 
> > >>>>>>>>> condition
> > >>>>>>>>> statement to be 'true', after returning from loop_version, there 
> > >>>>>>>>> is a
> > >>>>>>>>> piece of code forcing the condition in second loop to be false,
> > >>>>>>>>> and the original condition is moved from *in loop* to *exit edge*
> > >>>>>>>>> between loop1 and loop2.
> > >>>>>>>>
> > >>>>>>>> Yes, one complication is that we use loop_version but do not retain
> > >>>>>>>> the CFG it creates.  Something like the vectorizers
> > >>>>>>>> slpeel_tree_duplicate_loop_to_edge_cfg would be a better "fit"
> > >>>>>>>> but then that code doesn't do any scaling yet.  But then
> > >>>>>>>> loop_version uses cfg_hook_duplicate_loop_to_header_edge and I 
> > >>>>>>>> suppose
> > >>>>>>>> we could write a variant that simply doesn't mangle the CFG
> > >>>>>>>> with a new condition switching between both loops but keeps them
> > >>>>>>>> executed after each other ...
> > >>>>>>>>
> > >>>>>>>> As said, this adds to the confusion and some awkwardness.
> > >>>>>>>
> > >>>>>>> Then loop_version in loop split requires two types of variant, one
> > >>>>>>> is to insert condition to loop preheader for 'split_loops' usage,
> > >>>>>>> another is to insert condition to loop exit for 
> > >>>>>>> 'do_split_loop_on_cond'
> > >>>>>>> usage, it needs one extra function to encapsulate these cfg 
> > >>>>>>> alterations
> > >>>>>>> from outside to inside.
> > >>>>>>>
> > >>>>>>> unswitching only execute one loop as it only moves the invariant 
> > >>>>>>> condition
> > >>>>>>> to first loop's pre-header.  While 'split_loops' and
> > >>>>>>> 'do_split_loop_on_cond'
> > >>>>>>> may execute both loops no matter the condition is moved to the 
> > >>>>>>> first loop's
> > >>>>>>> preheader or exit.
> > >>>>>>>
> > >>>>>>> The condition stmt in loop unswitching is invariant, but it is 
> > >>>>>>> variant
> > >>>>>>> in loop splitting, that's why loop unswitching execute only one loop
> > >>>>>>> but loop splitting executes both loops.
> > >>>>>>>
> > >>>>>>> Seems we need two condition arguments for loop_version, one for 
> > >>>>>>> connecting
> > >>>>>>> loop1 preheader to loop2 preheader, another one for connecting 
> > >>>>>>> loop1's exit
> > >>>>>>> to loop2's header?  Then it will be more generic for both 
> > >>>>>>> unswitching pass
> > >>>>>>> and splitting pass.  Looks a bit complicated and conflicted with
> > >>>>>>> loop_version's
> > >>>>>>> comments:
> > >>>>>>>
> > >>>>>>> /* Main entry point for Loop Versioning transformation.
> > >>>>>>>
> > >>>>>>>     This transformation given a condition and a loop, creates
> > >>>>>>>     -if (condition) { loop_copy1 } else { loop_copy2 }, ... */
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> And this only works for loop split usage, those many other places
> > >>>>>>> doesn't use loop_version like this?
> > >>>>>>
> > >>>>>> Yes, as said if you don't want the above CFG then you probably
> > >>>>>> shouldn't use loop_version but instead use its building blocks
> > >>>>>> (and some refactoring of loop_version can make that easier).
> > >>>>>>
> > >>>>>> I think splitting wants
> > >>>>>>
> > >>>>>>    loop_copy1
> > >>>>>>    if (condition)
> > >>>>>>      loop_copy2
> > >>>>>>
> > >>>>>> IMHO it would be good to split 'loopify' into the actual 
> > >>>>>> loopification
> > >>>>>> and the scaling.  Basically make the part of loop_version that
> > >>>>>> copies the loop on the header edge and creates a loop structure for
> > >>>>>> it separate.
> > >>>>>>
> > >>>>>> loop distribution uses slpeel_tree_duplicate_loop_to_edge_cfg
> > >>>>>> (copy_loop_before).
> > >>>>>>
> > >>>>>
> > >>>>> Unfortunately slpeel_tree_duplicate_loop_to_edge_cfg only supports
> > >>>>> copying loops with single exit, it would cause many ICEs in it even
> > >>>>> building GCC stage1 (line 1065, line 1184 due to exit or new_exit
> > >>>>> is NULL returning from single_exit (loop).).  Seems loop_version is
> > >>>>> more flexible for loop split.
> > >>>>
> > >>>> Hmm, sure - loop_version does not need to do anything special with
> > >>>> exits since control flow either enters the original or the loop copy.
> > >>>>
> > >>>> But slpeel_tree_duplicate_loop_to_edge_cfg intends to create
> > >>>> control flow that enters _both_ loops, so it needs to have
> > >>>> an edge from the first loop exit to the second loop entry.
> > >>>>
> > >>>> One could extend this to a region copy, copying eventual CFG merges
> > >>>> of exits and specifying which exit of a SEME region is supposed
> > >>>> to be connected to the original region entry.
> > >>>>
> > >>>> After all that's what loop splitting needs in the end - though not
> > >>>> sure what exactly it does with more than one exit.
> > >>>
> > >>> In tree-ssa-loop-split.c,  split_loop only accepts single exit loop,
> > >>> the recently added split_loop_on_cond could process multiple exits loop.
> > >>>
> > >>> For example, do some changes to the loop-cond-split-1.c,
> > >>>
> > >>> int ga;
> > >>> extern int a;
> > >>> extern int b;
> > >>> extern int c;
> > >>>
> > >>> void test1 (int n) {
> > >>>   int i;
> > >>>   for (i = 0; i < n; i = inc (i)). {
> > >>>       if (a+3 > 0)
> > >>>        break;
> > >>>
> > >>>       if (ga)
> > >>>         ga = do_something ();
> > >>>
> > >>>       for (int j = 0; j < n; j++)
> > >>>         {
> > >>>            b+=5;
> > >>>            if (b > c) break;
> > >>>         }
> > >>>     }
> > >>> }
> > >>>
> > >>> the "if (ga)" will be a new exit edge from loop_copy1 to loop_copy2. 
> > >>> I am not sure whether it is valuable to do semi-invariant loop split to 
> > >>> such
> > >>> cases with multiple exits, but obviously the split_loop_on_cond is a 
> > >>> special
> > >>> case from split_loop both duplicate loop to 
> > >>>    if (condition1) {loop_copy1} if (condition2) {loop_copy2}
> > >>> The only difference is condition1 is true for split_loop_on_cond.
> > >>>
> > >>>>
> > >>>> So there's another set of "loop" copying, gimple_duplicate_sese_region,
> > >>>> which doesn't actually require a single exit but a single "important"
> > >>>> exit.  That might match how you treat multiple exits.
> > >>>
> > >>> gimple_duplicate_sese_region doesn't handle subloops and latches.  
> > >>> Finally,
> > >>> I found that loop_version is still much better
> > >>> than slpeel_tree_duplicate_loop_to_edge_cfg and 
> > >>> gimple_duplicate_sese_region
> > >>> since it could handle all cases like multiple exits/subloops, etc.  I 
> > >>> did some
> > >>> refactor to the code to introduce loop_version2 to create duplicated 
> > >>> loops
> > >>> with two input conditions as attached patch, is this reasonable enough?
> > >>>
> > >>> I also tried to copy the code in loop_version out of it to don't call 
> > >>> loop_version
> > >>> in loop_version2, but it seems useless with many duplicate code and NOT 
> > >>> get rid
> > >>> of creating "if (condition1) {loop_copy1}" at first?
> > >>>
> > >>>
> > >>
> > >> The previous attached patch 
> > >> 0001-Add-loop_version2-to-support-loop-transformation-wit.patch
> > >> had issues when connecting two loops, reason is split_loop connect_loops 
> > >> at *exit* edge,
> > >> do_split_loop_on_cond connect_loops at latch_edge.  So they have 
> > >> different CFGs even
> > >> both with two conditions :(.  It seems not so that useful to also define 
> > >> another new function
> > >> 'connect_loops_at_latch' given the limited usage in loop split only?  
> > >> And the loop_version2
> > >> also looks not so general again ...
> > > 
> > > All I wanted to say is that none of the current high-level APIs we have
> > > are a 1:1 match for loop splitting and that they in fact might do
> > > stuff that's unnecessary or counter-productive.  If inventing a new API
> > > doesn't sound appealing then maybe refactoring an existing
> > > (maybe loop_version) to expose re-usable pieces would make more sense.
> > > 
> > > For example loop_version currently does lv_adjust_loop_entry_edge
> > > before it loopifys the copy inserted on the header.  I wonder if
> > > the condition generation can be done later and thus we can
> > > have three pieces - 1) duplicating the loop on the entry edge,
> > > 2) adjusting the CFG to insert a condition branching to either loop,
> > > 3) from loopify extract the scale_loop_frequencies bits (in fact
> > > loopify is only used from within cfgloopmanip.c)
> > > 
> > > That said, it shouldn't be a requirement to do all this to fix the
> > > profile for loop splitting it's just that the current situation
> > > does not help understanding of how the adjustment works.
> > 
> > The attached patch 0002-Refactor-loop_version.patch is to refactor 
> > loop_version
> > according to your above comments.
> > 
> > loopify is moved before condition generation, scale_loop_frequencies is
> > extracted out of loopify. 
> 
> I think that's a good improvement (and if loopify is unused after the
> patch it should be removed together with it).  The loop copying part
> could then be a new clone_loop_to_header_edge function, or loop_copy
> (rather than loop_version).
> 
> The existing duplicate_loop_to_header_edge function is named badly
> since it does duplicate_loop_body_to_header_edge ...
> 
> > 0001-Fix-loop-split-incorrect-count-and-probability.patch is still the 
> > initial
> > version that fixes the probability and frequency issues in loop split, just 
> > a
> > repeat, both passed regression test on P8LE, OK for master?
> 
> That's still the original patch, I don't see any of the comments addressed
> and I do not have enough knowledge to approve the patch.
> 
> Sorry here, but I really hoped that Honza would eventually chime in.
Sorry, I got bit lost in the thread. What patch I should look at?
Honza
> 
> Richard.

Reply via email to