On Fri, Nov 17, 2023 at 10:04 AM Kewen.Lin <li...@linux.ibm.com> wrote:
>
> on 2023/11/15 17:43, Alexander Monakov wrote:
> >
> > On Wed, 15 Nov 2023, Kewen.Lin wrote:
> >
> >>>> And I suppose it would be OK to do that.  Empty BBs are usually removed 
> >>>> by
> >>>> CFG cleanup so the situation should only happen in rare corner cases 
> >>>> where
> >>>> the fix would be to actually run CFG cleanup ...
> >>>
> >>> Yeah, sel-sched invokes 'cfg_cleanup (0)' up front, and I suppose that
> >>> may be a preferable compromise for sched-rgn as well.
> >>
> >> Inspired by this discussion, I tested the attached patch 1 which is to run
> >> cleanup_cfg (0) first in haifa_sched_init, it's bootstrapped and
> >> regress-tested on x86_64-redhat-linux and powerpc64{,le}-linux-gnu.
> >
> > I don't think you can run cleanup_cfg after sched_init. I would suggest
> > to put it early in schedule_insns.
>
> Thanks for the suggestion, I placed it at the beginning of haifa_sched_init
> instead, since schedule_insns invokes haifa_sched_init, although the
> calls rgn_setup_common_sched_info and rgn_setup_sched_infos are executed
> ahead but they are all "setup" functions, shouldn't affect or be affected
> by this placement.
>
> >
> >> Then I assumed some of the current uses of no_real_insns_p won't encounter
> >> empty blocks any more, so made a patch 2 with some explicit assertions, but
> >> unfortunately I got ICEs during bootstrapping happens in function
> >> compute_priorities.  I'm going to investigate it further and post more
> >> findings, but just heads-up to ensure if this is on the right track.
> >
> > I suspect this may be caused by invoking cleanup_cfg too late.
>
> By looking into some failures, I found that although cleanup_cfg is executed
> there would be still some empty blocks left, by analyzing a few failures there
> are at least such cases:
>   1. empty function body
>   2. block holding a label for return.
>   3. block without any successor.
>   4. block which becomes empty after scheduling some other block.
>   5. block which looks mergeable with its always successor but left.
>   ...
>
> For 1,2, there is one single successor EXIT block, I think they don't affect
> state transition, for 3, it's the same.  For 4, it depends on if we can have
> the assumption this kind of empty block doesn't have the chance to have debug
> insn (like associated debug insn should be moved along), I'm not sure.  For 5,
> a reduced test case is:
>
> #include <istream>
> namespace std {
> template <>
> basic_istream<char> &
> basic_istream<char>::getline (char_type *, streamsize, char_type) __try
> {
>   __streambuf_type *a = rdbuf ();
>   a->sgetc ();
>   while (traits_type::eq_int_type)
>     ;
> }
> __catch (...) {}
> } // namespace std
>
> slim RTL:
>
>     1: NOTE_INSN_DELETED
>     7: NOTE_INSN_BASIC_BLOCK 2
> ...
>    15: r137:CCUNS=cmp(r135:DI,r136:DI)
>       REG_DEAD r136:DI
>       REG_DEAD r135:DI
>    16: pc={(ltu(r137:CCUNS,0))?L24:pc}
>       REG_DEAD r137:CCUNS
>       REG_BR_PROB 966367644
>    17: NOTE_INSN_BASIC_BLOCK 3
>    18: r138:DI=[r122:DI]
>    19: r139:DI=[r138:DI+0x48]
>       REG_DEAD r138:DI
>    20: %3:DI=r122:DI
>       REG_DEAD r122:DI
>    21: %12:DI=r139:DI
>    22: ctr:DI=r139:DI
>       REG_DEAD r139:DI
>    23: %3:DI=call [ctr:DI] argc:0
>       REG_DEAD ctr:DI
>       REG_DEAD %12:DI
>       REG_UNUSED %3:DI
>       REG_EH_REGION 0x1
>       REG_CALL_DECL (nil)
>    24: L24:
>    25: NOTE_INSN_BASIC_BLOCK 4
>    51: L51:
>    48: NOTE_INSN_BASIC_BLOCK 5
>    47: NOTE_INSN_DELETED
>    52: pc=L51
>    53: barrier
>    39: L39:
>    41: NOTE_INSN_BASIC_BLOCK 6
>    32: %3:DI=call [`__cxa_begin_catch'] argc:0
>       REG_UNUSED %3:DI
>       REG_CALL_DECL `__cxa_begin_catch'
>       REG_EH_REGION 0
>    33: call [`__cxa_end_catch'] argc:0
>       REG_CALL_DECL `__cxa_end_catch'
>    34: barrier
>
> ;; basic block 4, loop depth 0, count 10631108 (estimated locally, freq 
> 1.0000), maybe hot
> ;;  prev block 3, next block 5, flags: (REACHABLE, RTL)
> ;;  pred:       3 [always]  count:1063111 (estimated locally, freq 0.1000) 
> (FALLTHRU)
> ;;              2 [90.0% (guessed)]  count:9567997 (estimated locally, freq 
> 0.9000)
> ;;  succ:       5 [always]  count:10631108 (estimated locally, freq 1.0000) 
> (FALLTHRU)
> ;; basic block 5, loop depth 0, count 1073741824 (estimated locally, freq 
> 101.0000), maybe hot
> ;;  prev block 4, next block 6, flags: (REACHABLE, RTL, MODIFIED)
> ;;  pred:       4 [always]  count:10631108 (estimated locally, freq 1.0000) 
> (FALLTHRU)
> ;;              5 [always]  count:1073741824 (estimated locally, freq 
> 101.0000)
> ;;  succ:       5 [always]  count:1073741824 (estimated locally, freq 
> 101.0000)
>
> It looks bb 4 can be merged with bb 5, a miss-opt?  There can be some other 
> cases
> similar to this, either it's miss-opt or not, it seems to affect our 
> assumption.
>
> With the limited findings above so far, I wonder if we still want to go with 
> this
> direction that running cleanup_cfg first and make the assumption that there 
> is no
> such empty block which can cause debug and non-debug state inconsistency?
> IMHO it seems risky to make it.  Thoughts?

Running CFG cleanup shouldn't be the fix to remove such blocks but instead
scheduling shouldn't special case empty blocks as they usually do not appear.

Richard.

>
> BR,
> Kewen

Reply via email to