Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2013-05-11 Thread Steven Bosscher
On Sat, May 11, 2013 at 4:38 PM, Teresa Johnson  wrote:
>   /* If we are partitioning hot/cold basic blocks, we don't want to
>  mess up unconditional or indirect jumps that cross between hot
>  and cold sections.
>
>  Basic block partitioning may result in some jumps that appear to
>  be optimizable (or blocks that appear to be mergeable), but which really
>  must be left untouched (they are required to make it safely across
>  partition boundaries).  See the comments at the top of
>  bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>
> And at least a bunch of these are when we are in cfglayout mode.
>
> But let me locate a reproducer so we can make sure it isn't due to
> some issue with my patch.

It sounds like the issue here is that the partitioning code insists on
having an explicit jump for a section switch even for single_succ_p
blocks i.e. unconditional jump, while normally in cfglayout mode there
are no unconditional jumps.

If that is the problem, then the proper solution is to not have the
explicit jump. Just forward the jump, set the EDGE_CROSSING flag on
the crossing edge, and add a REG_CROSSING_JUMP note on the branching
insn. (FWIW, these REG_CROSSING_JUMP notes are also an aberration --
we have the CFG edges with all the information in them! But oh
well...) If the edge cannot fall through when going out of cfglayout
mode after bbro (and crossing edges can't fall through) then an extra
"forwarder jump" block will be inserted automatically. This may
require some hacking to invert branch conditions for branching insns
such that the branch goes to the other section and that the fallthru
path stays in the same section, but that's something that should be
relatively easy to do in cfgcleanup.

Ciao!
Steven


Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2013-05-11 Thread Teresa Johnson
On Sat, May 11, 2013 at 4:19 AM, Jan Hubicka  wrote:
>> >
>> > BTW2: We badly need to figure out a way to create test cases for FDO... :-(
>>
>> Yes. I had tried testing awhile back with the gcc regression tests and
>> enabling -freorder-blocks-and-partition, but none of the issues I was
>> having with larger benchmarks fired. I think there just aren't enough
>> (or large/complex enough?) FDO tests in gcc.dg/tree-prof and elsewhere
>> to trigger this. I was able to trigger many of the issues when
>> compiling cpu2006 with fdo and partitioning enabled, but it will take
>> some work to cut them down.
>
> Yep, we do not have that many testcases in tree-prof and modifying i.e.
> gcc.c-torture/execute to run with -fprofile-generate/-fprofile-use by default
> is probly bit of overkill. Having easy way to do that optionally may be
> interesting though.
>
> Once -freorder-blocks-and-partition actually works, we should enable it by
> default with -fprofile-generate (I recall I was trying to do that once, but
> I am not sure what was outcome back then and why it did not happen).
> That should get it tested with profiledbootstrap, too.

I don't remember if I tried enabling splitting with a
profiledbootstrap - that sounds like a great stress test. I can try
enabling it locally with my patch and running that as a test.

Teresa

>
> Honza
>>
>> Thanks,
>> Teresa
>>
>> >
>> > Ciao!
>> > Steven
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



--
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2013-05-11 Thread Teresa Johnson
On Sat, May 11, 2013 at 4:44 AM, Steven Bosscher  wrote:
> On Sat, May 11, 2013 at 5:21 AM, Teresa Johnson wrote:
>> Here there was a block that happened to be laid out at the very start
>> of the cold section (it was jumped to from elsewhere, not reached via
>> fall through from its layout predecessor). Thus it was preceded by a
>> switch section note, which was put into the bb header when we entered
>> cfglayout mode for compgoto. The note ended up in the middle of the
>> block when we did some block combining with its cfg predecessor (not
>> the block that preceded it in the layout chain, which was the last hot
>> block in the reorder chain).
>
> Yikes. So we also need an INSN_NOTE verifier to make sure that kind of
> non-sense doesn't happen? More reason to make the note go away :-)
>
> (See my recent patch that added note_outside_basic_block_p for other
> examples of NOTEs ending up where they don't belong. It's one of those
> most-people-want-it-but-nobody-does-it tasks: To create some frame
> work for NOTE_INSN_* notes that is safe and sane...).
>
>
>>> Please make the note go away completely before pass_free_cfg, and earn
>>> greater admiration than Zeus. The note always was wrong, and now
>>> you've shown it's also a problem.
>
> Many, many thanks!

Some encouraging news: I removed the earlier calls to
insert_section_boundary_note and added a call from
rest_of_pass_free_cfg, and I also expanded that routine to do similar
sanity checking as in verify_hot_cold_block_grouping to ensure we only
switch sections once. And cpu2006 built fine with profile feedback and
splitting enabled. I am running them now, but all the previous
problems I chased down were compile-time not run-time, thankfully, so
that is a very good sign. I'll do some more testing with internal
benchmarks.

>
>>> Right, I think it's good that you've centralized this code. But it
>>> seems to me that we should make sure that the hot blocks and cold
>>> blocks are still grouped (i.e. there is only one basic block B such
>>> that BB_PARTITION(B) != BB_PARTITION(B->next_bb). That is something
>>> your code doesn't handle, AFAICT. It's just one thing that's difficult
>>> to maintain, probably there are others. It's also something the
>>> partitioning verifier should check, i.e. that the basic blocks in hot
>>> and cold partitions are properly grouped.
>>
>> Actually, there is already code that verifies this at the end of bbro
>> (verify_hot_cold_block_grouping()). Before bb reordering it doesn't
>> make sense to check this.
>>
>> And AFAICT, after bbro the only place we go into and out of cfglayout
>> mode is compgoto, which duplicates blocks along edges only if they
>> don't cross a partition boundary, and lays out the duplicated block
>> adjacent to the original. I haven't seen any places where this is
>> violated, probably as a result. But it wouldn't be a bad idea to call
>> verify_hot_cold_block_grouping again during the flow verification code
>> once we detect/flag that bbro is complete.
>
> I'd just make it part of verify_flow_info and only let it run if your
> new flag on crtl is set.

Yes, although the new flag is not sufficient since it just indicates
that there was partitioning done and not that we are done with bb
reordering. I don't see a way to detect that we are done with that
pass, so it looks like I will need to add another flag to the rtl_data
struct.

>
> You shouldn't count on compgoto being the last and only pass to go
> into/out-of cfglayout mode. The compiler changes all the time, passes
> get shuffled around from time to time, and target-specific passes can
> be inserted anywhere. I have patches in my queue to lengthen the life
> of the CFG beyond pass_machine_reorg (many machine reorgs are CFG
> aware already anyway) and they should be allowed to go into/out-of
> cfglayout mode also.

Yep, I agree it will be better to add the checking code to catch issues early.

>
>
>
 +  /* Invoke the cleanup again once we are out of cfg layout mode
 + after committing the final bb layout above. This enables removal
 + of forwarding blocks across the hot/cold section boundary when
 + splitting is enabled that were necessary while in cfg layout
 + mode.  */
 +  if (crtl->has_bb_partition)
 +cleanup_cfg (CLEANUP_EXPENSIVE);
>>>
>>> There shouldn't be any forwarder blocks in cfg layout mode. What did
>>> you need this for?
>>
>> This was a performance fix.
>>
>> There is code in try_forward_edges, called from try_optimize_cfg that
>> we call from cleanup_cfg, typically in cfglayout mode, that will not
>> eliminate forwarding blocks when either the given block "b" or its
>> successor block ends with a region-crossing jump. The comments
>> indicate that these need to be left in to ensure we don't fall through
>> across section boundaries, which makes sense. The issue here was that
>> I saw the blocks in the hot partition ending in conditional branches,
>> which had a fall-through to anothe

Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2013-05-11 Thread Jan Hubicka
> On Sat, May 11, 2013 at 1:19 PM, Jan Hubicka wrote:
> > Once -freorder-blocks-and-partition actually works, we should enable it by
> > default with -fprofile-generate (I recall I was trying to do that once, but
> > I am not sure what was outcome back then and why it did not happen).
> > That should get it tested with profiledbootstrap, too.
> 
> I don't think -freorder-blocks-and-partition ever was stable enough to
> work with profiledbootstrap. From day one, it was fragile and not well
> covered in regression testing. I hope the verifiers will make life a
> bit more bearable, and that the fixes from Teresa will allow us to
> enable -freorder-blocks-and-partition with -fprofile-generate.

Yep, I hoped to slowly chase the bugs away but always got scared by
implementation details
> 
> Has anyone ever investigated the effects of
> -freorder-blocks-and-partition vs. the function splitting part if
> flag_partial_inlining (ipa-split.c)?

ipa-split really does splitting just in very special cases where partial
inlining seems possible and feasible. Plus one really split function into
two making it impossible to mix local vars. So it is not a replacement for
partitioning...

I was considering more aggressive outlininning of cold parts in ipa-split,
but did not get it implemented yet.

Honza
> 
> Ciao!
> Steven


Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2013-05-11 Thread Steven Bosscher
On Sat, May 11, 2013 at 1:19 PM, Jan Hubicka wrote:
> Once -freorder-blocks-and-partition actually works, we should enable it by
> default with -fprofile-generate (I recall I was trying to do that once, but
> I am not sure what was outcome back then and why it did not happen).
> That should get it tested with profiledbootstrap, too.

I don't think -freorder-blocks-and-partition ever was stable enough to
work with profiledbootstrap. From day one, it was fragile and not well
covered in regression testing. I hope the verifiers will make life a
bit more bearable, and that the fixes from Teresa will allow us to
enable -freorder-blocks-and-partition with -fprofile-generate.

Has anyone ever investigated the effects of
-freorder-blocks-and-partition vs. the function splitting part if
flag_partial_inlining (ipa-split.c)?

Ciao!
Steven


Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2013-05-11 Thread Steven Bosscher
On Sat, May 11, 2013 at 5:21 AM, Teresa Johnson wrote:
> Here there was a block that happened to be laid out at the very start
> of the cold section (it was jumped to from elsewhere, not reached via
> fall through from its layout predecessor). Thus it was preceded by a
> switch section note, which was put into the bb header when we entered
> cfglayout mode for compgoto. The note ended up in the middle of the
> block when we did some block combining with its cfg predecessor (not
> the block that preceded it in the layout chain, which was the last hot
> block in the reorder chain).

Yikes. So we also need an INSN_NOTE verifier to make sure that kind of
non-sense doesn't happen? More reason to make the note go away :-)

(See my recent patch that added note_outside_basic_block_p for other
examples of NOTEs ending up where they don't belong. It's one of those
most-people-want-it-but-nobody-does-it tasks: To create some frame
work for NOTE_INSN_* notes that is safe and sane...).


>> Please make the note go away completely before pass_free_cfg, and earn
>> greater admiration than Zeus. The note always was wrong, and now
>> you've shown it's also a problem.

Many, many thanks!

>> Right, I think it's good that you've centralized this code. But it
>> seems to me that we should make sure that the hot blocks and cold
>> blocks are still grouped (i.e. there is only one basic block B such
>> that BB_PARTITION(B) != BB_PARTITION(B->next_bb). That is something
>> your code doesn't handle, AFAICT. It's just one thing that's difficult
>> to maintain, probably there are others. It's also something the
>> partitioning verifier should check, i.e. that the basic blocks in hot
>> and cold partitions are properly grouped.
>
> Actually, there is already code that verifies this at the end of bbro
> (verify_hot_cold_block_grouping()). Before bb reordering it doesn't
> make sense to check this.
>
> And AFAICT, after bbro the only place we go into and out of cfglayout
> mode is compgoto, which duplicates blocks along edges only if they
> don't cross a partition boundary, and lays out the duplicated block
> adjacent to the original. I haven't seen any places where this is
> violated, probably as a result. But it wouldn't be a bad idea to call
> verify_hot_cold_block_grouping again during the flow verification code
> once we detect/flag that bbro is complete.

I'd just make it part of verify_flow_info and only let it run if your
new flag on crtl is set.

You shouldn't count on compgoto being the last and only pass to go
into/out-of cfglayout mode. The compiler changes all the time, passes
get shuffled around from time to time, and target-specific passes can
be inserted anywhere. I have patches in my queue to lengthen the life
of the CFG beyond pass_machine_reorg (many machine reorgs are CFG
aware already anyway) and they should be allowed to go into/out-of
cfglayout mode also.



>>> +  /* Invoke the cleanup again once we are out of cfg layout mode
>>> + after committing the final bb layout above. This enables removal
>>> + of forwarding blocks across the hot/cold section boundary when
>>> + splitting is enabled that were necessary while in cfg layout
>>> + mode.  */
>>> +  if (crtl->has_bb_partition)
>>> +cleanup_cfg (CLEANUP_EXPENSIVE);
>>
>> There shouldn't be any forwarder blocks in cfg layout mode. What did
>> you need this for?
>
> This was a performance fix.
>
> There is code in try_forward_edges, called from try_optimize_cfg that
> we call from cleanup_cfg, typically in cfglayout mode, that will not
> eliminate forwarding blocks when either the given block "b" or its
> successor block ends with a region-crossing jump. The comments
> indicate that these need to be left in to ensure we don't fall through
> across section boundaries, which makes sense. The issue here was that
> I saw the blocks in the hot partition ending in conditional branches,
> which had a fall-through to another hot section block, and the
> conditional jump led to yet another block in the hot section that
> simply contained an unconditional jump to a cold section block. So in
> this case when try_forward_edges was called with the block with the
> conditional branch, when we look at its successor (the forwarding
> block), we can't eliminate it since it ends in a region crossing
> branch. I guess the concern is that if the conditional branch sense
> was reversed in cfglayout mode we would end up falling through to a
> different region. But once we leave cfglayout mode that should not
> occur. So I loosened up the checks on the successor block so that it
> is ok if it ends in a region crossing branch when we are in cfgrtl
> mode (and added this call). That way, these forwarding blocks are
> eliminated and we are able to have a region crossing conditional jump
> directly to the cold section block, without the intervening forwarding
> block.

This sounds a bit scary to me. After bbro there shouldn't be any
changes to the basic block layout any

Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2013-05-11 Thread Jan Hubicka
> >
> > BTW2: We badly need to figure out a way to create test cases for FDO... :-(
> 
> Yes. I had tried testing awhile back with the gcc regression tests and
> enabling -freorder-blocks-and-partition, but none of the issues I was
> having with larger benchmarks fired. I think there just aren't enough
> (or large/complex enough?) FDO tests in gcc.dg/tree-prof and elsewhere
> to trigger this. I was able to trigger many of the issues when
> compiling cpu2006 with fdo and partitioning enabled, but it will take
> some work to cut them down.

Yep, we do not have that many testcases in tree-prof and modifying i.e.
gcc.c-torture/execute to run with -fprofile-generate/-fprofile-use by default
is probly bit of overkill. Having easy way to do that optionally may be
interesting though.

Once -freorder-blocks-and-partition actually works, we should enable it by
default with -fprofile-generate (I recall I was trying to do that once, but
I am not sure what was outcome back then and why it did not happen).
That should get it tested with profiledbootstrap, too.

Honza
> 
> Thanks,
> Teresa
> 
> >
> > Ciao!
> > Steven
> 
> 
> 
> --
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2013-05-10 Thread Teresa Johnson
On Fri, May 10, 2013 at 2:00 PM, Steven Bosscher  wrote:
> On Fri, May 10, 2013 at 5:54 PM, Teresa Johnson wrote:
>> The main issue I had here, and why I made this change, is that we go
>> in and out of cfglayout mode several times after bb partitioning and
>> then out_of_cfglayout. The problem was that when we subsequently went
>> in and out of cfglayout mode, the switch text section notes that had
>> been inserted by bbpart were getting messed up (they were moved into
>> the bb header when we enter cfglayout mode and then not being
>> transferred to the correct location upon exit).
>>
>> I investigated trying
>> to keep those in sync, but it is really difficult/impossible to do
>> during cfglayout mode when they are in the header. So I simply strip
>> them out completely on entry to cfglayout mode, and if there were any
>> there on entry, this change ensures that they are restored in the
>> appropriate location upon exit. I'm not sure what is a good
>> alternative?
>
> The problem is that the note exists at all. I'd like to see the note
> go away completely eventually (when we have a CFG all the way through
> pass_final). As a stop-gap, we should not emit the note until
> pass_free_cfg. Up to that point the basic blocks tell what partition
> they're in, and all hot block and cold blocks should be in a sequence.
> So during pass_free_cfg, just walk the basic blocks chain until
> there's a partition change, and emit the note between those two
> blocks.

Ok, let me take a stab at that.

>
>
>> I triggered the same error in 445.gobmk once I applied the
>> thread_prologue_and_epilogue_insns fixes. This is an assert in the
>> dwarf CFI code that complains about a NOTE_INSN_SWITCH_TEXTS_SECTION
>> note not being preceeded by a barrier:
>
> The problem here is that fixup_reorder_chain should force a jump for
> basic blocks in one partition falling through to bb->next. That should
> be dealt with in can_fallthru, which should return false if
> BB_PARTITION (target) != BB_PARTITION (src).

That wasn't the issue in this case.

Here there was a block that happened to be laid out at the very start
of the cold section (it was jumped to from elsewhere, not reached via
fall through from its layout predecessor). Thus it was preceded by a
switch section note, which was put into the bb header when we entered
cfglayout mode for compgoto. The note ended up in the middle of the
block when we did some block combining with its cfg predecessor (not
the block that preceded it in the layout chain, which was the last hot
block in the reorder chain).

So the issue here wasn't making sure there is no fall-through across
section boundaries. There was already code to prevent this, but my
patch contains a few fixes to make sure that is always happening.

>
>
>> The correct solution in my opinion is to strip out the SWITCH note
>> every time we enter cfglayout mode after bbro, and then invoke
>> insert_section_boundary_note when leaving cfglayout (if one was found
>> on entry to that cfglayout mode) to reapply it.
>
> Please make the note go away completely before pass_free_cfg, and earn
> greater admiration than Zeus. The note always was wrong, and now
> you've shown it's also a problem.

Ok, I will try. =)

>
>
>>> * Fixup redirected edges that did not cross partitions before but
>>> apparently do after redirection. This is not supposed to happen in the
>>> first place, so fixing up any of this just papers over an error
>>> elsewhere in the compiler.
>>
>> Looking back through the earlier email exchanges from last fall I
>> found some discussion on this where I had found a couple places
>> causing this to happen. Two were in
>> thread_prologue_and_epilogue_insns: when we duplicated tail blocks or
>> created a new block to hold a simple return, and redirected some of
>> the edges to the new copy. In some cases this caused edges that were
>> previously region crossing to become non-region crossing, and in other
>> cases the reverse happened. I think there are a couple of other cases
>> mentioned in the emails too, but I would need to do some more digging
>> to find them all.
>>
>> There were a few places in the code that tried to detect this type of
>> issue, and fix them up, but they weren't consistent and there were
>> other places that had the same issue. I've centralized all that
>> handling in this patch (fixup_partitions and fixup_partition_crossing,
>> called from a few places where we redirect edges) so that it is more
>> consistent and comprehensive.
>
> Right, I think it's good that you've centralized this code. But it
> seems to me that we should make sure that the hot blocks and cold
> blocks are still grouped (i.e. there is only one basic block B such
> that BB_PARTITION(B) != BB_PARTITION(B->next_bb). That is something
> your code doesn't handle, AFAICT. It's just one thing that's difficult
> to maintain, probably there are others. It's also something the
> partitioning verifier should check, i.e. that the basic blocks in h

Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2013-05-10 Thread Steven Bosscher
On Fri, May 10, 2013 at 11:10 PM, Jan Hubicka  wrote:
>> There shouldn't be any forwarder blocks in cfg layout mode. What did
>> you need this for?
>>
>> BTW2: We badly need to figure out a way to create test cases for FDO... :-(
>
> We have gcc.dg/tree-prof and friends.  What do you need to add?

Something to more easily reproduce bugs and derive test cases from them.

Ciao!
Steven


Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2013-05-10 Thread Jan Hubicka
> There shouldn't be any forwarder blocks in cfg layout mode. What did
> you need this for?
> 
> BTW2: We badly need to figure out a way to create test cases for FDO... :-(

We have gcc.dg/tree-prof and friends.  What do you need to add?

Honza


Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2013-05-10 Thread Steven Bosscher
On Fri, May 10, 2013 at 5:54 PM, Teresa Johnson wrote:
> The main issue I had here, and why I made this change, is that we go
> in and out of cfglayout mode several times after bb partitioning and
> then out_of_cfglayout. The problem was that when we subsequently went
> in and out of cfglayout mode, the switch text section notes that had
> been inserted by bbpart were getting messed up (they were moved into
> the bb header when we enter cfglayout mode and then not being
> transferred to the correct location upon exit).
>
> I investigated trying
> to keep those in sync, but it is really difficult/impossible to do
> during cfglayout mode when they are in the header. So I simply strip
> them out completely on entry to cfglayout mode, and if there were any
> there on entry, this change ensures that they are restored in the
> appropriate location upon exit. I'm not sure what is a good
> alternative?

The problem is that the note exists at all. I'd like to see the note
go away completely eventually (when we have a CFG all the way through
pass_final). As a stop-gap, we should not emit the note until
pass_free_cfg. Up to that point the basic blocks tell what partition
they're in, and all hot block and cold blocks should be in a sequence.
So during pass_free_cfg, just walk the basic blocks chain until
there's a partition change, and emit the note between those two
blocks.


> I triggered the same error in 445.gobmk once I applied the
> thread_prologue_and_epilogue_insns fixes. This is an assert in the
> dwarf CFI code that complains about a NOTE_INSN_SWITCH_TEXTS_SECTION
> note not being preceeded by a barrier:

The problem here is that fixup_reorder_chain should force a jump for
basic blocks in one partition falling through to bb->next. That should
be dealt with in can_fallthru, which should return false if
BB_PARTITION (target) != BB_PARTITION (src).


> The correct solution in my opinion is to strip out the SWITCH note
> every time we enter cfglayout mode after bbro, and then invoke
> insert_section_boundary_note when leaving cfglayout (if one was found
> on entry to that cfglayout mode) to reapply it.

Please make the note go away completely before pass_free_cfg, and earn
greater admiration than Zeus. The note always was wrong, and now
you've shown it's also a problem.


>> * Fixup redirected edges that did not cross partitions before but
>> apparently do after redirection. This is not supposed to happen in the
>> first place, so fixing up any of this just papers over an error
>> elsewhere in the compiler.
>
> Looking back through the earlier email exchanges from last fall I
> found some discussion on this where I had found a couple places
> causing this to happen. Two were in
> thread_prologue_and_epilogue_insns: when we duplicated tail blocks or
> created a new block to hold a simple return, and redirected some of
> the edges to the new copy. In some cases this caused edges that were
> previously region crossing to become non-region crossing, and in other
> cases the reverse happened. I think there are a couple of other cases
> mentioned in the emails too, but I would need to do some more digging
> to find them all.
>
> There were a few places in the code that tried to detect this type of
> issue, and fix them up, but they weren't consistent and there were
> other places that had the same issue. I've centralized all that
> handling in this patch (fixup_partitions and fixup_partition_crossing,
> called from a few places where we redirect edges) so that it is more
> consistent and comprehensive.

Right, I think it's good that you've centralized this code. But it
seems to me that we should make sure that the hot blocks and cold
blocks are still grouped (i.e. there is only one basic block B such
that BB_PARTITION(B) != BB_PARTITION(B->next_bb). That is something
your code doesn't handle, AFAICT. It's just one thing that's difficult
to maintain, probably there are others. It's also something the
partitioning verifier should check, i.e. that the basic blocks in hot
and cold partitions are properly grouped.


BTW1: I don't understand this comment:

> +  /* Invoke the cleanup again once we are out of cfg layout mode
> + after committing the final bb layout above. This enables removal
> + of forwarding blocks across the hot/cold section boundary when
> + splitting is enabled that were necessary while in cfg layout
> + mode.  */
> +  if (crtl->has_bb_partition)
> +cleanup_cfg (CLEANUP_EXPENSIVE);

There shouldn't be any forwarder blocks in cfg layout mode. What did
you need this for?

BTW2: We badly need to figure out a way to create test cases for FDO... :-(

Ciao!
Steven


Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2013-05-10 Thread Teresa Johnson
On Thu, May 9, 2013 at 3:40 PM, Steven Bosscher  wrote:
> On Thu, May 9, 2013 at 11:42 PM, Diego Novillo wrote:
>> On 2013-05-08 01:13 , Teresa Johnson wrote:
>>> -static void
>>> +void
>>>  emit_barrier_after_bb (basic_block bb)
>>>  {
>>>rtx barrier = emit_barrier_after (BB_END (bb));
>>> -  BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>>> +  if (current_ir_type () == IR_RTL_CFGLAYOUT)
>>> +BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>>
>>
>> What if the current IR is not RTL?  Should we fail here?  It doesn't seem
>> like it makes sense to call this from gimple, for instance.
>
> It also makes no sense calling it in IR_RTL_CFGLAYOUT mode. Barriers
> are meaningless in cfglayout mode.

Actually, the change above ensures we can call this routine when *not*
in CFGLAYOUT mode. It was previously only called when we were in
CFGLAYOUT mode (while in bbpart).

(This relates to a conversation we had on an earlier version of the
patch back in the fall, where we discussed this issue of bbpart adding
barriers while in cfglayout mode:
http://gcc.gnu.org/ml/gcc-patches/2012-10/msg02996.html).

>
> -  emit_barrier_after (BB_END (jump_block));
> +  /* We might be in cfg layout mode, and if so, the following routine will
> + insert the barrier correctly.  */
> +  emit_barrier_after_bb (jump_block);
>
> We're practically always in cfglayout mode, but oh well...

Right, not always, which is why I needed to make this change.


On Fri, May 10, 2013 at 5:05 AM, Steven Bosscher  wrote:
> On Fri, May 10, 2013 at 12:57 AM, Xinliang David Li wrote:
>> On Thu, May 9, 2013 at 3:40 PM, Steven Bosscher wrote:
>
>>> This patch mixes up things badly from the point of
>>> what-depends-on-what, the whole approach looks wrong to me.
>>
>>
>> Do you mean the 'source file dependency' or 'logical dependency'?
>>
>> If the former, the code can be easily refactored to remove the
>> dependency. I don't see how the latter can be avoided as bb-partition
>> etc does change cfg states and leads to different actions in cfg
>> layout finalize.
>
> I mean logical dependency. cfglayout is just a representation of the
> CFG, and bb-partition is a code transformation. By making
> fixup_reorder_chain emit the note, you' ve put part of the
> transformation into out-of-cfglayout which is just bogus. You also
> don't put GCSE or loop unrolling in out-of-cfglayout, and this change
> is IMHO in the same category: mixing transformations into internal
> representations. That may be a short-term fix but it is a long-term
> maintenance/cleanups nightmare.

The main issue I had here, and why I made this change, is that we go
in and out of cfglayout mode several times after bb partitioning and
then out_of_cfglayout. The problem was that when we subsequently went
in and out of cfglayout mode, the switch text section notes that had
been inserted by bbpart were getting messed up (they were moved into
the bb header when we enter cfglayout mode and then not being
transferred to the correct location upon exit). I investigated trying
to keep those in sync, but it is really difficult/impossible to do
during cfglayout mode when they are in the header. So I simply strip
them out completely on entry to cfglayout mode, and if there were any
there on entry, this change ensures that they are restored in the
appropriate location upon exit. I'm not sure what is a good
alternative?

I found a more detailed description of why I made this change in our
email exchange from the fall:

--
I triggered the same error in 445.gobmk once I applied the
thread_prologue_and_epilogue_insns fixes. This is an assert in the
dwarf CFI code that complains about a NOTE_INSN_SWITCH_TEXTS_SECTION
note not being preceeded by a barrier:

gcc -c -o engine/utils.o -DSPEC_CPU -DNDEBUG -DHAVE_CONFIG_H -I. -I..
-I../include -I./include   -fprofile-use
-freorder-blocks-and-partition -freorder-blocks -ffunction-sections
 -O2  engine/utils.c

engine/utils.c: In function ‘visible_along_edge’:
engine/utils.c:274:1: internal compiler error: in create_pseudo_cfg,
at dwarf2cfi.c:2742
 }
 ^

In this case the switch section note was inside a BB. What I found was
that this was due to several phases going into and back out of
cfglayout mode again. In this case it was the compgotos phase. There
aren't any computed gotos, but this change occurs during
cfg_layout_initialize (in try_optimize_cfg called via cleanup_cfg).
Here it merged two (non-contiguous) blocks that had a
single-successor/single-predecessor relationship. However, the source
block was previously on the section boundary and had a SWITCH note
prior. This note is put into the header of the bb when we go into
cfglayout mode, and ended up inside the new merged block, which was in
any case not on the new border between the hot and cold sections.

The correct solution in my opinion is to strip out the SWITCH note
every time we enter cfglayout mode after bbro, and then invoke
insert_section_boundary_note when leaving c

Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2013-05-10 Thread Teresa Johnson
On Fri, May 10, 2013 at 4:52 AM, Jan Hubicka  wrote:
>> On 05/07/13 23:13, Teresa Johnson wrote:
>> >--
>> >Revised patch that fixes failures encountered when enabling
>> >-freorder-blocks-and-partition, including the failure reported in PR 53743.
>> >
>> >This includes new verification code to ensure no cold blocks dominate hot
>> >blocks contributed by Steven Bosscher.
>> Seems like a reasonable verification; presumably if we have a cold
>> block dominating a hot block, then the block/edge frequencies are
>> badly broken.  Ah, just saw the comments for the other case where
>> this happens.  cold entry, but hot loop inside pushing over the
>> barrier. Arguably given a cold block in the dominator graph, all its
>> children should have their frequences scaled down to avoid that situation.
>
> Yep, also note that sanity checking anything about frequencies is really hard.
> There are very many places in compiler that necesarilly need to invalidate
> frequencies in weird ways (at least short of rebuilding the whole profile
> from probabilities again).

Yes, as noted in the comments this was in part due to several places
where counts/frequencies were not kept in sync. Rather than try to fix
all of these, or do any scaling of frequencies, the partitioning code
now just enforces that the partitioning is sane w.r.t. the given
counts. This is done during bb partitioning. The sanity checking
routine was also useful for finding places where optimization passes
were splitting edges and causing hot blocks previously reached by both
hot and cold blocks to become dominated by cold blocks (see comments
in commit_edge_insertions in my patch), and making sure they got fixed
up.

But there is the issue of what we should do in the case of an
infrequent but non-zero entry (marked cold by maybe_hot_count_p
because its count is less than the number of training runs) that leads
to a hot loop. The code I added to the partitioning routine
(find_rarely_executed_basic_blocks_and_crossing_edges) will cause the
entry to also be placed in the hot partition. I would argue this is
the desired behavior - if the routine contains code that is very hot
for, say, 1/2 its training runs, the entry and hot loop (and
everything on the path in between) should be in the hot partition.

>
>> I can't really comment on the cfglayout and related stuff -- it was
>> added at a time when I wasn't doing much with GCC and thus I don't
>> know much about it.
>
> I think I can take a look at the cfglayout stuff. Splitting the patch would 
> be great.

Thanks, that would be great. I can split the patch first.

>
> Honza
>>
>> However, I like the changes to record if we've done partitioning and
>> checking those instead of flag_reorder_blocks_and_partition.  That's
>> simple enough that I'd support pulling it out as a separate patch
>> and installing immediately if that can be done so without major
>> headaches.

Ok, thanks, will do.

>>
>> I think we could do something similar with the code to verify the
>> idom of a hot block is also hot.  Though looking at the
>> implementation I wonder if it could be simplified by walking the
>> dominator tree?  I can't look at it real closely tonight though.

Looking at this code again, I agree with you. It looks like it is
going to walk cold bb's more than once and O(n^2) in the worst case. I
will fix this (there are a couple places in the patch that do a walk
to ensure that this is not violated).

>>
>> Could you pull those two logical hunks of work out into individual
>> patches.

Will do. The only complication with splitting out the dominance
checking stuff is that there are a number of changes in the patch to
ensure that we don't violate this (hot block can't be dominated by
cold block). I am not sure it makes sense or will be easy to split all
of these out. I think what I will do is try to pull the big related
chunks of them out to a separate patch (the new verification code, the
code to prevent this in the partitioning routine, and
fixup_partitions), but there are going to be a few places in the other
patch that do some fixups related to this (e.g. in rtl_split_edge)
that I would like to leave in the larger correctness patch.

Thanks,
Teresa

>>
>> jeff



--
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2013-05-10 Thread Teresa Johnson
On Thu, May 9, 2013 at 2:42 PM, Diego Novillo  wrote:
> On 2013-05-08 01:13 , Teresa Johnson wrote:
>>
>> Somehow Rietveld didn't upload the patch properly. I've attached the
>> patch to this email instead. Here is the description:
>
>
> Rietveld has turned out to be far less useful that I had hoped.  If you are
> running ubuntu precise, the upload script is having some bad interaction
> with the server, which makes it to constantly reject your password.
>
> I do not recommend using Rietveld anymore.  I don't really have the cycles
> to invest in fixing the various usability warts we've found. Sorry.

Thanks for the note. The main reason I have tried to keep using
Rietveld is that it sends out the patch inline in the email with the
formatting preserved. I have found that cut-n-paste into a gmail
window messes up the spacing. Do you know of a good way to work around
this issue?

>
>
>> -static void
>> +void
>>  emit_barrier_after_bb (basic_block bb)
>>  {
>>rtx barrier = emit_barrier_after (BB_END (bb));
>> -  BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>> +  if (current_ir_type () == IR_RTL_CFGLAYOUT)
>> +BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>
>
> What if the current IR is not RTL?  Should we fail here?  It doesn't seem
> like it makes sense to call this from gimple, for instance.

This is called only from bb-reorder and cfgrtl, so we should only be
in IR_RTL, I can add an assert to this effect.

More on this change when I respond to Steven's comments.

>
>
>> + several different possibilities. One is that there are edge weight
>> insanities
>> + due to optimization phases that do not properly update basic block
>> profile
>> + counts. The second is that the entry of the function may not be hot,
>> because
>> + it is entered fewer times than the number of profile training runs,
>> but there
>> + is a loop inside the function that causes blocks within the function
>> to be
>> + above the threshold for hotness.  */
>> +  if (cold_bb_count)
>> +{
>> +  bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>> +
>
>
> Move this out into its own function?

Will do.

>
>> +  if (dom_calculated_here)
>> +calculate_dominance_info (CDI_DOMINATORS);
>> +
>> +  /* Keep examining hot bbs until we have either checked them all, or
>> + re-marked all cold bbs hot.  */
>> +  while (! bbs_in_hot_partition.is_empty ()
>> + && cold_bb_count)
>> +{
>> +  basic_block dom_bb;
>> +
>> +  bb = bbs_in_hot_partition.pop ();
>> +  dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb);
>> +
>> +  /* If bb's immediate dominator is also hot then it is ok.  */
>> +  if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION)
>> +continue;
>> +
>> +  /* We have a hot bb with an immediate dominator that is cold.
>> + The dominator needs to be re-marked to hot.  */
>
>
> s/to hot/hot/

ok. Actually, I think s/to hot/as hot/ might sound better.

>
>> Index: cfgrtl.c
>> ===
>> --- cfgrtl.c(revision 198686)
>> +++ cfgrtl.c(working copy)
>> @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "tree.h"
>>  #include "hard-reg-set.h"
>>  #include "basic-block.h"
>> +#include "bb-reorder.h"
>
>
> You may need to modify Makefile.in to declare this new dependency.
>
>> +/* Called when edge E has been redirected to a new destination,
>> +   in order to update the region crossing flag on the edge and
>> +   jump.  */
>> +
>> +static void
>> +fixup_partition_crossing (edge e, basic_block target)
>> +{
>> +  rtx note;
>> +
>> +  gcc_assert (e->dest == target);
>
>
> Then, why not just take a single argument E?

Good idea, will do.

>
>> +fixup_bb_partition (basic_block bb)
>> +{
>> +  edge e;
>> +  edge_iterator ei;
>> +
>> +  /* Now need to make bb's pred edges non-region crossing.  */
>
>
> This is hard to parse.

Ok, how about:

/* Ensure edges to bb reflect its new partition assignment with the appropriate
   region-crossing flag setting.  */

>
>> +  /* Delete any blocks that became unreachable and weren't
>> + already cleaned up, for example during edge forwarding
>> + and convert_jumps_to_returns. This will expose more
>> + opportunities for fixing the partition boundaries here.
>> + Also, the calculation of the dominance graph during verification
>> + will assert if there are unreachable nodes.  */
>> +  delete_unreachable_blocks ();
>
>
> Why not just schedule a CFG cleanup as a prerequisite to this pass?

Which pass? This is called right after we try to optimize the cfg
during cleanup_cfg, which is invoked numerous places. try_optimize_cfg
performs a number of cfg optimizations, some of which can create
unreachable blocks. I found it was much easier to clean this up in one
pass at the end rather that try to detect and fix this up
increment

Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2013-05-10 Thread Steven Bosscher
On Fri, May 10, 2013 at 12:57 AM, Xinliang David Li wrote:
> On Thu, May 9, 2013 at 3:40 PM, Steven Bosscher wrote:

>> This patch mixes up things badly from the point of
>> what-depends-on-what, the whole approach looks wrong to me.
>
>
> Do you mean the 'source file dependency' or 'logical dependency'?
>
> If the former, the code can be easily refactored to remove the
> dependency. I don't see how the latter can be avoided as bb-partition
> etc does change cfg states and leads to different actions in cfg
> layout finalize.

I mean logical dependency. cfglayout is just a representation of the
CFG, and bb-partition is a code transformation. By making
fixup_reorder_chain emit the note, you' ve put part of the
transformation into out-of-cfglayout which is just bogus. You also
don't put GCSE or loop unrolling in out-of-cfglayout, and this change
is IMHO in the same category: mixing transformations into internal
representations. That may be a short-term fix but it is a long-term
maintenance/cleanups nightmare.

Although I've only skimmed the patch, I have noted several issues with it:

* 3 different changes put into a single patch: the
crtl->has_bb_partition change (which looks good to me), the
verification stuff, and various fixes. The patch should be submitted
in 3 parts to make/testing review easier.

* Emitting barriers in cfglayout mode. That's non-sense.

* Fixup redirected edges that did not cross partitions before but
apparently do after redirection. This is not supposed to happen in the
first place, so fixing up any of this just papers over an error
elsewhere in the compiler.

* The fixup_reorder_chain changes I've mentioned above.


Ciao!
Steven


Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2013-05-10 Thread Jan Hubicka
> On 05/07/13 23:13, Teresa Johnson wrote:
> >--
> >Revised patch that fixes failures encountered when enabling
> >-freorder-blocks-and-partition, including the failure reported in PR 53743.
> >
> >This includes new verification code to ensure no cold blocks dominate hot
> >blocks contributed by Steven Bosscher.
> Seems like a reasonable verification; presumably if we have a cold
> block dominating a hot block, then the block/edge frequencies are
> badly broken.  Ah, just saw the comments for the other case where
> this happens.  cold entry, but hot loop inside pushing over the
> barrier. Arguably given a cold block in the dominator graph, all its

Yep, also note that sanity checking anything about frequencies is really hard.
There are very many places in compiler that necesarilly need to invalidate
frequencies in weird ways (at least short of rebuilding the whole profile
from probabilities again).

> I can't really comment on the cfglayout and related stuff -- it was
> added at a time when I wasn't doing much with GCC and thus I don't
> know much about it.

I think I can take a look at the cfglayout stuff. Splitting the patch would be 
great.

Honza
> 
> However, I like the changes to record if we've done partitioning and
> checking those instead of flag_reorder_blocks_and_partition.  That's
> simple enough that I'd support pulling it out as a separate patch
> and installing immediately if that can be done so without major
> headaches.
> 
> I think we could do something similar with the code to verify the
> idom of a hot block is also hot.  Though looking at the
> implementation I wonder if it could be simplified by walking the
> dominator tree?  I can't look at it real closely tonight though.
> 
> Could you pull those two logical hunks of work out into individual
> patches.
> 
> jeff


Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2013-05-09 Thread Jeff Law

On 05/07/13 23:13, Teresa Johnson wrote:

--
Revised patch that fixes failures encountered when enabling
-freorder-blocks-and-partition, including the failure reported in PR 53743.

This includes new verification code to ensure no cold blocks dominate hot
blocks contributed by Steven Bosscher.
Seems like a reasonable verification; presumably if we have a cold block 
dominating a hot block, then the block/edge frequencies are badly 
broken.  Ah, just saw the comments for the other case where this 
happens.  cold entry, but hot loop inside pushing over the barrier. 
Arguably given a cold block in the dominator graph, all its children 
should have their frequences scaled down to avoid that situation.



Additionally, I added a flag to the rtl_data structure to indicate whether
any partitioning was actually performed, so that optimizations which were
conservatively disabled whenever the flag_reorder_blocks_and_partition
is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less
conservative for functions where no partitions were formed (e.g. they are
completely hot).
--

Ok for trunk?
I can't really comment on the cfglayout and related stuff -- it was 
added at a time when I wasn't doing much with GCC and thus I don't know 
much about it.


However, I like the changes to record if we've done partitioning and 
checking those instead of flag_reorder_blocks_and_partition.  That's 
simple enough that I'd support pulling it out as a separate patch and 
installing immediately if that can be done so without major headaches.


I think we could do something similar with the code to verify the idom 
of a hot block is also hot.  Though looking at the implementation I 
wonder if it could be simplified by walking the dominator tree?  I can't 
look at it real closely tonight though.


Could you pull those two logical hunks of work out into individual 
patches.


jeff



Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2013-05-09 Thread Xinliang David Li
On Thu, May 9, 2013 at 3:40 PM, Steven Bosscher  wrote:
> On Thu, May 9, 2013 at 11:42 PM, Diego Novillo wrote:
>> On 2013-05-08 01:13 , Teresa Johnson wrote:
>>> -static void
>>> +void
>>>  emit_barrier_after_bb (basic_block bb)
>>>  {
>>>rtx barrier = emit_barrier_after (BB_END (bb));
>>> -  BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>>> +  if (current_ir_type () == IR_RTL_CFGLAYOUT)
>>> +BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>>
>>
>> What if the current IR is not RTL?  Should we fail here?  It doesn't seem
>> like it makes sense to call this from gimple, for instance.
>
> It also makes no sense calling it in IR_RTL_CFGLAYOUT mode. Barriers
> are meaningless in cfglayout mode.
>
> -  emit_barrier_after (BB_END (jump_block));
> +  /* We might be in cfg layout mode, and if so, the following routine will
> + insert the barrier correctly.  */
> +  emit_barrier_after_bb (jump_block);
>
> We're practically always in cfglayout mode, but oh well...
> +  if (current_ir_type () == IR_RTL_CFGLAYOUT)
>  emit_barrier_after (BB_END (jump_block));
>
>
>>> Index: cfgrtl.c
>>> ===
>>> --- cfgrtl.c(revision 198686)
>>> +++ cfgrtl.c(working copy)
>>> @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "tree.h"
>>>  #include "hard-reg-set.h"
>>>  #include "basic-block.h"
>>> +#include "bb-reorder.h"
>>
>>
>> You may need to modify Makefile.in to declare this new dependency.
>
> Eh, no. cfgrtl should not depend on bb-reorder.
>
> And cfglayout should not depend on basic block partitioning, either,
> so this change:
>
>>  /* Finalize the changes: reorder insn list according to the sequence 
>> specified
>> -   by aux pointers, enter compensation code, rebuild scope forest.  */
>> +   by aux pointers, enter compensation code, rebuild scope forest. If
>> +   this is called when we will FINALIZE_REORDER_BLOCKS, indicate that
>> +   to fixup_reorder_chain so that it can insert the proper switch text
>> +   section notes.  */
>>
>>  void
>> -cfg_layout_finalize (void)
>> +cfg_layout_finalize (bool finalize_reorder_blocks)
>
> is Just Wrong (tm).
>
> This patch mixes up things badly from the point of
> what-depends-on-what, the whole approach looks wrong to me.


Do you mean the 'source file dependency' or 'logical dependency'?

If the former, the code can be easily refactored to remove the
dependency. I don't see how the latter can be avoided as bb-partition
etc does change cfg states and leads to different actions in cfg
layout finalize.

thanks,

David




>
> Ciao!
> Steven


Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2013-05-09 Thread Steven Bosscher
On Thu, May 9, 2013 at 11:42 PM, Diego Novillo wrote:
> On 2013-05-08 01:13 , Teresa Johnson wrote:
>> -static void
>> +void
>>  emit_barrier_after_bb (basic_block bb)
>>  {
>>rtx barrier = emit_barrier_after (BB_END (bb));
>> -  BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>> +  if (current_ir_type () == IR_RTL_CFGLAYOUT)
>> +BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>
>
> What if the current IR is not RTL?  Should we fail here?  It doesn't seem
> like it makes sense to call this from gimple, for instance.

It also makes no sense calling it in IR_RTL_CFGLAYOUT mode. Barriers
are meaningless in cfglayout mode.

-  emit_barrier_after (BB_END (jump_block));
+  /* We might be in cfg layout mode, and if so, the following routine will
+ insert the barrier correctly.  */
+  emit_barrier_after_bb (jump_block);

We're practically always in cfglayout mode, but oh well...
+  if (current_ir_type () == IR_RTL_CFGLAYOUT)
 emit_barrier_after (BB_END (jump_block));


>> Index: cfgrtl.c
>> ===
>> --- cfgrtl.c(revision 198686)
>> +++ cfgrtl.c(working copy)
>> @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "tree.h"
>>  #include "hard-reg-set.h"
>>  #include "basic-block.h"
>> +#include "bb-reorder.h"
>
>
> You may need to modify Makefile.in to declare this new dependency.

Eh, no. cfgrtl should not depend on bb-reorder.

And cfglayout should not depend on basic block partitioning, either,
so this change:

>  /* Finalize the changes: reorder insn list according to the sequence 
> specified
> -   by aux pointers, enter compensation code, rebuild scope forest.  */
> +   by aux pointers, enter compensation code, rebuild scope forest. If
> +   this is called when we will FINALIZE_REORDER_BLOCKS, indicate that
> +   to fixup_reorder_chain so that it can insert the proper switch text
> +   section notes.  */
>
>  void
> -cfg_layout_finalize (void)
> +cfg_layout_finalize (bool finalize_reorder_blocks)

is Just Wrong (tm).

This patch mixes up things badly from the point of
what-depends-on-what, the whole approach looks wrong to me.

Ciao!
Steven


Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2013-05-09 Thread Diego Novillo

On 2013-05-08 01:13 , Teresa Johnson wrote:

Somehow Rietveld didn't upload the patch properly. I've attached the
patch to this email instead. Here is the description:


Rietveld has turned out to be far less useful that I had hoped.  If you 
are running ubuntu precise, the upload script is having some bad 
interaction with the server, which makes it to constantly reject your 
password.


I do not recommend using Rietveld anymore.  I don't really have the 
cycles to invest in fixing the various usability warts we've found. Sorry.




-static void
+void
 emit_barrier_after_bb (basic_block bb)
 {
   rtx barrier = emit_barrier_after (BB_END (bb));
-  BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
+  if (current_ir_type () == IR_RTL_CFGLAYOUT)
+BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);


What if the current IR is not RTL?  Should we fail here?  It doesn't 
seem like it makes sense to call this from gimple, for instance.



+ several different possibilities. One is that there are edge 
weight insanities
+ due to optimization phases that do not properly update basic 
block profile
+ counts. The second is that the entry of the function may not be 
hot, because
+ it is entered fewer times than the number of profile training 
runs, but there
+ is a loop inside the function that causes blocks within the 
function to be

+ above the threshold for hotness.  */
+  if (cold_bb_count)
+{
+  bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
+


Move this out into its own function?


+  if (dom_calculated_here)
+calculate_dominance_info (CDI_DOMINATORS);
+
+  /* Keep examining hot bbs until we have either checked them all, or
+ re-marked all cold bbs hot.  */
+  while (! bbs_in_hot_partition.is_empty ()
+ && cold_bb_count)
+{
+  basic_block dom_bb;
+
+  bb = bbs_in_hot_partition.pop ();
+  dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb);
+
+  /* If bb's immediate dominator is also hot then it is ok.  */
+  if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION)
+continue;
+
+  /* We have a hot bb with an immediate dominator that is cold.
+ The dominator needs to be re-marked to hot.  */


s/to hot/hot/


Index: cfgrtl.c
===
--- cfgrtl.c(revision 198686)
+++ cfgrtl.c(working copy)
@@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree.h"
 #include "hard-reg-set.h"
 #include "basic-block.h"
+#include "bb-reorder.h"


You may need to modify Makefile.in to declare this new dependency.


+/* Called when edge E has been redirected to a new destination,
+   in order to update the region crossing flag on the edge and
+   jump.  */
+
+static void
+fixup_partition_crossing (edge e, basic_block target)
+{
+  rtx note;
+
+  gcc_assert (e->dest == target);


Then, why not just take a single argument E?


+fixup_bb_partition (basic_block bb)
+{
+  edge e;
+  edge_iterator ei;
+
+  /* Now need to make bb's pred edges non-region crossing.  */


This is hard to parse.


+  /* Delete any blocks that became unreachable and weren't
+ already cleaned up, for example during edge forwarding
+ and convert_jumps_to_returns. This will expose more
+ opportunities for fixing the partition boundaries here.
+ Also, the calculation of the dominance graph during verification
+ will assert if there are unreachable nodes.  */
+  delete_unreachable_blocks ();


Why not just schedule a CFG cleanup as a prerequisite to this pass?


A minor formatting nit.  References to locals and function arguments 
should be done in capitals.



Diego.


Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2013-05-07 Thread Teresa Johnson


binyG3NHBftCR.bin
Description: Binary data


Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2013-02-05 Thread Teresa Johnson
Thanks for the confirmation that the -g issue is orthogonal. I did
start to try to address it but got pulled away by some other things
for awhile. I'll see if I can take another stab at it.

In the meantime, could one of the global maintainers take a look at
the patch? I don't want it to get too stale, and without these fixes I
am unable to get -freorder-blocks-and-partition to work at all.

Thanks!
Teresa

On Thu, Jan 31, 2013 at 6:18 AM, Christophe Lyon
 wrote:
> Hello,
>
> Sorry for the long delay (ref http://patchwork.ozlabs.org/patch/199397/)
>
>
>
> On 6 December 2012 20:26, Teresa Johnson  wrote:
>>
>>
>>
>> On Wed, Nov 28, 2012 at 7:48 AM, Christophe Lyon
>>  wrote:
>>>
>>> I have updated my trunk checkout, and I can confirm that eval.c now
>>> compiles with your patch (and the other 4 patches I added to PR55121).
>>
>>
>> good
>>
>>>
>>>
>>> Now, when looking at the whole Spec2k results:
>>> - vpr passes now (used to fail)
>>
>>
>> good
>>
>>>
>>> - gcc, parser, perlbmk bzip2 and twolf no longer build: they all fail
>>> with the same error from gas:
>>> can't resolve `.text.unlikely' {.text.unlikely section} - `.LBB171'
>>> {.text section}
>>> - gap still does not build (same error as above)
>>>
>>> I haven't looked in detail, so I may be missing an obvious patch here.
>>
>>
>> Finally had a chance to get back to this. I was able to reproduce the
>> failure using x86_64 linux with "-freorder-blocks-and-partition -g".
>> However, I am also getting the same failure with a pristine copy of trunk.
>> Can you confirm whether you were seeing any of these failures without my
>> patches, because I believe they are probably a limitation with function
>> splitting and debug info that is orthogonal to my patch.
>>
> Yes I confirm that I see these failures without your patch too; and
> both -freorder-blocks-and-partition and -g are present in my
> command-line.
> And now gap's integer.c fails to compile with a similar error message too.
>
>>>
>>> And I still observe runtime mis-behaviour on crafty, galgel, facerec and
>>> fma3d.
>>
>>
>> I'm not seeing this on x86_64, unfortunately, so it might require some
>> follow-on work to triage and fix.
>>
>> I'll look into the gas failure, but if someone could review this patch in
>> the meantime given that it does improve things considerably (at least
>> without -g), that would be great.
>>
> Indeed.
>
>> Thanks,
>> Teresa
>>
>
> Thanks
> Christophe



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2013-01-31 Thread Christophe Lyon
Hello,

Sorry for the long delay (ref http://patchwork.ozlabs.org/patch/199397/)



On 6 December 2012 20:26, Teresa Johnson  wrote:
>
>
>
> On Wed, Nov 28, 2012 at 7:48 AM, Christophe Lyon
>  wrote:
>>
>> I have updated my trunk checkout, and I can confirm that eval.c now
>> compiles with your patch (and the other 4 patches I added to PR55121).
>
>
> good
>
>>
>>
>> Now, when looking at the whole Spec2k results:
>> - vpr passes now (used to fail)
>
>
> good
>
>>
>> - gcc, parser, perlbmk bzip2 and twolf no longer build: they all fail
>> with the same error from gas:
>> can't resolve `.text.unlikely' {.text.unlikely section} - `.LBB171'
>> {.text section}
>> - gap still does not build (same error as above)
>>
>> I haven't looked in detail, so I may be missing an obvious patch here.
>
>
> Finally had a chance to get back to this. I was able to reproduce the
> failure using x86_64 linux with "-freorder-blocks-and-partition -g".
> However, I am also getting the same failure with a pristine copy of trunk.
> Can you confirm whether you were seeing any of these failures without my
> patches, because I believe they are probably a limitation with function
> splitting and debug info that is orthogonal to my patch.
>
Yes I confirm that I see these failures without your patch too; and
both -freorder-blocks-and-partition and -g are present in my
command-line.
And now gap's integer.c fails to compile with a similar error message too.

>>
>> And I still observe runtime mis-behaviour on crafty, galgel, facerec and
>> fma3d.
>
>
> I'm not seeing this on x86_64, unfortunately, so it might require some
> follow-on work to triage and fix.
>
> I'll look into the gas failure, but if someone could review this patch in
> the meantime given that it does improve things considerably (at least
> without -g), that would be great.
>
Indeed.

> Thanks,
> Teresa
>

Thanks
Christophe


Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2012-11-26 Thread Jack Howarth
On Mon, Nov 26, 2012 at 12:19:55PM -0800, Teresa Johnson wrote:
> Are you sure you have all my changes applied? I applied the 4 patches
> attached to PR55121 into my trunk checkout that has my fixes, and to a
> pristine trunk checkout. I configured and built both for
> --target=arm-none-linux-gnueabi, and built using your options, .i file
> and gcda file. I can reproduce the failure using the pristine trunk
> with your patches but not with my fixed trunk + your patches. (I just
> updated to head to pickup recent changes and get the same result. The
> vec changes required some manual changes to the patch, which I will
> resend shortly.)

Teresa,
Your mailer seems to have corrupted the posted patch with stray
=3D characters and line breaks. Can you repost a copy as an attachment
to the list?
 Jack

> 
> Without my fixes:
> 
> $ ~/extra/gcc_trunk_3_arm-eabi/gcc/cc1 -fpreproce
> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9
> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp
> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use
> -fno-common -o eval.s -freorder-blocks-and-partition
> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
> 2.4.2-p1, MPC version 0.8.1
> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
> 2.4.2-p1, MPC version 0.8.1
> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
> Compiler executable checksum: d19cc60a2f07de08237a8488bb35cd1a
> eval.c: In function ‘Ge’:
> eval.c:792:1: internal compiler error: in df_compact_blocks, at df-core.c:1560
>  }
>  ^
> 0x622f71 df_compact_blocks()
> ../../gcc_trunk_3/gcc/df-core.c:1560
> 0x5cfcb5 compact_blocks()
> ../../gcc_trunk_3/gcc/cfg.c:162
> 0xc9dce0 reorder_basic_blocks
> ../../gcc_trunk_3/gcc/bb-reorder.c:2154
> 0xc9dce0 rest_of_handle_reorder_blocks
> ../../gcc_trunk_3/gcc/bb-reorder.c:2219
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.
> See  for instructions.
> 
> 
> With my fixes:
> 
> $ ~/extra/gcc_trunk_4_arm-eabi/gcc/cc1 -fpreproce
> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9
> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp
> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use
> -fno-common -o eval.s -freorder-blocks-and-partition
> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
> 2.4.2-p1, MPC version 0.8.1
> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
> 2.4.2-p1, MPC version 0.8.1
> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
> Compiler executable checksum: 45b468efa7c981f9afb44c4dac2424f3
> 
> 
> Thanks,
> Teresa
> 
> On Mon, Nov 26, 2012 at 8:25 AM, Christophe Lyon
>  wrote:
> > Hi,
> >
> > I have tested your patch on Spec2000 on ARM, and I can still see
> > several failures caused by:
> > "error: fallthru edge crosses section boundary", including the case
> > described in PR55121.
> >
> > On 26 November 2012 16:55, Teresa Johnson  wrote:
> >> Ping.
> >> Teresa
> >>
> >> On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson  
> >> wrote:
> >>> Revised patch that fixes failures encountered when enabling
> >>> -freorder-blocks-and-partition, including the failure reported in PR 
> >>> 53743.
> >>>
> >>> This includes new verification code to ensure no cold blocks dominate hot
> >>> blocks contributed by Steven Bosscher.
> >>>
> >>> I attempted to make the handling of partition updates through the 
> >>> optimization
> >>> passes much more consistent, removing a number of partial fixes in the 
> >>> code
> >>> stream in the process. The code to fixup partitions (including the 
> >>> BB_PARTITION
> >>> assignement, region crossing jump notes, and switch text section notes) is
> >>> now handled in a few centralized locations. For example, inside
> >>> rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that 
> >>> callers
> >>> don't need to attempt the fixup themselves.
> >>>
> >>> For optimization passes that make adjustments to the cfg while in cfg 
> >>> layout
> >>> mode that are not easy to fix up incrementally, the new routine
> >>> fixup_partitions handles the cleanup globally. This does require 
> >>> calculation
> >>> of the dominance relation, however, as far as I can tell the routines 
> >>> which
> >>> now invoke this global fixup (try_optimize_cfg and commit_edge_insertions)
> >>> are invoked typicall

Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2012-11-26 Thread Teresa Johnson
Are you sure you have all my changes applied? I applied the 4 patches
attached to PR55121 into my trunk checkout that has my fixes, and to a
pristine trunk checkout. I configured and built both for
--target=arm-none-linux-gnueabi, and built using your options, .i file
and gcda file. I can reproduce the failure using the pristine trunk
with your patches but not with my fixed trunk + your patches. (I just
updated to head to pickup recent changes and get the same result. The
vec changes required some manual changes to the patch, which I will
resend shortly.)

Without my fixes:

$ ~/extra/gcc_trunk_3_arm-eabi/gcc/cc1 -fpreproce
ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9
-mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp
-mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use
-fno-common -o eval.s -freorder-blocks-and-partition
GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
2.4.2-p1, MPC version 0.8.1
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
2.4.2-p1, MPC version 0.8.1
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
Compiler executable checksum: d19cc60a2f07de08237a8488bb35cd1a
eval.c: In function ‘Ge’:
eval.c:792:1: internal compiler error: in df_compact_blocks, at df-core.c:1560
 }
 ^
0x622f71 df_compact_blocks()
../../gcc_trunk_3/gcc/df-core.c:1560
0x5cfcb5 compact_blocks()
../../gcc_trunk_3/gcc/cfg.c:162
0xc9dce0 reorder_basic_blocks
../../gcc_trunk_3/gcc/bb-reorder.c:2154
0xc9dce0 rest_of_handle_reorder_blocks
../../gcc_trunk_3/gcc/bb-reorder.c:2219
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.


With my fixes:

$ ~/extra/gcc_trunk_4_arm-eabi/gcc/cc1 -fpreproce
ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9
-mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp
-mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use
-fno-common -o eval.s -freorder-blocks-and-partition
GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
2.4.2-p1, MPC version 0.8.1
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
2.4.2-p1, MPC version 0.8.1
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
Compiler executable checksum: 45b468efa7c981f9afb44c4dac2424f3


Thanks,
Teresa

On Mon, Nov 26, 2012 at 8:25 AM, Christophe Lyon
 wrote:
> Hi,
>
> I have tested your patch on Spec2000 on ARM, and I can still see
> several failures caused by:
> "error: fallthru edge crosses section boundary", including the case
> described in PR55121.
>
> On 26 November 2012 16:55, Teresa Johnson  wrote:
>> Ping.
>> Teresa
>>
>> On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson  
>> wrote:
>>> Revised patch that fixes failures encountered when enabling
>>> -freorder-blocks-and-partition, including the failure reported in PR 53743.
>>>
>>> This includes new verification code to ensure no cold blocks dominate hot
>>> blocks contributed by Steven Bosscher.
>>>
>>> I attempted to make the handling of partition updates through the 
>>> optimization
>>> passes much more consistent, removing a number of partial fixes in the code
>>> stream in the process. The code to fixup partitions (including the 
>>> BB_PARTITION
>>> assignement, region crossing jump notes, and switch text section notes) is
>>> now handled in a few centralized locations. For example, inside
>>> rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that 
>>> callers
>>> don't need to attempt the fixup themselves.
>>>
>>> For optimization passes that make adjustments to the cfg while in cfg layout
>>> mode that are not easy to fix up incrementally, the new routine
>>> fixup_partitions handles the cleanup globally. This does require calculation
>>> of the dominance relation, however, as far as I can tell the routines which
>>> now invoke this global fixup (try_optimize_cfg and commit_edge_insertions)
>>> are invoked typically once (or a small number of times in the case of
>>> try_optimize_cfg) per optimization pass. Additionally, I compared the
>>> -ftime-report output for some large fdo compilations and saw only minimal
>>> increases in the dominance computation times, which were only a tiny percent
>>> of the overall compile time.
>>>
>>> Additionally, I added a flag to the rtl_data structure to indicate whether
>>> any partitioning was actually performed, so that optimizations which were
>>> cons

Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2012-11-26 Thread Christophe Lyon
Hi,

I have tested your patch on Spec2000 on ARM, and I can still see
several failures caused by:
"error: fallthru edge crosses section boundary", including the case
described in PR55121.

On 26 November 2012 16:55, Teresa Johnson  wrote:
> Ping.
> Teresa
>
> On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson  wrote:
>> Revised patch that fixes failures encountered when enabling
>> -freorder-blocks-and-partition, including the failure reported in PR 53743.
>>
>> This includes new verification code to ensure no cold blocks dominate hot
>> blocks contributed by Steven Bosscher.
>>
>> I attempted to make the handling of partition updates through the 
>> optimization
>> passes much more consistent, removing a number of partial fixes in the code
>> stream in the process. The code to fixup partitions (including the 
>> BB_PARTITION
>> assignement, region crossing jump notes, and switch text section notes) is
>> now handled in a few centralized locations. For example, inside
>> rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that 
>> callers
>> don't need to attempt the fixup themselves.
>>
>> For optimization passes that make adjustments to the cfg while in cfg layout
>> mode that are not easy to fix up incrementally, the new routine
>> fixup_partitions handles the cleanup globally. This does require calculation
>> of the dominance relation, however, as far as I can tell the routines which
>> now invoke this global fixup (try_optimize_cfg and commit_edge_insertions)
>> are invoked typically once (or a small number of times in the case of
>> try_optimize_cfg) per optimization pass. Additionally, I compared the
>> -ftime-report output for some large fdo compilations and saw only minimal
>> increases in the dominance computation times, which were only a tiny percent
>> of the overall compile time.
>>
>> Additionally, I added a flag to the rtl_data structure to indicate whether
>> any partitioning was actually performed, so that optimizations which were
>> conservatively disabled whenever the flag_reorder_blocks_and_partition
>> is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less
>> conservative for functions where no partitions were formed (e.g. they are
>> completely hot).
>>
>> Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with 
>> SPEC2006 int
>> benchmarks and internal google benchmarks using profile feedback and
>> -freorder-blocks-and-partition to get more coverage. Ok for trunk?
>>
>> Thanks,
>> Teresa
>>
>> 2012-11-14  Teresa Johnson  
>> Steven Bosscher  
>>
>> * cfghooks.h (cfg_layout_finalize): New parameter.
>> * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize
>> parameter.
>> * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert
>> as this is now done by redirect_edge_and_branch_force.
>> * function.c (thread_prologue_and_epilogue_insns): Insert new bb 
>> after
>> barriers, new cfg_layout_finalize parameter, and don't store exit
>> predecessor BB until after it is potentially split.
>> * function.h (struct rtl_data): New flag has_bb_partition.
>> * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter.
>> * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if
>> any blocks in function actually partitioned.
>> (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean
>> up partitioning.
>> * bb-reorder.c (connect_traces): Only look for partitions and skip
>> block copying if any blocks in function actually partitioned.
>> (emit_barrier_after_bb): Handle insertion in non-cfglayout mode.
>> (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure
>> that no cold blocks dominate a hot block.
>> (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert
>> as this is now done by force_nonfallthru_and_redirect.
>> (add_reg_crossing_jump_notes): Handle the fact that some jumps may
>> already be marked with region crossing note.
>> (reorder_basic_blocks): Only need to verify partitions if any
>> blocks in function actually partitioned.
>> (insert_section_boundary_note): Only need to insert note if any
>> blocks in function actually partitioned.
>> (rest_of_handle_reorder_blocks): New cfg_layout_finalize
>> parameter, and remove call to insert_section_boundary_note as this
>> is now called via cfg_layout_finalize/fixup_reorder_chain.
>> (duplicate_computed_gotos): New cfg_layout_finalize
>> parameter.
>> (partition_hot_cold_basic_blocks): Set flag indicating function
>> has bb partitions.
>> * bb-reorder.h: Declare insert_section_boundary_note and
>> emit_barrier_after_bb, which are no longer static.
>> * basic-block.h: Declare new function fixup_partitions.
>> * cfgrtl.c (t

Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2012-11-26 Thread Teresa Johnson
Ping.
Teresa

On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson  wrote:
> Revised patch that fixes failures encountered when enabling
> -freorder-blocks-and-partition, including the failure reported in PR 53743.
>
> This includes new verification code to ensure no cold blocks dominate hot
> blocks contributed by Steven Bosscher.
>
> I attempted to make the handling of partition updates through the optimization
> passes much more consistent, removing a number of partial fixes in the code
> stream in the process. The code to fixup partitions (including the 
> BB_PARTITION
> assignement, region crossing jump notes, and switch text section notes) is
> now handled in a few centralized locations. For example, inside
> rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that 
> callers
> don't need to attempt the fixup themselves.
>
> For optimization passes that make adjustments to the cfg while in cfg layout
> mode that are not easy to fix up incrementally, the new routine
> fixup_partitions handles the cleanup globally. This does require calculation
> of the dominance relation, however, as far as I can tell the routines which
> now invoke this global fixup (try_optimize_cfg and commit_edge_insertions)
> are invoked typically once (or a small number of times in the case of
> try_optimize_cfg) per optimization pass. Additionally, I compared the
> -ftime-report output for some large fdo compilations and saw only minimal
> increases in the dominance computation times, which were only a tiny percent
> of the overall compile time.
>
> Additionally, I added a flag to the rtl_data structure to indicate whether
> any partitioning was actually performed, so that optimizations which were
> conservatively disabled whenever the flag_reorder_blocks_and_partition
> is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less
> conservative for functions where no partitions were formed (e.g. they are
> completely hot).
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with 
> SPEC2006 int
> benchmarks and internal google benchmarks using profile feedback and
> -freorder-blocks-and-partition to get more coverage. Ok for trunk?
>
> Thanks,
> Teresa
>
> 2012-11-14  Teresa Johnson  
> Steven Bosscher  
>
> * cfghooks.h (cfg_layout_finalize): New parameter.
> * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize
> parameter.
> * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert
> as this is now done by redirect_edge_and_branch_force.
> * function.c (thread_prologue_and_epilogue_insns): Insert new bb after
> barriers, new cfg_layout_finalize parameter, and don't store exit
> predecessor BB until after it is potentially split.
> * function.h (struct rtl_data): New flag has_bb_partition.
> * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter.
> * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if
> any blocks in function actually partitioned.
> (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean
> up partitioning.
> * bb-reorder.c (connect_traces): Only look for partitions and skip
> block copying if any blocks in function actually partitioned.
> (emit_barrier_after_bb): Handle insertion in non-cfglayout mode.
> (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure
> that no cold blocks dominate a hot block.
> (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert
> as this is now done by force_nonfallthru_and_redirect.
> (add_reg_crossing_jump_notes): Handle the fact that some jumps may
> already be marked with region crossing note.
> (reorder_basic_blocks): Only need to verify partitions if any
> blocks in function actually partitioned.
> (insert_section_boundary_note): Only need to insert note if any
> blocks in function actually partitioned.
> (rest_of_handle_reorder_blocks): New cfg_layout_finalize
> parameter, and remove call to insert_section_boundary_note as this
> is now called via cfg_layout_finalize/fixup_reorder_chain.
> (duplicate_computed_gotos): New cfg_layout_finalize
> parameter.
> (partition_hot_cold_basic_blocks): Set flag indicating function
> has bb partitions.
> * bb-reorder.h: Declare insert_section_boundary_note and
> emit_barrier_after_bb, which are no longer static.
> * basic-block.h: Declare new function fixup_partitions.
> * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary
> check for region crossing note.
> (fixup_partition_crossing): New function.
> (fixup_bb_partition): Ditto.
> (rtl_redirect_edge_and_branch): Fixup partition boundaries.
> (force_nonfallthru_and_redirect): Fixup partition boundaries,
> r

Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2012-11-15 Thread Teresa Johnson
Revised patch that fixes failures encountered when enabling
-freorder-blocks-and-partition, including the failure reported in PR 53743.

This includes new verification code to ensure no cold blocks dominate hot
blocks contributed by Steven Bosscher.

I attempted to make the handling of partition updates through the optimization
passes much more consistent, removing a number of partial fixes in the code
stream in the process. The code to fixup partitions (including the BB_PARTITION
assignement, region crossing jump notes, and switch text section notes) is
now handled in a few centralized locations. For example, inside
rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers
don't need to attempt the fixup themselves.

For optimization passes that make adjustments to the cfg while in cfg layout
mode that are not easy to fix up incrementally, the new routine
fixup_partitions handles the cleanup globally. This does require calculation
of the dominance relation, however, as far as I can tell the routines which
now invoke this global fixup (try_optimize_cfg and commit_edge_insertions)
are invoked typically once (or a small number of times in the case of
try_optimize_cfg) per optimization pass. Additionally, I compared the
-ftime-report output for some large fdo compilations and saw only minimal
increases in the dominance computation times, which were only a tiny percent
of the overall compile time.

Additionally, I added a flag to the rtl_data structure to indicate whether
any partitioning was actually performed, so that optimizations which were
conservatively disabled whenever the flag_reorder_blocks_and_partition
is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less
conservative for functions where no partitions were formed (e.g. they are
completely hot).

Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 
int
benchmarks and internal google benchmarks using profile feedback and
-freorder-blocks-and-partition to get more coverage. Ok for trunk?

Thanks,
Teresa

2012-11-14  Teresa Johnson  
Steven Bosscher  

* cfghooks.h (cfg_layout_finalize): New parameter.
* modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize
parameter.
* ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert
as this is now done by redirect_edge_and_branch_force.
* function.c (thread_prologue_and_epilogue_insns): Insert new bb after
barriers, new cfg_layout_finalize parameter, and don't store exit
predecessor BB until after it is potentially split.
* function.h (struct rtl_data): New flag has_bb_partition.
* hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter.
* cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if
any blocks in function actually partitioned.
(try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean
up partitioning.
* bb-reorder.c (connect_traces): Only look for partitions and skip
block copying if any blocks in function actually partitioned.
(emit_barrier_after_bb): Handle insertion in non-cfglayout mode.
(find_rarely_executed_basic_blocks_and_crossing_edges): Ensure
that no cold blocks dominate a hot block.
(fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert
as this is now done by force_nonfallthru_and_redirect.
(add_reg_crossing_jump_notes): Handle the fact that some jumps may
already be marked with region crossing note.
(reorder_basic_blocks): Only need to verify partitions if any
blocks in function actually partitioned.
(insert_section_boundary_note): Only need to insert note if any
blocks in function actually partitioned.
(rest_of_handle_reorder_blocks): New cfg_layout_finalize
parameter, and remove call to insert_section_boundary_note as this
is now called via cfg_layout_finalize/fixup_reorder_chain.
(duplicate_computed_gotos): New cfg_layout_finalize
parameter.
(partition_hot_cold_basic_blocks): Set flag indicating function
has bb partitions.
* bb-reorder.h: Declare insert_section_boundary_note and
emit_barrier_after_bb, which are no longer static.
* basic-block.h: Declare new function fixup_partitions.
* cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary
check for region crossing note.
(fixup_partition_crossing): New function.
(fixup_bb_partition): Ditto.
(rtl_redirect_edge_and_branch): Fixup partition boundaries.
(force_nonfallthru_and_redirect): Fixup partition boundaries,
remove old code that tried to do this. Emit barrier correctly
when we are in cfglayout mode.
(rtl_split_edge): Correctly fixup partition boundaries.
(commit_one_edge_insertion): Remove old code that tried to
fixu

Re: [PATCH] Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2012-11-03 Thread Teresa Johnson
On Tue, Oct 30, 2012 at 10:48 AM, Steven Bosscher  wrote:
> On Tue, Oct 30, 2012 at 6:20 AM, Teresa Johnson wrote:
>> Index: bb-reorder.c
>> ===
>> --- bb-reorder.c(revision 192692)
>> +++ bb-reorder.c(working copy)
>> @@ -2188,6 +2188,8 @@ insert_section_boundary_note (void)
>> first_partition = BB_PARTITION (bb);
>>if (BB_PARTITION (bb) != first_partition)
>> {
>> +  /* There should be a barrier between text sections. */
>> +  emit_barrier_after (BB_END (bb->prev_bb));
>
> So why isn't there one? There can't be a fall-through edge from one
> section to the other, so cfgrtl.c:fixup_reorder_chain should have
> added a barrier here already in the code under the comment:
>
>   /* Now add jumps and labels as needed to match the blocks new
>  outgoing edges.  */
>
> Why isn't it doing that for you?

I triggered the same error in 445.gobmk once I applied the
thread_prologue_and_epilogue_insns fixes. This is an assert in the
dwarf CFI code that complains about a NOTE_INSN_SWITCH_TEXTS_SECTION
note not being preceeded by a barrier:

gcc -c -o engine/utils.o -DSPEC_CPU -DNDEBUG -DHAVE_CONFIG_H -I. -I..
-I../include -I./include   -fprofile-use
-freorder-blocks-and-partition -freorder-blocks -ffunction-sections
 -O2  engine/utils.c

engine/utils.c: In function ‘visible_along_edge’:
engine/utils.c:274:1: internal compiler error: in create_pseudo_cfg,
at dwarf2cfi.c:2742
 }
 ^

In this case the switch section note was inside a BB. What I found was
that this was due to several phases going into and back out of
cfglayout mode again. In this case it was the compgotos phase. There
aren't any computed gotos, but this change occurs during
cfg_layout_initialize (in try_optimize_cfg called via cleanup_cfg).
Here it merged two (non-contiguous) blocks that had a
single-successor/single-predecessor relationship. However, the source
block was previously on the section boundary and had a SWITCH note
prior. This note is put into the header of the bb when we go into
cfglayout mode, and ended up inside the new merged block, which was in
any case not on the new border between the hot and cold sections.

The correct solution in my opinion is to strip out the SWITCH note
every time we enter cfglayout mode after bbro, and then invoke
insert_section_boundary_note when leaving cfglayout (if one was found
on entry to that cfglayout mode) to reapply it. This fixed not only
the 445.gobmk failure, but I also found that I no longer need the
above change to insert a barrier when inserting the SWITCH note
(although now that I think about it more, it must have been the
prologue/epilogue code fix that addressed that).

In any case, the 445.gobmk code also showed another bug, that would
have been caught by your new patch to verify that cold sections never
dominate hot sections. In this case, the func entry block was marked
cold and we switch to hot code part way through the function. The
reason is that the entry bb count was 2 which is < than the # training
runs which is 8. But later on in the code there is a loop which brings
those bb counts above 8, and so they are marked hot. Obviously this
doesn't make sense. The fix I plan to implement to the bbpart code to
ensure no cold blocks dominate hot bbs should address this, but a more
sophisticated algorithm for marking blocks hot vs cold would be better
(either via the structural method you were working on, or by doing
this on traces as part of bbro as David suggested).

My plan is to add in your domination check patch, implement the
domination fixes in the bbpart algorithm, do a bunch more testing and
send the whole shebang out for review.

Thanks,
Teresa


>
> BTW, something else I noted in cfgrtl.c:
> NOTE_INSN_SWITCH_TEXT_SECTIONS shouldn't be copied in
> duplicate_insn_chain, so the following is necessary for robustness:
>
> Index: cfgrtl.c
> ===
> --- cfgrtl.c(revision 191819)
> +++ cfgrtl.c(working copy)
> @@ -3615,7 +3615,6 @@
>   break;
>
> case NOTE_INSN_EPILOGUE_BEG:
> -   case NOTE_INSN_SWITCH_TEXT_SECTIONS:
>   emit_note_copy (insn);
>   break;
>
>
> There can be only one! One note to rule them all! etc.
>
>
>> Index: cfgrtl.c
>> ===
>> --- cfgrtl.c(revision 192692)
>> +++ cfgrtl.c(working copy)
>> @@ -912,7 +912,8 @@ rtl_can_merge_blocks (basic_block a, basic_block b
>>   partition boundaries).  See  the comments at the top of
>>   bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>>
>> -  if (BB_PARTITION (a) != BB_PARTITION (b))
>> +  if (find_reg_note (BB_END (a), REG_CROSSING_JUMP, NULL_RTX)
>> +  || BB_PARTITION (a) != BB_PARTITION (b))
>>  return false;
>
> My dislike for this whole scheme just continues to grow... How can

Re: [PATCH] Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2012-11-03 Thread Teresa Johnson
On Thu, Nov 1, 2012 at 2:26 PM, Teresa Johnson  wrote:
> On Thu, Nov 1, 2012 at 10:19 AM, Teresa Johnson  wrote:
>> On Wed, Oct 31, 2012 at 4:02 PM, Steven Bosscher  
>> wrote:
>>> On Tue, Oct 30, 2012 at 10:43 PM, Teresa Johnson wrote:
 Sure, I will give this a try after your verification patch tests
 complete. Does this mean that the patch you posted above to
 force_nonfallthru_and_redirect is no longer needed either? I'll see if
 I can avoid the need for some of my fixes, although I believe at least
 the function.c one will still be needed. I'll check.
>>>
>>> The force_nonfallthru change is still necessary, because
>>> force_nonfallthru should be almost a no-op in cfglayout mode. The
>>> whole concept of a fallthru edge doesn't exist in cfglayout mode: any
>>> single_succ edge is a fallthru edge until the order of the basic
>>> blocks has been determined and the insn chain is re-linked (cfglayout
>>> mode originally was developed for bb-reorder, to move blocks around
>>> more easily). So the correct patch would actually be:
>>>
>>> Index: cfgrtl.c
>>> ===
>>> --- cfgrtl.c(revision 193046)
>>> +++ cfgrtl.c(working copy)
>>> @@ -4547,7 +4547,7 @@ struct cfg_hooks cfg_layout_rtl_cfg_hooks = {
>>>cfg_layout_split_edge,
>>>rtl_make_forwarder_block,
>>>NULL, /* tidy_fallthru_edge */
>>> -  rtl_force_nonfallthru,
>>> +  NULL, /* force_nonfallthru */
>>>rtl_block_ends_with_call_p,
>>>rtl_block_ends_with_condjump_p,
>>>rtl_flow_call_edges_add,
>>>
>>> (Or better yet: Remove the force_nonfallthru and tidy_fallthru_edge
>>> hooks, they are cfgrtl-only.)
>>>
>>> But obviously that won't work because
>>> bb-reorder.c:fix_up_fall_thru_edges calls this hook while we're in
>>> cfglayout mode. That is a bug. The call to force_nonfallthru results
>>> in a "dangling" barrier:
>>>
>>> cfgrtl.c:1523  emit_barrier_after (BB_END (jump_block));
>>>
>>> In cfglayout mode, barriers don't exist in the insns chain, and they
>>> don't have to because every edge is a fallthru edge. If there are
>>> barriers before cfglayout mode, they are either removed or linked in
>>> the basic block footer, and fixup_reorder_chain restores or inserts
>>> barriers where necessary to drop out of cfglayout mode. This
>>> emit_barrier_after call hangs a barrier after BB_END but not in the
>>> footer, and I'm pretty sure the result will be that the barrier is
>>> lost in fixup_reorder_chain. See also emit_barrier_after_bb for how
>>> inserting a barrier should be done in cfglayout mode.
>>>
>>> So in short, bbpart doesn't know what it wants to be: a cfgrtl or a
>>> cfglayout pass. It doesn't work without cfglayout but it's doing
>>> things that are only correct in the cfgrtl world and Very Wrong Indeed
>>> in cfglayout-land.
>>>
>>>
 Regarding your earlier question about why we needed to add the
 barrier, I need to dig up the details again but essentially I found
 that the barriers were being added by bbpart, but bbro was reordering
 things and the block that ended up at the border between the hot and
 cold section didn't necessarily have a barrier on it because it was
 not previously at the region boundary.
>>>
>>> That doesn't sound right. bbpart doesn't actually re-order the basic
>>> blocks, it only marks the blocks with the partition they will be
>>> assigned to. Whatever ends up at the border between the two partitions
>>> is not relevant: the hot section cannot end in a fall-through edge to
>>> the cold section (verify_flow_info even checks for that, see "fallthru
>>> edge crosses section boundary (bb %i)") so it must end in some
>>> explicit jump. Such jumps are always followed by a barrier. The only
>>> reason I can think of why there might be a missing barrier, is because
>>> fixup_reorder_chain has a bug and forgets to insert the barrier in
>>> some cases (and I suspect this may be the case for return patterns, or
>>> the a.m. issue of a dropper barrier).
>>>
>>> I would like to work on debugging this, but it's hard without test cases...
>>
>> I'm working on trying to reproduce some of these failures in a test
>> case I can share. So far, I have only been able to reproduce the
>> failure reported in PR 53743 in spec2006 (456.hmmer/sre_math.c). Still
>> working on getting a smaller/shareable test case for the other 2
>> issues.
>>
>> The failure in PR 53743 (assert in cfg_layout_merge_blocks) is what I
>> had fixed with my original changes to cfgrtl.c. Need to understand why
>> there is a reg crossing note between 2 bbs in the same partition.
>
> Interestingly, this turned out to be the same root cause as the
> verify_flow_info failures below. It is fixed by the same fix to
> thread_prologue_and_epilogue_insns. When the code below created the
> copy_bb and put it in e->src's partition, it made it insufficient for
> the merge blocks routine to check if the two bbs were in the same
> partition, because th

Re: [PATCH] Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2012-11-01 Thread Steven Bosscher
On Thu, Nov 1, 2012 at 10:26 PM, Teresa Johnson wrote:
> I'll do some testing of the fix below, but do you have any comments on
> the correctness or the potential issue I raised (see my note just
> below the patch)?

Sorry, I don't know the pro- and epilogue threading code well enough
to be of any help to you...

> Do you recommend pursuing the move of the bb partition phase until
> later, after we leave cfglayout mode? I need to revisit to see if my
> prologue/epilogue fix below also addresses the issue I saw when I
> tried moving it.

I think it should not be moved, cfglayout mode is the natural mode for
this kind of CFG transformations. But someone will have to tackle the
force_nonfallthru issue.

Ciao!
Steven


Re: [PATCH] Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2012-11-01 Thread Teresa Johnson
On Tue, Oct 30, 2012 at 10:48 AM, Steven Bosscher  wrote:
> On Tue, Oct 30, 2012 at 6:20 AM, Teresa Johnson wrote:
>> Index: bb-reorder.c
>> ===
>> --- bb-reorder.c(revision 192692)
>> +++ bb-reorder.c(working copy)
>> @@ -2188,6 +2188,8 @@ insert_section_boundary_note (void)
>> first_partition = BB_PARTITION (bb);
>>if (BB_PARTITION (bb) != first_partition)
>> {
>> +  /* There should be a barrier between text sections. */
>> +  emit_barrier_after (BB_END (bb->prev_bb));
>
> So why isn't there one? There can't be a fall-through edge from one
> section to the other, so cfgrtl.c:fixup_reorder_chain should have
> added a barrier here already in the code under the comment:
>
>   /* Now add jumps and labels as needed to match the blocks new
>  outgoing edges.  */
>
> Why isn't it doing that for you?
>
> BTW, something else I noted in cfgrtl.c:
> NOTE_INSN_SWITCH_TEXT_SECTIONS shouldn't be copied in
> duplicate_insn_chain, so the following is necessary for robustness:
>
> Index: cfgrtl.c
> ===
> --- cfgrtl.c(revision 191819)
> +++ cfgrtl.c(working copy)
> @@ -3615,7 +3615,6 @@
>   break;
>
> case NOTE_INSN_EPILOGUE_BEG:
> -   case NOTE_INSN_SWITCH_TEXT_SECTIONS:
>   emit_note_copy (insn);
>   break;

Shouldn't the patch be:

@@ -3630,10 +3631,11 @@ duplicate_insn_chain (rtx from, rtx to)
case NOTE_INSN_FUNCTION_BEG:
  /* There is always just single entry to function.  */
case NOTE_INSN_BASIC_BLOCK:
+  /* We should only switch text sections once.  */
+   case NOTE_INSN_SWITCH_TEXT_SECTIONS:
  break;

case NOTE_INSN_EPILOGUE_BEG:
-   case NOTE_INSN_SWITCH_TEXT_SECTIONS:
  emit_note_copy (insn);
  break;


i.e. move the NOTE above to where we will ignore it. Otherwise, we
would fall into the default case which is listed as unreachable.

Teresa

>
>
> There can be only one! One note to rule them all! etc.
>
>
>> Index: cfgrtl.c
>> ===
>> --- cfgrtl.c(revision 192692)
>> +++ cfgrtl.c(working copy)
>> @@ -912,7 +912,8 @@ rtl_can_merge_blocks (basic_block a, basic_block b
>>   partition boundaries).  See  the comments at the top of
>>   bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>>
>> -  if (BB_PARTITION (a) != BB_PARTITION (b))
>> +  if (find_reg_note (BB_END (a), REG_CROSSING_JUMP, NULL_RTX)
>> +  || BB_PARTITION (a) != BB_PARTITION (b))
>>  return false;
>
> My dislike for this whole scheme just continues to grow... How can
> there be a REG_CROSSING_JUMP note if BB_PARTITION(a)==BB_PARTITION(b)?
> That is a bug. We should not need the notes here.
>
> As long as we have the CFG, BB_PARTITION(a)==BB_PARTITION(b) should be
> the canonical way to check whether two blocks are in the same
> partition, and the EDGE_CROSSING flag should be set iff an edge
> crosses from one section to another. The REG_CROSSING_JUMP note should
> only be used to see if a JUMP_INSN may jump to another section,
> without having to check all successor edges.
>
> Any place where we have to check the BB_PARTITION or
> edge->flags&EDGE_CROSSING *and* REG_CROSSING_JUMP indicates a bug in
> the partitioning updating.
>
> Another BTW: sched-vis.c doesn't handle REG_CROSSING_JUMP notes so
> that slim RTL dumping breaks. I need this patchlet to make things
> work:
> Index: sched-vis.c
> ===
> --- sched-vis.c (revision 191819)
> +++ sched-vis.c (working copy)
> @@ -553,6 +553,11 @@
>  {
>char t1[BUF_LEN], t2[BUF_LEN], t3[BUF_LEN];
>
> +  if (! x)
> +{
> +  sprintf (buf, "(nil)");
> +  return;
> +}
>switch (GET_CODE (x))
>  {
>  case SET:
>
> Ciao!
> Steven



--
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH] Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2012-11-01 Thread Teresa Johnson
On Thu, Nov 1, 2012 at 10:19 AM, Teresa Johnson  wrote:
> On Wed, Oct 31, 2012 at 4:02 PM, Steven Bosscher  
> wrote:
>> On Tue, Oct 30, 2012 at 10:43 PM, Teresa Johnson wrote:
>>> Sure, I will give this a try after your verification patch tests
>>> complete. Does this mean that the patch you posted above to
>>> force_nonfallthru_and_redirect is no longer needed either? I'll see if
>>> I can avoid the need for some of my fixes, although I believe at least
>>> the function.c one will still be needed. I'll check.
>>
>> The force_nonfallthru change is still necessary, because
>> force_nonfallthru should be almost a no-op in cfglayout mode. The
>> whole concept of a fallthru edge doesn't exist in cfglayout mode: any
>> single_succ edge is a fallthru edge until the order of the basic
>> blocks has been determined and the insn chain is re-linked (cfglayout
>> mode originally was developed for bb-reorder, to move blocks around
>> more easily). So the correct patch would actually be:
>>
>> Index: cfgrtl.c
>> ===
>> --- cfgrtl.c(revision 193046)
>> +++ cfgrtl.c(working copy)
>> @@ -4547,7 +4547,7 @@ struct cfg_hooks cfg_layout_rtl_cfg_hooks = {
>>cfg_layout_split_edge,
>>rtl_make_forwarder_block,
>>NULL, /* tidy_fallthru_edge */
>> -  rtl_force_nonfallthru,
>> +  NULL, /* force_nonfallthru */
>>rtl_block_ends_with_call_p,
>>rtl_block_ends_with_condjump_p,
>>rtl_flow_call_edges_add,
>>
>> (Or better yet: Remove the force_nonfallthru and tidy_fallthru_edge
>> hooks, they are cfgrtl-only.)
>>
>> But obviously that won't work because
>> bb-reorder.c:fix_up_fall_thru_edges calls this hook while we're in
>> cfglayout mode. That is a bug. The call to force_nonfallthru results
>> in a "dangling" barrier:
>>
>> cfgrtl.c:1523  emit_barrier_after (BB_END (jump_block));
>>
>> In cfglayout mode, barriers don't exist in the insns chain, and they
>> don't have to because every edge is a fallthru edge. If there are
>> barriers before cfglayout mode, they are either removed or linked in
>> the basic block footer, and fixup_reorder_chain restores or inserts
>> barriers where necessary to drop out of cfglayout mode. This
>> emit_barrier_after call hangs a barrier after BB_END but not in the
>> footer, and I'm pretty sure the result will be that the barrier is
>> lost in fixup_reorder_chain. See also emit_barrier_after_bb for how
>> inserting a barrier should be done in cfglayout mode.
>>
>> So in short, bbpart doesn't know what it wants to be: a cfgrtl or a
>> cfglayout pass. It doesn't work without cfglayout but it's doing
>> things that are only correct in the cfgrtl world and Very Wrong Indeed
>> in cfglayout-land.
>>
>>
>>> Regarding your earlier question about why we needed to add the
>>> barrier, I need to dig up the details again but essentially I found
>>> that the barriers were being added by bbpart, but bbro was reordering
>>> things and the block that ended up at the border between the hot and
>>> cold section didn't necessarily have a barrier on it because it was
>>> not previously at the region boundary.
>>
>> That doesn't sound right. bbpart doesn't actually re-order the basic
>> blocks, it only marks the blocks with the partition they will be
>> assigned to. Whatever ends up at the border between the two partitions
>> is not relevant: the hot section cannot end in a fall-through edge to
>> the cold section (verify_flow_info even checks for that, see "fallthru
>> edge crosses section boundary (bb %i)") so it must end in some
>> explicit jump. Such jumps are always followed by a barrier. The only
>> reason I can think of why there might be a missing barrier, is because
>> fixup_reorder_chain has a bug and forgets to insert the barrier in
>> some cases (and I suspect this may be the case for return patterns, or
>> the a.m. issue of a dropper barrier).
>>
>> I would like to work on debugging this, but it's hard without test cases...
>
> I'm working on trying to reproduce some of these failures in a test
> case I can share. So far, I have only been able to reproduce the
> failure reported in PR 53743 in spec2006 (456.hmmer/sre_math.c). Still
> working on getting a smaller/shareable test case for the other 2
> issues.
>
> The failure in PR 53743 (assert in cfg_layout_merge_blocks) is what I
> had fixed with my original changes to cfgrtl.c. Need to understand why
> there is a reg crossing note between 2 bbs in the same partition.

Interestingly, this turned out to be the same root cause as the
verify_flow_info failures below. It is fixed by the same fix to
thread_prologue_and_epilogue_insns. When the code below created the
copy_bb and put it in e->src's partition, it made it insufficient for
the merge blocks routine to check if the two bbs were in the same
partition, because they were in the same partition but separated by
the region crossing jump.

I'll do some testing of the fix below, but do you have any comments on
the correctn

Re: [PATCH] Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2012-11-01 Thread Teresa Johnson
On Wed, Oct 31, 2012 at 4:02 PM, Steven Bosscher  wrote:
> On Tue, Oct 30, 2012 at 10:43 PM, Teresa Johnson wrote:
>> Sure, I will give this a try after your verification patch tests
>> complete. Does this mean that the patch you posted above to
>> force_nonfallthru_and_redirect is no longer needed either? I'll see if
>> I can avoid the need for some of my fixes, although I believe at least
>> the function.c one will still be needed. I'll check.
>
> The force_nonfallthru change is still necessary, because
> force_nonfallthru should be almost a no-op in cfglayout mode. The
> whole concept of a fallthru edge doesn't exist in cfglayout mode: any
> single_succ edge is a fallthru edge until the order of the basic
> blocks has been determined and the insn chain is re-linked (cfglayout
> mode originally was developed for bb-reorder, to move blocks around
> more easily). So the correct patch would actually be:
>
> Index: cfgrtl.c
> ===
> --- cfgrtl.c(revision 193046)
> +++ cfgrtl.c(working copy)
> @@ -4547,7 +4547,7 @@ struct cfg_hooks cfg_layout_rtl_cfg_hooks = {
>cfg_layout_split_edge,
>rtl_make_forwarder_block,
>NULL, /* tidy_fallthru_edge */
> -  rtl_force_nonfallthru,
> +  NULL, /* force_nonfallthru */
>rtl_block_ends_with_call_p,
>rtl_block_ends_with_condjump_p,
>rtl_flow_call_edges_add,
>
> (Or better yet: Remove the force_nonfallthru and tidy_fallthru_edge
> hooks, they are cfgrtl-only.)
>
> But obviously that won't work because
> bb-reorder.c:fix_up_fall_thru_edges calls this hook while we're in
> cfglayout mode. That is a bug. The call to force_nonfallthru results
> in a "dangling" barrier:
>
> cfgrtl.c:1523  emit_barrier_after (BB_END (jump_block));
>
> In cfglayout mode, barriers don't exist in the insns chain, and they
> don't have to because every edge is a fallthru edge. If there are
> barriers before cfglayout mode, they are either removed or linked in
> the basic block footer, and fixup_reorder_chain restores or inserts
> barriers where necessary to drop out of cfglayout mode. This
> emit_barrier_after call hangs a barrier after BB_END but not in the
> footer, and I'm pretty sure the result will be that the barrier is
> lost in fixup_reorder_chain. See also emit_barrier_after_bb for how
> inserting a barrier should be done in cfglayout mode.
>
> So in short, bbpart doesn't know what it wants to be: a cfgrtl or a
> cfglayout pass. It doesn't work without cfglayout but it's doing
> things that are only correct in the cfgrtl world and Very Wrong Indeed
> in cfglayout-land.
>
>
>> Regarding your earlier question about why we needed to add the
>> barrier, I need to dig up the details again but essentially I found
>> that the barriers were being added by bbpart, but bbro was reordering
>> things and the block that ended up at the border between the hot and
>> cold section didn't necessarily have a barrier on it because it was
>> not previously at the region boundary.
>
> That doesn't sound right. bbpart doesn't actually re-order the basic
> blocks, it only marks the blocks with the partition they will be
> assigned to. Whatever ends up at the border between the two partitions
> is not relevant: the hot section cannot end in a fall-through edge to
> the cold section (verify_flow_info even checks for that, see "fallthru
> edge crosses section boundary (bb %i)") so it must end in some
> explicit jump. Such jumps are always followed by a barrier. The only
> reason I can think of why there might be a missing barrier, is because
> fixup_reorder_chain has a bug and forgets to insert the barrier in
> some cases (and I suspect this may be the case for return patterns, or
> the a.m. issue of a dropper barrier).
>
> I would like to work on debugging this, but it's hard without test cases...

I'm working on trying to reproduce some of these failures in a test
case I can share. So far, I have only been able to reproduce the
failure reported in PR 53743 in spec2006 (456.hmmer/sre_math.c). Still
working on getting a smaller/shareable test case for the other 2
issues.

The failure in PR 53743 (assert in cfg_layout_merge_blocks) is what I
had fixed with my original changes to cfgrtl.c. Need to understand why
there is a reg crossing note between 2 bbs in the same partition.

In the hmmer test case I also hit a failures in rtl_verify_flow_info
and rtl_verify_flow_info_1:

gcc  -c -o sre_math.o -DSPEC_CPU -D
NDEBUG-fprofile-use -freorder-blocks-and-partition   -O2
 sre_math.c
sre_math.c: In function ‘Gammln’:
sre_math.c:161:1: error: EDGE_CROSSING incorrectly set across same section
 }
 ^
sre_math.c:161:1: error: missing barrier after block 6
sre_math.c:161:1: internal compiler error: verify_flow_info failed


This was due to some code in thread_prologue_and_epilogue_insns that
duplicated tail blocks:

if (e)
  {
copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src))

Re: [PATCH] Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2012-11-01 Thread Christophe Lyon
> I would like to work on debugging this, but it's hard without test cases...
Maybe the files I attached to my PR55121 could help you in this respect?

Your "sanity checking" patching does complain with these input files.


Christophe.


Re: [PATCH] Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2012-10-31 Thread Steven Bosscher
On Tue, Oct 30, 2012 at 10:43 PM, Teresa Johnson wrote:
> Sure, I will give this a try after your verification patch tests
> complete. Does this mean that the patch you posted above to
> force_nonfallthru_and_redirect is no longer needed either? I'll see if
> I can avoid the need for some of my fixes, although I believe at least
> the function.c one will still be needed. I'll check.

The force_nonfallthru change is still necessary, because
force_nonfallthru should be almost a no-op in cfglayout mode. The
whole concept of a fallthru edge doesn't exist in cfglayout mode: any
single_succ edge is a fallthru edge until the order of the basic
blocks has been determined and the insn chain is re-linked (cfglayout
mode originally was developed for bb-reorder, to move blocks around
more easily). So the correct patch would actually be:

Index: cfgrtl.c
===
--- cfgrtl.c(revision 193046)
+++ cfgrtl.c(working copy)
@@ -4547,7 +4547,7 @@ struct cfg_hooks cfg_layout_rtl_cfg_hooks = {
   cfg_layout_split_edge,
   rtl_make_forwarder_block,
   NULL, /* tidy_fallthru_edge */
-  rtl_force_nonfallthru,
+  NULL, /* force_nonfallthru */
   rtl_block_ends_with_call_p,
   rtl_block_ends_with_condjump_p,
   rtl_flow_call_edges_add,

(Or better yet: Remove the force_nonfallthru and tidy_fallthru_edge
hooks, they are cfgrtl-only.)

But obviously that won't work because
bb-reorder.c:fix_up_fall_thru_edges calls this hook while we're in
cfglayout mode. That is a bug. The call to force_nonfallthru results
in a "dangling" barrier:

cfgrtl.c:1523  emit_barrier_after (BB_END (jump_block));

In cfglayout mode, barriers don't exist in the insns chain, and they
don't have to because every edge is a fallthru edge. If there are
barriers before cfglayout mode, they are either removed or linked in
the basic block footer, and fixup_reorder_chain restores or inserts
barriers where necessary to drop out of cfglayout mode. This
emit_barrier_after call hangs a barrier after BB_END but not in the
footer, and I'm pretty sure the result will be that the barrier is
lost in fixup_reorder_chain. See also emit_barrier_after_bb for how
inserting a barrier should be done in cfglayout mode.

So in short, bbpart doesn't know what it wants to be: a cfgrtl or a
cfglayout pass. It doesn't work without cfglayout but it's doing
things that are only correct in the cfgrtl world and Very Wrong Indeed
in cfglayout-land.


> Regarding your earlier question about why we needed to add the
> barrier, I need to dig up the details again but essentially I found
> that the barriers were being added by bbpart, but bbro was reordering
> things and the block that ended up at the border between the hot and
> cold section didn't necessarily have a barrier on it because it was
> not previously at the region boundary.

That doesn't sound right. bbpart doesn't actually re-order the basic
blocks, it only marks the blocks with the partition they will be
assigned to. Whatever ends up at the border between the two partitions
is not relevant: the hot section cannot end in a fall-through edge to
the cold section (verify_flow_info even checks for that, see "fallthru
edge crosses section boundary (bb %i)") so it must end in some
explicit jump. Such jumps are always followed by a barrier. The only
reason I can think of why there might be a missing barrier, is because
fixup_reorder_chain has a bug and forgets to insert the barrier in
some cases (and I suspect this may be the case for return patterns, or
the a.m. issue of a dropper barrier).

I would like to work on debugging this, but it's hard without test cases...

Ciao!
Steven


Re: [PATCH] Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2012-10-30 Thread Teresa Johnson
On Tue, Oct 30, 2012 at 2:33 PM, Steven Bosscher  wrote:
> On Tue, Oct 30, 2012 at 10:28 PM, Steven Bosscher wrote:
>> Hello Teresa,
>>
>> Could you try this patch for me also? It moves bbpart outside the part
>> of the passes pipeline that works in cfglayout mode.
>
> where's the "unsend" button if you need it...
>
> So, to complete the mail...
>
> Could you try this patch for me also? It moves bbpart outside the part
> of the passes pipeline that works in cfglayout mode. It looks like
> when someone (/me looks the other way) changed the compiler to work
> like that, he/she forgot about updating this pass...
>
> Instead, the pass should run just before register allocation, as late
> as possible so that any funny CFG modifications have taken place. A
> possible down-side is that profile info may have degenerated a bit
> further, but I don't think that's a serious concern because the
> partitioning is actually quite stupid: Just stuff all blocks with
> count==0 into the cold section. Updating 0-counts is easy enough that
> I think GCC should get that right everywhere.
>
> It'd be nice to figure out if there are less stupid^Wsimplistic
> heuristics to decide what should go into the cold partition, for GCC
> 4.9...
>
> Ciao!
> Steven
>
>
> * passes.c (init_optimization_passes): Move
> pass_partition_blocks just before RA.
>
> Index: passes.c
> ===
> --- passes.c(revision 192995)
> +++ passes.c(working copy)
> @@ -1595,7 +1595,6 @@ init_optimization_passes (void)
>NEXT_PASS (pass_ud_rtl_dce);
>NEXT_PASS (pass_combine);
>NEXT_PASS (pass_if_after_combine);
> -  NEXT_PASS (pass_partition_blocks);
>NEXT_PASS (pass_regmove);
>NEXT_PASS (pass_outof_cfg_layout_mode);
>NEXT_PASS (pass_split_all_insns);
> @@ -1606,6 +1605,7 @@ init_optimization_passes (void)
>NEXT_PASS (pass_match_asm_constraints);
>NEXT_PASS (pass_sms);
>NEXT_PASS (pass_sched);
> +  NEXT_PASS (pass_partition_blocks);
>NEXT_PASS (pass_ira);
>NEXT_PASS (pass_reload);
>NEXT_PASS (pass_postreload);


This doesn't quite work for me. I got an error because bbpart has
PROP_cfglayout listed in its required properties. I tried removing
that, but then verify_flow_info gives an error about a missing barrier
after a bb with no fall through edges.

Teresa

-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH] Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2012-10-30 Thread Teresa Johnson
On Tue, Oct 30, 2012 at 2:33 PM, Steven Bosscher  wrote:
> On Tue, Oct 30, 2012 at 10:28 PM, Steven Bosscher wrote:
>> Hello Teresa,
>>
>> Could you try this patch for me also? It moves bbpart outside the part
>> of the passes pipeline that works in cfglayout mode.
>
> where's the "unsend" button if you need it...
>
> So, to complete the mail...
>
> Could you try this patch for me also? It moves bbpart outside the part
> of the passes pipeline that works in cfglayout mode. It looks like
> when someone (/me looks the other way) changed the compiler to work
> like that, he/she forgot about updating this pass...

Sure, I will give this a try after your verification patch tests
complete. Does this mean that the patch you posted above to
force_nonfallthru_and_redirect is no longer needed either? I'll see if
I can avoid the need for some of my fixes, although I believe at least
the function.c one will still be needed. I'll check.

Regarding your earlier question about why we needed to add the
barrier, I need to dig up the details again but essentially I found
that the barriers were being added by bbpart, but bbro was reordering
things and the block that ended up at the border between the hot and
cold section didn't necessarily have a barrier on it because it was
not previously at the region boundary.


>
> Instead, the pass should run just before register allocation, as late
> as possible so that any funny CFG modifications have taken place. A
> possible down-side is that profile info may have degenerated a bit
> further, but I don't think that's a serious concern because the
> partitioning is actually quite stupid: Just stuff all blocks with
> count==0 into the cold section. Updating 0-counts is easy enough that
> I think GCC should get that right everywhere.
>
> It'd be nice to figure out if there are less stupid^Wsimplistic
> heuristics to decide what should go into the cold partition, for GCC
> 4.9...

Yep, that is one of my goals in looking at this - I am hoping to use
the new fdo summary info I added to tune this into a more robust
decision based on the counter values in the working set summary.

Thanks for the help!
Teresa

>
> Ciao!
> Steven
>
>
> * passes.c (init_optimization_passes): Move
> pass_partition_blocks just before RA.
>
> Index: passes.c
> ===
> --- passes.c(revision 192995)
> +++ passes.c(working copy)
> @@ -1595,7 +1595,6 @@ init_optimization_passes (void)
>NEXT_PASS (pass_ud_rtl_dce);
>NEXT_PASS (pass_combine);
>NEXT_PASS (pass_if_after_combine);
> -  NEXT_PASS (pass_partition_blocks);
>NEXT_PASS (pass_regmove);
>NEXT_PASS (pass_outof_cfg_layout_mode);
>NEXT_PASS (pass_split_all_insns);
> @@ -1606,6 +1605,7 @@ init_optimization_passes (void)
>NEXT_PASS (pass_match_asm_constraints);
>NEXT_PASS (pass_sms);
>NEXT_PASS (pass_sched);
> +  NEXT_PASS (pass_partition_blocks);
>NEXT_PASS (pass_ira);
>NEXT_PASS (pass_reload);
>NEXT_PASS (pass_postreload);



--
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH] Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2012-10-30 Thread Steven Bosscher
On Tue, Oct 30, 2012 at 10:28 PM, Steven Bosscher wrote:
> Hello Teresa,
>
> Could you try this patch for me also? It moves bbpart outside the part
> of the passes pipeline that works in cfglayout mode.

where's the "unsend" button if you need it...

So, to complete the mail...

Could you try this patch for me also? It moves bbpart outside the part
of the passes pipeline that works in cfglayout mode. It looks like
when someone (/me looks the other way) changed the compiler to work
like that, he/she forgot about updating this pass...

Instead, the pass should run just before register allocation, as late
as possible so that any funny CFG modifications have taken place. A
possible down-side is that profile info may have degenerated a bit
further, but I don't think that's a serious concern because the
partitioning is actually quite stupid: Just stuff all blocks with
count==0 into the cold section. Updating 0-counts is easy enough that
I think GCC should get that right everywhere.

It'd be nice to figure out if there are less stupid^Wsimplistic
heuristics to decide what should go into the cold partition, for GCC
4.9...

Ciao!
Steven


* passes.c (init_optimization_passes): Move
pass_partition_blocks just before RA.

Index: passes.c
===
--- passes.c(revision 192995)
+++ passes.c(working copy)
@@ -1595,7 +1595,6 @@ init_optimization_passes (void)
   NEXT_PASS (pass_ud_rtl_dce);
   NEXT_PASS (pass_combine);
   NEXT_PASS (pass_if_after_combine);
-  NEXT_PASS (pass_partition_blocks);
   NEXT_PASS (pass_regmove);
   NEXT_PASS (pass_outof_cfg_layout_mode);
   NEXT_PASS (pass_split_all_insns);
@@ -1606,6 +1605,7 @@ init_optimization_passes (void)
   NEXT_PASS (pass_match_asm_constraints);
   NEXT_PASS (pass_sms);
   NEXT_PASS (pass_sched);
+  NEXT_PASS (pass_partition_blocks);
   NEXT_PASS (pass_ira);
   NEXT_PASS (pass_reload);
   NEXT_PASS (pass_postreload);


Re: [PATCH] Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2012-10-30 Thread Steven Bosscher
Hello Teresa,

Could you try this patch for me also? It moves bbpart outside the part
of the passes pipeline that works in cfglayout mode.


Re: [PATCH] Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2012-10-30 Thread Steven Bosscher
On Tue, Oct 30, 2012 at 6:48 PM, Steven Bosscher  wrote:
> On Tue, Oct 30, 2012 at 6:20 AM, Teresa Johnson wrote:
>> Index: bb-reorder.c
>> ===
>> --- bb-reorder.c(revision 192692)
>> +++ bb-reorder.c(working copy)
>> @@ -2188,6 +2188,8 @@ insert_section_boundary_note (void)
>> first_partition = BB_PARTITION (bb);
>>if (BB_PARTITION (bb) != first_partition)
>> {
>> +  /* There should be a barrier between text sections. */
>> +  emit_barrier_after (BB_END (bb->prev_bb));
>
> So why isn't there one? There can't be a fall-through edge from one
> section to the other, so cfgrtl.c:fixup_reorder_chain should have
> added a barrier here already in the code under the comment:
>
>   /* Now add jumps and labels as needed to match the blocks new
>  outgoing edges.  */
>
> Why isn't it doing that for you?

Maybe it's because fix_up_fall_thru_edges calls force_nonfallthru,
which is incorrectly inserting JUMP_INSNs and BARRIERs in cfglayout
mode. I'm going to test this patch:

Index: cfgrtl.c
===
--- cfgrtl.c(revision 192889)
+++ cfgrtl.c(working copy)
@@ -1511,16 +1511,17 @@ force_nonfallthru_and_redirect (edge e,
 #endif
}
   set_return_jump_label (BB_END (jump_block));
+  emit_barrier_after (BB_END (jump_block));
 }
-  else
+  else if (current_ir_type () == IR_RTL_CFGRTL)
 {
   rtx label = block_label (target);
   emit_jump_insn_after_setloc (gen_jump (label), BB_END (jump_block), loc);
   JUMP_LABEL (BB_END (jump_block)) = label;
   LABEL_NUSES (label)++;
+  emit_barrier_after (BB_END (jump_block));
 }

-  emit_barrier_after (BB_END (jump_block));
   redirect_edge_succ_nodup (e, target);

   if (abnormal_edge_flags)


Re: [PATCH] Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2012-10-30 Thread Steven Bosscher
On Tue, Oct 30, 2012 at 6:20 AM, Teresa Johnson wrote:
> Index: bb-reorder.c
> ===
> --- bb-reorder.c(revision 192692)
> +++ bb-reorder.c(working copy)
> @@ -2188,6 +2188,8 @@ insert_section_boundary_note (void)
> first_partition = BB_PARTITION (bb);
>if (BB_PARTITION (bb) != first_partition)
> {
> +  /* There should be a barrier between text sections. */
> +  emit_barrier_after (BB_END (bb->prev_bb));

So why isn't there one? There can't be a fall-through edge from one
section to the other, so cfgrtl.c:fixup_reorder_chain should have
added a barrier here already in the code under the comment:

  /* Now add jumps and labels as needed to match the blocks new
 outgoing edges.  */

Why isn't it doing that for you?

BTW, something else I noted in cfgrtl.c:
NOTE_INSN_SWITCH_TEXT_SECTIONS shouldn't be copied in
duplicate_insn_chain, so the following is necessary for robustness:

Index: cfgrtl.c
===
--- cfgrtl.c(revision 191819)
+++ cfgrtl.c(working copy)
@@ -3615,7 +3615,6 @@
  break;

case NOTE_INSN_EPILOGUE_BEG:
-   case NOTE_INSN_SWITCH_TEXT_SECTIONS:
  emit_note_copy (insn);
  break;


There can be only one! One note to rule them all! etc.


> Index: cfgrtl.c
> ===
> --- cfgrtl.c(revision 192692)
> +++ cfgrtl.c(working copy)
> @@ -912,7 +912,8 @@ rtl_can_merge_blocks (basic_block a, basic_block b
>   partition boundaries).  See  the comments at the top of
>   bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>
> -  if (BB_PARTITION (a) != BB_PARTITION (b))
> +  if (find_reg_note (BB_END (a), REG_CROSSING_JUMP, NULL_RTX)
> +  || BB_PARTITION (a) != BB_PARTITION (b))
>  return false;

My dislike for this whole scheme just continues to grow... How can
there be a REG_CROSSING_JUMP note if BB_PARTITION(a)==BB_PARTITION(b)?
That is a bug. We should not need the notes here.

As long as we have the CFG, BB_PARTITION(a)==BB_PARTITION(b) should be
the canonical way to check whether two blocks are in the same
partition, and the EDGE_CROSSING flag should be set iff an edge
crosses from one section to another. The REG_CROSSING_JUMP note should
only be used to see if a JUMP_INSN may jump to another section,
without having to check all successor edges.

Any place where we have to check the BB_PARTITION or
edge->flags&EDGE_CROSSING *and* REG_CROSSING_JUMP indicates a bug in
the partitioning updating.

Another BTW: sched-vis.c doesn't handle REG_CROSSING_JUMP notes so
that slim RTL dumping breaks. I need this patchlet to make things
work:
Index: sched-vis.c
===
--- sched-vis.c (revision 191819)
+++ sched-vis.c (working copy)
@@ -553,6 +553,11 @@
 {
   char t1[BUF_LEN], t2[BUF_LEN], t3[BUF_LEN];

+  if (! x)
+{
+  sprintf (buf, "(nil)");
+  return;
+}
   switch (GET_CODE (x))
 {
 case SET:

Ciao!
Steven


Re: [PATCH] Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2012-10-30 Thread Steven Bosscher
On Tue, Oct 30, 2012 at 8:49 AM, Matthew Gretton-Dann wrote:
> On 30 October 2012 05:20, Teresa Johnson wrote:
>> Index: cfgrtl.c
>> ===
>> --- cfgrtl.c(revision 192692)
>> +++ cfgrtl.c(working copy)
>> @@ -912,7 +912,8 @@ rtl_can_merge_blocks (basic_block a, basic_block b
>>   partition boundaries).  See  the comments at the top of
>>   bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>>
>> -  if (BB_PARTITION (a) != BB_PARTITION (b))
>> +  if (find_reg_note (BB_END (a), REG_CROSSING_JUMP, NULL_RTX)
>> +  || BB_PARTITION (a) != BB_PARTITION (b))
>>  return false;
>>
>>/* Protect the loop latches.  */
>> @@ -3978,7 +3979,8 @@ cfg_layout_can_merge_blocks_p (basic_block a, basi
>>   partition boundaries).  See  the comments at the top of
>>   bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>>
>> -  if (BB_PARTITION (a) != BB_PARTITION (b))
>> +  if (find_reg_note (BB_END (a), REG_CROSSING_JUMP, NULL_RTX)
>> +  || BB_PARTITION (a) != BB_PARTITION (b))
>>  return false;
>>
>>/* Protect the loop latches.  */
>
> As this if() condition seems to be the canonical way to detect being
> in a different partition should it be moved out into a query function,
> and all of cfgrtl.c updated to use it?

Not just in cfgrtl.c but for example also in ifcvt.c (which currently
only tests for notes, that's broken).

Ciao!
Steven


Re: [PATCH] Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2012-10-30 Thread Matthew Gretton-Dann
On 30 October 2012 05:20, Teresa Johnson  wrote:
> Index: cfgrtl.c
> ===
> --- cfgrtl.c(revision 192692)
> +++ cfgrtl.c(working copy)
> @@ -912,7 +912,8 @@ rtl_can_merge_blocks (basic_block a, basic_block b
>   partition boundaries).  See  the comments at the top of
>   bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>
> -  if (BB_PARTITION (a) != BB_PARTITION (b))
> +  if (find_reg_note (BB_END (a), REG_CROSSING_JUMP, NULL_RTX)
> +  || BB_PARTITION (a) != BB_PARTITION (b))
>  return false;
>
>/* Protect the loop latches.  */
> @@ -3978,7 +3979,8 @@ cfg_layout_can_merge_blocks_p (basic_block a, basi
>   partition boundaries).  See  the comments at the top of
>   bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>
> -  if (BB_PARTITION (a) != BB_PARTITION (b))
> +  if (find_reg_note (BB_END (a), REG_CROSSING_JUMP, NULL_RTX)
> +  || BB_PARTITION (a) != BB_PARTITION (b))
>  return false;
>
>/* Protect the loop latches.  */

As this if() condition seems to be the canonical way to detect being
in a different partition should it be moved out into a query function,
and all of cfgrtl.c updated to use it?

[Note I am not a maintainer and so can't approve/reject your patch].

Thanks,

Matt

-- 
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-d...@linaro.org


[PATCH] Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2012-10-29 Thread Teresa Johnson
This patch fixes three different failures I encountered while trying to use
-freorder-blocks-and-partition, including the failure reported in PR 53743.

Bootstrapped and tested on x86_64-unknown-linux-gnu. Ok for trunk?

Thanks,
Teresa

2012-10-29  Teresa Johnson  

PR optimization/53743
* function.c (thread_prologue_and_epilogue_insns): Don't
store exit predecessor BB until after it is potentially split.
* bb-reorder.c (insert_section_boundary_note): Ensure that
a barrier exists before a switch section node, as this is expected
by later passes (e.g. dwarf CFI code).
* cfgrtl.c (rtl_can_merge_blocks): Use the same condition looking
for region-crossing jumps as in try_redirect_by_replacing_jump,
which may be called while merging blocks.
(cfg_layout_can_merge_blocks_p): Ditto.

Index: function.c
===
--- function.c  (revision 192692)
+++ function.c  (working copy)
@@ -6517,7 +6517,7 @@ epilogue_done:
   basic_block simple_return_block_cold = NULL;
   edge pending_edge_hot = NULL;
   edge pending_edge_cold = NULL;
-  basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb;
+  basic_block exit_pred;
   int i;
 
   gcc_assert (entry_edge != orig_entry_edge);
@@ -6545,6 +6545,12 @@ epilogue_done:
else
  pending_edge_cold = e;
  }
+  
+  /* Save a pointer to the exit's predecessor BB for use in
+ inserting new BBs at the end of the function. Do this
+ after the call to split_block above which may split
+ the original exit pred.  */
+  exit_pred = EXIT_BLOCK_PTR->prev_bb;
 
   FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e)
{
Index: bb-reorder.c
===
--- bb-reorder.c(revision 192692)
+++ bb-reorder.c(working copy)
@@ -2188,6 +2188,8 @@ insert_section_boundary_note (void)
first_partition = BB_PARTITION (bb);
   if (BB_PARTITION (bb) != first_partition)
{
+  /* There should be a barrier between text sections. */
+  emit_barrier_after (BB_END (bb->prev_bb));
  new_note = emit_note_before (NOTE_INSN_SWITCH_TEXT_SECTIONS,
   BB_HEAD (bb));
  /* ??? This kind of note always lives between basic blocks,
Index: cfgrtl.c
===
--- cfgrtl.c(revision 192692)
+++ cfgrtl.c(working copy)
@@ -912,7 +912,8 @@ rtl_can_merge_blocks (basic_block a, basic_block b
  partition boundaries).  See  the comments at the top of
  bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
 
-  if (BB_PARTITION (a) != BB_PARTITION (b))
+  if (find_reg_note (BB_END (a), REG_CROSSING_JUMP, NULL_RTX)
+  || BB_PARTITION (a) != BB_PARTITION (b))
 return false;
 
   /* Protect the loop latches.  */
@@ -3978,7 +3979,8 @@ cfg_layout_can_merge_blocks_p (basic_block a, basi
  partition boundaries).  See  the comments at the top of
  bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
 
-  if (BB_PARTITION (a) != BB_PARTITION (b))
+  if (find_reg_note (BB_END (a), REG_CROSSING_JUMP, NULL_RTX)
+  || BB_PARTITION (a) != BB_PARTITION (b))
 return false;
 
   /* Protect the loop latches.  */

--
This patch is available for review at http://codereview.appspot.com/6823047