Sure, I'll give an example of the performance gains that can be gained with
an ifcombine pass in the bugreport and cc you and Andrew Pinski on the
email. I'm here, extracting the example as the following demo, and I have
the following question:

// Way.h
struct waymapt
{
  int fillnum;
  int num;
};
typedef waymapt* waymappt;

class wayobj
{
public:
    int bound;
    int *maparp;
    waymappt waymap;
    int makebound2(int fillnum, int iters);
    void create(int size);
};


// WayInit.cpp
#include <cstring>
#include "Way.h"

void wayobj::create(int size)
{
  maparp=new int[size];
  waymap=new waymapt[size];
}


// Way.cpp
#include "Way.h"
int wayobj::makebound2(int fillnum, int iters)
{
  for (int i = 0; i < iters; i++)
  {
    if (waymap[i].fillnum!=fillnum)
      if (maparp[i]!=0)
        bound++;
  }
  return bound;
}


// main.cpp
#include <cstdio>
#include "Way.h"

#define SIZE 16
int main() {
  wayobj *woj = new wayobj;
  woj->create(SIZE);
  woj->makebound2(10, SIZE);

  printf("bound == %d\n", woj->bound);
  return 0;
}

Both maparp and waymap of wayobj are arrays, and in the function
wayobj::create, the same number of elements are allocated for both arrays.
In function makebound2, if (maparp[i]! = 0) is within if
(waymap[i].fillnum!=fillnum). So if "if (waymap[i].fillnum!=fillnum)" won't
trap, then "if (maparp[i]! = 0)" won't trap either. Is there a way we can
tell that the inner branch won't trap by the relationship between the outer
branch and the inner branch?  If this is possible then ifcombine pass can
do a merge on this nested branch. In spec2006's 473.astar program, this
nested branch is a hotspot with poor prediction accuracy, so the
performance improvement after merging this nested branches is very
significant. Icc merges this nested branch, but gcc doesn't.

Richard Biener <rguent...@suse.de> 于2025年7月21日周一 15:02写道:

> On Sat, 19 Jul 2025, Andrew Pinski wrote:
>
> > On Sat, Jul 19, 2025 at 8:41 PM ywgrit via Gcc <gcc@gcc.gnu.org> wrote:
> > >
> > > I've tested merging for nested branches on icc, and it seems that icc
> does
> > > a branch merge for code that might trap, making a more aggressive
> > > optimization.
> >
> > So it is not exactly it might trap but rather it is part of a bigger
> > struct and it is an adjacent location.
> > We do some of this already in phiopt see hoist_adjacent_loads in
> > tree-ssa-phiopt.cc which handles a similar but different case.
> > Added here originally
> > (
> https://inbox.sourceware.org/gcc-patches/1336055636.22269.16.camel@gnopaine/
> ).
> > Seems like a similar code could be done for ifcombine. I thought I saw
> > some improvements dealing with load handling that happened for GCC 15.
>
> Not for this particular case I think, though we could implement
> this as
>
> *(unsigned long *)(&waymap[i].fillnum) != (unsigned long)fillnum << 32
>
> this is what the enhancements were about.  Can you open a bugreport
> for tracking that ifcombine might benefit from a non-trap enhancement
> like phiopt has for adjacent load hosting?
>
> Thanks,
> Richard.
>
> >
> > Thanks,
> > Andrew
> >
> >
> >
> > > Way_.cpp
> > > struct waymapt
> > > {
> > >   int fillnum;
> > >   int num;
> > > };
> > > typedef waymapt* waymappt;
> > >
> > > class wayobj
> > > {
> > > public:
> > >     int boundl;
> > >     waymappt waymap;    int makebound2(int fillnum, int iters);
> > > };
> > >
> > > int wayobj::makebound2(int fillnum, int iters)
> > > {
> > >   for (int i = 0; i < iters; i++)
> > >   {
> > >     if (waymap[i].fillnum!=fillnum)
> > >       if (waymap[i].num!=0)
> > >         boundl++;
> > >   }
> > >   return boundl;
> > > }
> > >
> > > compile commandline
> > > icpc -c -o Way_.o -g -O3 Way_.cpp
> > >
> > > The instructions generated
> > > cmp    (%r11,%r9,1),%esi
> > > setne  %bpl
> > > xor    %ecx,%ecx
> > > cmpl   $0x0,0x4(%r11,%r9,1)
> > > setne  %cl
> > > and    %ecx,%ebp
> > > cmp    $0x1,%ebp
> > > jne    49 <_ZN6wayobj10makebound2Eii+0x49>
> > >
> > > ywgrit <yw987194...@gmail.com> 于2025年7月20日周日 11:11写道:
> > >
> > > > Can we add a -merge-branch option to merge branch bbs when the
> programmer
> > > > can ensure that the inner branch bb will not trap?
> > > > Also, the current ifcombine pass can only merge very simple nested
> > > > branches, and if statements usually generate multiple gimple
> statements, so
> > > > a lot of merge opportunities are lost. For example, the hotspot
> function in
> > > > speccpu 2006's 473.astar program contains two nested branches, we
> did an
> > > > experiment with the environment:gcc-12.3.0, linux 5.15.0, intel core
> > > > i7-10750h, and after the experiment, compared to generating two
> branch
> > > > instructions, if the nested branches of the hotspot function are
> compiled
> > > > into one branch instruction. There will be a 30% improvement in
> performance.
> > > > If there are indirect accesses in the if statement, the branch
> prediction
> > > > is probably not accurate, so I think it's important to maximize the
> chances
> > > > of merging as much as possible, e.g. by adding a -merge-branch
> option as
> > > > described above.
> > > >
> > > > Richard Biener <rguent...@suse.de> 于2025年7月18日周五 22:37写道:
> > > >
> > > >> On Fri, 18 Jul 2025, ywgrit wrote:
> > > >>
> > > >> > For now, if combine pass can combine the simple nested comparison
> > > >> branches,
> > > >> > e.g.
> > > >> > if (a != b)
> > > >> >   if (c == d)
> > > >> > These cond bbs must have only the conditional, which is too harsh.
> > > >> >
> > > >> > We often meet code like this:
> > > >> > if (a != b)
> > > >> >   if (m[index] == k[index])
> > > >> > m and c are arrays, so the 2nd branch belongs to a bb that has
> mem_ref
> > > >> > gimples and these stmts could trap. So these stmts won't pass the
> > > >> > bb_no_side_effects_p check, the branches can't be merged and
> performance
> > > >> > gains are lost, what are the way to merge these branches bb?
> > > >> > I think there are extremely many such nested branches and
> probably the
> > > >> > prediction accuracy of such nested branches is not very high, so
> doing
> > > >> > branch merging will  result in high performance gain.
> > > >>
> > > >> Without actual data I do not believe such general claim.  But the
> issue
> > > >> is that we cannot speculate the loads from m[index] or k[index] when
> > > >> they might trap, so there is no way to merge the branches.
> > > >>
> > > >> Intel APX introduces conditional moves that hide traps, so with that
> > > >> you could do
> > > >>
> > > >>  flag = a != b;
> > > >>  cmov<flag> m[index], reg1
> > > >>  cmov<flag> k[index], reg2
> > > >>  if (flag && reg1 == reg2)
> > > >>
> > > >> but there is no way to do this in ifcombine on GIMPLE.  It would
> > > >> also be slower in case if (a != b) is well predicted and mostly
> > > >> false.
> > > >>
> > > >> Richard.
> > > >>
> > > >
> >
>
> --
> Richard Biener <rguent...@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to