[Bug middle-end/81914] [7/8 Regression] gcc 7.1 generates branch for code which was branchless in earlier gcc version

2017-12-19 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81914

--- Comment #11 from Jakub Jelinek  ---
Author: jakub
Date: Tue Dec 19 16:43:04 2017
New Revision: 255829

URL: https://gcc.gnu.org/viewcvs?rev=255829=gcc=rev
Log:
PR middle-end/81914
* predict.c (zero_one_minusone): New function.
(apply_return_prediction): Avoid return prediction for functions
returning only -1, 0 and 1 values, unless they only return -1 and 0
or 0 and 1.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/predict.c

[Bug middle-end/81914] [7/8 Regression] gcc 7.1 generates branch for code which was branchless in earlier gcc version

2017-12-16 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81914

--- Comment #10 from Marc Glisse  ---
For the particular case of <=> (-1, 0 or 1), I've seen code like (a>b)-(a

[Bug middle-end/81914] [7/8 Regression] gcc 7.1 generates branch for code which was branchless in earlier gcc version

2017-12-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81914

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
   Priority|P3  |P2

[Bug middle-end/81914] [7/8 Regression] gcc 7.1 generates branch for code which was branchless in earlier gcc version

2017-12-12 Thread bugzi...@poradnik-webmastera.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81914

--- Comment #9 from Daniel Fruzynski  ---
In the meantime I found another case when gcc 7 inserts lots of jumps. I am not
sure if your extra test cases covers it too:

#include 

int test(int data1[9][9], int data2[9][9])
{
  uint64_t b1 = 0, b2 = 0;
  for (int n = 0; n < 9; ++n)
  {
for (int k = 0; k < 9; ++k)
{
  int a = data1[n][k] * 9 + data2[n][k];
  (a < 64 ? b1 : b2) |= 1 << (a & 63);
}
  }
  return __builtin_popcount(b1) + __builtin_popcount(b2);
}

[Bug middle-end/81914] [7/8 Regression] gcc 7.1 generates branch for code which was branchless in earlier gcc version

2017-12-12 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81914

--- Comment #8 from Jakub Jelinek  ---
Created attachment 42856
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42856=edit
gcc8-pr81914.patch

Untested patch that handles all the cases in the #c7 testcase.  Though it will
not handle some comparison functions in GCC (e.g. those which sometimes just
return a - b or similar).
If it doesn't look like a completely dumb approach, we probably should test it
on SPEC (unfortunately I don't have a reasonable setup for that right now).

[Bug middle-end/81914] [7/8 Regression] gcc 7.1 generates branch for code which was branchless in earlier gcc version

2017-12-12 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81914

--- Comment #7 from Jakub Jelinek  ---
Larger testcase with more cases.  Some of them, e.g. in f3, are predicted
roughly reasonably, but the > 95% predictions are just wrong in these cases.

int
f1 (long long a, long long b)
{
  return a < b ? -1 : a > b;
}

int
f2 (int a, int b)
{
  if (a < b)
return -1;
  if (a > b)
return 1;
  return 0;
}

int
f3 (int *a, int *b)
{
  if (a[0] != b[0])
return a[0] > b[0] ? 1 : -1;
  if (a[1] != b[1])
return a[1] > b[1] ? 1 : -1;
  return 0;
}

int
f4 (int *a, int *b)
{
  if (a[0] > b[0])
return -1;
  if (a[0] < b[0])
return 1;

  if (a[1] > b[1])
return -1;
  if (a[1] < b[1])
return 1;

  if (a[2] > b[2])
return -1;
  if (a[2] < b[2])
return 1;
  return 0;
}

[Bug middle-end/81914] [7/8 Regression] gcc 7.1 generates branch for code which was branchless in earlier gcc version

2017-12-12 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81914

--- Comment #6 from Jakub Jelinek  ---
Another testcase, which has even higher prediction of not returning -1:
int
cmp (int a, int b)
{
  if (a < b)
return -1;
  if (a > b)
return 1;
  return 0;
}

In the #c0 case, we have in the IL:
   :
  if (a_3(D) >= b_4(D))
goto ; [INV]
  else
goto ; [INV]

   :
  _1 = a_3(D) > b_4(D);
  iftmp.0_6 = (int) _1;

   :
  # iftmp.0_2 = PHI 
  return iftmp.0_2;
while in this one:
  :
  if (a_2(D) < b_3(D))
goto ;
  else
goto ;

  :
  if (a_2(D) > b_3(D))
goto ;
  else
goto ;

  :

  :
  # _1 = PHI <-1(2), 1(3), 0(4)>
  return _1;
In any case, I think it is worth recognizing these patterns.
We should also consider other examples, e.g. from gcc itself:
static int
oecount_cmp (const void *p1, const void *p2)
{
  const oecount *c1 = (const oecount *)p1;
  const oecount *c2 = (const oecount *)p2;
  if (c1->cnt != c2->cnt)
return c1->cnt > c2->cnt ? 1 : -1;
  else
/* If counts are identical, use unique IDs to stabilize qsort.  */
return c1->id > c2->id ? 1 : -1;
}
static int
tm_alias_pair_cmp (const void *x, const void *y)
{
  const tm_alias_pair *p1 = (const tm_alias_pair *) x;
  const tm_alias_pair *p2 = (const tm_alias_pair *) y;
  if (p1->uid < p2->uid)
return -1;
  if (p1->uid > p2->uid)
return 1;
  return 0;
}
int
type_warning_cmp (const void *p1, const void *p2)
{
  const odr_type_warn_count *t1 = (const odr_type_warn_count *)p1;
  const odr_type_warn_count *t2 = (const odr_type_warn_count *)p2;

  if (t1->dyn_count < t2->dyn_count)
   return 1;
  if (t1->dyn_count > t2->dyn_count)
   return -1;
  return t2->count - t1->count;
}
static int
stack_var_cmp (const void *a, const void *b)
{
  size_t ia = *(const size_t *)a;
  size_t ib = *(const size_t *)b;
  unsigned int aligna = stack_vars[ia].alignb;
  unsigned int alignb = stack_vars[ib].alignb;
  HOST_WIDE_INT sizea = stack_vars[ia].size;
  HOST_WIDE_INT sizeb = stack_vars[ib].size;
  tree decla = stack_vars[ia].decl;
  tree declb = stack_vars[ib].decl;
  bool largea, largeb;
  unsigned int uida, uidb;

  /* Primary compare on "large" alignment.  Large comes first.  */
  largea = (aligna * BITS_PER_UNIT > MAX_SUPPORTED_STACK_ALIGNMENT);
  largeb = (alignb * BITS_PER_UNIT > MAX_SUPPORTED_STACK_ALIGNMENT);
  if (largea != largeb)
return (int)largeb - (int)largea;

  /* Secondary compare on size, decreasing  */
  if (sizea > sizeb)
return -1;
  if (sizea < sizeb)
return 1;

  /* Tertiary compare on true alignment, decreasing.  */
  if (aligna < alignb)
return -1;
  if (aligna > alignb)
return 1;

  /* Final compare on ID for sort stability, increasing.
 Two SSA names are compared by their version, SSA names come before
 non-SSA names, and two normal decls are compared by their DECL_UID.  */
  if (TREE_CODE (decla) == SSA_NAME)
{
  if (TREE_CODE (declb) == SSA_NAME)
uida = SSA_NAME_VERSION (decla), uidb = SSA_NAME_VERSION (declb);
  else
return -1;
}
  else if (TREE_CODE (declb) == SSA_NAME)
return 1;
  else
uida = DECL_UID (decla), uidb = DECL_UID (declb);
  if (uida < uidb)
return 1;
  if (uida > uidb)
return -1;
  return 0;
}

etc.  Perhaps catching everything is too hard for heuristics, but at least
figuring a pattern of these <=> comparisons, or consecutive comparisons like:
  if (uida < uidb)
return 1;
  if (uida > uidb)
return -1;
is highly desirable.

[Bug middle-end/81914] [7/8 Regression] gcc 7.1 generates branch for code which was branchless in earlier gcc version

2017-12-12 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81914

--- Comment #5 from Jakub Jelinek  ---
Oops, it is most likely the PRED_NEGATIVE_RETURN stuff instead.

[Bug middle-end/81914] [7/8 Regression] gcc 7.1 generates branch for code which was branchless in earlier gcc version

2017-12-12 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81914

--- Comment #4 from Jakub Jelinek  ---
Predictions for bb 2
  DS theory heuristics: 98.0%
  combined heuristics: 98.0%
  negative return heuristics of edge 2->4: 2.0%
Predictions for bb 3
1 edges in bb 3 predicted to even probabilities
Predictions for bb 4
1 edges in bb 4 predicted to even probabilities
cmp (long long int a, long long int b)
{
  _Bool _1;
  int iftmp.0_2;
  int iftmp.0_6;

   [local count: 1073741825]:
  if (a_3(D) >= b_4(D))
goto ; [98.00%]
  else
goto ; [2.00%]

Honza, I think it would be worth to recognize idioms like this (<=> operator)
and don't apply Dempster-Shaffer in that case, but rather predict either equal
probabilities for <, == and > (like before), or perhaps even somewhat smaller
probability for the == case and slightly higher for < or >.

[Bug middle-end/81914] [7/8 Regression] gcc 7.1 generates branch for code which was branchless in earlier gcc version

2017-08-21 Thread bugzi...@poradnik-webmastera.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81914

--- Comment #3 from Daniel Fruzynski  ---
Yes, branchless version is faster. Here are results for code compiled with gcc
4.8.5:

BenchmarkTime   CPU Iterations
--
BM_memcmp6 ns  6 ns  111001949
BM_int64cmp  4 ns  4 ns  183761752

And here for gcc 7.1.0:

BenchmarkTime   CPU Iterations
--
BM_memcmp6 ns  6 ns  113198940
BM_int64cmp  8 ns  8 ns   82036754

Note: Code tested accepted values passed via pointer instead of value, to
better compare it with memcmp function. Functions were comparing random values,
generated before tests and stored in arrays. srand was called with constant
value to get repeatable results.

[Bug middle-end/81914] [7/8 Regression] gcc 7.1 generates branch for code which was branchless in earlier gcc version

2017-08-21 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81914

Richard Biener  changed:

   What|Removed |Added

 Target||x86_64-*-*, i?86-*-*
   Target Milestone|--- |7.3

--- Comment #2 from Richard Biener  ---
But are you sure the branchless variant is faster?

[Bug middle-end/81914] [7/8 Regression] gcc 7.1 generates branch for code which was branchless in earlier gcc version

2017-08-21 Thread mpolacek at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81914

Marek Polacek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2017-08-21
 CC||mpolacek at gcc dot gnu.org
  Component|c   |middle-end
Summary|gcc 7.1 generates branch|[7/8 Regression] gcc 7.1
   |for code which was  |generates branch for code
   |branchless in earlier gcc   |which was branchless in
   |version |earlier gcc version
 Ever confirmed|0   |1

--- Comment #1 from Marek Polacek  ---
Started with r237185.