[Bug middle-end/81914] [7/8 Regression] gcc 7.1 generates branch for code which was branchless in earlier gcc version
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81914 --- Comment #11 from Jakub Jelinek --- Author: jakub Date: Tue Dec 19 16:43:04 2017 New Revision: 255829 URL: https://gcc.gnu.org/viewcvs?rev=255829=gcc=rev Log: PR middle-end/81914 * predict.c (zero_one_minusone): New function. (apply_return_prediction): Avoid return prediction for functions returning only -1, 0 and 1 values, unless they only return -1 and 0 or 0 and 1. Modified: trunk/gcc/ChangeLog trunk/gcc/predict.c
[Bug middle-end/81914] [7/8 Regression] gcc 7.1 generates branch for code which was branchless in earlier gcc version
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81914 --- Comment #10 from Marc Glisse --- For the particular case of <=> (-1, 0 or 1), I've seen code like (a>b)-(a
[Bug middle-end/81914] [7/8 Regression] gcc 7.1 generates branch for code which was branchless in earlier gcc version
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81914 Richard Biener changed: What|Removed |Added Keywords||missed-optimization Priority|P3 |P2
[Bug middle-end/81914] [7/8 Regression] gcc 7.1 generates branch for code which was branchless in earlier gcc version
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81914 --- Comment #9 from Daniel Fruzynski--- In the meantime I found another case when gcc 7 inserts lots of jumps. I am not sure if your extra test cases covers it too: #include int test(int data1[9][9], int data2[9][9]) { uint64_t b1 = 0, b2 = 0; for (int n = 0; n < 9; ++n) { for (int k = 0; k < 9; ++k) { int a = data1[n][k] * 9 + data2[n][k]; (a < 64 ? b1 : b2) |= 1 << (a & 63); } } return __builtin_popcount(b1) + __builtin_popcount(b2); }
[Bug middle-end/81914] [7/8 Regression] gcc 7.1 generates branch for code which was branchless in earlier gcc version
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81914 --- Comment #8 from Jakub Jelinek --- Created attachment 42856 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42856=edit gcc8-pr81914.patch Untested patch that handles all the cases in the #c7 testcase. Though it will not handle some comparison functions in GCC (e.g. those which sometimes just return a - b or similar). If it doesn't look like a completely dumb approach, we probably should test it on SPEC (unfortunately I don't have a reasonable setup for that right now).
[Bug middle-end/81914] [7/8 Regression] gcc 7.1 generates branch for code which was branchless in earlier gcc version
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81914 --- Comment #7 from Jakub Jelinek --- Larger testcase with more cases. Some of them, e.g. in f3, are predicted roughly reasonably, but the > 95% predictions are just wrong in these cases. int f1 (long long a, long long b) { return a < b ? -1 : a > b; } int f2 (int a, int b) { if (a < b) return -1; if (a > b) return 1; return 0; } int f3 (int *a, int *b) { if (a[0] != b[0]) return a[0] > b[0] ? 1 : -1; if (a[1] != b[1]) return a[1] > b[1] ? 1 : -1; return 0; } int f4 (int *a, int *b) { if (a[0] > b[0]) return -1; if (a[0] < b[0]) return 1; if (a[1] > b[1]) return -1; if (a[1] < b[1]) return 1; if (a[2] > b[2]) return -1; if (a[2] < b[2]) return 1; return 0; }
[Bug middle-end/81914] [7/8 Regression] gcc 7.1 generates branch for code which was branchless in earlier gcc version
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81914 --- Comment #6 from Jakub Jelinek --- Another testcase, which has even higher prediction of not returning -1: int cmp (int a, int b) { if (a < b) return -1; if (a > b) return 1; return 0; } In the #c0 case, we have in the IL: : if (a_3(D) >= b_4(D)) goto ; [INV] else goto ; [INV] : _1 = a_3(D) > b_4(D); iftmp.0_6 = (int) _1; : # iftmp.0_2 = PHIreturn iftmp.0_2; while in this one: : if (a_2(D) < b_3(D)) goto ; else goto ; : if (a_2(D) > b_3(D)) goto ; else goto ; : : # _1 = PHI <-1(2), 1(3), 0(4)> return _1; In any case, I think it is worth recognizing these patterns. We should also consider other examples, e.g. from gcc itself: static int oecount_cmp (const void *p1, const void *p2) { const oecount *c1 = (const oecount *)p1; const oecount *c2 = (const oecount *)p2; if (c1->cnt != c2->cnt) return c1->cnt > c2->cnt ? 1 : -1; else /* If counts are identical, use unique IDs to stabilize qsort. */ return c1->id > c2->id ? 1 : -1; } static int tm_alias_pair_cmp (const void *x, const void *y) { const tm_alias_pair *p1 = (const tm_alias_pair *) x; const tm_alias_pair *p2 = (const tm_alias_pair *) y; if (p1->uid < p2->uid) return -1; if (p1->uid > p2->uid) return 1; return 0; } int type_warning_cmp (const void *p1, const void *p2) { const odr_type_warn_count *t1 = (const odr_type_warn_count *)p1; const odr_type_warn_count *t2 = (const odr_type_warn_count *)p2; if (t1->dyn_count < t2->dyn_count) return 1; if (t1->dyn_count > t2->dyn_count) return -1; return t2->count - t1->count; } static int stack_var_cmp (const void *a, const void *b) { size_t ia = *(const size_t *)a; size_t ib = *(const size_t *)b; unsigned int aligna = stack_vars[ia].alignb; unsigned int alignb = stack_vars[ib].alignb; HOST_WIDE_INT sizea = stack_vars[ia].size; HOST_WIDE_INT sizeb = stack_vars[ib].size; tree decla = stack_vars[ia].decl; tree declb = stack_vars[ib].decl; bool largea, largeb; unsigned int uida, uidb; /* Primary compare on "large" alignment. Large comes first. */ largea = (aligna * BITS_PER_UNIT > MAX_SUPPORTED_STACK_ALIGNMENT); largeb = (alignb * BITS_PER_UNIT > MAX_SUPPORTED_STACK_ALIGNMENT); if (largea != largeb) return (int)largeb - (int)largea; /* Secondary compare on size, decreasing */ if (sizea > sizeb) return -1; if (sizea < sizeb) return 1; /* Tertiary compare on true alignment, decreasing. */ if (aligna < alignb) return -1; if (aligna > alignb) return 1; /* Final compare on ID for sort stability, increasing. Two SSA names are compared by their version, SSA names come before non-SSA names, and two normal decls are compared by their DECL_UID. */ if (TREE_CODE (decla) == SSA_NAME) { if (TREE_CODE (declb) == SSA_NAME) uida = SSA_NAME_VERSION (decla), uidb = SSA_NAME_VERSION (declb); else return -1; } else if (TREE_CODE (declb) == SSA_NAME) return 1; else uida = DECL_UID (decla), uidb = DECL_UID (declb); if (uida < uidb) return 1; if (uida > uidb) return -1; return 0; } etc. Perhaps catching everything is too hard for heuristics, but at least figuring a pattern of these <=> comparisons, or consecutive comparisons like: if (uida < uidb) return 1; if (uida > uidb) return -1; is highly desirable.
[Bug middle-end/81914] [7/8 Regression] gcc 7.1 generates branch for code which was branchless in earlier gcc version
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81914 --- Comment #5 from Jakub Jelinek --- Oops, it is most likely the PRED_NEGATIVE_RETURN stuff instead.
[Bug middle-end/81914] [7/8 Regression] gcc 7.1 generates branch for code which was branchless in earlier gcc version
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81914 --- Comment #4 from Jakub Jelinek --- Predictions for bb 2 DS theory heuristics: 98.0% combined heuristics: 98.0% negative return heuristics of edge 2->4: 2.0% Predictions for bb 3 1 edges in bb 3 predicted to even probabilities Predictions for bb 4 1 edges in bb 4 predicted to even probabilities cmp (long long int a, long long int b) { _Bool _1; int iftmp.0_2; int iftmp.0_6; [local count: 1073741825]: if (a_3(D) >= b_4(D)) goto ; [98.00%] else goto ; [2.00%] Honza, I think it would be worth to recognize idioms like this (<=> operator) and don't apply Dempster-Shaffer in that case, but rather predict either equal probabilities for <, == and > (like before), or perhaps even somewhat smaller probability for the == case and slightly higher for < or >.
[Bug middle-end/81914] [7/8 Regression] gcc 7.1 generates branch for code which was branchless in earlier gcc version
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81914 --- Comment #3 from Daniel Fruzynski--- Yes, branchless version is faster. Here are results for code compiled with gcc 4.8.5: BenchmarkTime CPU Iterations -- BM_memcmp6 ns 6 ns 111001949 BM_int64cmp 4 ns 4 ns 183761752 And here for gcc 7.1.0: BenchmarkTime CPU Iterations -- BM_memcmp6 ns 6 ns 113198940 BM_int64cmp 8 ns 8 ns 82036754 Note: Code tested accepted values passed via pointer instead of value, to better compare it with memcmp function. Functions were comparing random values, generated before tests and stored in arrays. srand was called with constant value to get repeatable results.
[Bug middle-end/81914] [7/8 Regression] gcc 7.1 generates branch for code which was branchless in earlier gcc version
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81914 Richard Biener changed: What|Removed |Added Target||x86_64-*-*, i?86-*-* Target Milestone|--- |7.3 --- Comment #2 from Richard Biener --- But are you sure the branchless variant is faster?
[Bug middle-end/81914] [7/8 Regression] gcc 7.1 generates branch for code which was branchless in earlier gcc version
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81914 Marek Polacek changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2017-08-21 CC||mpolacek at gcc dot gnu.org Component|c |middle-end Summary|gcc 7.1 generates branch|[7/8 Regression] gcc 7.1 |for code which was |generates branch for code |branchless in earlier gcc |which was branchless in |version |earlier gcc version Ever confirmed|0 |1 --- Comment #1 from Marek Polacek --- Started with r237185.