[Bug tree-optimization/103523] [11 Regression] vectorizable_induction generating code for modes without checking support for them by r11-7861-ge4180ab2f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103523 Joel Hutton changed: What|Removed |Added Status|REOPENED|RESOLVED Resolution|--- |FIXED --- Comment #11 from Joel Hutton --- Fixed on 11 and trunk.
[Bug bootstrap/103688] [11 regression] build error after r11-9380
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103688 Joel Hutton changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #3 from Joel Hutton --- Fixed on 11.
[Bug tree-optimization/103523] [11/12 Regression] vectorizable_induction generating code for modes without checking support for them by r11-7861-ge4180ab2f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103523 Joel Hutton changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #8 from Joel Hutton --- fixed on trunk
[Bug tree-optimization/103523] [11/12 Regression] SVE float auto-vect float format expand failure
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103523 Joel Hutton changed: What|Removed |Added Status|NEW |ASSIGNED --- Comment #4 from Joel Hutton --- introduced 26th March by e4180ab2f
[Bug target/99102] [11 Regression] SVE: Wrong code with -O2 -ftree-vectorize -march=armv8.2-a+sve -msve-vector-bits=256
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99102 Joel Hutton changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #6 from Joel Hutton --- Fixed on trunk.
[Bug target/99102] [11 Regression] SVE: Wrong code with -O2 -ftree-vectorize -march=armv8.2-a+sve -msve-vector-bits=256
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99102 --- Comment #4 from Joel Hutton --- It seems it is vectorizing a 'MASK_STORE' into a 'SCATTER_STORE' when it should be using a 'MASK_SCATTER_STORE'. Currently it's choosing between IFN_SCATTER_STORE and IFN_MASK_SCATTER_STORE based on the 'using_partial_vectors' field. 7729 vec_loop_masks *loop_masks 7730 = (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo) 7731? &LOOP_VINFO_MASKS (loop_vinfo) 7732: NULL); 7733 vec_loop_lens *loop_lens 7734 = (loop_vinfo && LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo) 7735? &LOOP_VINFO_LENS (loop_vinfo) 7736: NULL); 806 #define LOOP_VINFO_FULLY_MASKED_P(L)>--->---\$ 807 (LOOP_VINFO_USING_PARTIAL_VECTORS_P (L)>--\$ 808&& !LOOP_VINFO_MASKS (L).is_empty ())$ 809 $ 8005 if (memory_access_type == VMAT_GATHER_SCATTER) 8006 { 8007 tree scale = size_int (gs_info.scale); 8008 gcall *call; 8009 if (loop_masks) 8010 call = gimple_build_call_internal 8011 (IFN_MASK_SCATTER_STORE, 5, dataref_ptr, vec_offset, 8012scale, vec_oprnd, final_mask); 8013 else 8014 call = gimple_build_call_internal 8015 (IFN_SCATTER_STORE, 4, dataref_ptr, vec_offset, 8016scale, vec_oprnd); 8017 gimple_call_set_nothrow (call, true); 8018 vect_finish_stmt_generation (vinfo, stmt_info, call, gsi); 8019 new_stmt = call; 8020 break; 8021 } 8022 8023 if (i > 0) 8024 /* Bump the vector pointer. */ 8025 dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, ptr_incr, 8026gsi, stmt_info, bump); 8027 8028 if (slp) 8029 vec_oprnd = vec_oprnds[i];
[Bug target/99102] [11 Regression] SVE: Wrong code with -O2 -ftree-vectorize -march=armv8.2-a+sve -msve-vector-bits=256
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99102 --- Comment #3 from Joel Hutton --- Seems like the 'vect' pass is vectorizing a 'MASK_STORE' to a 'SCATTER_STORE' and ignoring the mask. 1292 testcase.c:7:29: note: vect_is_simple_use: operand 0, type of def: constant$ 1293 testcase.c:7:29: note: created new init_stmt: vect_cst__65 = { 0, 0, 0, 0 };$ 1294 testcase.c:7:29: note: add new stmt: mask__60.11_66 = vect_cst__64 == vect_cst__65;$ 1295 testcase.c:7:29: note: -->vectorizing statement: _61 = &a[_10];$ 1296 testcase.c:7:29: note: -->vectorizing statement: .MASK_STORE (_61, 128B, _60, _31);$ 1297 testcase.c:7:29: note: transform statement.$ 1298 testcase.c:7:29: note: vect_is_simple_use: operand l_23(D) == 0, type of def: internal$ 1299 testcase.c:7:29: note: vect_is_simple_use: vectype vector(4) $ 1300 testcase.c:7:29: note: vect_is_simple_use: operand (long intD.9) j_24(D), type of def: internal$ 1301 testcase.c:7:29: note: vect_is_simple_use: vectype vector(4) long int$ 1302 Applying pattern match.pd:139, generic-match.c:24056$ 1303 Applying pattern match.pd:139, generic-match.c:24056$ 1304 testcase.c:7:29: note: transform store. ncopies = 1$ 1305 testcase.c:7:29: note: vect_get_vec_defs_for_operand: _31$ 1306 testcase.c:7:29: note: vect_is_simple_use: operand (long intD.9) j_24(D), type of def: internal$ 1307 testcase.c:7:29: note:def_stmt = _31 = (long int) j_24(D);$ 1308 testcase.c:7:29: note: vect_get_vec_defs_for_operand: _60$ 1309 testcase.c:7:29: note: vect_is_simple_use: operand l_23(D) == 0, type of def: internal$ 1310 testcase.c:7:29: note:def_stmt = _60 = l_23(D) == 0;$ 1311 testcase.c:7:29: note: create integer_type-pointer variable to type: long int vectorizing a pointer ref: MEM[(long int *)&a]$ 1312 Applying pattern match.pd:139, generic-match.c:27580$ 1313 testcase.c:7:29: note: created &a$ 1314 testcase.c:7:29: note: add new stmt: .SCATTER_STORE (vectp_a.12_67, { 0, 32, 64, 96 }, 1, vect__31.10_63);$ before: 112[local count: 858993458]:$ 113 # i_39 = PHI $ 114 # ivtmp_16 = PHI $ 115 _1 = (int) i_39;$ 116 _2 = (long int) j_24(D);$ 117 _45 = l_23(D) == 0;$ 118 _46 = &a[_1];$ 119 .MASK_STORE (_46, 128B, _45, _2);$ 120 i.0_3 = (unsigned short) i_39;$ 121 _4 = i.0_3 + 4;$ 122 i_26 = (short int) _4;$ 123 ivtmp_7 = ivtmp_16 - 1;$ 124 if (ivtmp_7 != 0)$ 125 goto ; [75.00%]$ 126 else$ 127 goto ; [25.00%]$ 128 $ 129[local count: 644245087]:$ 130 goto ; [100.00%]$ 131 $ 132[local count: 214748368]:$ 133 h_22 = h_33 + 1;$ 134 if (_14 > h_22)$ 135 goto ; [89.00%]$ 136 else$ 137 goto ; [11.00%]$ after: 1525 # PT = null { D.3608 } (nonlocal)$ 1526 # ALIGN = 32, MISALIGN = 0$ 1527 # vectp_a.12_67 = PHI <&aD.3608(21), vectp_a.12_68(26)>$ 1528 # ivtmp_70 = PHI <0(21), ivtmp_71(26)>$ 1529 _10 = (intD.7) i_9;$ 1530 vect__31.10_63 = (vector(4) long intD.9) vect_cst__62;$ 1531 _31 = (long intD.9) j_24(D);$ 1532 mask__60.11_66 = vect_cst__64 == vect_cst__65;$ 1533 _60 = l_23(D) == 0;$ 1534 # PT = null { D.3608 } (nonlocal)$ 1535 _61 = &aD.3608[_10];$ 1536 # .MEM_69 = VDEF <.MEM_13>$ 1537 # USE = anything~$ 1538 # CLB = anything~$ 1539 .SCATTER_STORE (vectp_a.12_67, { 0, 32, 64, 96 }, 1, vect__31.10_63);$ 1540 # RANGE [0, 14] NONZERO 12$ 1541 i.0_27 = (unsigned short) i_9;$ 1542 # RANGE [4, 18] NONZERO 28$ 1543 _28 = i.0_27 + 4;$ 1544 # RANGE [4, 18] NONZERO 28$ 1545 i_29 = (short intD.18) _28;$ 1546 ivtmp_30 = ivtmp_11 - 1;$ 1547 # PT = null { D.3608 } (nonlocal)$ 1548 # ALIGN = 32, MISALIGN = 0$ 1549 vectp_a.12_68 = vectp_a.12_67 + 128;$ 1550 ivtmp_71 = ivtmp_70 + 1;$ 1551 if (ivtmp_71 < 1)$ 1552 goto ; [0.00%]$ 1553 else$ 1554 goto ; [100.00%]$ 1555 ;;succ: 26 [never (adjusted)] count:0 (estimated locally) (TRUE_VALUE,EXECUTABLE)$ 1556 ;;40 [always (adjusted)] count:214748371 (estimated locally) (FALSE_VALUE,EXECUTABLE)$ 1557 $ 1558 ;; basic block 26, loop depth 4, count 0 (estimated locally)$ 1559 ;;prev block 22, next block 40, flags: (NEW, VISITED)$ 1560 ;;pred: 22 [never (adjusted)] count:0 (estimated locally) (TRUE_VALUE,EXECUTABLE)$ 1561 goto ; [100.00%]$
[Bug target/99102] [11 Regression] SVE: Wrong code with -O2 -ftree-vectorize -march=armv8.2-a+sve -msve-vector-bits=256
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99102 Joel Hutton changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Ever confirmed|0 |1 Last reconfirmed||2021-02-25 --- Comment #2 from Joel Hutton --- Bisect shows 46c705e70e078f6a1920d92e49042125d5e18495 is the first bad commit commit 46c705e70e078f6a1920d92e49042125d5e18495 Author: Richard Sandiford Date: Wed Nov 11 11:42:46 2020 + aarch64: Support SVE comparisons for unpacked integers This patch adds support for comparing unpacked SVE integer vectors, such as byte elements stored in the bottom bytes of halfword containers. It also adds support for selects between unpacked SVE vectors (both integer and floating-point), since selects and compares are closely tied via the vcond optab interface. is the offending commit.
[Bug target/98196] [11 Regression] aarch64: Wrong code at -O3 -march=armv8.2-a+sve -msve-vector-bits=256 -fvect-cost-model=unlimited
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98196 --- Comment #5 from Joel Hutton --- this appears to have been fixed on trunk by 0411210fddbd3ec27c8dc1183f40f662712a2232 Author: Richard Sandiford Date: Thu Dec 31 16:10:47 2020 +
[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 Bug 53947 depends on bug 98772, which changed state. Bug 98772 Summary: Widening patterns causing missed vectorization https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98772 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED
[Bug tree-optimization/98772] Widening patterns causing missed vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98772 Joel Hutton changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #6 from Joel Hutton --- fixed on trunk
[Bug tree-optimization/98772] Widening patterns causing missed vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98772 --- Comment #2 from Joel Hutton --- Yes, it is aarch64, I have updated the field.
[Bug tree-optimization/98772] New: Widening patterns causing missed vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98772 Bug ID: 98772 Summary: Widening patterns causing missed vectorization Product: gcc Version: unknown Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: joelh at gcc dot gnu.org Target Milestone: --- Disabling widening patterns (widening_mult, widening_plus, widening_minus) allows some testcases to be vectorized better. Currently mixed scalar and vector code is produced, due to the patterns being recognized and substituted but vectorization failing 'no optab'. When they are recognized 16bytes -> 16 shorts, using a pair 8byte->8short instructions is presumed, the datatypes chosen in 'vectorizable_conversion' are 'vectype_in' 8 bytes, 'vectype out' 8 shorts. This causes the scalar code to be emitted where these patterns were recognized. For the following testcases with: gcc -O3 #include extern void wdiff( int16_t d[16], uint8_t *restrict pix1, uint8_t *restrict pix2) { for( int y = 0; y < 4; y++ ) { for( int x = 0; x < 4; x++ ) d[x + y*4] = pix1[x] * pix2[x]; pix1 += 16; pix2 += 16; } The following output is seen, processing 8 elements per cycle using scalar instructions and 8 elements per cycle using vector instructions. wdiff: .LFB0: .cfi_startproc ldrbw3, [x1, 32] ldrbw6, [x2, 32] ldrbw8, [x1, 33] ldrbw5, [x2, 33] ldrbw4, [x1, 34] mul w3, w3, w6 ldrbw7, [x1, 35] fmovs0, w3 ldrbw3, [x2, 34] mul w8, w8, w5 ldrbw9, [x2, 35] ldrbw6, [x2, 48] ldrbw5, [x1, 49] ins v0.h[1], w8 mul w3, w4, w3 mul w7, w7, w9 ldrbw4, [x1, 48] ldrbw8, [x2, 49] ldrbw9, [x2, 50] ins v0.h[2], w3 ldrbw3, [x1, 51] mul w6, w6, w4 ldrbw4, [x1, 50] mul w5, w5, w8 ldrbw8, [x2, 51] ldr d2, [x1] ins v0.h[3], w7 ldr d1, [x2] mul w4, w4, w9 ldr d4, [x1, 16] ldr d3, [x2, 16] mul w1, w3, w8 ins v0.h[4], w6 zip1v2.2s, v2.2s, v4.2s zip1v1.2s, v1.2s, v3.2s ins v0.h[5], w5 umull v1.8h, v1.8b, v2.8b ins v0.h[6], w4 ins v0.h[7], w1 stp q1, q0, [x0] ret if the widening multiply instruction is disabled e.g.: - { vect_recog_widen_mult_pattern, "widen_mult" }, + //{ vect_recog_widen_mult_pattern, "widen_mult" }, in tree-vect-patterns.c then the same testcase is able to process 16 elements per cycle using vector instructions. wdiff: .LFB0: .cfi_startproc ldr b3, [x1, 33] ldr b2, [x2, 33] ldr b1, [x1, 32] ldr b0, [x2, 32] ldr b5, [x1, 34] ins v1.b[1], v3.b[0] ldr b4, [x2, 34] ins v0.b[1], v2.b[0] ldr b3, [x1, 35] ldr b2, [x2, 35] ldr b19, [x1, 48] ins v1.b[2], v5.b[0] ldr b17, [x2, 48] ins v0.b[2], v4.b[0] ldr b18, [x1, 49] ldr b16, [x2, 49] ldr b7, [x1, 50] ins v1.b[3], v3.b[0] ldr b6, [x2, 50] ins v0.b[3], v2.b[0] ldr b5, [x1, 51] ldr b4, [x2, 51] ldr d3, [x1] ins v1.b[4], v19.b[0] ldr d2, [x2] ins v0.b[4], v17.b[0] ldr d19, [x1, 16] ldr d17, [x2, 16] ins v1.b[5], v18.b[0] zip1v3.2s, v3.2s, v19.2s ins v0.b[5], v16.b[0] zip1v2.2s, v2.2s, v17.2s ins v1.b[6], v7.b[0] umull v2.8h, v2.8b, v3.8b ins v0.b[6], v6.b[0] ins v1.b[7], v5.b[0] ins v0.b[7], v4.b[0] umull v0.8h, v0.8b, v1.8b stp q2, q0, [x0] ret .cfi_endproc note the use of 2 umull instructions. The same can be seen for widening plus and widening minus. It appears to be due to the way than the vectype_in is chosen in vectorizable conversion, in vectorizable conversion, tree-vect-stmts.c:4626 vect_is_simple_use fills the &vectype1_in parameter, which fills the vectype_in parameter. during slp vectorization vect_is_simple_use uses the slp tree vectype: tree-vect-stmts.c: 11369 if (slp_node) 11370 { 11371 slp_tree child = SLP_TREE_CHILDREN (slp_node)[operand]; | 11372 *slp_def = child; 11373 *vectype = SLP_TREE_VECTYPE (child); 11374 if (SLP_TREE_DEF_TYPE (child) == vect_internal_def) 11375 { | |11376 *op = gimple_get_lhs (SLP_TREE_REPRESENTATIVE (child)->stmt); | |11377 return vect_is_simple_use (*op, vinfo, dt, def_stmt_info_out); | |11378 } for 'v
[Bug tree-optimization/98133] [11 Regression] ICE in vectorizable_conversion, at tree-vect-stmts.c:4690
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98133 Joel Hutton changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #4 from Joel Hutton --- Fixed on trunk, appears to be a duplicate of PR97929
[Bug tree-optimization/98133] [11 Regression] ICE in vectorizable_conversion, at tree-vect-stmts.c:4690
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98133 Joel Hutton changed: What|Removed |Added CC||joelh at gcc dot gnu.org --- Comment #3 from Joel Hutton --- This is fixed on trunk, I believe this is a duplicate of PR97929. Fixed on trunk with r11-5903-gf5b902a9af9d1cce6c540c7f71e02e22e45c23ef
[Bug tree-optimization/97929] [11 Regression] ICE: in exact_div, at poly-int.h:2219 (vect_get_num_vectors)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97929 Joel Hutton changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #4 from Joel Hutton --- Fixed on trunk with r11-5903-gf5b902a9af9d1cce6c540c7f71e02e22e45c23ef