[Bug tree-optimization/103523] [11 Regression] vectorizable_induction generating code for modes without checking support for them by r11-7861-ge4180ab2f

2021-12-15 Thread joelh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103523

Joel Hutton  changed:

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |FIXED

--- Comment #11 from Joel Hutton  ---
Fixed on 11 and trunk.

[Bug bootstrap/103688] [11 regression] build error after r11-9380

2021-12-15 Thread joelh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103688

Joel Hutton  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Joel Hutton  ---
Fixed on 11.

[Bug tree-optimization/103523] [11/12 Regression] vectorizable_induction generating code for modes without checking support for them by r11-7861-ge4180ab2f

2021-12-10 Thread joelh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103523

Joel Hutton  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from Joel Hutton  ---
fixed on trunk

[Bug tree-optimization/103523] [11/12 Regression] SVE float auto-vect float format expand failure

2021-12-03 Thread joelh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103523

Joel Hutton  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED

--- Comment #4 from Joel Hutton  ---
introduced 26th March by e4180ab2f

[Bug target/99102] [11 Regression] SVE: Wrong code with -O2 -ftree-vectorize -march=armv8.2-a+sve -msve-vector-bits=256

2021-03-10 Thread joelh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99102

Joel Hutton  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Joel Hutton  ---
Fixed on trunk.

[Bug target/99102] [11 Regression] SVE: Wrong code with -O2 -ftree-vectorize -march=armv8.2-a+sve -msve-vector-bits=256

2021-03-05 Thread joelh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99102

--- Comment #4 from Joel Hutton  ---
It seems it is vectorizing a 'MASK_STORE' into a 'SCATTER_STORE' when it should
be using a 'MASK_SCATTER_STORE'. Currently it's choosing between
IFN_SCATTER_STORE and IFN_MASK_SCATTER_STORE based on the
'using_partial_vectors' field.

 7729   vec_loop_masks *loop_masks
 7730 = (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
 7731? &LOOP_VINFO_MASKS (loop_vinfo)
 7732: NULL);
 7733   vec_loop_lens *loop_lens
 7734 = (loop_vinfo && LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)
 7735? &LOOP_VINFO_LENS (loop_vinfo)
 7736: NULL);

 806 #define LOOP_VINFO_FULLY_MASKED_P(L)>--->---\$
 807   (LOOP_VINFO_USING_PARTIAL_VECTORS_P (L)>--\$
 808&& !LOOP_VINFO_MASKS (L).is_empty ())$
 809 $

 8005   if (memory_access_type == VMAT_GATHER_SCATTER)
 8006 {
 8007   tree scale = size_int (gs_info.scale);
 8008   gcall *call;
 8009   if (loop_masks)
 8010 call = gimple_build_call_internal
 8011   (IFN_MASK_SCATTER_STORE, 5, dataref_ptr,
vec_offset,
 8012scale, vec_oprnd, final_mask);
 8013   else
 8014 call = gimple_build_call_internal
 8015   (IFN_SCATTER_STORE, 4, dataref_ptr, vec_offset,
 8016scale, vec_oprnd);
 8017   gimple_call_set_nothrow (call, true);
 8018   vect_finish_stmt_generation (vinfo, stmt_info, call,
gsi);
 8019   new_stmt = call;
 8020   break;
 8021 }
 8022
 8023   if (i > 0)
 8024 /* Bump the vector pointer.  */
 8025 dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr,
ptr_incr,
 8026gsi, stmt_info, bump);
 8027
 8028   if (slp)
 8029 vec_oprnd = vec_oprnds[i];

[Bug target/99102] [11 Regression] SVE: Wrong code with -O2 -ftree-vectorize -march=armv8.2-a+sve -msve-vector-bits=256

2021-03-03 Thread joelh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99102

--- Comment #3 from Joel Hutton  ---
Seems like the 'vect' pass is vectorizing a 'MASK_STORE' to a 'SCATTER_STORE'
and ignoring the mask.

1292 testcase.c:7:29: note:  vect_is_simple_use: operand 0, type of def:
constant$
1293 testcase.c:7:29: note:  created new init_stmt: vect_cst__65 = { 0, 0, 0, 0
};$
1294 testcase.c:7:29: note:  add new stmt: mask__60.11_66 = vect_cst__64 ==
vect_cst__65;$
1295 testcase.c:7:29: note:  -->vectorizing statement: _61 = &a[_10];$
1296 testcase.c:7:29: note:  -->vectorizing statement: .MASK_STORE (_61,
128B, _60, _31);$
1297 testcase.c:7:29: note:  transform statement.$
1298 testcase.c:7:29: note:  vect_is_simple_use: operand l_23(D) == 0, type of
def: internal$
1299 testcase.c:7:29: note:  vect_is_simple_use: vectype vector(4)
$
1300 testcase.c:7:29: note:  vect_is_simple_use: operand (long intD.9) j_24(D),
type of def: internal$
1301 testcase.c:7:29: note:  vect_is_simple_use: vectype vector(4) long int$
1302 Applying pattern match.pd:139, generic-match.c:24056$
1303 Applying pattern match.pd:139, generic-match.c:24056$
1304 testcase.c:7:29: note:  transform store. ncopies = 1$
1305 testcase.c:7:29: note:  vect_get_vec_defs_for_operand: _31$
1306 testcase.c:7:29: note:  vect_is_simple_use: operand (long intD.9) j_24(D),
type of def: internal$
1307 testcase.c:7:29: note:def_stmt =  _31 = (long int) j_24(D);$
1308 testcase.c:7:29: note:  vect_get_vec_defs_for_operand: _60$
1309 testcase.c:7:29: note:  vect_is_simple_use: operand l_23(D) == 0, type of
def: internal$
1310 testcase.c:7:29: note:def_stmt =  _60 = l_23(D) == 0;$
1311 testcase.c:7:29: note:  create integer_type-pointer variable to type: long
int  vectorizing a pointer ref: MEM[(long int *)&a]$
1312 Applying pattern match.pd:139, generic-match.c:27580$
1313 testcase.c:7:29: note:  created &a$
1314 testcase.c:7:29: note:  add new stmt: .SCATTER_STORE (vectp_a.12_67, { 0,
32, 64, 96 }, 1, vect__31.10_63);$



before:

112[local count: 858993458]:$
113   # i_39 = PHI $
114   # ivtmp_16 = PHI $
115   _1 = (int) i_39;$
116   _2 = (long int) j_24(D);$
117   _45 = l_23(D) == 0;$
118   _46 = &a[_1];$
119   .MASK_STORE (_46, 128B, _45, _2);$
120   i.0_3 = (unsigned short) i_39;$
121   _4 = i.0_3 + 4;$
122   i_26 = (short int) _4;$
123   ivtmp_7 = ivtmp_16 - 1;$
124   if (ivtmp_7 != 0)$
125 goto ; [75.00%]$
126   else$
127 goto ; [25.00%]$
128 $
129[local count: 644245087]:$
130   goto ; [100.00%]$
131 $
132[local count: 214748368]:$
133   h_22 = h_33 + 1;$
134   if (_14 > h_22)$
135 goto ; [89.00%]$
136   else$
137 goto ; [11.00%]$


after:

1525   # PT = null { D.3608 } (nonlocal)$
1526   # ALIGN = 32, MISALIGN = 0$
1527   # vectp_a.12_67 = PHI <&aD.3608(21), vectp_a.12_68(26)>$
1528   # ivtmp_70 = PHI <0(21), ivtmp_71(26)>$
1529   _10 = (intD.7) i_9;$
1530   vect__31.10_63 = (vector(4) long intD.9) vect_cst__62;$
1531   _31 = (long intD.9) j_24(D);$
1532   mask__60.11_66 = vect_cst__64 == vect_cst__65;$
1533   _60 = l_23(D) == 0;$
1534   # PT = null { D.3608 } (nonlocal)$
1535   _61 = &aD.3608[_10];$
1536   # .MEM_69 = VDEF <.MEM_13>$
1537   # USE = anything~$
1538   # CLB = anything~$
1539   .SCATTER_STORE (vectp_a.12_67, { 0, 32, 64, 96 }, 1, vect__31.10_63);$
1540   # RANGE [0, 14] NONZERO 12$
1541   i.0_27 = (unsigned short) i_9;$
1542   # RANGE [4, 18] NONZERO 28$
1543   _28 = i.0_27 + 4;$
1544   # RANGE [4, 18] NONZERO 28$
1545   i_29 = (short intD.18) _28;$
1546   ivtmp_30 = ivtmp_11 - 1;$
1547   # PT = null { D.3608 } (nonlocal)$
1548   # ALIGN = 32, MISALIGN = 0$
1549   vectp_a.12_68 = vectp_a.12_67 + 128;$
1550   ivtmp_71 = ivtmp_70 + 1;$
1551   if (ivtmp_71 < 1)$
1552 goto ; [0.00%]$
1553   else$
1554 goto ; [100.00%]$
1555 ;;succ:   26 [never (adjusted)]  count:0 (estimated locally)
(TRUE_VALUE,EXECUTABLE)$
1556 ;;40 [always (adjusted)]  count:214748371 (estimated
locally) (FALSE_VALUE,EXECUTABLE)$
1557 $
1558 ;;   basic block 26, loop depth 4, count 0 (estimated locally)$
1559 ;;prev block 22, next block 40, flags: (NEW, VISITED)$
1560 ;;pred:   22 [never (adjusted)]  count:0 (estimated locally)
(TRUE_VALUE,EXECUTABLE)$
1561   goto ; [100.00%]$

[Bug target/99102] [11 Regression] SVE: Wrong code with -O2 -ftree-vectorize -march=armv8.2-a+sve -msve-vector-bits=256

2021-02-25 Thread joelh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99102

Joel Hutton  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Last reconfirmed||2021-02-25

--- Comment #2 from Joel Hutton  ---
Bisect shows

46c705e70e078f6a1920d92e49042125d5e18495 is the first bad commit
commit 46c705e70e078f6a1920d92e49042125d5e18495
Author: Richard Sandiford 
Date:   Wed Nov 11 11:42:46 2020 +

aarch64: Support SVE comparisons for unpacked integers

This patch adds support for comparing unpacked SVE integer vectors,
such as byte elements stored in the bottom bytes of halfword
containers.  It also adds support for selects between unpacked
SVE vectors (both integer and floating-point), since selects and
compares are closely tied via the vcond optab interface.



is the offending commit.

[Bug target/98196] [11 Regression] aarch64: Wrong code at -O3 -march=armv8.2-a+sve -msve-vector-bits=256 -fvect-cost-model=unlimited

2021-02-18 Thread joelh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98196

--- Comment #5 from Joel Hutton  ---
this appears to have been fixed on trunk by 

0411210fddbd3ec27c8dc1183f40f662712a2232
Author: Richard Sandiford 
Date:   Thu Dec 31 16:10:47 2020 +

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2021-02-11 Thread joelh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 98772, which changed state.

Bug 98772 Summary: Widening patterns causing missed vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98772

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/98772] Widening patterns causing missed vectorization

2021-02-11 Thread joelh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98772

Joel Hutton  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Joel Hutton  ---
fixed on trunk

[Bug tree-optimization/98772] Widening patterns causing missed vectorization

2021-01-21 Thread joelh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98772

--- Comment #2 from Joel Hutton  ---
Yes, it is aarch64, I have updated the field.

[Bug tree-optimization/98772] New: Widening patterns causing missed vectorization

2021-01-20 Thread joelh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98772

Bug ID: 98772
   Summary: Widening patterns causing missed vectorization
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: joelh at gcc dot gnu.org
  Target Milestone: ---

Disabling widening patterns (widening_mult, widening_plus, widening_minus)
allows some testcases to be vectorized better. Currently mixed scalar and
vector code is produced, due to the patterns being recognized and substituted
but vectorization failing 'no optab'. When they are recognized 16bytes -> 16
shorts, using a pair 8byte->8short instructions is presumed, the datatypes
chosen in 'vectorizable_conversion' are 'vectype_in' 8 bytes, 'vectype out' 8
shorts. This causes the scalar code to be emitted where these patterns were
recognized.


For the following testcases with: gcc -O3

#include 
extern void wdiff( int16_t d[16], uint8_t *restrict pix1, uint8_t *restrict
pix2)
{
   for( int y = 0; y < 4; y++ )
  {
for( int x = 0; x < 4; x++ )
  d[x + y*4] = pix1[x] * pix2[x];
pix1 += 16;  
pix2 += 16;
 }

The following output is seen, processing 8 elements per cycle using scalar
instructions and 8 elements per cycle using vector instructions.

wdiff:
.LFB0:
.cfi_startproc
ldrbw3, [x1, 32]
ldrbw6, [x2, 32]
ldrbw8, [x1, 33]
ldrbw5, [x2, 33]
ldrbw4, [x1, 34]
mul w3, w3, w6
ldrbw7, [x1, 35]
fmovs0, w3
ldrbw3, [x2, 34]
mul w8, w8, w5
ldrbw9, [x2, 35]
ldrbw6, [x2, 48]
ldrbw5, [x1, 49]
ins v0.h[1], w8
mul w3, w4, w3
mul w7, w7, w9
ldrbw4, [x1, 48]
ldrbw8, [x2, 49]
ldrbw9, [x2, 50]
ins v0.h[2], w3
ldrbw3, [x1, 51]
mul w6, w6, w4
ldrbw4, [x1, 50]
mul w5, w5, w8
ldrbw8, [x2, 51]
ldr d2, [x1]
ins v0.h[3], w7
ldr d1, [x2]
mul w4, w4, w9
ldr d4, [x1, 16]
ldr d3, [x2, 16]
mul w1, w3, w8
ins v0.h[4], w6
zip1v2.2s, v2.2s, v4.2s
zip1v1.2s, v1.2s, v3.2s
ins v0.h[5], w5
umull   v1.8h, v1.8b, v2.8b
ins v0.h[6], w4
ins v0.h[7], w1
stp q1, q0, [x0]
ret


if the widening multiply instruction is disabled e.g.:

-  { vect_recog_widen_mult_pattern, "widen_mult" },
+  //{ vect_recog_widen_mult_pattern, "widen_mult" },
in tree-vect-patterns.c

then the same testcase is able to process 16 elements per cycle using vector
instructions. 

wdiff:
.LFB0:
.cfi_startproc
ldr b3, [x1, 33]
ldr b2, [x2, 33]
ldr b1, [x1, 32]
ldr b0, [x2, 32]
ldr b5, [x1, 34]
ins v1.b[1], v3.b[0]
ldr b4, [x2, 34]
ins v0.b[1], v2.b[0]
ldr b3, [x1, 35]
ldr b2, [x2, 35]
ldr b19, [x1, 48]
ins v1.b[2], v5.b[0]
ldr b17, [x2, 48]
ins v0.b[2], v4.b[0]
ldr b18, [x1, 49]
ldr b16, [x2, 49]
ldr b7, [x1, 50]
ins v1.b[3], v3.b[0]
ldr b6, [x2, 50]
ins v0.b[3], v2.b[0]
ldr b5, [x1, 51]
ldr b4, [x2, 51]
ldr d3, [x1]
ins v1.b[4], v19.b[0]
ldr d2, [x2]
ins v0.b[4], v17.b[0]
ldr d19, [x1, 16]
ldr d17, [x2, 16]
ins v1.b[5], v18.b[0]
zip1v3.2s, v3.2s, v19.2s
ins v0.b[5], v16.b[0]
zip1v2.2s, v2.2s, v17.2s
ins v1.b[6], v7.b[0]
umull   v2.8h, v2.8b, v3.8b
ins v0.b[6], v6.b[0]
ins v1.b[7], v5.b[0]
ins v0.b[7], v4.b[0]
umull   v0.8h, v0.8b, v1.8b
stp q2, q0, [x0]
ret
.cfi_endproc

note the use of 2 umull instructions.



The same can be seen for widening plus and widening minus.

It appears to be due to the way than the vectype_in is chosen in vectorizable
conversion, 

in vectorizable conversion, tree-vect-stmts.c:4626

vect_is_simple_use fills the &vectype1_in parameter, which fills the vectype_in
parameter.



during slp vectorization vect_is_simple_use uses the slp tree vectype:

tree-vect-stmts.c:
11369 if (slp_node)
11370 {
11371 slp_tree child = SLP_TREE_CHILDREN (slp_node)[operand]; |
11372 *slp_def = child;
11373 *vectype = SLP_TREE_VECTYPE (child);
11374 if (SLP_TREE_DEF_TYPE (child) == vect_internal_def)
11375 { | |11376 *op = gimple_get_lhs (SLP_TREE_REPRESENTATIVE (child)->stmt);
| |11377 return vect_is_simple_use (*op, vinfo, dt, def_stmt_info_out); |
|11378 }



for 'v

[Bug tree-optimization/98133] [11 Regression] ICE in vectorizable_conversion, at tree-vect-stmts.c:4690

2021-01-14 Thread joelh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98133

Joel Hutton  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Joel Hutton  ---
Fixed on trunk, appears to be a duplicate of PR97929

[Bug tree-optimization/98133] [11 Regression] ICE in vectorizable_conversion, at tree-vect-stmts.c:4690

2021-01-14 Thread joelh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98133

Joel Hutton  changed:

   What|Removed |Added

 CC||joelh at gcc dot gnu.org

--- Comment #3 from Joel Hutton  ---
This is fixed on trunk, I believe this is a duplicate of PR97929. 

Fixed on trunk with r11-5903-gf5b902a9af9d1cce6c540c7f71e02e22e45c23ef

[Bug tree-optimization/97929] [11 Regression] ICE: in exact_div, at poly-int.h:2219 (vect_get_num_vectors)

2020-12-10 Thread joelh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97929

Joel Hutton  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Joel Hutton  ---
Fixed on trunk with r11-5903-gf5b902a9af9d1cce6c540c7f71e02e22e45c23ef