[Bug target/53101] Recognize casts to sub-vectors

2021-08-19 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53101

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |FIXED
   Target Milestone|--- |8.0
 Status|NEW |RESOLVED

--- Comment #8 from Andrew Pinski  ---
GCC 5+ just produces a stack increment/decrement but no stores (unlike 4.9.x).
GCC 6+ is able to remove the stack increment/decrement.
GCC 7+ uses BIT_FIELD_REF on the gimple level.
GCC 8+ expands the BIT_FIELD_REF to use vec_select directly and get:
(insn 6 5 7 (set (reg:TI 90)
(vec_select:TI (subreg:V2TI (reg/v:V4DF 88 [ x ]) 0)
(parallel [
(const_int 0 [0])
]))) "/app/example.cpp":9 -1
 (nil))

So this is fully fixed for GCC 8+ with incremental steps along the way.

[Bug target/53101] Recognize casts to sub-vectors

2012-11-16 Thread glisse at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53101



--- Comment #7 from Marc Glisse glisse at gcc dot gnu.org 2012-11-16 23:03:47 
UTC ---

Created attachment 28713

  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28713

Tweak on the patch of PR48037



This is a slight extension of Richard's patch for PR 48037. It needs testing

and doesn't solve the real problem (which is at the RTL level).



I am not sure if BIT_FIELD_REF (a mention of that tree in doc/generic.texi

would be nice) is allowed to refer to an unaligned subvector, the patch allows

it as long as the elements are aligned. See PR 55359 for a related issue.


[Bug target/53101] Recognize casts to sub-vectors

2012-11-11 Thread glisse at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53101



--- Comment #6 from Marc Glisse glisse at gcc dot gnu.org 2012-11-11 22:18:13 
UTC ---

PR 48037 seems related (it was the scalar case).


[Bug target/53101] Recognize casts to sub-vectors

2012-05-06 Thread glisse at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53101

Marc Glisse glisse at gcc dot gnu.org changed:

   What|Removed |Added

 CC||glisse at gcc dot gnu.org

--- Comment #5 from Marc Glisse glisse at gcc dot gnu.org 2012-05-06 19:55:58 
UTC ---
Getting a BIT_FIELD_REF in gimple is simple enough. If I remember correctly,
the patch below was enough (it also fixes the fact the current code will
generate a BIT_FIELD_REF for true elements of the vector and one past the end,
but not further, and I am not sure why that one-past-the-end behavior would be
wanted).

I did it because I thought it would be easier to convince the compiler to
expand that to a vec_select, and I tried patching get_inner_reference and
extract_bit_field_1, but no luck, I give up for now.



--- gimplify.c(revision 187205)
+++ gimplify.c(working copy)
@@ -4262,11 +4262,16 @@ gimple_fold_indirect_ref (tree t)
   else if (TREE_CODE (optype) == COMPLEX_TYPE
 useless_type_conversion_p (type, TREE_TYPE (optype)))
 return fold_build1 (REALPART_EXPR, type, op);
   /* *(foo *)vectorfoo = BIT_FIELD_REFvectorfoo,... */
   else if (TREE_CODE (optype) == VECTOR_TYPE
-useless_type_conversion_p (type, TREE_TYPE (optype)))
+(useless_type_conversion_p (type, TREE_TYPE (optype))
+   || (TREE_CODE (type) == VECTOR_TYPE
+useless_type_conversion_p (TREE_TYPE (type),
+ TREE_TYPE (optype))
+TYPE_VECTOR_SUBPARTS (type)
+   TYPE_VECTOR_SUBPARTS (optype
 {
   tree part_width = TYPE_SIZE (type);
   tree index = bitsize_int (0);
   return fold_build3 (BIT_FIELD_REF, type, op, part_width, index);
 }
@@ -4283,22 +4288,29 @@ gimple_fold_indirect_ref (tree t)
   STRIP_NOPS (addr);
   addrtype = TREE_TYPE (addr);

   /* ((foo*)vectorfoo)[1] - BIT_FIELD_REFvectorfoo,... */
   if (TREE_CODE (addr) == ADDR_EXPR
-   TREE_CODE (TREE_TYPE (addrtype)) == VECTOR_TYPE
-   useless_type_conversion_p (type, TREE_TYPE (TREE_TYPE (addrtype)))
-   host_integerp (off, 1))
+   host_integerp (off, 1)
+   ((TREE_CODE (TREE_TYPE (addrtype)) == VECTOR_TYPE
+useless_type_conversion_p (type,
+TREE_TYPE (TREE_TYPE (addrtype
+  || (TREE_CODE (type) == VECTOR_TYPE
+   useless_type_conversion_p (TREE_TYPE (type),
+   TREE_TYPE (TREE_TYPE (addrtype)))
+   TYPE_VECTOR_SUBPARTS (type)
+  TYPE_VECTOR_SUBPARTS (TREE_TYPE (addrtype)
 {
   unsigned HOST_WIDE_INT offset = tree_low_cst (off, 1);
   tree part_width = TYPE_SIZE (type);
   unsigned HOST_WIDE_INT part_widthi
-= tree_low_cst (part_width, 0) / BITS_PER_UNIT;
+= tree_low_cst (part_width, 0);
+  unsigned HOST_WIDE_INT orig_widthi
+= tree_low_cst (TYPE_SIZE (TREE_TYPE (addrtype)), 0);
   unsigned HOST_WIDE_INT indexi = offset * BITS_PER_UNIT;
   tree index = bitsize_int (indexi);
-  if (offset / part_widthi
-  = TYPE_VECTOR_SUBPARTS (TREE_TYPE (addrtype)))
+  if (indexi + part_widthi = orig_widthi)
 return fold_build3 (BIT_FIELD_REF, type, TREE_OPERAND (addr, 0),
 part_width, index);
 }

   /* ((foo*)complexfoo)[1] - __imag__ complexfoo */


[Bug target/53101] Recognize casts to sub-vectors

2012-05-03 Thread marc.glisse at normalesup dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53101

--- Comment #4 from Marc Glisse marc.glisse at normalesup dot org 2012-05-03 
19:19:00 UTC ---
(define_peephole2
  [(set (mem:VI8F_256 (match_operand 2))
(match_operand:VI8F_256 1 register_operand))
   (set (match_operand:ssehalfvecmode 0 register_operand)
(mem:ssehalfvecmode (match_dup 2)))]
  TARGET_AVX
  [(set (match_dup 0)
(vec_select:ssehalfvecmode (match_dup 1)
 (parallel [(const_int 0) (const_int
1)])))]
)

(and similar for VI4F_256) is much less hackish than the XEXP stuff. I was
quite sure I'd tested exactly this and it didn't work, but now it looks like it
does :-/

Except that following http://gcc.gnu.org/ml/gcc-patches/2012-05/msg00197.html ,
this is not the right place to try and add such logic. That's a good thing
because it is way too fragile, another instruction can easily squeeze between
the two sets and disable the peephole.


[Bug target/53101] Recognize casts to sub-vectors

2012-05-01 Thread marc.glisse at normalesup dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53101

--- Comment #2 from Marc Glisse marc.glisse at normalesup dot org 2012-05-01 
15:10:26 UTC ---
(In reply to comment #1)
 We get MEM[(T * {ref-all})x] for the casting (not a BIT_FIELD_REF for
 example).
 This gets expanded to
 
 (insn 6 5 7 (set (reg:OI 63)
 (subreg:OI (reg/v:V4DF 61 [ x ]) 0)) t.c:8 -1
  (nil))
 
 (insn 7 6 8 (set (reg:V2DF 60 [ retval ])
 (subreg:V2DF (reg:OI 63) 0)) t.c:8 -1
  (nil))
 
 but that should be perfectly optimizable.

A bit hard for me (never touched those md files before)... This obviously
incorrect code does the transformation:

(define_peephole2
[
(set
 (match_operand:V8SF 2 memory_operand)
 (match_operand:V8SF 1 register_operand)
)
(set
 (match_operand:V4SF 0 register_operand)
 (match_operand:V4SF 3 memory_operand)
)
]
  TARGET_AVX
[(const_int 0)]
{
  emit_insn (gen_vec_extract_lo_v8sf (operands[0], operands[1]));
  DONE;
})

(the code in this experiment uses __v4sf and __v8sf instead of __m128d/__m256d
in the description above)

but operands[2] and operands[3] don't compare equal with rtx_equal_p, and
trying a match_dup refuses to compile because of the mode mismatch, so I don't
know how to constrain 2 and 3 to be the same. I tried adding some (subreg:
...) in there, but it didn't match, and looking at the rtl peephole dump, there
isn't any subreg there.

Then maybe peephole isn't the right place, but that's the only one where I
managed to get something that compiles and is executed by the compiler on this
testcase.


[Bug target/53101] Recognize casts to sub-vectors

2012-05-01 Thread marc.glisse at normalesup dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53101

--- Comment #3 from Marc Glisse marc.glisse at normalesup dot org 2012-05-01 
17:17:42 UTC ---
(In reply to comment #2)
 but operands[2] and operands[3] don't compare equal with rtx_equal_p, and
 trying a match_dup refuses to compile because of the mode mismatch, so I don't
 know how to constrain 2 and 3 to be the same.

rtx_equal_p (XEXP (operands[2], 0), XEXP (operands[3], 0))

seems to give the right answer in the 3 manual tests I did. Currently checking
if the testsuite finds something. It is very likely not the right way to do it,
but I didn't find any inspiring pattern in the .md files.

Then I'll see if I understand how the fancy macros make it possible to have a
single piece of code for all modes, and if instead of calling
gen_vec_extract_lo_v8sf I shouldn't give a replacement pattern like (set
(match_dup 0) (vec_select (match_dup 1) (const_int 0))).


[Bug target/53101] Recognize casts to sub-vectors

2012-04-24 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53101

Richard Guenther rguenth at gcc dot gnu.org changed:

   What|Removed |Added

   Keywords||missed-optimization
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2012-04-24
 Ever Confirmed|0   |1

--- Comment #1 from Richard Guenther rguenth at gcc dot gnu.org 2012-04-24 
11:13:33 UTC ---
We get MEM[(T * {ref-all})x] for the casting (not a BIT_FIELD_REF for
example).
This gets expanded to

(insn 6 5 7 (set (reg:OI 63)
(subreg:OI (reg/v:V4DF 61 [ x ]) 0)) t.c:8 -1
 (nil))

(insn 7 6 8 (set (reg:V2DF 60 [ retval ])
(subreg:V2DF (reg:OI 63) 0)) t.c:8 -1
 (nil))

but that should be perfectly optimizable.