[Bug target/53101] Recognize casts to sub-vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53101 Andrew Pinski changed: What|Removed |Added Resolution|--- |FIXED Target Milestone|--- |8.0 Status|NEW |RESOLVED --- Comment #8 from Andrew Pinski --- GCC 5+ just produces a stack increment/decrement but no stores (unlike 4.9.x). GCC 6+ is able to remove the stack increment/decrement. GCC 7+ uses BIT_FIELD_REF on the gimple level. GCC 8+ expands the BIT_FIELD_REF to use vec_select directly and get: (insn 6 5 7 (set (reg:TI 90) (vec_select:TI (subreg:V2TI (reg/v:V4DF 88 [ x ]) 0) (parallel [ (const_int 0 [0]) ]))) "/app/example.cpp":9 -1 (nil)) So this is fully fixed for GCC 8+ with incremental steps along the way.
[Bug target/53101] Recognize casts to sub-vectors
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53101 --- Comment #7 from Marc Glisse glisse at gcc dot gnu.org 2012-11-16 23:03:47 UTC --- Created attachment 28713 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28713 Tweak on the patch of PR48037 This is a slight extension of Richard's patch for PR 48037. It needs testing and doesn't solve the real problem (which is at the RTL level). I am not sure if BIT_FIELD_REF (a mention of that tree in doc/generic.texi would be nice) is allowed to refer to an unaligned subvector, the patch allows it as long as the elements are aligned. See PR 55359 for a related issue.
[Bug target/53101] Recognize casts to sub-vectors
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53101 --- Comment #6 from Marc Glisse glisse at gcc dot gnu.org 2012-11-11 22:18:13 UTC --- PR 48037 seems related (it was the scalar case).
[Bug target/53101] Recognize casts to sub-vectors
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53101 Marc Glisse glisse at gcc dot gnu.org changed: What|Removed |Added CC||glisse at gcc dot gnu.org --- Comment #5 from Marc Glisse glisse at gcc dot gnu.org 2012-05-06 19:55:58 UTC --- Getting a BIT_FIELD_REF in gimple is simple enough. If I remember correctly, the patch below was enough (it also fixes the fact the current code will generate a BIT_FIELD_REF for true elements of the vector and one past the end, but not further, and I am not sure why that one-past-the-end behavior would be wanted). I did it because I thought it would be easier to convince the compiler to expand that to a vec_select, and I tried patching get_inner_reference and extract_bit_field_1, but no luck, I give up for now. --- gimplify.c(revision 187205) +++ gimplify.c(working copy) @@ -4262,11 +4262,16 @@ gimple_fold_indirect_ref (tree t) else if (TREE_CODE (optype) == COMPLEX_TYPE useless_type_conversion_p (type, TREE_TYPE (optype))) return fold_build1 (REALPART_EXPR, type, op); /* *(foo *)vectorfoo = BIT_FIELD_REFvectorfoo,... */ else if (TREE_CODE (optype) == VECTOR_TYPE -useless_type_conversion_p (type, TREE_TYPE (optype))) +(useless_type_conversion_p (type, TREE_TYPE (optype)) + || (TREE_CODE (type) == VECTOR_TYPE +useless_type_conversion_p (TREE_TYPE (type), + TREE_TYPE (optype)) +TYPE_VECTOR_SUBPARTS (type) + TYPE_VECTOR_SUBPARTS (optype { tree part_width = TYPE_SIZE (type); tree index = bitsize_int (0); return fold_build3 (BIT_FIELD_REF, type, op, part_width, index); } @@ -4283,22 +4288,29 @@ gimple_fold_indirect_ref (tree t) STRIP_NOPS (addr); addrtype = TREE_TYPE (addr); /* ((foo*)vectorfoo)[1] - BIT_FIELD_REFvectorfoo,... */ if (TREE_CODE (addr) == ADDR_EXPR - TREE_CODE (TREE_TYPE (addrtype)) == VECTOR_TYPE - useless_type_conversion_p (type, TREE_TYPE (TREE_TYPE (addrtype))) - host_integerp (off, 1)) + host_integerp (off, 1) + ((TREE_CODE (TREE_TYPE (addrtype)) == VECTOR_TYPE +useless_type_conversion_p (type, +TREE_TYPE (TREE_TYPE (addrtype + || (TREE_CODE (type) == VECTOR_TYPE + useless_type_conversion_p (TREE_TYPE (type), + TREE_TYPE (TREE_TYPE (addrtype))) + TYPE_VECTOR_SUBPARTS (type) + TYPE_VECTOR_SUBPARTS (TREE_TYPE (addrtype) { unsigned HOST_WIDE_INT offset = tree_low_cst (off, 1); tree part_width = TYPE_SIZE (type); unsigned HOST_WIDE_INT part_widthi -= tree_low_cst (part_width, 0) / BITS_PER_UNIT; += tree_low_cst (part_width, 0); + unsigned HOST_WIDE_INT orig_widthi += tree_low_cst (TYPE_SIZE (TREE_TYPE (addrtype)), 0); unsigned HOST_WIDE_INT indexi = offset * BITS_PER_UNIT; tree index = bitsize_int (indexi); - if (offset / part_widthi - = TYPE_VECTOR_SUBPARTS (TREE_TYPE (addrtype))) + if (indexi + part_widthi = orig_widthi) return fold_build3 (BIT_FIELD_REF, type, TREE_OPERAND (addr, 0), part_width, index); } /* ((foo*)complexfoo)[1] - __imag__ complexfoo */
[Bug target/53101] Recognize casts to sub-vectors
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53101 --- Comment #4 from Marc Glisse marc.glisse at normalesup dot org 2012-05-03 19:19:00 UTC --- (define_peephole2 [(set (mem:VI8F_256 (match_operand 2)) (match_operand:VI8F_256 1 register_operand)) (set (match_operand:ssehalfvecmode 0 register_operand) (mem:ssehalfvecmode (match_dup 2)))] TARGET_AVX [(set (match_dup 0) (vec_select:ssehalfvecmode (match_dup 1) (parallel [(const_int 0) (const_int 1)])))] ) (and similar for VI4F_256) is much less hackish than the XEXP stuff. I was quite sure I'd tested exactly this and it didn't work, but now it looks like it does :-/ Except that following http://gcc.gnu.org/ml/gcc-patches/2012-05/msg00197.html , this is not the right place to try and add such logic. That's a good thing because it is way too fragile, another instruction can easily squeeze between the two sets and disable the peephole.
[Bug target/53101] Recognize casts to sub-vectors
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53101 --- Comment #2 from Marc Glisse marc.glisse at normalesup dot org 2012-05-01 15:10:26 UTC --- (In reply to comment #1) We get MEM[(T * {ref-all})x] for the casting (not a BIT_FIELD_REF for example). This gets expanded to (insn 6 5 7 (set (reg:OI 63) (subreg:OI (reg/v:V4DF 61 [ x ]) 0)) t.c:8 -1 (nil)) (insn 7 6 8 (set (reg:V2DF 60 [ retval ]) (subreg:V2DF (reg:OI 63) 0)) t.c:8 -1 (nil)) but that should be perfectly optimizable. A bit hard for me (never touched those md files before)... This obviously incorrect code does the transformation: (define_peephole2 [ (set (match_operand:V8SF 2 memory_operand) (match_operand:V8SF 1 register_operand) ) (set (match_operand:V4SF 0 register_operand) (match_operand:V4SF 3 memory_operand) ) ] TARGET_AVX [(const_int 0)] { emit_insn (gen_vec_extract_lo_v8sf (operands[0], operands[1])); DONE; }) (the code in this experiment uses __v4sf and __v8sf instead of __m128d/__m256d in the description above) but operands[2] and operands[3] don't compare equal with rtx_equal_p, and trying a match_dup refuses to compile because of the mode mismatch, so I don't know how to constrain 2 and 3 to be the same. I tried adding some (subreg: ...) in there, but it didn't match, and looking at the rtl peephole dump, there isn't any subreg there. Then maybe peephole isn't the right place, but that's the only one where I managed to get something that compiles and is executed by the compiler on this testcase.
[Bug target/53101] Recognize casts to sub-vectors
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53101 --- Comment #3 from Marc Glisse marc.glisse at normalesup dot org 2012-05-01 17:17:42 UTC --- (In reply to comment #2) but operands[2] and operands[3] don't compare equal with rtx_equal_p, and trying a match_dup refuses to compile because of the mode mismatch, so I don't know how to constrain 2 and 3 to be the same. rtx_equal_p (XEXP (operands[2], 0), XEXP (operands[3], 0)) seems to give the right answer in the 3 manual tests I did. Currently checking if the testsuite finds something. It is very likely not the right way to do it, but I didn't find any inspiring pattern in the .md files. Then I'll see if I understand how the fancy macros make it possible to have a single piece of code for all modes, and if instead of calling gen_vec_extract_lo_v8sf I shouldn't give a replacement pattern like (set (match_dup 0) (vec_select (match_dup 1) (const_int 0))).
[Bug target/53101] Recognize casts to sub-vectors
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53101 Richard Guenther rguenth at gcc dot gnu.org changed: What|Removed |Added Keywords||missed-optimization Status|UNCONFIRMED |NEW Last reconfirmed||2012-04-24 Ever Confirmed|0 |1 --- Comment #1 from Richard Guenther rguenth at gcc dot gnu.org 2012-04-24 11:13:33 UTC --- We get MEM[(T * {ref-all})x] for the casting (not a BIT_FIELD_REF for example). This gets expanded to (insn 6 5 7 (set (reg:OI 63) (subreg:OI (reg/v:V4DF 61 [ x ]) 0)) t.c:8 -1 (nil)) (insn 7 6 8 (set (reg:V2DF 60 [ retval ]) (subreg:V2DF (reg:OI 63) 0)) t.c:8 -1 (nil)) but that should be perfectly optimizable.