On 10/18/18, Richard Sandiford <richard.sandif...@arm.com> wrote: > "H.J. Lu" <hjl.to...@gmail.com> writes: >> On 10/18/18, Richard Sandiford <richard.sandif...@arm.com> wrote: >>> "H.J. Lu" <hjl.to...@gmail.com> writes: >>>> On 10/18/18, Richard Sandiford <richard.sandif...@arm.com> wrote: >>>>> "H.J. Lu" <hjl.to...@gmail.com> writes: >>>>>> On 10/18/18, Richard Sandiford <richard.sandif...@arm.com> wrote: >>>>>>> "H.J. Lu" <hjl.to...@gmail.com> writes: >>>>>>>> On 10/17/18, Marc Glisse <marc.gli...@inria.fr> wrote: >>>>>>>>> On Wed, 17 Oct 2018, H.J. Lu wrote: >>>>>>>>> >>>>>>>>>> We may simplify >>>>>>>>>> >>>>>>>>>> (subreg (vec_merge (vec_duplicate X) (vector) (const_int 1)) 0) >>>>>>>>>> >>>>>>>>>> to X when mode of X is the same as of mode of subreg. >>>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> we already have code to simplify vec_select(vec_merge): >>>>>>>>> >>>>>>>>> /* If we select elements in a vec_merge that all come from >>>>>>>>> the >>>>>>>>> same >>>>>>>>> operand, select from that operand directly. */ >>>>>>>>> >>>>>>>>> It would make sense to me to make the subreg transform as similar >>>>>>>>> to >>>>>>>>> it >>>>>>>>> as >>>>>>>>> possible, in particular you don't need to special case >>>>>>>>> vec_duplicate, >>>>>>>>> the >>>>>>>>> transformation would see that everything comes from the first >>>>>>>>> vector, >>>>>>>>> produce (subreg (vec_duplicate X) 0), and let another >>>>>>>>> transformation >>>>>>>>> optimize that. >>>>>>> >>>>>>> Sorry, didn't see this before the OK. >>>>>>> >>>>>>>> What do you mean by another transformation? If simplify_subreg >>>>>>>> doesn't >>>>>>>> return X for >>>>>>>> >>>>>>>> (subreg (vec_merge (vec_duplicate X) >>>>>>>> (vector) >>>>>>>> (const_int ((1 << N) | M))) >>>>>>>> (N * sizeof (X))) >>>>>>>> >>>>>>>> >>>>>>>> no further transformation will be done. >>>>>>> >>>>>>> I think the point was that we should transform: >>>>>>> >>>>>>> (subreg (vec_merge X >>>>>>> (vector) >>>>>>> (const_int ((1 << N) | M))) >>>>>>> (N * sizeof (X))) >>>>>>> >>>>>>> into: >>>>>>> >>>>>>> simplify_gen_subreg (outermode, X, innermode, byte) >>>>>>> >>>>>>> which should further simplify when X is a vec_duplicate. >>>>>> >>>>>> But sizeof (X) is the size of scalar of vec_dup. How do we >>>>>> check the mask of vec_merge? >>>>> >>>>> Yeah, should be sizeof (outermode) (which was the same thing >>>>> in the original pattern, but not here). >>>>> >>>>> Richard >>>>> >>>> >>>> Like this >>>> >>>> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c >>>> index b0cf3bbb2a9..e12b5c0e165 100644 >>>> --- a/gcc/simplify-rtx.c >>>> +++ b/gcc/simplify-rtx.c >>>> @@ -6601,20 +6601,21 @@ simplify_subreg (machine_mode outermode, rtx >>>> op, >>>> return NULL_RTX; >>>> } >>>> >>>> - /* Return X for >>>> - (subreg (vec_merge (vec_duplicate X) >>>> + /* Simplify >>>> + (subreg (vec_merge (X) >>>> (vector) >>>> (const_int ((1 << N) | M))) >>>> - (N * sizeof (X))) >>>> + (N * sizeof (outermode))) >>>> + to >>>> + (subreg ((X) (N * sizeof (outermode))) >>> >>> Stray "(": (subreg (X) (N * sizeof (outermode))) >>> >>> OK with that change if it passes testing. >> >> The self-test failed for 32-bit compiler: >> >> expected: (reg:QI 342) >> actual: (subreg:QI (vec_merge:V128QI (vec_duplicate:V128QI (reg:QI >> 342)) >> (reg:V128QI 343) >> (const_int 65 [0x41])) 64) >> >> since >> >> && (UINTVAL (XEXP (op, 2)) & (HOST_WIDE_INT_1U << idx)) != 0) >> >> works only up to vectors with 64 elements for 32-bit compilers. > > Since HOST_WIDE_INT should be 64 bits for both 32-bit and 64-bit > compilers, I guess the test invoked UB and so only worked by chance > for 64-bit compilers? > >> Should we limit the self-test to vectors with 64 elements? > > HOST_BITS_PER_WIDE_INT elements probably. I wondered at first whether > we should try to support CONST_WIDE_INT, but that just changes the limit > to WIDE_INT_MAX_PRECISION.
This is the patch I am going to check in. -- H.J.
From 9c491755021816acb10fae03e897aa9932e84e9e Mon Sep 17 00:00:00 2001 From: "H.J. Lu" <hjl.to...@gmail.com> Date: Thu, 18 Oct 2018 12:56:23 -0700 Subject: [PATCH] Limit mask of vec_merge to HOST_BITS_PER_WIDE_INT Since mask of vec_merge is in HOST_WIDE_INT, HOST_BITS_PER_WIDE_INT is the maximum number of vector elements. * simplify-rtx.c (simplify_subreg): Limit mask of vec_merge to HOST_BITS_PER_WIDE_INT. (test_vector_ops_duplicate): Likewise. --- gcc/simplify-rtx.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index ccf92166356..2ff68ceb4e3 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -6611,6 +6611,7 @@ simplify_subreg (machine_mode outermode, rtx op, */ unsigned int idx; if (constant_multiple_p (byte, GET_MODE_SIZE (outermode), &idx) + && idx < HOST_BITS_PER_WIDE_INT && GET_CODE (op) == VEC_MERGE && GET_MODE_INNER (innermode) == outermode && CONST_INT_P (XEXP (op, 2)) @@ -6861,6 +6862,8 @@ test_vector_ops_duplicate (machine_mode mode, rtx scalar_reg) rtx vector_reg = make_test_reg (mode); for (unsigned HOST_WIDE_INT i = 0; i < const_nunits; i++) { + if (i >= HOST_BITS_PER_WIDE_INT) + break; rtx mask = GEN_INT ((HOST_WIDE_INT_1U << i) | (i + 1)); rtx vm = gen_rtx_VEC_MERGE (mode, duplicate, vector_reg, mask); poly_uint64 offset = i * GET_MODE_SIZE (inner_mode); -- 2.17.2