Re: [PATCH 2/2] Simplify subreg of vec_merge of vec_duplicate

H.J. Lu Thu, 18 Oct 2018 14:08:38 -0700

On 10/18/18, Richard Sandiford <richard.sandif...@arm.com> wrote:
> "H.J. Lu" <hjl.to...@gmail.com> writes:
>> On 10/18/18, Richard Sandiford <richard.sandif...@arm.com> wrote:
>>> "H.J. Lu" <hjl.to...@gmail.com> writes:
>>>> On 10/18/18, Richard Sandiford <richard.sandif...@arm.com> wrote:
>>>>> "H.J. Lu" <hjl.to...@gmail.com> writes:
>>>>>> On 10/18/18, Richard Sandiford <richard.sandif...@arm.com> wrote:
>>>>>>> "H.J. Lu" <hjl.to...@gmail.com> writes:
>>>>>>>> On 10/17/18, Marc Glisse <marc.gli...@inria.fr> wrote:
>>>>>>>>> On Wed, 17 Oct 2018, H.J. Lu wrote:
>>>>>>>>>
>>>>>>>>>> We may simplify
>>>>>>>>>>
>>>>>>>>>>  (subreg (vec_merge (vec_duplicate X) (vector) (const_int 1)) 0)
>>>>>>>>>>
>>>>>>>>>> to X when mode of X is the same as of mode of subreg.
>>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> we already have code to simplify vec_select(vec_merge):
>>>>>>>>>
>>>>>>>>>      /* If we select elements in a vec_merge that all come from
>>>>>>>>> the
>>>>>>>>> same
>>>>>>>>>         operand, select from that operand directly.  */
>>>>>>>>>
>>>>>>>>> It would make sense to me to make the subreg transform as similar
>>>>>>>>> to
>>>>>>>>> it
>>>>>>>>> as
>>>>>>>>> possible, in particular you don't need to special case
>>>>>>>>> vec_duplicate,
>>>>>>>>> the
>>>>>>>>> transformation would see that everything comes from the first
>>>>>>>>> vector,
>>>>>>>>> produce (subreg (vec_duplicate X) 0), and let another
>>>>>>>>> transformation
>>>>>>>>> optimize that.
>>>>>>>
>>>>>>> Sorry, didn't see this before the OK.
>>>>>>>
>>>>>>>> What do you mean by another transformation? If simplify_subreg
>>>>>>>> doesn't
>>>>>>>> return X for
>>>>>>>>
>>>>>>>>   (subreg (vec_merge (vec_duplicate X)
>>>>>>>>                     (vector)
>>>>>>>>                     (const_int ((1 << N) | M)))
>>>>>>>>          (N * sizeof (X)))
>>>>>>>>
>>>>>>>>
>>>>>>>> no further transformation will be done.
>>>>>>>
>>>>>>> I think the point was that we should transform:
>>>>>>>
>>>>>>>   (subreg (vec_merge X
>>>>>>>                      (vector)
>>>>>>>                      (const_int ((1 << N) | M)))
>>>>>>>           (N * sizeof (X)))
>>>>>>>
>>>>>>> into:
>>>>>>>
>>>>>>>   simplify_gen_subreg (outermode, X, innermode, byte)
>>>>>>>
>>>>>>> which should further simplify when X is a vec_duplicate.
>>>>>>
>>>>>> But sizeof (X) is the size of scalar of vec_dup.  How do we
>>>>>> check the mask of vec_merge?
>>>>>
>>>>> Yeah, should be sizeof (outermode) (which was the same thing
>>>>> in the original pattern, but not here).
>>>>>
>>>>> Richard
>>>>>
>>>>
>>>> Like this
>>>>
>>>> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
>>>> index b0cf3bbb2a9..e12b5c0e165 100644
>>>> --- a/gcc/simplify-rtx.c
>>>> +++ b/gcc/simplify-rtx.c
>>>> @@ -6601,20 +6601,21 @@ simplify_subreg (machine_mode outermode, rtx
>>>> op,
>>>>        return NULL_RTX;
>>>>      }
>>>>
>>>> -  /* Return X for
>>>> -  (subreg (vec_merge (vec_duplicate X)
>>>> +  /* Simplify
>>>> +  (subreg (vec_merge (X)
>>>>             (vector)
>>>>             (const_int ((1 << N) | M)))
>>>> -     (N * sizeof (X)))
>>>> +     (N * sizeof (outermode)))
>>>> +     to
>>>> +  (subreg ((X) (N * sizeof (outermode)))
>>>
>>> Stray "(": (subreg (X) (N * sizeof (outermode)))
>>>
>>> OK with that change if it passes testing.
>>
>> The self-test failed for 32-bit compiler:
>>
>> expected: (reg:QI 342)
>>   actual: (subreg:QI (vec_merge:V128QI (vec_duplicate:V128QI (reg:QI
>> 342))
>>         (reg:V128QI 343)
>>         (const_int 65 [0x41])) 64)
>>
>> since
>>
>> && (UINTVAL (XEXP (op, 2)) & (HOST_WIDE_INT_1U << idx)) != 0)
>>
>> works only up to vectors with 64 elements for 32-bit compilers.
>
> Since HOST_WIDE_INT should be 64 bits for both 32-bit and 64-bit
> compilers, I guess the test invoked UB and so only worked by chance
> for 64-bit compilers?
>
>> Should we limit the self-test to vectors with 64 elements?
>
> HOST_BITS_PER_WIDE_INT elements probably.  I wondered at first whether
> we should try to support CONST_WIDE_INT, but that just changes the limit
> to WIDE_INT_MAX_PRECISION.


This is the patch I am going to check in.

-- 
H.J.

From 9c491755021816acb10fae03e897aa9932e84e9e Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.to...@gmail.com>
Date: Thu, 18 Oct 2018 12:56:23 -0700
Subject: [PATCH] Limit mask of vec_merge to HOST_BITS_PER_WIDE_INT

Since mask of vec_merge is in HOST_WIDE_INT, HOST_BITS_PER_WIDE_INT is
the maximum number of vector elements.

	* simplify-rtx.c (simplify_subreg): Limit mask of vec_merge to
	HOST_BITS_PER_WIDE_INT.
	(test_vector_ops_duplicate): Likewise.
---
 gcc/simplify-rtx.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index ccf92166356..2ff68ceb4e3 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -6611,6 +6611,7 @@ simplify_subreg (machine_mode outermode, rtx op,
    */
   unsigned int idx;
   if (constant_multiple_p (byte, GET_MODE_SIZE (outermode), &idx)
+      && idx < HOST_BITS_PER_WIDE_INT
       && GET_CODE (op) == VEC_MERGE
       && GET_MODE_INNER (innermode) == outermode
       && CONST_INT_P (XEXP (op, 2))
@@ -6861,6 +6862,8 @@ test_vector_ops_duplicate (machine_mode mode, rtx scalar_reg)
       rtx vector_reg = make_test_reg (mode);
       for (unsigned HOST_WIDE_INT i = 0; i < const_nunits; i++)
 	{
+	  if (i >= HOST_BITS_PER_WIDE_INT)
+	    break;
 	  rtx mask = GEN_INT ((HOST_WIDE_INT_1U << i) | (i + 1));
 	  rtx vm = gen_rtx_VEC_MERGE (mode, duplicate, vector_reg, mask);
 	  poly_uint64 offset = i * GET_MODE_SIZE (inner_mode);
-- 
2.17.2

Re: [PATCH 2/2] Simplify subreg of vec_merge of vec_duplicate

Reply via email to