Re: [RFC PATCH, i386]: Autovectorize 8-byte vectors

Jeff Law Thu, 27 Jun 2019 08:37:02 -0700

On 6/27/19 9:34 AM, Jakub Jelinek wrote:
> On Thu, Jun 27, 2019 at 09:24:58AM -0600, Jeff Law wrote:
>> On 6/27/19 12:05 AM, Jakub Jelinek wrote:
>>> On Wed, Jun 26, 2019 at 12:19:28PM +0200, Uros Bizjak wrote:
>>>> Yes, the patch works OK. I'll regression test it and push it later today.
>>>
>>> I think it caused
>>> +FAIL: gcc.dg/tree-ssa/pr84512.c scan-tree-dump optimized "return 285;"
>>> which admittedly already is xfailed on various targets.
>>> We now newly vectorize those loops and there is no FRE or similar pass
>>> after vectorization to clean it up, in particular optimize the
>>> a[8] and a[9] loads given the MEM <vector(2) int> [(int *)&a + 32B]
>>> store:
>>>   MEM <vector(2) int> [(int *)&a + 32B] = { 64, 81 };
>>>   _13 = a[8];
>>>   res_6 = _13 + 140;
>>>   _18 = a[9];
>>>   res_15 = res_6 + _18;
>>>   a ={v} {CLOBBER};
>>>   return res_15;
>>>
>>> Shall we xfail it, or is there a plan to enable FRE after vectorization,
>>> or similar pass that would be able to do similar memory optimizations?
>>> Note, the RTL passes are able to optimize it in the end in this testcase.
>> I wonder if we could logically break up the vector store within DOM.  If
>> we did that we'd end up with a[8] and a[9] in DOM's expression hash
>> table.  That would allow us to replace the loads into _13 and _18 with
>> constants and the rest should just fall out.
>>
>> Care to open a BZ?  If so, go ahead and assign it to me.
> 
> I think Richi is on working on adding fre3 now.
Yea, I saw that later.  I think Richi's message indicated he wanted a
late fre pass, so even if DOM was to capture this, it may not eliminate
the desire for a late fre pass.


jeff

Re: [RFC PATCH, i386]: Autovectorize 8-byte vectors

Reply via email to