On 6/27/19 9:34 AM, Jakub Jelinek wrote: > On Thu, Jun 27, 2019 at 09:24:58AM -0600, Jeff Law wrote: >> On 6/27/19 12:05 AM, Jakub Jelinek wrote: >>> On Wed, Jun 26, 2019 at 12:19:28PM +0200, Uros Bizjak wrote: >>>> Yes, the patch works OK. I'll regression test it and push it later today. >>> >>> I think it caused >>> +FAIL: gcc.dg/tree-ssa/pr84512.c scan-tree-dump optimized "return 285;" >>> which admittedly already is xfailed on various targets. >>> We now newly vectorize those loops and there is no FRE or similar pass >>> after vectorization to clean it up, in particular optimize the >>> a[8] and a[9] loads given the MEM <vector(2) int> [(int *)&a + 32B] >>> store: >>> MEM <vector(2) int> [(int *)&a + 32B] = { 64, 81 }; >>> _13 = a[8]; >>> res_6 = _13 + 140; >>> _18 = a[9]; >>> res_15 = res_6 + _18; >>> a ={v} {CLOBBER}; >>> return res_15; >>> >>> Shall we xfail it, or is there a plan to enable FRE after vectorization, >>> or similar pass that would be able to do similar memory optimizations? >>> Note, the RTL passes are able to optimize it in the end in this testcase. >> I wonder if we could logically break up the vector store within DOM. If >> we did that we'd end up with a[8] and a[9] in DOM's expression hash >> table. That would allow us to replace the loads into _13 and _18 with >> constants and the rest should just fall out. >> >> Care to open a BZ? If so, go ahead and assign it to me. > > I think Richi is on working on adding fre3 now. Yea, I saw that later. I think Richi's message indicated he wanted a late fre pass, so even if DOM was to capture this, it may not eliminate the desire for a late fre pass.
jeff