https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104408
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Tamar Christina from comment #4) > (In reply to Richard Biener from comment #3) > > match.pd just does canonicalization here. SLP discovery could handle this > > in the existing swap operands or reassoc support but I guess the desire here > > is to pull out a Complex SLP pattern. > > Yes, though also to optimize the case where you don't have the optab, > currently the generated code is much worse at -Ofast. > > > > > So - no perfect idea yet how to reliably match a Complex pattern here but > > trying to attack this from the match.pd side sounds wrong. > > Well the problem is that the scalar code is suboptimal too. even without > matching a complex pattern, so the epilogue here does an extra sub on each > unrolled step. Well, the issue is then why the scalar code is not optimized (yes, it's not so easy). > So I initially figured we'd want to not perform the canonization if it's > coming at the expense of sharing. However that looks harder than I though at > first as there are multiple points in const-fold.c that will try and force > this form. Yep. The canonicalization likely happens early before we do CSE. > I can probably fix the epilogue post vectorization but that seemed like a > worse solution. Well, the CSE opportunity needs to be realized despite the canonialization.