https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114010

--- Comment #5 from Manolis Tsamis <manolis.tsamis at vrull dot eu> ---
Also, I further investigated the codegen difference in the second example (zip
+ umlal vs umull) and it looks to be some sort of RTL ordering + combine issue.

Specifically, when the we expand the RTL for the example there are some very
slight ordering differences where some non-dependent insns have swapped order.
On of these happens to precede a relevant vector statement and then in one case
combine does the umlal transformation but in the other not.

Afaik combine has some limits about the instruction window that it looks, so it
looks feasible that ordering differences in RTL can later transform into major
codegen differences in a number of ways. Other differences seem to come from
register allocation, as you mentioned.

This doesn't yet provide any useful insights whether the vectorization
improvements are accidental or not.

Reply via email to