https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594

--- Comment #5 from Roger Sayle <roger at nextmovesoftware dot com> ---
Hi Tamar,
I think this is where I need to apologize.  Combine is now canonicalizing these
equivalent RTL expressions to the zero_extend form, on the assumption that zero
extension has no data dependency and is cheaper or at worst the same speed on
many targets.  Unfortunately, for aarch64 there are patterns (splitters or
peephole2s) for optimizing the sign_extend version that don't exist for the
zero_extend version [even though the instruction set is symmetric and should
handle both sxtw/uxtw].  Technically, these were just missed-optimizations
before,  but I'm guessing my changes (to both trees and RTL) lead to changes in
the form that the backend encounters, and leads to a code quality regression.

This should be easy to fix, I just need to get up to speed on the instructions
that aarch64 supports, and which zero extended forms are currently missing.
I'm sure if GCC instead canonicalized to the sign_extend form, that other
targets would show similar asymmetries (it's only when things change that
anyone notices the difference).  I'll see if I can come up with a fix over the
weekend.

Reply via email to