https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99434

--- Comment #2 from cqwrteur <unlvsur at live dot com> ---
(In reply to Andrew Pinski from comment #1)
> This is just a register allocation issue dealing with mulx and TImode.
> 
> If mulq was used instead (that is without -march=native), all of the
> functions are done correctly.

I do not think so. I think GCC generally did things like this wrong. I have
even found out how to produce different wrong results deterministically.

For example like this
https://godbolt.org/z/PbobYG

Any time it deals with things like >>32 or >>64, it produces a slower result.
This even compiles without -march=native.

While clang generates exactly the same assembly which means my result is
correct. GCC does things for this wrong.

It looks like we need more optimizations on trees for these patterns.

Reply via email to