https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93177
--- Comment #7 from Segher Boessenkool <segher at gcc dot gnu.org> --- (In reply to Andrew Pinski from comment #5) > >the cntlz ones are not, for example > > :) It has been a long time since I touched this but I would not doubt I > messed up this too. It's nastiness in the generic builtins. builtin_clz(0) is undefined, even if it *is* defined for the machine pattern. This is so that code using the builtin can be portable. Unfortunately there is no good way (or I don't know it, anyway) to do something like int f(int x) { return x ? __builtin_clz(x) : 32; } so that it compiles to just a cntlzw insn (instead, it currently does a branch and stuff :-( ). > __mulh* intrinsics are better implemented these days using either 64bit or > 128bit multiples. Yup. > __l[hwd]brx/__st[hwd]brx intrinsics are better implemented as > __builtin_bswap* followed by load/stored these days (the bswap builtins did > not exist back then or optimized) Yup. > Many of the other intrinsics should be implemented as non inline-asm too, > even fma, should be done using __builtin_fma :). Yup :-) GCC has come a long way, since Cell :-) You can reliably write many things just as high-level C code now, and trust that well-optimised machine code falls out.