> Do we have evidence that targets properly cost XOR vs SUB RTXen? > > It might actually be a reload optimization - when the constant is > available in a register use 'sub', when it needs to be reloaded > use 'xor'? > > That said, I wonder if the fallout of changing some SUB to XOR > is bigger than the benefit when we do it early (missed combines, etc.)?
Regarding fallout I did a bootstrap and regtest for various backends now. No change on Power9, s390x and aarch64. On x86 there is one additional FAIL in pr78103-3.c: unsigned long long bar (unsigned int x) { return __CHAR_BIT__ * sizeof (unsigned int) - 1 - __builtin_clz (x); } is supposed to become bsrl %edi, %eax ret but now is bsrl %edi, %eax xorl $31, %eax xorq $31, %rax ret The x86 backend has various splitters catching and simplifying something like (xor (minus (const_int 63) (clz (match_operand))) (const_int 63)) to (bsr ...). >From a quick glance, there are several combinations of 31, 63, xor, clz which would need to be duplicated(?) to match against the changed patterns. Perhaps xor is always cheaper on x86 and a simple change from (minus (const_int 63) (...)) to (xor (const_int 63) (...)) would be sufficient but this would still need to be reviewed separately. Needing to keep both patterns (as neither minus nor xor can be considered "more canonical" than the other) seems like an annoyance. Regards Robin