https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116919
--- Comment #2 from Artemiy Volkov <artemiy at synopsys dot com> --- (In reply to Jeffrey A. Law from comment #1) > Confirmed. I see that you're looking at the crcu8 code. If you're looking > to optimize that function, you *really* want Mariam's code that detects CRC > loops and can generate a table lookup or clmul sequence (when zbc is > enabled). It'll optimize that function and the points where it gets inlined > to synthesize crcu16. Thank you, that's some outstanding work. I was able to install v1 of the series and it's safe to say that it beats this patch when it comes to optimizing that function. :) > > Your patch may still well be useful since I suspect this shows up elsewhere. > It's probably worth submitting upstream. > > I think the big question is whether or not forcing zero extension for that > case is going to hurt others where we rely on the sign extension of > constants to produce a constant that is more readily handled by li. Though > if the code is strictly handling cases where we have a sub-word op and we > want to widen it to a word sized op, then it may not be that big of a deal. Yeah, I think the scope here is slightly broader than just coremark; and yes, the idea is to strictly only handle widening to word size to eliminate the zext. Could you please help me understand how to construct a (RISC-V) counterexample that this approach could potentially affect negatively? > > The other way to approach this (and I'm not convinced it's actually better) > would be to have a define_insn_and_split pattern to match this: > > Trying 14 -> 15: > 14: r141:SI=r138:SI|0xffffffffffff8000 > REG_DEAD r138:SI > REG_EQUAL r138:SI|0xffffffffffff8000 > 15: r138:SI=zero_extend(r141:SI#0) > REG_DEAD r141:SI > Failed to match this instruction: > (set (reg/v:SI 138 [ crc ]) > (ior:SI (reg/v:SI 138 [ crc ]) > (const_int 32768 [0x8000]))) > > But I'm generally not a fan of using define_insn_and_split for this kind of > case (2 insn -> 2 insn split) as it mucks up the costing model in combine.cc. To add to your argument a bit against handling this in combine: will this also work when the loop is unrolled? To my best knowledge, if instead of 1 user insn 14 has 8 (as it would in case of crcu8()), then combine will have a much harder time converting the negative constant to the positive one. Besides, wouldn't the split pattern have to be added for many different arches?