https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116919

--- Comment #2 from Artemiy Volkov <artemiy at synopsys dot com> ---
(In reply to Jeffrey A. Law from comment #1)
> Confirmed.  I see that you're looking at the crcu8 code.  If you're looking
> to optimize that function, you *really* want Mariam's code that detects CRC
> loops and can generate a table lookup or clmul sequence (when zbc is
> enabled).  It'll optimize that function and the points where it gets inlined
> to synthesize crcu16.

Thank you, that's some outstanding work. I was able to install v1 of the series
and it's safe to say that it beats this patch when it comes to optimizing that
function. :)

> 
> Your patch may still well be useful since I suspect this shows up elsewhere.
> It's probably worth submitting upstream.
> 
> I think the big question is whether or not forcing zero extension for that
> case is going to hurt others where we rely on the sign extension of
> constants to produce a constant that is more readily handled by li.  Though
> if the code is strictly handling cases where we have a sub-word op and we
> want to widen it to a word sized op, then it may not be that big of a deal.

Yeah, I think the scope here is slightly broader than just coremark; and yes,
the idea is to strictly only handle widening to word size to eliminate the
zext. Could you please help me understand how to construct a (RISC-V)
counterexample that this approach could potentially affect negatively?

> 
> The other way to approach this (and I'm not convinced it's actually better)
> would be to have a define_insn_and_split pattern to match this:
> 
> Trying 14 -> 15: 
>    14: r141:SI=r138:SI|0xffffffffffff8000
>       REG_DEAD r138:SI
>       REG_EQUAL r138:SI|0xffffffffffff8000
>    15: r138:SI=zero_extend(r141:SI#0)
>       REG_DEAD r141:SI
> Failed to match this instruction:
> (set (reg/v:SI 138 [ crc ])
>     (ior:SI (reg/v:SI 138 [ crc ]) 
>         (const_int 32768 [0x8000])))
> 
> But I'm generally not a fan of using define_insn_and_split for this kind of
> case (2 insn -> 2 insn split) as it mucks up the costing model in combine.cc.

To add to your argument a bit against handling this in combine: will this also
work when the loop is unrolled? To my best knowledge, if instead of 1 user insn
14 has 8 (as it would in case of crcu8()), then combine will have a much harder
time converting the negative constant to the positive one. Besides, wouldn't
the split pattern have to be added for many different arches?

Reply via email to