On Wed, Nov 11, 2020 at 11:17:32AM +0100, Philipp Tomsich wrote:
> From: Philipp Tomsich <p...@gnu.org>
> 
> The function
>     long f(long a)
>     {
>       return(a & 0xFFFFFFFFull) << 3;
>     }
> is folded into
>     _1 = a_2(D) << 3;
>     _3 = _1 & 34359738360;
> wheras the construction
>     return (a & 0xFFFFFFFFull) * 8;
> results in
>     _1 = a_2(D) & 4294967295;
>     _3 = _1 * 8;
> 
> This leads to suboptimal code-generation for RISC-V (march=rv64g), as
> the shifted constant needs to be expanded into 3 RTX and 2 RTX (one
> each for the LSHIFT_EXPR and the BIT_AND_EXPR) which will overwhelm
> the combine pass (a sequence of 5 RTX are not considered):
>       li      a5,1            # tmp78,        # 23    [c=4 l=4]  
> *movdi_64bit/1
>       slli    a5,a5,35        #, tmp79, tmp78 # 24    [c=4 l=4]  ashldi3
>       addi    a5,a5,-8        #, tmp77, tmp79 # 9     [c=4 l=4]  adddi3/1
>       slli    a0,a0,3         #, tmp76, tmp80 # 6     [c=4 l=4]  ashldi3
>       and     a0,a0,a5        # tmp77,, tmp76 # 15    [c=4 l=4]  anddi3/0
>       ret                     # 28    [c=0 l=4]  simple_return
> instead of:
>       slli    a0,a0,32        #, tmp76, tmp79 # 26    [c=4 l=4]  ashldi3
>       srli    a0,a0,29        #,, tmp76       # 27    [c=4 l=4]  lshrdi3
>       ret                                     # 24    [c=0 l=4]  simple_return
> 
> We address this by adding a simplification for
>    (a << s) & M, where ((M >> s) << s) == M
> to
>    (a & M_unshifted) << s, where M_unshifted := (M >> s)
> which undistributes the LSHIFT.

This is problematic, we have another rule that goes against this:
/* Fold (X {&,^,|} C2) << C1 into (X << C1) {&,^,|} (C2 << C1)
   (X {&,^,|} C2) >> C1 into (X >> C1) & (C2 >> C1).  */
(for shift (lshift rshift)
 (for bit_op (bit_and bit_xor bit_ior)
  (simplify
   (shift (convert?:s (bit_op:s @0 INTEGER_CST@2)) INTEGER_CST@1)
   (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
    (with { tree mask = int_const_binop (shift, fold_convert (type, @2), @1); }
     (bit_op (shift (convert @0) @1) { mask; }))))))
and we don't want the two rules to keep fighting against each other.
It is better to have one form as canonical and only right before expansion
(isel pass) or during expansion decide e.g. based on target costs
whether that (X << C1) & (C2 << C1) is better expanded like that,
or as (X & C2) << C1, or as (X << C3) >> C4.

        Jakub

Reply via email to