> On 5 Sep 2017, at 9:35 AM, Michael Clark <[email protected]> wrote:
>
>>
>> On 19 Aug 2017, at 4:10 AM, Richard Henderson <[email protected]> wrote:
>>
>> On 08/17/2017 03:29 PM, Michael Clark wrote:
>>> hand coded x86 asm (no worse because the sar depends on the lea)
>>>
>>> sx5(int):
>>> shl edi, 27
>>> sar edi, 27
>>> movsx eax, dl
>>
>> Typo in the register, but I know what you mean. More interestingly, edi
>> already has the sign-extended value, so "mov eax, edi" sufficies (saving one
>> byte or perhaps allowing better register allocation).
>>
>> That said, if anyone is tweaking x86-64 patterns on this, consider
>>
>> sx5(int):
>> rorx eax, edi, 5
>> sar eax, 27
>>
>> where the (newish) bmi2 rorx instruction allows the data to be moved into
>> place
>> while rotating, avoiding the extra move entirely.
>
> The patterns might be hard to match unless we can get the gimple expansions
> for bitfield access to use the mode of the enclosing type.
>
> For bitfield accesses to SI mode fields under 8 bits on RISC-V, gcc is
> generating two shifts on QI mode sub-registers, each with a sign-extend.
>
> For bitfield accesses to SI mode fields under 16 bits on RISC-V, gcc is
> generating two shifts on HI mode sub-registers, each with a sign-extend.
>
> The bitfield expansion logic is selecting the narrowest type that can hold
> the bitwidth of the field, versus the bitwidth of the enclosing type and this
> appears to be in the gimple to rtl transform, possibly in expr.c.
>
> Using the mode of the enclosing type for bitfield member access expansion
> would likely solve this problem. I suspect that with the use of caches and
> word sized register accesses, that this sort of change would be reasonable to
> make for 32-bit and 64-bit targets.
>
> I also noticed something I didn’t spot earlier. On RV32 the sign extend and
> shifts are correctly coalesced into 27 bit shifts in the combine stage. We
> are presently looking into another redundant sign extension issue on RV64
> that could potentially be related. It could be that the shift coalescing
> optimisation doesn’t happen unless the redundant sign extensions are
> eliminated early in combine by simplify_rtx. i.e. the pattern is more complex
> due to sign_extend ops that are not eliminated.
>
> - https://cx.rv8.io/g/2FxpNw
>
> RV64 and Aarch64 both appear to have the issue but with different expansions
> for the shift and extend pattern due to the mode of access (QI or HI). Field
> accesses above 16 bits create SI mode accesses and generate the expected
> code. The RISC-V compiler has the #define SLOW_BYTE_ACCESS 1 patch however it
> appears to make no difference in this case. SLOW_BYTE_ACCESS suppresses QI
> mode and HI mode loads in some bitfield test cases when a struct is passed by
> pointer but has no effect on this particular issue. This shows the codegen
> for the fix to the SLOW_BYTE_ACCESS issue. i.e. proof that the compiler as
> SLOW_BYTE_ACCESS defined to 1.
>
> - https://cx.rv8.io/g/TyXnoG
>
> A target independent fix that would solve the issue on ARM and RISC-V would
> be to access bitfield members with the mode of the bitfield member's
> enclosing type instead of the smallest mode that can hold the bitwidth of the
> type. If we had a char or short member in the struct, I can understand the
> use of QI and HI mode, as we would need narrorwer accesses due to alignment
> issues, but in this case the member is in int and one would expect this to
> expand to SI mode accesses if the enclosing type is SI mode.
It appears that 64-bit targets which promote to an SI mode subregister of DI
mode register prevents combine from coalescing the shift and sign_extend. The
only difference I can spot between RV32 which coalesces the shift during
combine, and RV64 and other 64-bit targets which do not coalesce the shifts
during combine, is that the 64-bit targets have promoted the shift operand to a
SI mode subregister of a DI mode register, whereas on RV 32 the shift operand
is simply an SI mode register. I suspect there is some code in simplify-rtx.c
that is missing the sub register case however I’m still trying to isolate the
code that coalesces shift and sign_extend.
I’ve found a similar, perhaps smaller, but related case where a shift is not
coalesced however in this case it is also not coalesced on RV32
https://cx.rv8.io/g/fPdk2F
> riscv64:
>
> sx5(int):
> slliw a0,a0,3
> slliw a0,a0,24
> sraiw a0,a0,24
> sraiw a0,a0,3
> ret
>
> sx17(int):
> slliw a0,a0,15
> sraiw a0,a0,15
> ret
>
> riscv32:
>
> sx5(int):
> slli a0,a0,27
> srai a0,a0,27
> ret
>
> sx17(int):
> slli a0,a0,15
> srai a0,a0,15
> ret
>
> aarch64:
>
> sx5(int):
> sbfiz w0, w0, 3, 5
> asr w0, w0, 3
> ret
>
> sx17(int):
> sbfx w0, w0, 0, 17
> ret