> On 5 Sep 2017, at 9:35 AM, Michael Clark <michaeljcl...@mac.com> wrote:
> 
>> 
>> On 19 Aug 2017, at 4:10 AM, Richard Henderson <r...@twiddle.net> wrote:
>> 
>> On 08/17/2017 03:29 PM, Michael Clark wrote:
>>> hand coded x86 asm (no worse because the sar depends on the lea)
>>> 
>>>     sx5(int):
>>>       shl edi, 27
>>>       sar edi, 27
>>>       movsx eax, dl
>> 
>> Typo in the register, but I know what you mean.  More interestingly, edi
>> already has the sign-extended value, so "mov eax, edi" sufficies (saving one
>> byte or perhaps allowing better register allocation).
>> 
>> That said, if anyone is tweaking x86-64 patterns on this, consider
>> 
>> sx5(int):
>>      rorx    eax, edi, 5
>>      sar     eax, 27
>> 
>> where the (newish) bmi2 rorx instruction allows the data to be moved into 
>> place
>> while rotating, avoiding the extra move entirely.
> 
> The patterns might be hard to match unless we can get the gimple expansions 
> for bitfield access to use the mode of the enclosing type.
> 
> For bitfield accesses to SI mode fields under 8 bits on RISC-V, gcc is 
> generating two shifts on QI mode sub-registers, each with a sign-extend.
> 
> For bitfield accesses to SI mode fields under 16 bits on RISC-V, gcc is 
> generating two shifts on HI mode sub-registers, each with a sign-extend.
> 
> The bitfield expansion logic is selecting the narrowest type that can hold 
> the bitwidth of the field, versus the bitwidth of the enclosing type and this 
> appears to be in the gimple to rtl transform, possibly in expr.c.
> 
> Using the mode of the enclosing type for bitfield member access expansion 
> would likely solve this problem. I suspect that with the use of caches and 
> word sized register accesses, that this sort of change would be reasonable to 
> make for 32-bit and 64-bit targets.
> 
> I also noticed something I didn’t spot earlier. On RV32 the sign extend and 
> shifts are correctly coalesced into 27 bit shifts in the combine stage. We 
> are presently looking into another redundant sign extension issue on RV64 
> that could potentially be related. It could be that the shift coalescing 
> optimisation doesn’t happen unless the redundant sign extensions are 
> eliminated early in combine by simplify_rtx. i.e. the pattern is more complex 
> due to sign_extend ops that are not eliminated.
> 
> - https://cx.rv8.io/g/2FxpNw
> 
> RV64 and Aarch64 both appear to have the issue but with different expansions 
> for the shift and extend pattern due to the mode of access (QI or HI). Field 
> accesses above 16 bits create SI mode accesses and generate the expected 
> code. The RISC-V compiler has the #define SLOW_BYTE_ACCESS 1 patch however it 
> appears to make no difference in this case. SLOW_BYTE_ACCESS suppresses QI 
> mode and HI mode loads in some bitfield test cases when a struct is passed by 
> pointer but has no effect on this particular issue. This shows the codegen 
> for the fix to the SLOW_BYTE_ACCESS issue. i.e. proof that the compiler as 
> SLOW_BYTE_ACCESS defined to 1.
> 
> - https://cx.rv8.io/g/TyXnoG
> 
> A target independent fix that would solve the issue on ARM and RISC-V would 
> be to access bitfield members with the mode of the bitfield member's 
> enclosing type instead of the smallest mode that can hold the bitwidth of the 
> type. If we had a char or short member in the struct, I can understand the 
> use of QI and HI mode, as we would need narrorwer accesses due to alignment 
> issues, but in this case the member is in int and one would expect this to 
> expand to SI mode accesses if the enclosing type is SI mode.

It appears that 64-bit targets which promote to an SI mode subregister of DI 
mode register prevents combine from coalescing the shift and sign_extend. The 
only difference I can spot between RV32 which coalesces the shift during 
combine, and RV64 and other 64-bit targets which do not coalesce the shifts 
during combine, is that the 64-bit targets have promoted the shift operand to a 
SI mode subregister of a DI mode register, whereas on RV 32 the shift operand 
is simply an SI mode register. I suspect there is some code in simplify-rtx.c 
that is missing the sub register case however I’m still trying to isolate the 
code that coalesces shift and sign_extend.

I’ve found a similar, perhaps smaller, but related case where a shift is not 
coalesced however in this case it is also not coalesced on RV32

        https://cx.rv8.io/g/fPdk2F

> riscv64:
> 
>       sx5(int):
>         slliw a0,a0,3
>         slliw a0,a0,24
>         sraiw a0,a0,24
>         sraiw a0,a0,3
>         ret
> 
>       sx17(int):
>         slliw a0,a0,15
>         sraiw a0,a0,15
>         ret
> 
> riscv32:
> 
>       sx5(int):
>         slli a0,a0,27
>         srai a0,a0,27
>         ret
> 
>       sx17(int):
>         slli a0,a0,15
>         srai a0,a0,15
>         ret
> 
> aarch64:
> 
>       sx5(int):
>         sbfiz w0, w0, 3, 5
>         asr w0, w0, 3
>         ret
> 
>       sx17(int):
>         sbfx w0, w0, 0, 17
>         ret

Reply via email to