> On 19 Aug 2017, at 4:10 AM, Richard Henderson <r...@twiddle.net> wrote: > > On 08/17/2017 03:29 PM, Michael Clark wrote: >> hand coded x86 asm (no worse because the sar depends on the lea) >> >> sx5(int): >> shl edi, 27 >> sar edi, 27 >> movsx eax, dl > > Typo in the register, but I know what you mean. More interestingly, edi > already has the sign-extended value, so "mov eax, edi" sufficies (saving one > byte or perhaps allowing better register allocation). > > That said, if anyone is tweaking x86-64 patterns on this, consider > > sx5(int): > rorx eax, edi, 5 > sar eax, 27 > > where the (newish) bmi2 rorx instruction allows the data to be moved into > place > while rotating, avoiding the extra move entirely.
The patterns might be hard to match unless we can get the gimple expansions for bitfield access to use the mode of the enclosing type. For bitfield accesses to SI mode fields under 8 bits on RISC-V, gcc is generating two shifts on QI mode sub-registers, each with a sign-extend. For bitfield accesses to SI mode fields under 16 bits on RISC-V, gcc is generating two shifts on HI mode sub-registers, each with a sign-extend. The bitfield expansion logic is selecting the narrowest type that can hold the bitwidth of the field, versus the bitwidth of the enclosing type and this appears to be in the gimple to rtl transform, possibly in expr.c. Using the mode of the enclosing type for bitfield member access expansion would likely solve this problem. I suspect that with the use of caches and word sized register accesses, that this sort of change would be reasonable to make for 32-bit and 64-bit targets. I also noticed something I didn’t spot earlier. On RV32 the sign extend and shifts are correctly coalesced into 27 bit shifts in the combine stage. We are presently looking into another redundant sign extension issue on RV64 that could potentially be related. It could be that the shift coalescing optimisation doesn’t happen unless the redundant sign extensions are eliminated early in combine by simplify_rtx. i.e. the pattern is more complex due to sign_extend ops that are not eliminated. - https://cx.rv8.io/g/2FxpNw RV64 and Aarch64 both appear to have the issue but with different expansions for the shift and extend pattern due to the mode of access (QI or HI). Field accesses above 16 bits create SI mode accesses and generate the expected code. The RISC-V compiler has the #define SLOW_BYTE_ACCESS 1 patch however it appears to make no difference in this case. SLOW_BYTE_ACCESS suppresses QI mode and HI mode loads in some bitfield test cases when a struct is passed by pointer but has no effect on this particular issue. This shows the codegen for the fix to the SLOW_BYTE_ACCESS issue. i.e. proof that the compiler as SLOW_BYTE_ACCESS defined to 1. - https://cx.rv8.io/g/TyXnoG A target independent fix that would solve the issue on ARM and RISC-V would be to access bitfield members with the mode of the bitfield member's enclosing type instead of the smallest mode that can hold the bitwidth of the type. If we had a char or short member in the struct, I can understand the use of QI and HI mode, as we would need narrorwer accesses due to alignment issues, but in this case the member is in int and one would expect this to expand to SI mode accesses if the enclosing type is SI mode. riscv64: sx5(int): slliw a0,a0,3 slliw a0,a0,24 sraiw a0,a0,24 sraiw a0,a0,3 ret sx17(int): slliw a0,a0,15 sraiw a0,a0,15 ret riscv32: sx5(int): slli a0,a0,27 srai a0,a0,27 ret sx17(int): slli a0,a0,15 srai a0,a0,15 ret aarch64: sx5(int): sbfiz w0, w0, 3, 5 asr w0, w0, 3 ret sx17(int): sbfx w0, w0, 0, 17 ret