> On 5 Sep 2017, at 9:35 AM, Michael Clark <michaeljcl...@mac.com> wrote: > >> >> On 19 Aug 2017, at 4:10 AM, Richard Henderson <r...@twiddle.net> wrote: >> >> On 08/17/2017 03:29 PM, Michael Clark wrote: >>> hand coded x86 asm (no worse because the sar depends on the lea) >>> >>> sx5(int): >>> shl edi, 27 >>> sar edi, 27 >>> movsx eax, dl >> >> Typo in the register, but I know what you mean. More interestingly, edi >> already has the sign-extended value, so "mov eax, edi" sufficies (saving one >> byte or perhaps allowing better register allocation). >> >> That said, if anyone is tweaking x86-64 patterns on this, consider >> >> sx5(int): >> rorx eax, edi, 5 >> sar eax, 27 >> >> where the (newish) bmi2 rorx instruction allows the data to be moved into >> place >> while rotating, avoiding the extra move entirely. > > The patterns might be hard to match unless we can get the gimple expansions > for bitfield access to use the mode of the enclosing type. > > For bitfield accesses to SI mode fields under 8 bits on RISC-V, gcc is > generating two shifts on QI mode sub-registers, each with a sign-extend. > > For bitfield accesses to SI mode fields under 16 bits on RISC-V, gcc is > generating two shifts on HI mode sub-registers, each with a sign-extend. > > The bitfield expansion logic is selecting the narrowest type that can hold > the bitwidth of the field, versus the bitwidth of the enclosing type and this > appears to be in the gimple to rtl transform, possibly in expr.c. > > Using the mode of the enclosing type for bitfield member access expansion > would likely solve this problem. I suspect that with the use of caches and > word sized register accesses, that this sort of change would be reasonable to > make for 32-bit and 64-bit targets. > > I also noticed something I didn’t spot earlier. On RV32 the sign extend and > shifts are correctly coalesced into 27 bit shifts in the combine stage. We > are presently looking into another redundant sign extension issue on RV64 > that could potentially be related. It could be that the shift coalescing > optimisation doesn’t happen unless the redundant sign extensions are > eliminated early in combine by simplify_rtx. i.e. the pattern is more complex > due to sign_extend ops that are not eliminated. > > - https://cx.rv8.io/g/2FxpNw > > RV64 and Aarch64 both appear to have the issue but with different expansions > for the shift and extend pattern due to the mode of access (QI or HI). Field > accesses above 16 bits create SI mode accesses and generate the expected > code. The RISC-V compiler has the #define SLOW_BYTE_ACCESS 1 patch however it > appears to make no difference in this case. SLOW_BYTE_ACCESS suppresses QI > mode and HI mode loads in some bitfield test cases when a struct is passed by > pointer but has no effect on this particular issue. This shows the codegen > for the fix to the SLOW_BYTE_ACCESS issue. i.e. proof that the compiler as > SLOW_BYTE_ACCESS defined to 1. > > - https://cx.rv8.io/g/TyXnoG > > A target independent fix that would solve the issue on ARM and RISC-V would > be to access bitfield members with the mode of the bitfield member's > enclosing type instead of the smallest mode that can hold the bitwidth of the > type. If we had a char or short member in the struct, I can understand the > use of QI and HI mode, as we would need narrorwer accesses due to alignment > issues, but in this case the member is in int and one would expect this to > expand to SI mode accesses if the enclosing type is SI mode.
It appears that 64-bit targets which promote to an SI mode subregister of DI mode register prevents combine from coalescing the shift and sign_extend. The only difference I can spot between RV32 which coalesces the shift during combine, and RV64 and other 64-bit targets which do not coalesce the shifts during combine, is that the 64-bit targets have promoted the shift operand to a SI mode subregister of a DI mode register, whereas on RV 32 the shift operand is simply an SI mode register. I suspect there is some code in simplify-rtx.c that is missing the sub register case however I’m still trying to isolate the code that coalesces shift and sign_extend. I’ve found a similar, perhaps smaller, but related case where a shift is not coalesced however in this case it is also not coalesced on RV32 https://cx.rv8.io/g/fPdk2F > riscv64: > > sx5(int): > slliw a0,a0,3 > slliw a0,a0,24 > sraiw a0,a0,24 > sraiw a0,a0,3 > ret > > sx17(int): > slliw a0,a0,15 > sraiw a0,a0,15 > ret > > riscv32: > > sx5(int): > slli a0,a0,27 > srai a0,a0,27 > ret > > sx17(int): > slli a0,a0,15 > srai a0,a0,15 > ret > > aarch64: > > sx5(int): > sbfiz w0, w0, 3, 5 > asr w0, w0, 3 > ret > > sx17(int): > sbfx w0, w0, 0, 17 > ret