On Fri, Aug 18, 2017 at 10:56:10PM +1200, Michael Clark wrote:
> 
> > On 18 Aug 2017, at 10:41 PM, Gabriel Paubert <paub...@iram.es> wrote:
> > 
> > On Fri, Aug 18, 2017 at 10:29:04AM +1200, Michael Clark wrote:
> >> Sorry I had to send again as my Apple mailer is munging emails. I’ve 
> >> disabled RTF.
> >> 
> >> 
> >> This one is quite interesting:
> >> 
> >> - https://cx.rv8.io/g/WXWMTG
> >> 
> >> It’s another target independent bug. x86 is using some LEA followed by SAR 
> >> trick with a 3 bit shift. Surely SHL 27, SAR 27 would suffice. In any case 
> >> RISC-V seems like a nice target to try to fix this codegen for, as its 
> >> less risk than attempting a fix in x86 ;-)
> >> 
> >> - https://github.com/riscv/riscv-gcc/issues/89
> >> 
> >> code:
> >> 
> >>    template <typename T, unsigned B>
> >>    inline T signextend(const T x)
> >>    {
> >>    struct {T x:B;} s;
> >>    return s.x = x;
> >>    }
> >> 
> >>    int sx5(int x) {
> >>            return signextend<signed int,5>(x);
> >>    }
> >> 
> >> riscv asm:
> >> 
> >>    sx5(int):
> >>      slliw a0,a0,3
> >>      slliw a0,a0,24
> >>      sraiw a0,a0,24
> >>      sraiw a0,a0,3
> >>      ret
> >> 
> >> hand coded riscv asm
> >> 
> >>    sx5(int):
> >>      slliw a0,a0,27
> >>      sraiw a0,a0,27
> >>      ret
> >> 
> >> x86 asm:
> >> 
> >>    sx5(int):
> >>      lea eax, [0+rdi*8]
> >>      sar al, 3
> >>      movsx eax, al
> >>      ret
> >> 
> >> hand coded x86 asm (no worse because the sar depends on the lea)
> >> 
> >>    sx5(int):
> >>      shl edi, 27
> >>      sar edi, 27
> >>      movsx eax, dl
> > 
> > Huh? dl is not a subreg of edi!
> > 
> > s/edi/edx/ and it may work.
> > 
> > dil can also be used, but only on 64 bit.
> 
> Sorry I meant dil on x86-64. I was sure that it was possible to extend into 
> another register. 

It is, but only from the first 4 registers on 32 bit: AL, BL, CL, and
DL can be accessed as 8 bit registers, as well as the high part of the
16 bit registers: AH, BH, CH and DH.

This is essentially a left-over of the 16 bit processors (8086 to
80286), carried over without change in the 32 bit processors.

64 bit extensions made registers more orthogonal in this respect, but
there is still some weirdness due to historical reasons.

> I have not done much i386 asm so I am unaware of the constraints. 

Fortunate you are, especially since you hopefully did not have to live
through the 16 bit nightmare of segments and near and far addresses.

>Can the source and dest registers for movsx not differ on i386? I thought they 
>could.

They can, but only from the first 4 registers. On the other hand
movsx eax,dh (in Intel's syntax) is perfectly valid.

> In any case, the plot thickens…
> 
> I believe we have bugs on both RISC-V and Aarch64.

Maybe, I don't know either arch well enough, but RISC-V looks intuitive
enough to understand that it is bad both in terms of performance and
code size.

Another arch you could try is Power.

> I found that it at least appears like it is transitioning to a char or short 
> as the break is at 24 and 16 depending on the width, and values over 16 work 
> as one would expect.
> 
> Here is an updated test program: https://cx.rv8.io/g/M9ewNf
> 
>       template <typename T, unsigned B>
>       inline T signextend(const T x)
>       {
>         struct {T x:B;} s;
>         return s.x = x;
>       }
> 
>       int sx3(int x) { return signextend<signed int,3>(x); }
>       int sx5(int x) { return signextend<signed int,5>(x); }
>       int sx11(int x) { return signextend<signed int,11>(x); }
>       int sx14(int x) { return signextend<signed int,14>(x); }
>       int sx19(int x) { return signextend<signed int,19>(x); }
> 
> I filed a bug on riscv-gcc but I think it is target independent code given 
> there appears to be an issue on Aarch64. AFAICT, Aarch64 should generate a 
> single sbfx for all of the test functions.
> 
> - https://github.com/riscv/riscv-gcc/issues/89
> 
> Should I file a bug on GCC bugzilla given it looks to be target independent?

I think so, but you should get confirmation from someone else.

> 
> On RISC-V, the codegen is much more obviously wrong, but essentially the same 
> thing is happening on Aarch64 but there is only one additional instruction 
> instead of two.
> 
>       sx3(int):
>         slliw a0,a0,5
>         slliw a0,a0,24
>         sraiw a0,a0,24
>         sraiw a0,a0,5
>         ret
>       sx5(int):
>         slliw a0,a0,3
>         slliw a0,a0,24
>         sraiw a0,a0,24
>         sraiw a0,a0,3
>         ret
>       sx11(int):
>         slliw a0,a0,5
>         slliw a0,a0,16
>         sraiw a0,a0,16
>         sraiw a0,a0,5
>         ret
>       sx14(int):
>         slliw a0,a0,2
>         slliw a0,a0,16
>         sraiw a0,a0,16
>         sraiw a0,a0,2
>         ret
>       sx19(int):
>         slliw a0,a0,13
>         sraiw a0,a0,13
>         ret
> 
> 

Reply via email to