Re: [i386] Scalar DImode instructions on XMM registers
Hi, I am adding Vladimir and Richard into CC. I tried to solve similar problem with FP math years ago by having -mfpmath=sse,i387. The idea was to allow use of i387 registers when SSE ones run out and possibly also model the fact that Pentium4 had faster i387 additions than SSE additions. I also had some plans to extend this one mixed SSE/MMX/GPR integer arithmetics, but never got to that. This did not really fly becuase of the regalloc not really being able to understnad it (I made path to regclass to propagate the classes and figure out what operations needs to stay in i387 and what in SSE to avoid reloading, but that never got in). I believe Vladimir did some work on this with IRA (he is able to spill GPR regs into SSE and do bit of other tricks). Also I believe it was kind of Richard's design deicsion to avoid use of (paradoxical) subregs for vector conversions because these have funny implications. The code for handling upper parts of paradoxical subregs is controlled by macros around SUBREG_PROMOTED_VAR_P but I do not think it will handle V1DI->V2DI conversions fluently without some middle-end hacking. (it will probably try to produce zero extensions) When we are on SSE instructions, it would be great to finally teach copy_by_pieces/store_by_pieces to use vector instructions (these are more compact and either equaly fast or faster on some CPUs). I hope to get into this, but it would be great if someone beat me. Honza > 2015-04-24 13:27 GMT+03:00 Marc Glisse : > > On Fri, 24 Apr 2015, Uros Bizjak wrote: > > > >> Please try to generate paradoxical subreg (V2DImode subreg of V1DImode > >> pseudo). IIRC, there is some functionality in the compiler that is > >> able to tell if the highpart of the paradoxical register is zeroed. > > > > > > Those are not currently legal (I tried to change that) > > https://gcc.gnu.org/ml/gcc-patches/2013-03/msg00745.html > > https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00769.html > > > > In this case, a subreg:V2DI of DImode should work. > > > > -- > > Marc Glisse > > Thank you for you tips! It seems to work, will try and see what it > gives us for i386. > > Thanks, > Ilya
C++ exception handling optimization performance
Hello all, With gcc, does the fact that some branch results in a C++ exception effect the performance of a function when that exception branch isn't entered? In other words, does the presence of a throw effect the optimizer in any way? -- David Sankel -- David Sankel Stellar Science Ltd Co - Stellar Scientific Software Solutions
Re: [i386] Scalar DImode instructions on XMM registers
2015-04-24 13:27 GMT+03:00 Marc Glisse : > On Fri, 24 Apr 2015, Uros Bizjak wrote: > >> Please try to generate paradoxical subreg (V2DImode subreg of V1DImode >> pseudo). IIRC, there is some functionality in the compiler that is >> able to tell if the highpart of the paradoxical register is zeroed. > > > Those are not currently legal (I tried to change that) > https://gcc.gnu.org/ml/gcc-patches/2013-03/msg00745.html > https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00769.html > > In this case, a subreg:V2DI of DImode should work. > > -- > Marc Glisse Thank you for you tips! It seems to work, will try and see what it gives us for i386. Thanks, Ilya
Re: [i386] Scalar DImode instructions on XMM registers
On Fri, 24 Apr 2015, Uros Bizjak wrote: Please try to generate paradoxical subreg (V2DImode subreg of V1DImode pseudo). IIRC, there is some functionality in the compiler that is able to tell if the highpart of the paradoxical register is zeroed. Those are not currently legal (I tried to change that) https://gcc.gnu.org/ml/gcc-patches/2013-03/msg00745.html https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00769.html In this case, a subreg:V2DI of DImode should work. -- Marc Glisse
Re: [i386] Scalar DImode instructions on XMM registers
On Fri, Apr 24, 2015 at 12:14 PM, Uros Bizjak wrote: > I was looking into PR65105 and tried to generate SSE computation for a > simple 64bit a + b + c sequence. Having no scalar integer instructions in > SSE I have to use vector variants. Is this approach really better that having two add/addc instructions? >>> >>> FYI, V1DI mode was introduced because XMM shift insn were used to >>> shift DImode values. The cost of moves from/to integer DImode reg pair >>> was disastrous. >>> >>> Uros. >> >> Does it mean I have to add V1DI instructions for all opcodes I want to >> transform (add,sub,mul,or,and, etc.)? > > No. > > Please try to generate paradoxical subreg (V2DImode subreg of V1DImode > pseudo). IIRC, there is some functionality in the compiler that is > able to tell if the highpart of the paradoxical register is zeroed. Probably you can even generate paradoxical V2DImode subreg of DImode. I'm not sure if in this case register allocator degenerates the mode of resulting hard register to DImode, it is worth a try. Uros.
Re: [i386] Scalar DImode instructions on XMM registers
On Fri, Apr 24, 2015 at 12:09 PM, Ilya Enkovich wrote: I was looking into PR65105 and tried to generate SSE computation for a simple 64bit a + b + c sequence. Having no scalar integer instructions in SSE I have to use vector variants. >>> >>> Is this approach really better that having two add/addc instructions? >> >> FYI, V1DI mode was introduced because XMM shift insn were used to >> shift DImode values. The cost of moves from/to integer DImode reg pair >> was disastrous. >> >> Uros. > > Does it mean I have to add V1DI instructions for all opcodes I want to > transform (add,sub,mul,or,and, etc.)? No. Please try to generate paradoxical subreg (V2DImode subreg of V1DImode pseudo). IIRC, there is some functionality in the compiler that is able to tell if the highpart of the paradoxical register is zeroed. Uros.
Re: [i386] Scalar DImode instructions on XMM registers
2015-04-24 12:49 GMT+03:00 Uros Bizjak : > On Fri, Apr 24, 2015 at 11:45 AM, Uros Bizjak wrote: >> On Fri, Apr 24, 2015 at 11:22 AM, Ilya Enkovich >> wrote: >> >>> I was looking into PR65105 and tried to generate SSE computation for a >>> simple 64bit a + b + c sequence. Having no scalar integer instructions in >>> SSE I have to use vector variants. >> >> Is this approach really better that having two add/addc instructions? > > FYI, V1DI mode was introduced because XMM shift insn were used to > shift DImode values. The cost of moves from/to integer DImode reg pair > was disastrous. > > Uros. Does it mean I have to add V1DI instructions for all opcodes I want to transform (add,sub,mul,or,and, etc.)? Ilya
Re: [i386] Scalar DImode instructions on XMM registers
2015-04-24 12:45 GMT+03:00 Uros Bizjak : > On Fri, Apr 24, 2015 at 11:22 AM, Ilya Enkovich > wrote: > >> I was looking into PR65105 and tried to generate SSE computation for a >> simple 64bit a + b + c sequence. Having no scalar integer instructions in >> SSE I have to use vector variants. > > Is this approach really better that having two add/addc instructions? We surely shouldn't apply this for each DI instruction and compute transformation costs. It is profitable if not many conversions required, it helps to relax GPR pressure, we expect it to be profitable for mul. Performance tests will show if this is useful. I want to make a small prototype and try it. Ilya > > Uros.
Re: [i386] Scalar DImode instructions on XMM registers
On Fri, Apr 24, 2015 at 11:45 AM, Uros Bizjak wrote: > On Fri, Apr 24, 2015 at 11:22 AM, Ilya Enkovich > wrote: > >> I was looking into PR65105 and tried to generate SSE computation for a >> simple 64bit a + b + c sequence. Having no scalar integer instructions in >> SSE I have to use vector variants. > > Is this approach really better that having two add/addc instructions? FYI, V1DI mode was introduced because XMM shift insn were used to shift DImode values. The cost of moves from/to integer DImode reg pair was disastrous. Uros.
Re: [i386] Scalar DImode instructions on XMM registers
On Fri, Apr 24, 2015 at 11:22 AM, Ilya Enkovich wrote: > I was looking into PR65105 and tried to generate SSE computation for a > simple 64bit a + b + c sequence. Having no scalar integer instructions in > SSE I have to use vector variants. Is this approach really better that having two add/addc instructions? Uros.
Fwd: [i386] Scalar DImode instructions on XMM registers
Hi, I was looking into PR65105 and tried to generate SSE computation for a simple 64bit a + b + c sequence. Having no scalar integer instructions in SSE I have to use vector variants. Original RTL: (insn 3 2 4 2 (set (reg/v:DI 91 [ b ]) (mem/c:DI (plus:SI (reg/f:SI 16 argp) (const_int 8 [0x8])) [1 b+0 S8 A32])) test.c:3 89 {*movdi_internal} (nil)) (insn 8 5 9 2 (parallel [ (set (reg:DI 94 [ D.1813 ]) (plus:DI (mem/c:DI (reg/f:SI 16 argp) [1 a+0 S8 A32]) (reg/v:DI 91 [ b ]))) (clobber (reg:CC 17 flags)) ]) test.c:4 215 {*adddi3_doubleword} (expr_list:REG_UNUSED (reg:CC 17 flags) (expr_list:REG_DEAD (reg/v:DI 91 [ b ]) (nil (insn 9 8 14 2 (parallel [ (set (reg:DI 93 [ D.1813 ]) (plus:DI (reg:DI 94 [ D.1813 ]) (mem/c:DI (plus:SI (reg/f:SI 16 argp) (const_int 16 [0x10])) [1 c+0 S8 A32]))) (clobber (reg:CC 17 flags)) ]) test.c:4 215 {*adddi3_doubleword} (expr_list:REG_UNUSED (reg:CC 17 flags) (expr_list:REG_DEAD (reg:DI 94 [ D.1813 ]) (nil Transformed RTL: (insn 3 2 4 2 (set (reg:V1DI 91) (mem/c:V1DI (plus:SI (reg/f:SI 16 argp) (const_int 8 [0x8])) [1 b+0 S8 A32])) test.c:3 1077 {*movv1di_internal} (nil)) (insn 17 5 8 2 (set (reg:V1DI 95) (mem/c:V1DI (reg/f:SI 16 argp) [1 a+0 S8 A32])) test.c:4 -1 (nil)) (insn 8 17 24 2 (set (reg:V2DI 94 [ D.1813 ]) (plus:V2DI (reg:V2DI 95) (reg/v:V2DI 91 [ b ]))) test.c:4 2949 {*addv2di3} (expr_list:REG_UNUSED (reg:CC 17 flags) (expr_list:REG_DEAD (reg/v:V2DI 91 [ b ]) (nil (insn 24 8 9 2 (set (reg:V1DI 100) (mem/c:V1DI (plus:SI (reg/f:SI 16 argp) (const_int 16 [0x10])) [1 c+0 S8 A32])) test.c:4 -1 (nil)) (insn 9 24 18 2 (set (reg:V2DI 93 [ D.1813 ]) (plus:V2DI (reg:V2DI 94 [ D.1813 ]) (reg:V2DI 100))) test.c:4 2949 {*addv2di3} (expr_list:REG_UNUSED (reg:CC 17 flags) (expr_list:REG_DEAD (reg:V2DI 94 [ D.1813 ]) (nil The problem is that all loads are removed as dead code during subreg pass: DCE: Deleting insn 24 deleting insn with uid = 24. DCE: Deleting insn 17 deleting insn with uid = 17. DCE: Deleting insn 3 deleting insn with uid = 3. Is there a way to handle it without adding fake addv1di instruction for MMX registers? Thanks, Ilya