Re: [i386] Scalar DImode instructions on XMM registers

2015-04-24 Thread Jan Hubicka
Hi,
I am adding Vladimir and Richard into CC. I tried to solve similar problem
with FP math years ago by having -mfpmath=sse,i387. The idea was to allow
use of i387 registers when SSE ones run out and possibly also model the fact
that Pentium4 had faster i387 additions than SSE additions. I also had some
plans to extend this one mixed SSE/MMX/GPR integer arithmetics, but never
got to that.

This did not really fly becuase of the regalloc not really being able to
understnad it (I made path to regclass to propagate the classes and figure out
what operations needs to stay in i387 and what in SSE to avoid reloading, but
that never got in).

I believe Vladimir did some work on this with IRA (he is able to spill GPR
regs into SSE and do bit of other tricks).

Also I believe it was kind of Richard's design deicsion to avoid use of
(paradoxical) subregs for vector conversions because these have funny
implications.

The code for handling upper parts of paradoxical subregs is controlled by
macros around SUBREG_PROMOTED_VAR_P but I do not think it will handle
V1DI->V2DI conversions fluently without some middle-end hacking. (it will
probably try to produce zero extensions)

When we are on SSE instructions, it would be great to finally teach
copy_by_pieces/store_by_pieces to use vector instructions (these are more
compact and either equaly fast or faster on some CPUs). I hope to get into
this, but it would be great if someone beat me.

Honza

> 2015-04-24 13:27 GMT+03:00 Marc Glisse :
> > On Fri, 24 Apr 2015, Uros Bizjak wrote:
> >
> >> Please try to generate paradoxical subreg (V2DImode subreg of V1DImode
> >> pseudo). IIRC, there is some functionality in the compiler that is
> >> able to tell if the highpart of the paradoxical register is zeroed.
> >
> >
> > Those are not currently legal (I tried to change that)
> > https://gcc.gnu.org/ml/gcc-patches/2013-03/msg00745.html
> > https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00769.html
> >
> > In this case, a subreg:V2DI of DImode should work.
> >
> > --
> > Marc Glisse
> 
> Thank you for you tips! It seems to work, will try and see what it
> gives us for i386.
> 
> Thanks,
> Ilya


C++ exception handling optimization performance

2015-04-24 Thread David Sankel

Hello all,

With gcc, does the fact that some branch results in a C++ exception 
effect the performance of a function when that exception branch isn't 
entered? In other words, does the presence of a throw effect the 
optimizer in any way?


-- David Sankel

--
David Sankel 
Stellar Science Ltd Co - Stellar Scientific Software Solutions




Re: [i386] Scalar DImode instructions on XMM registers

2015-04-24 Thread Ilya Enkovich
2015-04-24 13:27 GMT+03:00 Marc Glisse :
> On Fri, 24 Apr 2015, Uros Bizjak wrote:
>
>> Please try to generate paradoxical subreg (V2DImode subreg of V1DImode
>> pseudo). IIRC, there is some functionality in the compiler that is
>> able to tell if the highpart of the paradoxical register is zeroed.
>
>
> Those are not currently legal (I tried to change that)
> https://gcc.gnu.org/ml/gcc-patches/2013-03/msg00745.html
> https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00769.html
>
> In this case, a subreg:V2DI of DImode should work.
>
> --
> Marc Glisse

Thank you for you tips! It seems to work, will try and see what it
gives us for i386.

Thanks,
Ilya


Re: [i386] Scalar DImode instructions on XMM registers

2015-04-24 Thread Marc Glisse

On Fri, 24 Apr 2015, Uros Bizjak wrote:


Please try to generate paradoxical subreg (V2DImode subreg of V1DImode
pseudo). IIRC, there is some functionality in the compiler that is
able to tell if the highpart of the paradoxical register is zeroed.


Those are not currently legal (I tried to change that)
https://gcc.gnu.org/ml/gcc-patches/2013-03/msg00745.html
https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00769.html

In this case, a subreg:V2DI of DImode should work.

--
Marc Glisse


Re: [i386] Scalar DImode instructions on XMM registers

2015-04-24 Thread Uros Bizjak
On Fri, Apr 24, 2015 at 12:14 PM, Uros Bizjak  wrote:

> I was looking into PR65105 and tried to generate SSE computation for a
> simple 64bit  a + b + c sequence. Having no scalar integer instructions in
> SSE I have to use vector variants.

 Is this approach really better that having two add/addc instructions?
>>>
>>> FYI, V1DI mode was introduced because XMM shift insn were used to
>>> shift DImode values. The cost of moves from/to integer DImode reg pair
>>> was disastrous.
>>>
>>> Uros.
>>
>> Does it mean I have to add V1DI instructions for all opcodes I want to
>> transform (add,sub,mul,or,and, etc.)?
>
> No.
>
> Please try to generate paradoxical subreg (V2DImode subreg of V1DImode
> pseudo). IIRC, there is some functionality in the compiler that is
> able to tell if the highpart of the paradoxical register is zeroed.

Probably you can even generate paradoxical V2DImode subreg of DImode.
I'm not sure if in this case register allocator degenerates the mode
of resulting hard register to DImode, it is worth a try.

Uros.


Re: [i386] Scalar DImode instructions on XMM registers

2015-04-24 Thread Uros Bizjak
On Fri, Apr 24, 2015 at 12:09 PM, Ilya Enkovich  wrote:

 I was looking into PR65105 and tried to generate SSE computation for a
 simple 64bit  a + b + c sequence. Having no scalar integer instructions in
 SSE I have to use vector variants.
>>>
>>> Is this approach really better that having two add/addc instructions?
>>
>> FYI, V1DI mode was introduced because XMM shift insn were used to
>> shift DImode values. The cost of moves from/to integer DImode reg pair
>> was disastrous.
>>
>> Uros.
>
> Does it mean I have to add V1DI instructions for all opcodes I want to
> transform (add,sub,mul,or,and, etc.)?

No.

Please try to generate paradoxical subreg (V2DImode subreg of V1DImode
pseudo). IIRC, there is some functionality in the compiler that is
able to tell if the highpart of the paradoxical register is zeroed.

Uros.


Re: [i386] Scalar DImode instructions on XMM registers

2015-04-24 Thread Ilya Enkovich
2015-04-24 12:49 GMT+03:00 Uros Bizjak :
> On Fri, Apr 24, 2015 at 11:45 AM, Uros Bizjak  wrote:
>> On Fri, Apr 24, 2015 at 11:22 AM, Ilya Enkovich  
>> wrote:
>>
>>> I was looking into PR65105 and tried to generate SSE computation for a
>>> simple 64bit  a + b + c sequence. Having no scalar integer instructions in
>>> SSE I have to use vector variants.
>>
>> Is this approach really better that having two add/addc instructions?
>
> FYI, V1DI mode was introduced because XMM shift insn were used to
> shift DImode values. The cost of moves from/to integer DImode reg pair
> was disastrous.
>
> Uros.

Does it mean I have to add V1DI instructions for all opcodes I want to
transform (add,sub,mul,or,and, etc.)?

Ilya


Re: [i386] Scalar DImode instructions on XMM registers

2015-04-24 Thread Ilya Enkovich
2015-04-24 12:45 GMT+03:00 Uros Bizjak :
> On Fri, Apr 24, 2015 at 11:22 AM, Ilya Enkovich  
> wrote:
>
>> I was looking into PR65105 and tried to generate SSE computation for a
>> simple 64bit  a + b + c sequence. Having no scalar integer instructions in
>> SSE I have to use vector variants.
>
> Is this approach really better that having two add/addc instructions?

We surely shouldn't apply this for each DI instruction and compute
transformation costs. It is profitable if not many conversions
required, it helps to relax GPR pressure, we expect it to be
profitable for mul. Performance tests will show if this is useful. I
want to make a small prototype and try it.

Ilya

>
> Uros.


Re: [i386] Scalar DImode instructions on XMM registers

2015-04-24 Thread Uros Bizjak
On Fri, Apr 24, 2015 at 11:45 AM, Uros Bizjak  wrote:
> On Fri, Apr 24, 2015 at 11:22 AM, Ilya Enkovich  
> wrote:
>
>> I was looking into PR65105 and tried to generate SSE computation for a
>> simple 64bit  a + b + c sequence. Having no scalar integer instructions in
>> SSE I have to use vector variants.
>
> Is this approach really better that having two add/addc instructions?

FYI, V1DI mode was introduced because XMM shift insn were used to
shift DImode values. The cost of moves from/to integer DImode reg pair
was disastrous.

Uros.


Re: [i386] Scalar DImode instructions on XMM registers

2015-04-24 Thread Uros Bizjak
On Fri, Apr 24, 2015 at 11:22 AM, Ilya Enkovich  wrote:

> I was looking into PR65105 and tried to generate SSE computation for a
> simple 64bit  a + b + c sequence. Having no scalar integer instructions in
> SSE I have to use vector variants.

Is this approach really better that having two add/addc instructions?

Uros.


Fwd: [i386] Scalar DImode instructions on XMM registers

2015-04-24 Thread Ilya Enkovich
Hi,

I was looking into PR65105 and tried to generate SSE computation for a
simple 64bit  a + b + c sequence. Having no scalar integer
instructions in SSE I have to use vector variants.

Original RTL:

(insn 3 2 4 2 (set (reg/v:DI 91 [ b ])
(mem/c:DI (plus:SI (reg/f:SI 16 argp)
(const_int 8 [0x8])) [1 b+0 S8 A32])) test.c:3 89
{*movdi_internal}
 (nil))
(insn 8 5 9 2 (parallel [
(set (reg:DI 94 [ D.1813 ])
(plus:DI (mem/c:DI (reg/f:SI 16 argp) [1 a+0 S8 A32])
(reg/v:DI 91 [ b ])))
(clobber (reg:CC 17 flags))
]) test.c:4 215 {*adddi3_doubleword}
 (expr_list:REG_UNUSED (reg:CC 17 flags)
(expr_list:REG_DEAD (reg/v:DI 91 [ b ])
(nil
(insn 9 8 14 2 (parallel [
(set (reg:DI 93 [ D.1813 ])
(plus:DI (reg:DI 94 [ D.1813 ])
(mem/c:DI (plus:SI (reg/f:SI 16 argp)
(const_int 16 [0x10])) [1 c+0 S8 A32])))
(clobber (reg:CC 17 flags))
]) test.c:4 215 {*adddi3_doubleword}
 (expr_list:REG_UNUSED (reg:CC 17 flags)
(expr_list:REG_DEAD (reg:DI 94 [ D.1813 ])
(nil

Transformed RTL:

(insn 3 2 4 2 (set (reg:V1DI 91)
(mem/c:V1DI (plus:SI (reg/f:SI 16 argp)
(const_int 8 [0x8])) [1 b+0 S8 A32])) test.c:3 1077
{*movv1di_internal}
 (nil))
(insn 17 5 8 2 (set (reg:V1DI 95)
(mem/c:V1DI (reg/f:SI 16 argp) [1 a+0 S8 A32])) test.c:4 -1
 (nil))
(insn 8 17 24 2 (set (reg:V2DI 94 [ D.1813 ])
(plus:V2DI (reg:V2DI 95)
(reg/v:V2DI 91 [ b ]))) test.c:4 2949 {*addv2di3}
 (expr_list:REG_UNUSED (reg:CC 17 flags)
(expr_list:REG_DEAD (reg/v:V2DI 91 [ b ])
(nil
(insn 24 8 9 2 (set (reg:V1DI 100)
(mem/c:V1DI (plus:SI (reg/f:SI 16 argp)
(const_int 16 [0x10])) [1 c+0 S8 A32])) test.c:4 -1
 (nil))
(insn 9 24 18 2 (set (reg:V2DI 93 [ D.1813 ])
(plus:V2DI (reg:V2DI 94 [ D.1813 ])
(reg:V2DI 100))) test.c:4 2949 {*addv2di3}
 (expr_list:REG_UNUSED (reg:CC 17 flags)
(expr_list:REG_DEAD (reg:V2DI 94 [ D.1813 ])
(nil

The problem is that all loads are removed as dead code during subreg pass:

DCE: Deleting insn 24
deleting insn with uid = 24.
DCE: Deleting insn 17
deleting insn with uid = 17.
DCE: Deleting insn 3
deleting insn with uid = 3.


Is there a way to handle it without adding fake addv1di instruction
for MMX registers?

Thanks,
Ilya