RE: [PATCH] loading float member of parameter stored via int registers

2023-01-03 Thread Hu, Lin1 via Gcc-patches
Sorry for send this mail. I enter the wrong command line.

-Original Message-
From: Gcc-patches  On Behalf 
Of Segher Boessenkool
Sent: Tuesday, January 3, 2023 5:00 PM
To: Andrew Pinski 
Cc: Jiufu Guo ; Jiufu Guo via Gcc-patches 
; Richard Biener ; Richard 
Biener ; dje@gmail.com; li...@gcc.gnu.org; 
jeffreya...@gmail.com
Subject: Re: [PATCH] loading float member of parameter stored via int registers

Hi!

On Fri, Dec 30, 2022 at 12:30:04AM -0800, Andrew Pinski wrote:
> On Thu, Dec 29, 2022 at 11:45 PM Segher Boessenkool 
>  wrote:
> > Ah!  This simply shows rs6000_modes_tieable_p is decidedly non-optimal:
> > it does not allow tying a scalar float to anything else.  No such 
> > thing is required, or good apparently.  I wonder why we have such 
> > restrictions at all in rs6000; is it just unfortunate history, was 
> > it good at one point in time?
> 
> The documentation for TARGET_MODES_TIEABLE_P says the following:
> If TARGET_HARD_REGNO_MODE_OK (r, mode1) and TARGET_HARD_REGNO_MODE_OK 
> (r, mode2) are always the same for any r, then TARGET_MODES_TIEABLE_P 
> (mode1, mode2) should be true. If they differ for any r, you should 
> define this hook to return false unless some other mechanism ensures 
> the accessibility of the value in a narrower mode.
> 
> even though rs6000_hard_regno_mode_ok_uncached's comment has the following:
>   /* The float registers (except for VSX vector modes) can only hold floating
>  modes and DImode.  */

That comment is incorrect.  See fctiw for example, which defines only the 
SImode part of the result (the other bits are undefined).

> TARGET_P8_VECTOR and TARGET_P9_VECTOR has special cased different modes now:
>   if (TARGET_P8_VECTOR && (mode == SImode))
> return 1;
> 
>   if (TARGET_P9_VECTOR && (mode == QImode || mode == HImode))
> return 1;
> Which I suspect that means rs6000_modes_tieable_p should return true 
> for SImode and SFmode if TARGET_P8_VECTOR is true. Likewise for 
> TARGET_P9_VECTOR and SFmode and QImode/HImode too.

It means that older CPUs do not have as many instructions to do scalar integer 
operations in vector registers, making it (almost) always a losing proposition 
to put scalar integers there.  On newer CPUs it is not quite as bad, there is a 
full(er) complement of instructions to do such things in vector regs, just a 
bit slower than on GPRs.

But yeah we might need to fix hard_regno_mode_ok if we change tieable.


Segher


Re: [PATCH] loading float member of parameter stored via int registers

2023-01-03 Thread Segher Boessenkool
Hi!

On Fri, Dec 30, 2022 at 12:30:04AM -0800, Andrew Pinski wrote:
> On Thu, Dec 29, 2022 at 11:45 PM Segher Boessenkool
>  wrote:
> > Ah!  This simply shows rs6000_modes_tieable_p is decidedly non-optimal:
> > it does not allow tying a scalar float to anything else.  No such thing
> > is required, or good apparently.  I wonder why we have such restrictions
> > at all in rs6000; is it just unfortunate history, was it good at one
> > point in time?
> 
> The documentation for TARGET_MODES_TIEABLE_P says the following:
> If TARGET_HARD_REGNO_MODE_OK (r, mode1) and TARGET_HARD_REGNO_MODE_OK
> (r, mode2) are always the same for any r, then TARGET_MODES_TIEABLE_P
> (mode1, mode2) should be true. If they differ for any r, you should
> define this hook to return false unless some other mechanism ensures
> the accessibility of the value in a narrower mode.
> 
> even though rs6000_hard_regno_mode_ok_uncached's comment has the following:
>   /* The float registers (except for VSX vector modes) can only hold floating
>  modes and DImode.  */

That comment is incorrect.  See fctiw for example, which defines only
the SImode part of the result (the other bits are undefined).

> TARGET_P8_VECTOR and TARGET_P9_VECTOR has special cased different modes now:
>   if (TARGET_P8_VECTOR && (mode == SImode))
> return 1;
> 
>   if (TARGET_P9_VECTOR && (mode == QImode || mode == HImode))
> return 1;
> Which I suspect that means rs6000_modes_tieable_p should return true
> for SImode and SFmode if TARGET_P8_VECTOR is true. Likewise for
> TARGET_P9_VECTOR and SFmode and QImode/HImode too.

It means that older CPUs do not have as many instructions to do scalar
integer operations in vector registers, making it (almost) always a
losing proposition to put scalar integers there.  On newer CPUs it is
not quite as bad, there is a full(er) complement of instructions to do
such things in vector regs, just a bit slower than on GPRs.

But yeah we might need to fix hard_regno_mode_ok if we change tieable.


Segher


Re: [PATCH] loading float member of parameter stored via int registers

2023-01-02 Thread Jiufu Guo via Gcc-patches


Hi,

Andrew Pinski  writes:

> On Thu, Dec 29, 2022 at 11:45 PM Segher Boessenkool
>  wrote:
>>
>> Hi!
>>
>> On Fri, Dec 30, 2022 at 10:22:31AM +0800, Jiufu Guo wrote:
>> > Considering the limitations of CSE, I try to find other places
>> > to handle this issue, and notice DSE can optimize below code:
>> > "[sfp:DI]=x:DI ; y:SI=[sfp:DI]" to "y:SI=x:DI#0".
>> >
>> > So, I drafted a patch to update DSE to handle DI->DF/SF.
>> > The patch updates "extract_low_bits" to get mode change
>> > with subreg.
>> >
>> > diff --git a/gcc/expmed.cc b/gcc/expmed.cc
>> > index b12b0e000c2..5e36331082c 100644
>> > --- a/gcc/expmed.cc
>> > +++ b/gcc/expmed.cc
>> > @@ -2439,7 +2439,10 @@ extract_low_bits (machine_mode mode, machine_mode 
>> > src_mode, rtx src)
>> >
>> >if (!targetm.modes_tieable_p (src_int_mode, src_mode))
>> >  return NULL_RTX;
>> > -  if (!targetm.modes_tieable_p (int_mode, mode))
>> > +  if (!targetm.modes_tieable_p (int_mode, mode)
>> > +  && !(known_le (GET_MODE_BITSIZE (mode), GET_MODE_BITSIZE (src_mode))
>> > +&& GET_MODE_CLASS (mode) == MODE_FLOAT
>> > +&& GET_MODE_CLASS (src_mode) == MODE_INT))
>> >  return NULL_RTX;
>> >
>> >src = gen_lowpart (src_int_mode, src);
>>
>> Ah!  This simply shows rs6000_modes_tieable_p is decidedly non-optimal:
>> it does not allow tying a scalar float to anything else.  No such thing
>> is required, or good apparently.  I wonder why we have such restrictions
>> at all in rs6000; is it just unfortunate history, was it good at one
>> point in time?
>
> The documentation for TARGET_MODES_TIEABLE_P says the following:
> If TARGET_HARD_REGNO_MODE_OK (r, mode1) and TARGET_HARD_REGNO_MODE_OK
> (r, mode2) are always the same for any r, then TARGET_MODES_TIEABLE_P
> (mode1, mode2) should be true. If they differ for any r, you should
> define this hook to return false unless some other mechanism ensures
> the accessibility of the value in a narrower mode.
>
> even though rs6000_hard_regno_mode_ok_uncached's comment has the following:
>   /* The float registers (except for VSX vector modes) can only hold floating
>  modes and DImode.  */
>
> TARGET_P8_VECTOR and TARGET_P9_VECTOR has special cased different modes now:
>   if (TARGET_P8_VECTOR && (mode == SImode))
> return 1;
>
>   if (TARGET_P9_VECTOR && (mode == QImode || mode == HImode))
> return 1;
> Which I suspect that means rs6000_modes_tieable_p should return true
> for SImode and SFmode if TARGET_P8_VECTOR is true. Likewise for
> TARGET_P9_VECTOR and SFmode and QImode/HImode too.
>
Thanks for your great comments!

modes_tieable_p is invoked by a few places besides extract_low_bits, so
updating this hook to relax the restriction may benefit more passes.

We may update modes_tieable_p for more cases as possible.
A hacked patch for "float vs. int" is listed at the end of this mail.

While back to the issue of this PR: optimize float loading which is
stored from the int register.  DSE works more on basicblock, so updating
modes_tieable_p (or extract_low_bits) can not handle some cases like:

double __attribute__ ((noipa)) foo_df (DF arg, int flag)
{
  if (flag == 2)
 return arg.a[3];
  return 0.0;
}

I'm thinking a way to handle this case.


BR,
Jeff (Jiufu)

>
> Thanks,
> Andrew Pinski
>
>>
>>
>> Segher

(To be refined.)
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index b3a609f3aa3..8088a608be6 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -1959,6 +1959,17 @@ rs6000_hard_regno_mode_ok (unsigned int regno, 
machine_mode mode)
 static bool
 rs6000_modes_tieable_p (machine_mode mode1, machine_mode mode2)
 {
+  
+  if ((GET_MODE_CLASS (mode1) == MODE_FLOAT
+   && (GET_MODE_SIZE (mode2) == UNITS_PER_FP_WORD
+  || (TARGET_P8_VECTOR && (mode2 == SImode))
+  || (TARGET_P9_VECTOR && (mode2 == QImode || mode2 == HImode
+  || (GET_MODE_CLASS (mode2) == MODE_FLOAT
+ && (GET_MODE_SIZE (mode1) == UNITS_PER_FP_WORD
+ || (TARGET_P8_VECTOR && (mode1 == SImode))
+ || (TARGET_P9_VECTOR && (mode1 == QImode || mode1 == HImode)
+return true;
+
   if (mode1 == PTImode || mode1 == OOmode || mode1 == XOmode
   || mode2 == PTImode || mode2 == OOmode || mode2 == XOmode)
 return mode1 == mode2;
---


Re: [PATCH] loading float member of parameter stored via int registers

2022-12-30 Thread Andrew Pinski via Gcc-patches
On Thu, Dec 29, 2022 at 11:45 PM Segher Boessenkool
 wrote:
>
> Hi!
>
> On Fri, Dec 30, 2022 at 10:22:31AM +0800, Jiufu Guo wrote:
> > Considering the limitations of CSE, I try to find other places
> > to handle this issue, and notice DSE can optimize below code:
> > "[sfp:DI]=x:DI ; y:SI=[sfp:DI]" to "y:SI=x:DI#0".
> >
> > So, I drafted a patch to update DSE to handle DI->DF/SF.
> > The patch updates "extract_low_bits" to get mode change
> > with subreg.
> >
> > diff --git a/gcc/expmed.cc b/gcc/expmed.cc
> > index b12b0e000c2..5e36331082c 100644
> > --- a/gcc/expmed.cc
> > +++ b/gcc/expmed.cc
> > @@ -2439,7 +2439,10 @@ extract_low_bits (machine_mode mode, machine_mode 
> > src_mode, rtx src)
> >
> >if (!targetm.modes_tieable_p (src_int_mode, src_mode))
> >  return NULL_RTX;
> > -  if (!targetm.modes_tieable_p (int_mode, mode))
> > +  if (!targetm.modes_tieable_p (int_mode, mode)
> > +  && !(known_le (GET_MODE_BITSIZE (mode), GET_MODE_BITSIZE (src_mode))
> > +&& GET_MODE_CLASS (mode) == MODE_FLOAT
> > +&& GET_MODE_CLASS (src_mode) == MODE_INT))
> >  return NULL_RTX;
> >
> >src = gen_lowpart (src_int_mode, src);
>
> Ah!  This simply shows rs6000_modes_tieable_p is decidedly non-optimal:
> it does not allow tying a scalar float to anything else.  No such thing
> is required, or good apparently.  I wonder why we have such restrictions
> at all in rs6000; is it just unfortunate history, was it good at one
> point in time?

The documentation for TARGET_MODES_TIEABLE_P says the following:
If TARGET_HARD_REGNO_MODE_OK (r, mode1) and TARGET_HARD_REGNO_MODE_OK
(r, mode2) are always the same for any r, then TARGET_MODES_TIEABLE_P
(mode1, mode2) should be true. If they differ for any r, you should
define this hook to return false unless some other mechanism ensures
the accessibility of the value in a narrower mode.

even though rs6000_hard_regno_mode_ok_uncached's comment has the following:
  /* The float registers (except for VSX vector modes) can only hold floating
 modes and DImode.  */

TARGET_P8_VECTOR and TARGET_P9_VECTOR has special cased different modes now:
  if (TARGET_P8_VECTOR && (mode == SImode))
return 1;

  if (TARGET_P9_VECTOR && (mode == QImode || mode == HImode))
return 1;
Which I suspect that means rs6000_modes_tieable_p should return true
for SImode and SFmode if TARGET_P8_VECTOR is true. Likewise for
TARGET_P9_VECTOR and SFmode and QImode/HImode too.


Thanks,
Andrew Pinski

>
>
> Segher


Re: [PATCH] loading float member of parameter stored via int registers

2022-12-29 Thread Segher Boessenkool
Hi!

On Fri, Dec 30, 2022 at 10:22:31AM +0800, Jiufu Guo wrote:
> Considering the limitations of CSE, I try to find other places
> to handle this issue, and notice DSE can optimize below code:
> "[sfp:DI]=x:DI ; y:SI=[sfp:DI]" to "y:SI=x:DI#0".
> 
> So, I drafted a patch to update DSE to handle DI->DF/SF.
> The patch updates "extract_low_bits" to get mode change 
> with subreg.
> 
> diff --git a/gcc/expmed.cc b/gcc/expmed.cc
> index b12b0e000c2..5e36331082c 100644
> --- a/gcc/expmed.cc
> +++ b/gcc/expmed.cc
> @@ -2439,7 +2439,10 @@ extract_low_bits (machine_mode mode, machine_mode 
> src_mode, rtx src)
>  
>if (!targetm.modes_tieable_p (src_int_mode, src_mode))
>  return NULL_RTX;
> -  if (!targetm.modes_tieable_p (int_mode, mode))
> +  if (!targetm.modes_tieable_p (int_mode, mode)
> +  && !(known_le (GET_MODE_BITSIZE (mode), GET_MODE_BITSIZE (src_mode))
> +&& GET_MODE_CLASS (mode) == MODE_FLOAT
> +&& GET_MODE_CLASS (src_mode) == MODE_INT))
>  return NULL_RTX;
>  
>src = gen_lowpart (src_int_mode, src);

Ah!  This simply shows rs6000_modes_tieable_p is decidedly non-optimal:
it does not allow tying a scalar float to anything else.  No such thing
is required, or good apparently.  I wonder why we have such restrictions
at all in rs6000; is it just unfortunate history, was it good at one
point in time?


Segher


Re: [PATCH] loading float member of parameter stored via int registers

2022-12-29 Thread Jiufu Guo via Gcc-patches
Hi,

Jiufu Guo via Gcc-patches  writes:

> Hi,
>
> Jiufu Guo via Gcc-patches  writes:
>
>> Hi,
>>
>> Segher Boessenkool  writes:
>>
>>> On Fri, Dec 23, 2022 at 08:13:48PM +0100, Richard Biener wrote:
 > Am 23.12.2022 um 17:55 schrieb Segher Boessenkool 
 > :
 > There are at least six very different kinds of subreg:
 > 
 > 0) Lvalue subregs.  Most archs have no use for it, and it can be
 >   expressed much more clearly and cleanly always.
 > 1) Subregs of mem.  Do not use, deprecated.  When old reload goes away
 >   this will go away.
 > 2) Subregs of hard registers.  Do not use, there are much better ways to
 >   write subregs of a non-zero byte offset, and for zero offset this is
 >   non-canonical RTL.
 > 3) Bitcast subregs.  In principle they go from one mode to another mode
 >   of the same size (but read on).
 > 4) Paradoxical subregs.  A concept completely separate from the rest,
 >   different rules for everything, it has to be special cased almost
 >   everywhere, it would be better if it was a separate rtx_code imo.
 > 5) Finally, normal subregs, taking a contiguous span of bits from some
 >   value.
 > 
 > Now, it is invalid to have a subreg of a subreg, so a 3) of a 5) is
 > written as just one subreg, as you say.  And a 4) of a 5) is just
 > invalid afaics (and let's not talk about 0)..2) anymore :-) )
 > 
 >> Note whether targets actually support subreg operations needs to be 
 >> queried and I’m not sure how subreg with offset validation should work 
 >> there.
 > 
 > But 3) is always valid, no?  On pseudos
>> I also has similar question: do we need to query/recog if "SF(SI#0)" is
>> valid on the target, or it would always work (even through reload)?
>> I also hit this during debugging on ppc64le: "SF(SI#0)" is valid,
>> and "SF(DI#4)" is not valid. 
 
 Yes, but it will eventually result in a spill/reload which is
 undesirable when we created this from CSE from a load.  So I think
 for CSE we do want to know whether a spill will definitely not
 occur.
>>>
>>> Does it cause reloads though?  On any sane backend?  If no movsf pattern
>>> allows integer registers, can things work at all?
>>>
>>> Anyway, the normal way to test if some RTL is valid is to just generate
>>> it (using validate_change) and then do apply_change_group, which then
>>> cancels the changes if they do not work.  CSE already does some of
>>> this.
>> validate_change seems ok. Thanks!
>>>
>>> (I am doubtful doing any of this in CSE is a good idea fwiw).
>> Understand your concern! Especially when we need to emit additional
>> inns in CSE.
>> While CSE does some similar work. It transforms
>> "[sf:DI]=%x:DI; %y:DI=[sf:DI]" to "%y:DI=%x:DI".
>> and "see if a MEM has already been loaded with a widening operation;
>> if it has, we can use a subreg of that." (only for int modes).
>> So, it may be acceptable to do this in CSE (maybe still seems
>> hacking).
>
> This maybe works for "DI to DF", because "mode converting
> subreg:DF(x:DI,0)" is cheaper than "mem load DF([sf:DI])". Then
> "y:DF=[sf:DI]" can be replaced by "y:DF=x:DI#0".
>
> While for "subreg:SF(x:SI,0)", in CSE, it may not cheaper.
> So, it may be doubtful for "convert DI to SF" in CSE.

Considering the limitations of CSE, I try to find other places
to handle this issue, and notice DSE can optimize below code:
"[sfp:DI]=x:DI ; y:SI=[sfp:DI]" to "y:SI=x:DI#0".

So, I drafted a patch to update DSE to handle DI->DF/SF.
The patch updates "extract_low_bits" to get mode change 
with subreg.

diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index b12b0e000c2..5e36331082c 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -2439,7 +2439,10 @@ extract_low_bits (machine_mode mode, machine_mode 
src_mode, rtx src)
 
   if (!targetm.modes_tieable_p (src_int_mode, src_mode))
 return NULL_RTX;
-  if (!targetm.modes_tieable_p (int_mode, mode))
+  if (!targetm.modes_tieable_p (int_mode, mode)
+  && !(known_le (GET_MODE_BITSIZE (mode), GET_MODE_BITSIZE (src_mode))
+  && GET_MODE_CLASS (mode) == MODE_FLOAT
+  && GET_MODE_CLASS (src_mode) == MODE_INT))
 return NULL_RTX;
 
   src = gen_lowpart (src_int_mode, src);
---

While I am aware that DSE is not good at cross basic-block.
e.g. for code, it was not optimized.

typedef struct SF {float a[4];int i1; int i2; } SF;
int foo_si (SF a, int flag)
{
  if (flag == 2)
return a.i1 + a.i2;
  return 0;
}


So, we may back to previous ideas which we discussed in
early of this thread.
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608872.html


BR,
Jeff (Jiufu)

>
> Any comments or suggestions?
>
>
> BR,
> Jeff (Jiufu)
>
>>
>> Thanks for so great comments!
>>
>> BR,
>> Jeff (Jiufu)
>>
>>>
>>>
>>> Segher


Re: [PATCH] loading float member of parameter stored via int registers

2022-12-27 Thread Jiufu Guo via Gcc-patches
Hi,

Jiufu Guo via Gcc-patches  writes:

> Hi,
>
> Segher Boessenkool  writes:
>
>> On Fri, Dec 23, 2022 at 08:13:48PM +0100, Richard Biener wrote:
>>> > Am 23.12.2022 um 17:55 schrieb Segher Boessenkool 
>>> > :
>>> > There are at least six very different kinds of subreg:
>>> > 
>>> > 0) Lvalue subregs.  Most archs have no use for it, and it can be
>>> >   expressed much more clearly and cleanly always.
>>> > 1) Subregs of mem.  Do not use, deprecated.  When old reload goes away
>>> >   this will go away.
>>> > 2) Subregs of hard registers.  Do not use, there are much better ways to
>>> >   write subregs of a non-zero byte offset, and for zero offset this is
>>> >   non-canonical RTL.
>>> > 3) Bitcast subregs.  In principle they go from one mode to another mode
>>> >   of the same size (but read on).
>>> > 4) Paradoxical subregs.  A concept completely separate from the rest,
>>> >   different rules for everything, it has to be special cased almost
>>> >   everywhere, it would be better if it was a separate rtx_code imo.
>>> > 5) Finally, normal subregs, taking a contiguous span of bits from some
>>> >   value.
>>> > 
>>> > Now, it is invalid to have a subreg of a subreg, so a 3) of a 5) is
>>> > written as just one subreg, as you say.  And a 4) of a 5) is just
>>> > invalid afaics (and let's not talk about 0)..2) anymore :-) )
>>> > 
>>> >> Note whether targets actually support subreg operations needs to be 
>>> >> queried and I’m not sure how subreg with offset validation should work 
>>> >> there.
>>> > 
>>> > But 3) is always valid, no?  On pseudos
> I also has similar question: do we need to query/recog if "SF(SI#0)" is
> valid on the target, or it would always work (even through reload)?
> I also hit this during debugging on ppc64le: "SF(SI#0)" is valid,
> and "SF(DI#4)" is not valid. 
>>> 
>>> Yes, but it will eventually result in a spill/reload which is
>>> undesirable when we created this from CSE from a load.  So I think
>>> for CSE we do want to know whether a spill will definitely not
>>> occur.
>>
>> Does it cause reloads though?  On any sane backend?  If no movsf pattern
>> allows integer registers, can things work at all?
>>
>> Anyway, the normal way to test if some RTL is valid is to just generate
>> it (using validate_change) and then do apply_change_group, which then
>> cancels the changes if they do not work.  CSE already does some of
>> this.
> validate_change seems ok. Thanks!
>>
>> (I am doubtful doing any of this in CSE is a good idea fwiw).
> Understand your concern! Especially when we need to emit additional
> inns in CSE.
> While CSE does some similar work. It transforms
> "[sf:DI]=%x:DI; %y:DI=[sf:DI]" to "%y:DI=%x:DI".
> and "see if a MEM has already been loaded with a widening operation;
> if it has, we can use a subreg of that." (only for int modes).
> So, it may be acceptable to do this in CSE (maybe still seems
> hacking).

This maybe works for "DI to DF", because "mode converting
subreg:DF(x:DI,0)" is cheaper than "mem load DF([sf:DI])". Then
"y:DF=[sf:DI]" can be replaced by "y:DF=x:DI#0".

While for "subreg:SF(x:SI,0)", in CSE, it may not cheaper.
So, it may be doubtful for "convert DI to SF" in CSE.

Any comments or suggestions?


BR,
Jeff (Jiufu)

>
> Thanks for so great comments!
>
> BR,
> Jeff (Jiufu)
>
>>
>>
>> Segher


Re: [PATCH] loading float member of parameter stored via int registers

2022-12-26 Thread Jiufu Guo via Gcc-patches
Hi,

Segher Boessenkool  writes:

> On Fri, Dec 23, 2022 at 08:13:48PM +0100, Richard Biener wrote:
>> > Am 23.12.2022 um 17:55 schrieb Segher Boessenkool 
>> > :
>> > There are at least six very different kinds of subreg:
>> > 
>> > 0) Lvalue subregs.  Most archs have no use for it, and it can be
>> >   expressed much more clearly and cleanly always.
>> > 1) Subregs of mem.  Do not use, deprecated.  When old reload goes away
>> >   this will go away.
>> > 2) Subregs of hard registers.  Do not use, there are much better ways to
>> >   write subregs of a non-zero byte offset, and for zero offset this is
>> >   non-canonical RTL.
>> > 3) Bitcast subregs.  In principle they go from one mode to another mode
>> >   of the same size (but read on).
>> > 4) Paradoxical subregs.  A concept completely separate from the rest,
>> >   different rules for everything, it has to be special cased almost
>> >   everywhere, it would be better if it was a separate rtx_code imo.
>> > 5) Finally, normal subregs, taking a contiguous span of bits from some
>> >   value.
>> > 
>> > Now, it is invalid to have a subreg of a subreg, so a 3) of a 5) is
>> > written as just one subreg, as you say.  And a 4) of a 5) is just
>> > invalid afaics (and let's not talk about 0)..2) anymore :-) )
>> > 
>> >> Note whether targets actually support subreg operations needs to be 
>> >> queried and I’m not sure how subreg with offset validation should work 
>> >> there.
>> > 
>> > But 3) is always valid, no?  On pseudos
I also has similar question: do we need to query/recog if "SF(SI#0)" is
valid on the target, or it would always work (even through reload)?
I also hit this during debugging on ppc64le: "SF(SI#0)" is valid,
and "SF(DI#4)" is not valid. 
>> 
>> Yes, but it will eventually result in a spill/reload which is
>> undesirable when we created this from CSE from a load.  So I think
>> for CSE we do want to know whether a spill will definitely not
>> occur.
>
> Does it cause reloads though?  On any sane backend?  If no movsf pattern
> allows integer registers, can things work at all?
>
> Anyway, the normal way to test if some RTL is valid is to just generate
> it (using validate_change) and then do apply_change_group, which then
> cancels the changes if they do not work.  CSE already does some of
> this.
validate_change seems ok. Thanks!
>
> (I am doubtful doing any of this in CSE is a good idea fwiw).
Understand your concern! Especially when we need to emit additional
inns in CSE.
While CSE does some similar work. It transforms
"[sf:DI]=%x:DI; %y:DI=[sf:DI]" to "%y:DI=%x:DI".
and "see if a MEM has already been loaded with a widening operation;
if it has, we can use a subreg of that." (only for int modes).
So, it may be acceptable to do this in CSE (maybe still seems
hacking).

Thanks for so great comments!

BR,
Jeff (Jiufu)

>
>
> Segher


Re: [PATCH] loading float member of parameter stored via int registers

2022-12-26 Thread Jiufu Guo via Gcc-patches
Hi,

Segher Boessenkool  writes:

> Hi!
>
> On Fri, Dec 23, 2022 at 08:36:36PM +0800, Jiufu Guo wrote:
>> It seems some limitations there. e.g. 1. "subreg:DF on DI register"
>> may not work well on pseudo,
>
> It is perfectly normal:
>   A hard register may be accessed in various modes throughout one
>   function, but each pseudo register is given a natural mode
>   and is accessed only in that mode.  When it is necessary to describe
>   an access to a pseudo register using a nonnatural mode, a @code{subreg}
>   expression is used.
>
> and:
>   @code{subreg} expressions are used to refer to a register in a machine
>   mode other than its natural one, or to refer to one register of
>   a multi-part @code{reg} that actually refers to several registers.
>
>   Each pseudo register has a natural mode.  If it is necessary to
>   operate on it in a different mode, the register must be
>   enclosed in a @code{subreg}.
>
> and we even have:
>   @item hard registers
>   It is seldom necessary to wrap hard registers in @code{subreg}s; such
>   registers would normally reduce to a single @code{reg} rtx.  This use of
>   @code{subreg}s is discouraged and may not be supported in the future.
>
Thanks so much for detailed explaination!

>> and 2. to convert high-part:DI to SF,
>> a "shift/rotate" is needed, and then we need to "emit shift insn"
>> in cse. I may need to update this patch.
>
> Hrm.  The machine insns to do this is just mtvsrd;xscvspdpn, but for
> converting the lowpart it is mtvsrws;xscvspdpn (this needs p9 or
> later).  We should arrive at those patterns, and we should try to not
> go via the more expensive formulations with shifts, which don't describe
> the hardware well, and which overestimate the cost of it.
Yes, understant!
>
> None of this belongs in generic code at all imo.  At expand time it
> should be expanded to something that works and can be optimised well,
> so not anything with :BLK (which has to be put in memory, something with
> unbounded size cannot be put in registers), not anything specifically
> tailored to any cpu, something nice and regular.  Using a subreg (of a
> pseudo!) is the standard way of writing a bitcast.
>
> So generic code would do a  (subreg:SF (reg:SI) 0)  to express a 32-bit
> integer bitcast to an IEEE SP number, and our machine description should
> make it work nicely.
Right!  So, I'm thinking a way: in generic code, we may generated
"shift+(subreg:SF (reg:SI) 0)"; and at somewhere (maybe in combiner),
using "mtvsr.." to replace the "shift+subreg".

BR,
Jeff (Jiufu)

>
>
> Segher


Re: [PATCH] loading float member of parameter stored via int registers

2022-12-23 Thread Segher Boessenkool
On Fri, Dec 23, 2022 at 08:13:48PM +0100, Richard Biener wrote:
> > Am 23.12.2022 um 17:55 schrieb Segher Boessenkool 
> > :
> > There are at least six very different kinds of subreg:
> > 
> > 0) Lvalue subregs.  Most archs have no use for it, and it can be
> >   expressed much more clearly and cleanly always.
> > 1) Subregs of mem.  Do not use, deprecated.  When old reload goes away
> >   this will go away.
> > 2) Subregs of hard registers.  Do not use, there are much better ways to
> >   write subregs of a non-zero byte offset, and for zero offset this is
> >   non-canonical RTL.
> > 3) Bitcast subregs.  In principle they go from one mode to another mode
> >   of the same size (but read on).
> > 4) Paradoxical subregs.  A concept completely separate from the rest,
> >   different rules for everything, it has to be special cased almost
> >   everywhere, it would be better if it was a separate rtx_code imo.
> > 5) Finally, normal subregs, taking a contiguous span of bits from some
> >   value.
> > 
> > Now, it is invalid to have a subreg of a subreg, so a 3) of a 5) is
> > written as just one subreg, as you say.  And a 4) of a 5) is just
> > invalid afaics (and let's not talk about 0)..2) anymore :-) )
> > 
> >> Note whether targets actually support subreg operations needs to be 
> >> queried and I’m not sure how subreg with offset validation should work 
> >> there.
> > 
> > But 3) is always valid, no?  On pseudos.
> 
> Yes, but it will eventually result in a spill/reload which is undesirable 
> when we created this from CSE from a load.  So I think for CSE we do want to 
> know whether a spill will definitely not occur.

Does it cause reloads though?  On any sane backend?  If no movsf pattern
allows integer registers, can things work at all?

Anyway, the normal way to test if some RTL is valid is to just generate
it (using validate_change) and then do apply_change_group, which then
cancels the changes if they do not work.  CSE already does some of this.

(I am doubtful doing any of this in CSE is a good idea fwiw).


Segher


Re: [PATCH] loading float member of parameter stored via int registers

2022-12-23 Thread Richard Biener via Gcc-patches



> Am 23.12.2022 um 17:55 schrieb Segher Boessenkool 
> :
> 
> On Fri, Dec 23, 2022 at 05:20:09PM +0100, Richard Biener wrote:
 Am 23.12.2022 um 15:48 schrieb Segher Boessenkool 
 :
>>> None of this belongs in generic code at all imo.  At expand time it
>>> should be expanded to something that works and can be optimised well,
>>> so not anything with :BLK (which has to be put in memory, something with
>>> unbounded size cannot be put in registers), not anything specifically
>>> tailored to any cpu, something nice and regular.  Using a subreg (of a
>>> pseudo!) is the standard way of writing a bitcast.
>>> 
>>> So generic code would do a  (subreg:SF (reg:SI) 0)  to express a 32-bit
>>> integer bitcast to an IEEE SP number, and our machine description should
>>> make it work nicely.
>> 
>> There’s also a byte offset in subreg, so (subreg:sf (reg:di) 4) is a 
>> Highpart bitcast.
> 
> There are at least six very different kinds of subreg:
> 
> 0) Lvalue subregs.  Most archs have no use for it, and it can be
>   expressed much more clearly and cleanly always.
> 1) Subregs of mem.  Do not use, deprecated.  When old reload goes away
>   this will go away.
> 2) Subregs of hard registers.  Do not use, there are much better ways to
>   write subregs of a non-zero byte offset, and for zero offset this is
>   non-canonical RTL.
> 3) Bitcast subregs.  In principle they go from one mode to another mode
>   of the same size (but read on).
> 4) Paradoxical subregs.  A concept completely separate from the rest,
>   different rules for everything, it has to be special cased almost
>   everywhere, it would be better if it was a separate rtx_code imo.
> 5) Finally, normal subregs, taking a contiguous span of bits from some
>   value.
> 
> Now, it is invalid to have a subreg of a subreg, so a 3) of a 5) is
> written as just one subreg, as you say.  And a 4) of a 5) is just
> invalid afaics (and let's not talk about 0)..2) anymore :-) )
> 
>> Note whether targets actually support subreg operations needs to be queried 
>> and I’m not sure how subreg with offset validation should work there.
> 
> But 3) is always valid, no?  On pseudos.

Yes, but it will eventually result in a spill/reload which is undesirable when 
we created this from CSE from a load.  So I think for CSE we do want to know 
whether a spill will definitely not occur.

Richard 
> 
> Segher


Re: [PATCH] loading float member of parameter stored via int registers

2022-12-23 Thread Segher Boessenkool
On Fri, Dec 23, 2022 at 05:20:09PM +0100, Richard Biener wrote:
> > Am 23.12.2022 um 15:48 schrieb Segher Boessenkool 
> > :
> > None of this belongs in generic code at all imo.  At expand time it
> > should be expanded to something that works and can be optimised well,
> > so not anything with :BLK (which has to be put in memory, something with
> > unbounded size cannot be put in registers), not anything specifically
> > tailored to any cpu, something nice and regular.  Using a subreg (of a
> > pseudo!) is the standard way of writing a bitcast.
> > 
> > So generic code would do a  (subreg:SF (reg:SI) 0)  to express a 32-bit
> > integer bitcast to an IEEE SP number, and our machine description should
> > make it work nicely.
> 
> There’s also a byte offset in subreg, so (subreg:sf (reg:di) 4) is a Highpart 
> bitcast.

There are at least six very different kinds of subreg:

0) Lvalue subregs.  Most archs have no use for it, and it can be
   expressed much more clearly and cleanly always.
1) Subregs of mem.  Do not use, deprecated.  When old reload goes away
   this will go away.
2) Subregs of hard registers.  Do not use, there are much better ways to
   write subregs of a non-zero byte offset, and for zero offset this is
   non-canonical RTL.
3) Bitcast subregs.  In principle they go from one mode to another mode
   of the same size (but read on).
4) Paradoxical subregs.  A concept completely separate from the rest,
   different rules for everything, it has to be special cased almost
   everywhere, it would be better if it was a separate rtx_code imo.
5) Finally, normal subregs, taking a contiguous span of bits from some
   value.

Now, it is invalid to have a subreg of a subreg, so a 3) of a 5) is
written as just one subreg, as you say.  And a 4) of a 5) is just
invalid afaics (and let's not talk about 0)..2) anymore :-) )

> Note whether targets actually support subreg operations needs to be queried 
> and I’m not sure how subreg with offset validation should work there.

But 3) is always valid, no?  On pseudos.


Segher


Re: [PATCH] loading float member of parameter stored via int registers

2022-12-23 Thread Richard Biener via Gcc-patches



> Am 23.12.2022 um 15:48 schrieb Segher Boessenkool 
> :
> 
> Hi!
> 
>> On Fri, Dec 23, 2022 at 08:36:36PM +0800, Jiufu Guo wrote:
>> It seems some limitations there. e.g. 1. "subreg:DF on DI register"
>> may not work well on pseudo,
> 
> It is perfectly normal:
>  A hard register may be accessed in various modes throughout one
>  function, but each pseudo register is given a natural mode
>  and is accessed only in that mode.  When it is necessary to describe
>  an access to a pseudo register using a nonnatural mode, a @code{subreg}
>  expression is used.
> 
> and:
>  @code{subreg} expressions are used to refer to a register in a machine
>  mode other than its natural one, or to refer to one register of
>  a multi-part @code{reg} that actually refers to several registers.
> 
>  Each pseudo register has a natural mode.  If it is necessary to
>  operate on it in a different mode, the register must be
>  enclosed in a @code{subreg}.
> 
> and we even have:
>  @item hard registers
>  It is seldom necessary to wrap hard registers in @code{subreg}s; such
>  registers would normally reduce to a single @code{reg} rtx.  This use of
>  @code{subreg}s is discouraged and may not be supported in the future.
> 
>> and 2. to convert high-part:DI to SF,
>> a "shift/rotate" is needed, and then we need to "emit shift insn"
>> in cse. I may need to update this patch.
> 
> Hrm.  The machine insns to do this is just mtvsrd;xscvspdpn, but for
> converting the lowpart it is mtvsrws;xscvspdpn (this needs p9 or
> later).  We should arrive at those patterns, and we should try to not
> go via the more expensive formulations with shifts, which don't describe
> the hardware well, and which overestimate the cost of it.
> 
> None of this belongs in generic code at all imo.  At expand time it
> should be expanded to something that works and can be optimised well,
> so not anything with :BLK (which has to be put in memory, something with
> unbounded size cannot be put in registers), not anything specifically
> tailored to any cpu, something nice and regular.  Using a subreg (of a
> pseudo!) is the standard way of writing a bitcast.
> 
> So generic code would do a  (subreg:SF (reg:SI) 0)  to express a 32-bit
> integer bitcast to an IEEE SP number, and our machine description should
> make it work nicely.

There’s also a byte offset in subreg, so (subreg:sf (reg:di) 4) is a Highpart 
bitcast.  Note whether targets actually support subreg operations needs to be 
queried and I’m not sure how subreg with offset validation should work there.

Richard 

> 
> 
> Segher


Re: [PATCH] loading float member of parameter stored via int registers

2022-12-23 Thread Segher Boessenkool
Hi!

On Fri, Dec 23, 2022 at 08:36:36PM +0800, Jiufu Guo wrote:
> It seems some limitations there. e.g. 1. "subreg:DF on DI register"
> may not work well on pseudo,

It is perfectly normal:
  A hard register may be accessed in various modes throughout one
  function, but each pseudo register is given a natural mode
  and is accessed only in that mode.  When it is necessary to describe
  an access to a pseudo register using a nonnatural mode, a @code{subreg}
  expression is used.

and:
  @code{subreg} expressions are used to refer to a register in a machine
  mode other than its natural one, or to refer to one register of
  a multi-part @code{reg} that actually refers to several registers.

  Each pseudo register has a natural mode.  If it is necessary to
  operate on it in a different mode, the register must be
  enclosed in a @code{subreg}.

and we even have:
  @item hard registers
  It is seldom necessary to wrap hard registers in @code{subreg}s; such
  registers would normally reduce to a single @code{reg} rtx.  This use of
  @code{subreg}s is discouraged and may not be supported in the future.

> and 2. to convert high-part:DI to SF,
> a "shift/rotate" is needed, and then we need to "emit shift insn"
> in cse. I may need to update this patch.

Hrm.  The machine insns to do this is just mtvsrd;xscvspdpn, but for
converting the lowpart it is mtvsrws;xscvspdpn (this needs p9 or
later).  We should arrive at those patterns, and we should try to not
go via the more expensive formulations with shifts, which don't describe
the hardware well, and which overestimate the cost of it.

None of this belongs in generic code at all imo.  At expand time it
should be expanded to something that works and can be optimised well,
so not anything with :BLK (which has to be put in memory, something with
unbounded size cannot be put in registers), not anything specifically
tailored to any cpu, something nice and regular.  Using a subreg (of a
pseudo!) is the standard way of writing a bitcast.

So generic code would do a  (subreg:SF (reg:SI) 0)  to express a 32-bit
integer bitcast to an IEEE SP number, and our machine description should
make it work nicely.


Segher


Re: [PATCH] loading float member of parameter stored via int registers

2022-12-23 Thread Jiufu Guo via Gcc-patches
HI,

Jiufu Guo via Gcc-patches  writes:

> Hi,
>
> Richard Biener  writes:
>
>> On Thu, 22 Dec 2022, guojiufu wrote:
>>
>>> Hi,
>>> 
>>> On 2022-12-21 15:30, Richard Biener wrote:
>>> > On Wed, 21 Dec 2022, Jiufu Guo wrote:
>>> > 
>>> >> Hi,
>>> >> 
>>> >> This patch is fixing an issue about parameter accessing if the
>>> >> parameter is struct type and passed through integer registers, and
>>> >> there is floating member is accessed. Like below code:
>>> >> 
>>> >> typedef struct DF {double a[4]; long l; } DF;
>>> >> double foo_df (DF arg){return arg.a[3];}
>>> >> 
>>> >> On ppc64le, with trunk gcc, "std 6,-24(1) ; lfd 1,-24(1)" is
>>> >> generated.  While instruction "mtvsrd 1, 6" would be enough for
>>> >> this case.
>>> > 
>>> > So why do we end up spilling for PPC?
>>> 
>>> Good question! According to GCC source code (in function.cc/expr.cc),
>>> it is common behavior: using "word_mode" to store the parameter to stack,
>>> And using the field's mode (e.g. float mode) to load from the stack.
>>> But with some tries, I fail to construct cases on many platforms.
>>> So, I convert the fix to a target hook and implemented the rs6000 part
>>> first.
>>> 
>>> > 
>>> > struct X { int i; float f; };
>>> > 
>>> > float foo (struct X x)
>>> > {
>>> >   return x.f;
>>> > }
>>> > 
>>> > does pass the structure in $RDI on x86_64 and we manage (with
>>> > optimization, with -O0 we spill) to generate
>>> > 
>>> > shrq$32, %rdi
>>> > movd%edi, %xmm0
>>> > 
>>> > and RTL expansion generates
>>> > 
>>> > (note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>>> > (insn 2 4 3 2 (set (reg/v:DI 83 [ x ])
>>> > (reg:DI 5 di [ x ])) "t.c":4:1 -1
>>> >  (nil))
>>> > (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
>>> > (insn 6 3 7 2 (parallel [
>>> > (set (reg:DI 85)
>>> > (ashiftrt:DI (reg/v:DI 83 [ x ])
>>> > (const_int 32 [0x20])))
>>> > (clobber (reg:CC 17 flags))
>>> > ]) "t.c":5:11 -1
>>> >  (nil))
>>> > (insn 7 6 8 2 (set (reg:SI 86)
>>> > (subreg:SI (reg:DI 85) 0)) "t.c":5:11 -1
>>> >  (nil))
>>> > 
>>> > I would imagine that for the ppc case we only see the subreg here
>>> > which should be even easier to optimize.
>>> > 
>>> > So how's this not fixable by providing proper patterns / subreg
>>> > capabilities?  Looking a bit at the RTL we have the issue might
>>> > be that nothing seems to handle CSE of
>>> > 
>>> 
>>> This case is also related to 'parameter on struct', PR89310 is
>>> just for this case. On trunk, it is fixed.
>>> One difference: the parameter is in DImode, and passed via an
>>> integer register for "{int i; float f;}".
>>> But for "{double a[4]; long l;}", the parameter is in BLKmode,
>>> and stored to stack during the argument setup.
>>
>> OK, so this would be another case for "heuristics" to use
>> sth different than word_mode for storing, but of course
>> the arguments are in integer registers and using different
>> modes can for example prohibit store-multiple instruction use.
>>
>> As I said in the related thread an RTL expansion time "SRA"
>> with the incoming argument assignment in mind could make
>> more optimal decisions for these kind of special-cases.
>
> Thanks a lot for your comments!
>
> Yeap! Using SRA-like analysis during expansion for parameter
> and returns (and may also some field accessing) would be a
> generic improvement for this kind of issue (PR101926 collected
> a lot of them).
> While we may still need some work for various ABIs and different
> targets, to analyze where the 'struct field' come from
> (int/float/vector/.. registers, or stack) and how the struct
> need to be handled (keep in pseudo or store in the stack).
> This may indicate a mount of changes for param_setup code.
>
> To reduce risk, I'm just draft straightforward patches for
> special cases currently, Like:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608081.html
> and this patch.
>
>>
>>> > (note 8 0 5 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>>> > (insn 5 8 7 2 (set (mem/c:DI (plus:DI (reg/f:DI 110 sfp)
>>> > (const_int 56 [0x38])) [2 arg+24 S8 A64])
>>> > (reg:DI 6 6)) "t.c":2:23 679 {*movdi_internal64}
>>> >  (expr_list:REG_DEAD (reg:DI 6 6)
>>> > (nil)))
>>> > (note 7 5 10 2 NOTE_INSN_FUNCTION_BEG)
>>> > (note 10 7 15 2 NOTE_INSN_DELETED)
>>> > (insn 15 10 16 2 (set (reg/i:DF 33 1)
>>> > (mem/c:DF (plus:DI (reg/f:DI 110 sfp)
>>> > (const_int 56 [0x38])) [1 arg.a[3]+0 S8 A64])) "t.c":2:40
>>> > 576 {*movdf_hardfloat64}
>>> >  (nil))
>>> > 
>>> > Possibly because the store and load happen in a different mode?  Can
>>> > you see why CSE doesn't handle this (producing a subreg)?  On
>>> 
>>> Yes, exactly! For "{double a[4]; long l;}", because the store and load
>>> are using a different mode, and then CSE does not optimize it.  This
>>> patch makes the store and load using the same mode (DImode), and then
>>> leverage CSE to handle it.
>>

Re: [PATCH] loading float member of parameter stored via int registers

2022-12-23 Thread Jiufu Guo via Gcc-patches
Hi,

Segher Boessenkool  writes:

> On Thu, Dec 22, 2022 at 11:28:01AM +, Richard Biener wrote:
>> On Thu, 22 Dec 2022, Jiufu Guo wrote:
>> > To reduce risk, I'm just draft straightforward patches for
>> > special cases currently, Like:
>> > https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608081.html
>> > and this patch.
>> 
>> Heh, yes - though I'm not fond of special-casing things.  RTL
>> expansion is already full of special cases :/
>
> And many of those are not useful at all (would be done by later passes),
> or are actively harmful.  Not to mention that expand is currently one of
> the most impregnable and undebuggable RTL passes.
>
> But there are also many things done during expand that although they
> should be done somewhat later, aren't actually done later at all
> currently.  So that needs fixing.
>
> Maybe things should go via an intermediate step, where all the decisions
> can be made, and then later we just have to translate the "low Gimple"
> or "RTL-Gimple" ("Rimple"?) to RTL.  A format that is looser in many
> ways than either RTL or Gimple.  A bit like Generic in that way.

Thanks for all your great comments!

BR,
Jeff (Jiufu)
>
>
> Segher


Re: [PATCH] loading float member of parameter stored via int registers

2022-12-22 Thread Segher Boessenkool
On Thu, Dec 22, 2022 at 11:28:01AM +, Richard Biener wrote:
> On Thu, 22 Dec 2022, Jiufu Guo wrote:
> > To reduce risk, I'm just draft straightforward patches for
> > special cases currently, Like:
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608081.html
> > and this patch.
> 
> Heh, yes - though I'm not fond of special-casing things.  RTL
> expansion is already full of special cases :/

And many of those are not useful at all (would be done by later passes),
or are actively harmful.  Not to mention that expand is currently one of
the most impregnable and undebuggable RTL passes.

But there are also many things done during expand that although they
should be done somewhat later, aren't actually done later at all
currently.  So that needs fixing.

Maybe things should go via an intermediate step, where all the decisions
can be made, and then later we just have to translate the "low Gimple"
or "RTL-Gimple" ("Rimple"?) to RTL.  A format that is looser in many
ways than either RTL or Gimple.  A bit like Generic in that way.


Segher


Re: [PATCH] loading float member of parameter stored via int registers

2022-12-22 Thread Richard Biener via Gcc-patches
On Thu, 22 Dec 2022, Jiufu Guo wrote:

> 
> Hi,
> 
> Richard Biener  writes:
> 
> > On Thu, 22 Dec 2022, guojiufu wrote:
> >
> >> Hi,
> >> 
> >> On 2022-12-21 15:30, Richard Biener wrote:
> >> > On Wed, 21 Dec 2022, Jiufu Guo wrote:
> >> > 
> >> >> Hi,
> >> >> 
> >> >> This patch is fixing an issue about parameter accessing if the
> >> >> parameter is struct type and passed through integer registers, and
> >> >> there is floating member is accessed. Like below code:
> >> >> 
> >> >> typedef struct DF {double a[4]; long l; } DF;
> >> >> double foo_df (DF arg){return arg.a[3];}
> >> >> 
> >> >> On ppc64le, with trunk gcc, "std 6,-24(1) ; lfd 1,-24(1)" is
> >> >> generated.  While instruction "mtvsrd 1, 6" would be enough for
> >> >> this case.
> >> > 
> >> > So why do we end up spilling for PPC?
> >> 
> >> Good question! According to GCC source code (in function.cc/expr.cc),
> >> it is common behavior: using "word_mode" to store the parameter to stack,
> >> And using the field's mode (e.g. float mode) to load from the stack.
> >> But with some tries, I fail to construct cases on many platforms.
> >> So, I convert the fix to a target hook and implemented the rs6000 part
> >> first.
> >> 
> >> > 
> >> > struct X { int i; float f; };
> >> > 
> >> > float foo (struct X x)
> >> > {
> >> >   return x.f;
> >> > }
> >> > 
> >> > does pass the structure in $RDI on x86_64 and we manage (with
> >> > optimization, with -O0 we spill) to generate
> >> > 
> >> > shrq$32, %rdi
> >> > movd%edi, %xmm0
> >> > 
> >> > and RTL expansion generates
> >> > 
> >> > (note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
> >> > (insn 2 4 3 2 (set (reg/v:DI 83 [ x ])
> >> > (reg:DI 5 di [ x ])) "t.c":4:1 -1
> >> >  (nil))
> >> > (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
> >> > (insn 6 3 7 2 (parallel [
> >> > (set (reg:DI 85)
> >> > (ashiftrt:DI (reg/v:DI 83 [ x ])
> >> > (const_int 32 [0x20])))
> >> > (clobber (reg:CC 17 flags))
> >> > ]) "t.c":5:11 -1
> >> >  (nil))
> >> > (insn 7 6 8 2 (set (reg:SI 86)
> >> > (subreg:SI (reg:DI 85) 0)) "t.c":5:11 -1
> >> >  (nil))
> >> > 
> >> > I would imagine that for the ppc case we only see the subreg here
> >> > which should be even easier to optimize.
> >> > 
> >> > So how's this not fixable by providing proper patterns / subreg
> >> > capabilities?  Looking a bit at the RTL we have the issue might
> >> > be that nothing seems to handle CSE of
> >> > 
> >> 
> >> This case is also related to 'parameter on struct', PR89310 is
> >> just for this case. On trunk, it is fixed.
> >> One difference: the parameter is in DImode, and passed via an
> >> integer register for "{int i; float f;}".
> >> But for "{double a[4]; long l;}", the parameter is in BLKmode,
> >> and stored to stack during the argument setup.
> >
> > OK, so this would be another case for "heuristics" to use
> > sth different than word_mode for storing, but of course
> > the arguments are in integer registers and using different
> > modes can for example prohibit store-multiple instruction use.
> >
> > As I said in the related thread an RTL expansion time "SRA"
> > with the incoming argument assignment in mind could make
> > more optimal decisions for these kind of special-cases.
> 
> Thanks a lot for your comments!
> 
> Yeap! Using SRA-like analysis during expansion for parameter
> and returns (and may also some field accessing) would be a
> generic improvement for this kind of issue (PR101926 collected
> a lot of them).
> While we may still need some work for various ABIs and different
> targets, to analyze where the 'struct field' come from
> (int/float/vector/.. registers, or stack) and how the struct
> need to be handled (keep in pseudo or store in the stack).
> This may indicate a mount of changes for param_setup code.
> 
> To reduce risk, I'm just draft straightforward patches for
> special cases currently, Like:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608081.html
> and this patch.

Heh, yes - though I'm not fond of special-casing things.  RTL
expansion is already full of special cases :/

So basically what I'd do is SRA analysis as if the aggregates
were initialized by copies from the argument registers (and
stack slots) and then try to avoid expanding the aggregate
PARM_DECL itself but aim at fully scalarizing that with only
the copies from the argument space remaining.

> >
> >> > (note 8 0 5 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
> >> > (insn 5 8 7 2 (set (mem/c:DI (plus:DI (reg/f:DI 110 sfp)
> >> > (const_int 56 [0x38])) [2 arg+24 S8 A64])
> >> > (reg:DI 6 6)) "t.c":2:23 679 {*movdi_internal64}
> >> >  (expr_list:REG_DEAD (reg:DI 6 6)
> >> > (nil)))
> >> > (note 7 5 10 2 NOTE_INSN_FUNCTION_BEG)
> >> > (note 10 7 15 2 NOTE_INSN_DELETED)
> >> > (insn 15 10 16 2 (set (reg/i:DF 33 1)
> >> > (mem/c:DF (plus:DI (reg/f:DI 110 sfp)
> >> > (const_int 56 

Re: [PATCH] loading float member of parameter stored via int registers

2022-12-22 Thread Jiufu Guo via Gcc-patches


Hi,

Richard Biener  writes:

> On Thu, 22 Dec 2022, guojiufu wrote:
>
>> Hi,
>> 
>> On 2022-12-21 15:30, Richard Biener wrote:
>> > On Wed, 21 Dec 2022, Jiufu Guo wrote:
>> > 
>> >> Hi,
>> >> 
>> >> This patch is fixing an issue about parameter accessing if the
>> >> parameter is struct type and passed through integer registers, and
>> >> there is floating member is accessed. Like below code:
>> >> 
>> >> typedef struct DF {double a[4]; long l; } DF;
>> >> double foo_df (DF arg){return arg.a[3];}
>> >> 
>> >> On ppc64le, with trunk gcc, "std 6,-24(1) ; lfd 1,-24(1)" is
>> >> generated.  While instruction "mtvsrd 1, 6" would be enough for
>> >> this case.
>> > 
>> > So why do we end up spilling for PPC?
>> 
>> Good question! According to GCC source code (in function.cc/expr.cc),
>> it is common behavior: using "word_mode" to store the parameter to stack,
>> And using the field's mode (e.g. float mode) to load from the stack.
>> But with some tries, I fail to construct cases on many platforms.
>> So, I convert the fix to a target hook and implemented the rs6000 part
>> first.
>> 
>> > 
>> > struct X { int i; float f; };
>> > 
>> > float foo (struct X x)
>> > {
>> >   return x.f;
>> > }
>> > 
>> > does pass the structure in $RDI on x86_64 and we manage (with
>> > optimization, with -O0 we spill) to generate
>> > 
>> > shrq$32, %rdi
>> > movd%edi, %xmm0
>> > 
>> > and RTL expansion generates
>> > 
>> > (note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>> > (insn 2 4 3 2 (set (reg/v:DI 83 [ x ])
>> > (reg:DI 5 di [ x ])) "t.c":4:1 -1
>> >  (nil))
>> > (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
>> > (insn 6 3 7 2 (parallel [
>> > (set (reg:DI 85)
>> > (ashiftrt:DI (reg/v:DI 83 [ x ])
>> > (const_int 32 [0x20])))
>> > (clobber (reg:CC 17 flags))
>> > ]) "t.c":5:11 -1
>> >  (nil))
>> > (insn 7 6 8 2 (set (reg:SI 86)
>> > (subreg:SI (reg:DI 85) 0)) "t.c":5:11 -1
>> >  (nil))
>> > 
>> > I would imagine that for the ppc case we only see the subreg here
>> > which should be even easier to optimize.
>> > 
>> > So how's this not fixable by providing proper patterns / subreg
>> > capabilities?  Looking a bit at the RTL we have the issue might
>> > be that nothing seems to handle CSE of
>> > 
>> 
>> This case is also related to 'parameter on struct', PR89310 is
>> just for this case. On trunk, it is fixed.
>> One difference: the parameter is in DImode, and passed via an
>> integer register for "{int i; float f;}".
>> But for "{double a[4]; long l;}", the parameter is in BLKmode,
>> and stored to stack during the argument setup.
>
> OK, so this would be another case for "heuristics" to use
> sth different than word_mode for storing, but of course
> the arguments are in integer registers and using different
> modes can for example prohibit store-multiple instruction use.
>
> As I said in the related thread an RTL expansion time "SRA"
> with the incoming argument assignment in mind could make
> more optimal decisions for these kind of special-cases.

Thanks a lot for your comments!

Yeap! Using SRA-like analysis during expansion for parameter
and returns (and may also some field accessing) would be a
generic improvement for this kind of issue (PR101926 collected
a lot of them).
While we may still need some work for various ABIs and different
targets, to analyze where the 'struct field' come from
(int/float/vector/.. registers, or stack) and how the struct
need to be handled (keep in pseudo or store in the stack).
This may indicate a mount of changes for param_setup code.

To reduce risk, I'm just draft straightforward patches for
special cases currently, Like:
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608081.html
and this patch.

>
>> > (note 8 0 5 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>> > (insn 5 8 7 2 (set (mem/c:DI (plus:DI (reg/f:DI 110 sfp)
>> > (const_int 56 [0x38])) [2 arg+24 S8 A64])
>> > (reg:DI 6 6)) "t.c":2:23 679 {*movdi_internal64}
>> >  (expr_list:REG_DEAD (reg:DI 6 6)
>> > (nil)))
>> > (note 7 5 10 2 NOTE_INSN_FUNCTION_BEG)
>> > (note 10 7 15 2 NOTE_INSN_DELETED)
>> > (insn 15 10 16 2 (set (reg/i:DF 33 1)
>> > (mem/c:DF (plus:DI (reg/f:DI 110 sfp)
>> > (const_int 56 [0x38])) [1 arg.a[3]+0 S8 A64])) "t.c":2:40
>> > 576 {*movdf_hardfloat64}
>> >  (nil))
>> > 
>> > Possibly because the store and load happen in a different mode?  Can
>> > you see why CSE doesn't handle this (producing a subreg)?  On
>> 
>> Yes, exactly! For "{double a[4]; long l;}", because the store and load
>> are using a different mode, and then CSE does not optimize it.  This
>> patch makes the store and load using the same mode (DImode), and then
>> leverage CSE to handle it.
>
> So can we instead fix CSE to consider replacing insn 15 above with
>
>  (insn 15 (set (reg/i:DF 33 1)
>(subreg:DF (reg/f:DI 6 6)))
>

Thanks for your suggestion! I 

Re: [PATCH] loading float member of parameter stored via int registers

2022-12-21 Thread Richard Biener via Gcc-patches
On Thu, 22 Dec 2022, guojiufu wrote:

> Hi,
> 
> On 2022-12-21 15:30, Richard Biener wrote:
> > On Wed, 21 Dec 2022, Jiufu Guo wrote:
> > 
> >> Hi,
> >> 
> >> This patch is fixing an issue about parameter accessing if the
> >> parameter is struct type and passed through integer registers, and
> >> there is floating member is accessed. Like below code:
> >> 
> >> typedef struct DF {double a[4]; long l; } DF;
> >> double foo_df (DF arg){return arg.a[3];}
> >> 
> >> On ppc64le, with trunk gcc, "std 6,-24(1) ; lfd 1,-24(1)" is
> >> generated.  While instruction "mtvsrd 1, 6" would be enough for
> >> this case.
> > 
> > So why do we end up spilling for PPC?
> 
> Good question! According to GCC source code (in function.cc/expr.cc),
> it is common behavior: using "word_mode" to store the parameter to stack,
> And using the field's mode (e.g. float mode) to load from the stack.
> But with some tries, I fail to construct cases on many platforms.
> So, I convert the fix to a target hook and implemented the rs6000 part
> first.
> 
> > 
> > struct X { int i; float f; };
> > 
> > float foo (struct X x)
> > {
> >   return x.f;
> > }
> > 
> > does pass the structure in $RDI on x86_64 and we manage (with
> > optimization, with -O0 we spill) to generate
> > 
> > shrq$32, %rdi
> > movd%edi, %xmm0
> > 
> > and RTL expansion generates
> > 
> > (note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
> > (insn 2 4 3 2 (set (reg/v:DI 83 [ x ])
> > (reg:DI 5 di [ x ])) "t.c":4:1 -1
> >  (nil))
> > (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
> > (insn 6 3 7 2 (parallel [
> > (set (reg:DI 85)
> > (ashiftrt:DI (reg/v:DI 83 [ x ])
> > (const_int 32 [0x20])))
> > (clobber (reg:CC 17 flags))
> > ]) "t.c":5:11 -1
> >  (nil))
> > (insn 7 6 8 2 (set (reg:SI 86)
> > (subreg:SI (reg:DI 85) 0)) "t.c":5:11 -1
> >  (nil))
> > 
> > I would imagine that for the ppc case we only see the subreg here
> > which should be even easier to optimize.
> > 
> > So how's this not fixable by providing proper patterns / subreg
> > capabilities?  Looking a bit at the RTL we have the issue might
> > be that nothing seems to handle CSE of
> > 
> 
> This case is also related to 'parameter on struct', PR89310 is
> just for this case. On trunk, it is fixed.
> One difference: the parameter is in DImode, and passed via an
> integer register for "{int i; float f;}".
> But for "{double a[4]; long l;}", the parameter is in BLKmode,
> and stored to stack during the argument setup.

OK, so this would be another case for "heuristics" to use
sth different than word_mode for storing, but of course
the arguments are in integer registers and using different
modes can for example prohibit store-multiple instruction use.

As I said in the related thread an RTL expansion time "SRA"
with the incoming argument assignment in mind could make
more optimal decisions for these kind of special-cases.

> > (note 8 0 5 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
> > (insn 5 8 7 2 (set (mem/c:DI (plus:DI (reg/f:DI 110 sfp)
> > (const_int 56 [0x38])) [2 arg+24 S8 A64])
> > (reg:DI 6 6)) "t.c":2:23 679 {*movdi_internal64}
> >  (expr_list:REG_DEAD (reg:DI 6 6)
> > (nil)))
> > (note 7 5 10 2 NOTE_INSN_FUNCTION_BEG)
> > (note 10 7 15 2 NOTE_INSN_DELETED)
> > (insn 15 10 16 2 (set (reg/i:DF 33 1)
> > (mem/c:DF (plus:DI (reg/f:DI 110 sfp)
> > (const_int 56 [0x38])) [1 arg.a[3]+0 S8 A64])) "t.c":2:40
> > 576 {*movdf_hardfloat64}
> >  (nil))
> > 
> > Possibly because the store and load happen in a different mode?  Can
> > you see why CSE doesn't handle this (producing a subreg)?  On
> 
> Yes, exactly! For "{double a[4]; long l;}", because the store and load
> are using a different mode, and then CSE does not optimize it.  This
> patch makes the store and load using the same mode (DImode), and then
> leverage CSE to handle it.

So can we instead fix CSE to consider replacing insn 15 above with

 (insn 15 (set (reg/i:DF 33 1)
   (subreg:DF (reg/f:DI 6 6)))

?  One peculiarity is that this introduces a hardreg use in this
special case.

Richard.

> > the GIMPLE side we'd happily do that (but we don't see the argument
> > setup).
> 
> Thanks for your comments!
> 
> 
> BR,
> Jeff (Jiufu)
> 
> > 
> > Thanks,
> > Richard.
> > 
> >> This patch updates the behavior when loading floating members of a
> >> parameter: if that floating member is stored via integer register,
> >> then loading it as integer mode first, and converting it to floating
> >> mode.
> >> 
> >> I also thought of a method: before storing the register to stack,
> >> convert it to float mode first. While there are some cases that may
> >> still prefer to keep an integer register store.
> >> 
> >> Bootstrap and regtest passes on ppc64{,le}.
> >> I would ask for help to review for comments and if this patch is
> >> acceptable for the trunk.
> >> 
> >> 
> >> BR,
> >> Jeff (Jiufu)

Re: [PATCH] loading float member of parameter stored via int registers

2022-12-21 Thread guojiufu via Gcc-patches

Hi,

On 2022-12-21 15:30, Richard Biener wrote:

On Wed, 21 Dec 2022, Jiufu Guo wrote:


Hi,

This patch is fixing an issue about parameter accessing if the
parameter is struct type and passed through integer registers, and
there is floating member is accessed. Like below code:

typedef struct DF {double a[4]; long l; } DF;
double foo_df (DF arg){return arg.a[3];}

On ppc64le, with trunk gcc, "std 6,-24(1) ; lfd 1,-24(1)" is
generated.  While instruction "mtvsrd 1, 6" would be enough for
this case.


So why do we end up spilling for PPC?


Good question! According to GCC source code (in function.cc/expr.cc),
it is common behavior: using "word_mode" to store the parameter to 
stack,

And using the field's mode (e.g. float mode) to load from the stack.
But with some tries, I fail to construct cases on many platforms.
So, I convert the fix to a target hook and implemented the rs6000 part
first.



struct X { int i; float f; };

float foo (struct X x)
{
  return x.f;
}

does pass the structure in $RDI on x86_64 and we manage (with
optimization, with -O0 we spill) to generate

shrq$32, %rdi
movd%edi, %xmm0

and RTL expansion generates

(note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(insn 2 4 3 2 (set (reg/v:DI 83 [ x ])
(reg:DI 5 di [ x ])) "t.c":4:1 -1
 (nil))
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 7 2 (parallel [
(set (reg:DI 85)
(ashiftrt:DI (reg/v:DI 83 [ x ])
(const_int 32 [0x20])))
(clobber (reg:CC 17 flags))
]) "t.c":5:11 -1
 (nil))
(insn 7 6 8 2 (set (reg:SI 86)
(subreg:SI (reg:DI 85) 0)) "t.c":5:11 -1
 (nil))

I would imagine that for the ppc case we only see the subreg here
which should be even easier to optimize.

So how's this not fixable by providing proper patterns / subreg
capabilities?  Looking a bit at the RTL we have the issue might
be that nothing seems to handle CSE of



This case is also related to 'parameter on struct', PR89310 is
just for this case. On trunk, it is fixed.
One difference: the parameter is in DImode, and passed via an
integer register for "{int i; float f;}".
But for "{double a[4]; long l;}", the parameter is in BLKmode,
and stored to stack during the argument setup.


(note 8 0 5 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(insn 5 8 7 2 (set (mem/c:DI (plus:DI (reg/f:DI 110 sfp)
(const_int 56 [0x38])) [2 arg+24 S8 A64])
(reg:DI 6 6)) "t.c":2:23 679 {*movdi_internal64}
 (expr_list:REG_DEAD (reg:DI 6 6)
(nil)))
(note 7 5 10 2 NOTE_INSN_FUNCTION_BEG)
(note 10 7 15 2 NOTE_INSN_DELETED)
(insn 15 10 16 2 (set (reg/i:DF 33 1)
(mem/c:DF (plus:DI (reg/f:DI 110 sfp)
(const_int 56 [0x38])) [1 arg.a[3]+0 S8 A64])) 
"t.c":2:40

576 {*movdf_hardfloat64}
 (nil))

Possibly because the store and load happen in a different mode?  Can
you see why CSE doesn't handle this (producing a subreg)?  On


Yes, exactly! For "{double a[4]; long l;}", because the store and load
are using a different mode, and then CSE does not optimize it.  This
patch makes the store and load using the same mode (DImode), and then
leverage CSE to handle it.


the GIMPLE side we'd happily do that (but we don't see the argument
setup).


Thanks for your comments!


BR,
Jeff (Jiufu)



Thanks,
Richard.


This patch updates the behavior when loading floating members of a
parameter: if that floating member is stored via integer register,
then loading it as integer mode first, and converting it to floating
mode.

I also thought of a method: before storing the register to stack,
convert it to float mode first. While there are some cases that may
still prefer to keep an integer register store.

Bootstrap and regtest passes on ppc64{,le}.
I would ask for help to review for comments and if this patch is
acceptable for the trunk.


BR,
Jeff (Jiufu)

PR target/108073

gcc/ChangeLog:

* config/rs6000/rs6000.cc (TARGET_LOADING_INT_CONVERT_TO_FLOAT): New
macro definition.
(rs6000_loading_int_convert_to_float): New hook implement.
* doc/tm.texi: Regenerated.
* doc/tm.texi.in (loading_int_convert_to_float): New hook.
* expr.cc (expand_expr_real_1): Updated to use the new hook.
* target.def (loading_int_convert_to_float): New hook.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/pr102024.C: Update.
* gcc.target/powerpc/pr108073.c: New test.

---
 gcc/config/rs6000/rs6000.cc | 70 
+

 gcc/doc/tm.texi |  6 ++
 gcc/doc/tm.texi.in  |  2 +
 gcc/expr.cc | 15 +
 gcc/target.def  | 11 
 gcc/testsuite/g++.target/powerpc/pr102024.C |  2 +-
 gcc/testsuite/gcc.target/powerpc/pr108073.c | 24 +++
 7 files changed, 129 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c

diff --git 

Re: [PATCH] loading float member of parameter stored via int registers

2022-12-20 Thread Richard Biener via Gcc-patches
On Wed, 21 Dec 2022, Jiufu Guo wrote:

> Hi,
> 
> This patch is fixing an issue about parameter accessing if the
> parameter is struct type and passed through integer registers, and
> there is floating member is accessed. Like below code:
> 
> typedef struct DF {double a[4]; long l; } DF;
> double foo_df (DF arg){return arg.a[3];}
> 
> On ppc64le, with trunk gcc, "std 6,-24(1) ; lfd 1,-24(1)" is
> generated.  While instruction "mtvsrd 1, 6" would be enough for
> this case.

So why do we end up spilling for ppc?

struct X { int i; float f; };

float foo (struct X x)
{
  return x.f;
}

does pass the structure in $RDI on x86_64 and we manage (with 
optimization, with -O0 we spill) to generate

shrq$32, %rdi
movd%edi, %xmm0

and RTL expansion generates

(note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(insn 2 4 3 2 (set (reg/v:DI 83 [ x ])
(reg:DI 5 di [ x ])) "t.c":4:1 -1
 (nil))
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 7 2 (parallel [
(set (reg:DI 85)
(ashiftrt:DI (reg/v:DI 83 [ x ])
(const_int 32 [0x20])))
(clobber (reg:CC 17 flags))
]) "t.c":5:11 -1
 (nil))
(insn 7 6 8 2 (set (reg:SI 86)
(subreg:SI (reg:DI 85) 0)) "t.c":5:11 -1
 (nil))

I would imagine that for the ppc case we only see the subreg here
which should be even easier to optimize.

So how's this not fixable by providing proper patterns / subreg
capabilities?  Looking a bit at the RTL we have the issue might
be that nothing seems to handle CSE of

(note 8 0 5 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(insn 5 8 7 2 (set (mem/c:DI (plus:DI (reg/f:DI 110 sfp)
(const_int 56 [0x38])) [2 arg+24 S8 A64])
(reg:DI 6 6)) "t.c":2:23 679 {*movdi_internal64}
 (expr_list:REG_DEAD (reg:DI 6 6)
(nil)))
(note 7 5 10 2 NOTE_INSN_FUNCTION_BEG)
(note 10 7 15 2 NOTE_INSN_DELETED)
(insn 15 10 16 2 (set (reg/i:DF 33 1)
(mem/c:DF (plus:DI (reg/f:DI 110 sfp)
(const_int 56 [0x38])) [1 arg.a[3]+0 S8 A64])) "t.c":2:40 
576 {*movdf_hardfloat64}
 (nil))

Possibly because the store and load happen in a different mode?  Can
you see why CSE doesn't handle this (producing a subreg)?  On
the GIMPLE side we'd happily do that (but we don't see the argument
setup).

Thanks,
Richard.

> This patch updates the behavior when loading floating members of a
> parameter: if that floating member is stored via integer register,
> then loading it as integer mode first, and converting it to floating
> mode.
> 
> I also thought of a method: before storing the register to stack,
> convert it to float mode first. While there are some cases that may
> still prefer to keep an integer register store.   
>
> 
> Bootstrap and regtest passes on ppc64{,le}.
> I would ask for help to review for comments and if this patch is
> acceptable for the trunk.
> 
> 
> BR,
> Jeff (Jiufu)
> 
>   PR target/108073
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000.cc (TARGET_LOADING_INT_CONVERT_TO_FLOAT): New
>   macro definition.
>   (rs6000_loading_int_convert_to_float): New hook implement.
>   * doc/tm.texi: Regenerated.
>   * doc/tm.texi.in (loading_int_convert_to_float): New hook.
>   * expr.cc (expand_expr_real_1): Updated to use the new hook.
>   * target.def (loading_int_convert_to_float): New hook.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.target/powerpc/pr102024.C: Update.
>   * gcc.target/powerpc/pr108073.c: New test.
> 
> ---
>  gcc/config/rs6000/rs6000.cc | 70 +
>  gcc/doc/tm.texi |  6 ++
>  gcc/doc/tm.texi.in  |  2 +
>  gcc/expr.cc | 15 +
>  gcc/target.def  | 11 
>  gcc/testsuite/g++.target/powerpc/pr102024.C |  2 +-
>  gcc/testsuite/gcc.target/powerpc/pr108073.c | 24 +++
>  7 files changed, 129 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index b3a609f3aa3..af676eea276 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -1559,6 +1559,9 @@ static const struct attribute_spec 
> rs6000_attribute_table[] =
>  #undef TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN
>  #define TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN 
> invalid_arg_for_unprototyped_fn
>  
> +#undef TARGET_LOADING_INT_CONVERT_TO_FLOAT
> +#define TARGET_LOADING_INT_CONVERT_TO_FLOAT 
> rs6000_loading_int_convert_to_float
> +
>  #undef TARGET_MD_ASM_ADJUST
>  #define TARGET_MD_ASM_ADJUST rs6000_md_asm_adjust
>  
> @@ -24018,6 +24021,73 @@ invalid_arg_for_unprototyped_fn (const_tree 
> typelist, const_tree funcdecl, const
> : NULL;
>  }
>  
> +/* Implement the TARGET_LOADING_INT_CONVERT_TO_FLOAT. */
> +static rtx
> +rs6000_loading_int_convert_to_float (machine_mode mode, rtx