Re: [PATCH] hppa: Improve expansion of ashldi3 when !TARGET_64BIT

2020-09-12 Thread John David Anglin
Committed.

Regards,
Dave

On 2020-08-22 7:24 p.m., Roger Sayle wrote:
> Hi Dave,
> I actually think using plus_xor_ior operator is useful.  It means that if 
> combine,
> inlining or some other RTL simplification generates these variants, these 
> forms
> will still be recognized by the backend.  It's more typing, but the compiler 
> produces
> better code.
>
> Here's what I have so far, but please feel free to modify anything.  I'll 
> leave the
> rest to you.
>
> With this patch:
>
> unsigned long long rotl4(unsigned long long x)
> {
>   return (x<<4) | (x>>60);
> }
>
> unsigned long long rotr4(unsigned long long x)
> {
>   return (x<<60) | (x>>4);
> }
>
> which previously generated:
>
> rotl4:depd,z %r26,59,60,%r28
>   extrd,u %r26,3,4,%r26
>   bve (%r2)
>   or %r26,%r28,%r28
>
> rotr4:extrd,u %r26,59,60,%r28
>   depd,z %r26,3,4,%r26
>   bve (%r2)
>   or %r26,%r28,%r28
>
> now produces:
>
> rotl4:bve (%r2)
>   shrpd %r26,%r26,60,%r28
>
> rotr4:bve (%r2)
>   shrpd %r26,%r26,4,%r28
>
>
> I'm guessing this is very similar to what you were thinking (or what I 
> described previously).
>
> Many thanks again for trying out these patches/suggestions for me.
>
> Best regards,
> Roger
> --
>
> -Original Message-
> From: John David Anglin  
> Sent: 22 August 2020 23:09
> To: Roger Sayle ; 'GCC Patches' 
> 
> Cc: 'Jeff Law' 
> Subject: Re: [PATCH] hppa: Improve expansion of ashldi3 when !TARGET_64BIT
>
> On 2020-08-22 12:01 p.m., Roger Sayle wrote:
>> I suspect that the issue with the 64-bit patterns is that the second 
>> variant of pa.md's define_insn "shrpdi4" is unlikely ever to match as 
>> (minus:DI (const_int 64) x) is never "canonical" when x is itself a 
>> CONST_INT.  Splitting this define_insn into two (or three see below) 
>> separate forms; the first as it currently is and the second (as you 
>> suggest) with
>>  "TARGET_64BIT
>>&& INTVAL (operands[3]) + INTVAL (operands[4]) == 64"
>> should do the trick.
> I will go ahead and add the basic patterns.  It seems it would be best if I 
> avoid using the "plus_xor_ior_operator".  It also seems the 32-bit patterns 
> should avoid it.
>> My first impression was that the DImode shrpd instructions would be 
>> most useful for implementing TI mode shifts, but that TI mode isn't 
>> supported by hppa64.  But then I noticed that the more immediate 
>> benefit would be in supporting rotrdi3 and rotldi3 on TARGET_64BIT 
>> that currently don't have expanders nor insns defined.  Here GCC 
>> currently generates three instructions where a single shrpd would be 
>> optimal.
> It turns out we now need to support TI mode and __int128 for libgomp.  The 
> hppa64-hpux target won't boot without it.  I had just added a change to 
> support TI mode but it's untested.
>
> Regards,
> Dave
>
> --
> John David Anglin  dave.ang...@bell.net
>


-- 
John David Anglin  dave.ang...@bell.net



Re: [PATCH] hppa: Improve expansion of ashldi3 when !TARGET_64BIT

2020-08-27 Thread Richard Biener via Gcc-patches
On Thu, Aug 27, 2020 at 9:17 AM Roger Sayle  wrote:
>
>
> >On 2020-08-26 5:23 p.m., Roger Sayle wrote:
> >> These more accurate target rtx_costs are used by the
> >> gimple-ssa-strength-reduction.c (via a call to mult_by_coeff_cost) to
> >> decide whether applying strength reduction would be profitable.  This test 
> >> case, slsr-13.c, assumes that two multiplications by four are
> >> cheaper than two multiplications by five.   (I believe) This is not the 
> >> case on hppa which
> >> has a sh2add instruction, that performs a multiplication by five in
> >> one cycle, or exactly the same cost as performing a left shift by two
> >> (i.e. a multiplication by four).  Oddly, I also believe this isn't the
> >> case on x86_64, where the similar lea instruction is (sometimes) as 
> >> efficient as left shift by two bits.
> >This looks like a regression.
> >
> >gcc-10 (prepatch):
> >
> >addl %r25,%r26,%r28
> >sh2addl %r25,%r28,%r25
> >sh2addl %r26,%r28,%r26
> >addl %r26,%r28,%r28
> >bv %r0(%r2)
> >addl %r28,%r25,%r28
> >
> >   [local count: 1073741824]:
> >  x1_4 = c_2(D) + s_3(D);
> >  slsr_11 = s_3(D) * 4;
> >  x2_6 = x1_4 + slsr_11;
> >  slsr_12 = c_2(D) * 4;
> >  x3_8 = x1_4 + slsr_12;
> >  _1 = x1_4 + x2_6;
> >  x_9 = _1 + x3_8;
> >  return x_9;
> >
> >gcc-11 (with patch):
> >
> >addl %r25,%r26,%r19
> >sh2addl %r26,%r26,%r28
> >addl %r28,%r25,%r28
> >sh2addl %r25,%r25,%r25
> >addl %r28,%r19,%r28
> >addl %r25,%r26,%r26
> >bv %r0(%r2)
> >addl %r28,%r26,%r28
> >
> >   [local count: 1073741824]:
> >  x1_4 = c_2(D) + s_3(D);
> >  a2_5 = s_3(D) * 5;
> >  x2_6 = c_2(D) + a2_5;
> >  a3_7 = c_2(D) * 5;
> >  x3_8 = s_3(D) + a3_7;
> >  _1 = x1_4 + x2_6;
> >  x_9 = _1 + x3_8;
> >  return x_9;
> >
> > Regards,
> > Dave
>
> There are two interesting (tree-optimization) observations here.  The first 
> is that at the tree-ssa
> level both of these gimple sequences look to have exactly the same cost, 
> seven assignments on
> a target where *4 is the same cost as *5.  The gimple doesn't attempt to 
> model the sh?add/lea
> instructions that combine may find, so at RTL expansion both sequences look 
> equivalent.  One
> fix may be to have gimple-ssa-strength-reduction.c just prefer 
> multiplications by 2, 4 and 8,
> even on targets that have a single cycle "mul" instruction.
>
> The second observation is why isn't tree-ssa-reassoc.c doing something here.  
> The test case
> is evaluating (s+c)+(s+5*c)+(5*s+c), and this strength reduction test is 
> expecting this to turn
> into "tmp=s+c;  return tmp+(tmp+4*c)+(4*s+tmp" which is clever and an 
> improvement, but
> overlooks the obvious reassociation 7*(s+c).  Indeed LLVM does this in three 
> instructions:

reassoc doesn't work on signed types

>
> tmp1 = s+c;
> tmp2 = tmp1<<3;
> return tmp2-tmp1;
>
> Although the PA backend is (mostly) innocent in this, the lowest impact 
> fix/work around is
> to have multiplications by 2, 4 and 8 return COSTS_N_INSNS(1)-1, to indicate 
> a preference
> when splitting ties.  I'll prepare a patch.
>
> Roger
> --
>
>


RE: [PATCH] hppa: Improve expansion of ashldi3 when !TARGET_64BIT

2020-08-27 Thread Roger Sayle


>On 2020-08-26 5:23 p.m., Roger Sayle wrote:
>> These more accurate target rtx_costs are used by the 
>> gimple-ssa-strength-reduction.c (via a call to mult_by_coeff_cost) to 
>> decide whether applying strength reduction would be profitable.  This test 
>> case, slsr-13.c, assumes that two multiplications by four are
>> cheaper than two multiplications by five.   (I believe) This is not the case 
>> on hppa which
>> has a sh2add instruction, that performs a multiplication by five in 
>> one cycle, or exactly the same cost as performing a left shift by two 
>> (i.e. a multiplication by four).  Oddly, I also believe this isn't the 
>> case on x86_64, where the similar lea instruction is (sometimes) as 
>> efficient as left shift by two bits.
>This looks like a regression.
>
>gcc-10 (prepatch):
>
>addl %r25,%r26,%r28
>sh2addl %r25,%r28,%r25
>sh2addl %r26,%r28,%r26
>addl %r26,%r28,%r28
>bv %r0(%r2)
>addl %r28,%r25,%r28
>
>   [local count: 1073741824]:
>  x1_4 = c_2(D) + s_3(D);
>  slsr_11 = s_3(D) * 4;
>  x2_6 = x1_4 + slsr_11;
>  slsr_12 = c_2(D) * 4;
>  x3_8 = x1_4 + slsr_12;
>  _1 = x1_4 + x2_6;
>  x_9 = _1 + x3_8;
>  return x_9;
>
>gcc-11 (with patch):
>
>addl %r25,%r26,%r19
>sh2addl %r26,%r26,%r28
>addl %r28,%r25,%r28
>sh2addl %r25,%r25,%r25
>addl %r28,%r19,%r28
>addl %r25,%r26,%r26
>bv %r0(%r2)
>addl %r28,%r26,%r28
>
>   [local count: 1073741824]:
>  x1_4 = c_2(D) + s_3(D);
>  a2_5 = s_3(D) * 5;
>  x2_6 = c_2(D) + a2_5;
>  a3_7 = c_2(D) * 5;
>  x3_8 = s_3(D) + a3_7;
>  _1 = x1_4 + x2_6;
>  x_9 = _1 + x3_8;
>  return x_9;
>
> Regards,
> Dave

There are two interesting (tree-optimization) observations here.  The first is 
that at the tree-ssa
level both of these gimple sequences look to have exactly the same cost, seven 
assignments on
a target where *4 is the same cost as *5.  The gimple doesn't attempt to model 
the sh?add/lea
instructions that combine may find, so at RTL expansion both sequences look 
equivalent.  One
fix may be to have gimple-ssa-strength-reduction.c just prefer multiplications 
by 2, 4 and 8,
even on targets that have a single cycle "mul" instruction.

The second observation is why isn't tree-ssa-reassoc.c doing something here.  
The test case
is evaluating (s+c)+(s+5*c)+(5*s+c), and this strength reduction test is 
expecting this to turn
into "tmp=s+c;  return tmp+(tmp+4*c)+(4*s+tmp" which is clever and an 
improvement, but
overlooks the obvious reassociation 7*(s+c).  Indeed LLVM does this in three 
instructions:

tmp1 = s+c;
tmp2 = tmp1<<3;
return tmp2-tmp1;

Although the PA backend is (mostly) innocent in this, the lowest impact 
fix/work around is
to have multiplications by 2, 4 and 8 return COSTS_N_INSNS(1)-1, to indicate a 
preference
when splitting ties.  I'll prepare a patch.

Roger
--




Re: [PATCH] hppa: Improve expansion of ashldi3 when !TARGET_64BIT

2020-08-26 Thread John David Anglin
On 2020-08-26 5:23 p.m., Roger Sayle wrote:
> These more accurate target rtx_costs are used by the 
> gimple-ssa-strength-reduction.c
> (via a call to mult_by_coeff_cost) to decide whether applying strength 
> reduction would
> be profitable.  This test case, slsr-13.c, assumes that two multiplications 
> by four are
> cheaper than two multiplications by five.   (I believe) This is not the case 
> on hppa which
> has a sh2add instruction, that performs a multiplication by five in one 
> cycle, or exactly
> the same cost as performing a left shift by two (i.e. a multiplication by 
> four).  Oddly, I
> also believe this isn't the case on x86_64, where the similar lea instruction 
> is (sometimes)
> as efficient as left shift by two bits.
This looks like a regression.

gcc-10 (prepatch):

    addl %r25,%r26,%r28
    sh2addl %r25,%r28,%r25
    sh2addl %r26,%r28,%r26
    addl %r26,%r28,%r28
    bv %r0(%r2)
    addl %r28,%r25,%r28

   [local count: 1073741824]:
  x1_4 = c_2(D) + s_3(D);
  slsr_11 = s_3(D) * 4;
  x2_6 = x1_4 + slsr_11;
  slsr_12 = c_2(D) * 4;
  x3_8 = x1_4 + slsr_12;
  _1 = x1_4 + x2_6;
  x_9 = _1 + x3_8;
  return x_9;

gcc-11 (with patch):

    addl %r25,%r26,%r19
    sh2addl %r26,%r26,%r28
    addl %r28,%r25,%r28
    sh2addl %r25,%r25,%r25
    addl %r28,%r19,%r28
    addl %r25,%r26,%r26
    bv %r0(%r2)
    addl %r28,%r26,%r28

   [local count: 1073741824]:
  x1_4 = c_2(D) + s_3(D);
  a2_5 = s_3(D) * 5;
  x2_6 = c_2(D) + a2_5;
  a3_7 = c_2(D) * 5;
  x3_8 = s_3(D) + a3_7;
  _1 = x1_4 + x2_6;
  x_9 = _1 + x3_8;
  return x_9;

Regards,
Dave

-- 
John David Anglin  dave.ang...@bell.net



Re: [PATCH] hppa: Improve expansion of ashldi3 when !TARGET_64BIT

2020-08-26 Thread Jeff Law via Gcc-patches
On Wed, 2020-08-26 at 22:23 +0100, Roger Sayle wrote:
> The failure of slsr-13.c is not caused by the patchh3.txt, but the previous 
> patchh2.txt
> that's now on mainline and the gcc-10 branch.  That change provided more 
> accurate
> rtx_costs for hppa, and solved the performance problems with synth_mult.
> 
> These more accurate target rtx_costs are used by the 
> gimple-ssa-strength-reduction.c
> (via a call to mult_by_coeff_cost) to decide whether applying strength 
> reduction would
> be profitable.  This test case, slsr-13.c, assumes that two multiplications 
> by four are
> cheaper than two multiplications by five.   (I believe) This is not the case 
> on hppa which
> has a sh2add instruction, that performs a multiplication by five in one 
> cycle, or exactly
> the same cost as performing a left shift by two (i.e. a multiplication by 
> four).  Oddly, I
> also believe this isn't the case on x86_64, where the similar lea instruction 
> is (sometimes)
> as efficient as left shift by two bits.
Yea, you can do a multiplication by 5 cheap on the PA.  While the x86 can too, I
don't think it's as clear cut a win as the PA, so they may not cost the same as 
a
multiply by 4 or left shift by 2.


> 
> I suspect that slsr-13.c should be expected to fail on some platforms 
> depending upon 
> a targets instruction set/timings.
Sounds like you're right since it depends on mult_by_coeff_cost under the hood 
:(
 I presume you or John will xfail it for the PA.


> 
> Unfortunately, to complicate things in our case, it appears that after RTL 
> optimizations,
> performing this strength reduction actually does results in fewer 
> instructions on the PA,
> so it's the right thing to do.  I'll need to study the logic in 
> gimple-ssa-strength to see
> how mult_by_coeff cost is being used; cost(x*4) == cost(x*5), but cost(x*4+y) 
> < cost(x*5+y).
Yea, x*4+y is cheaper than x*5+y on the PA.  The first is a single sh2add, the
second requires an additional add instruction.

Jeff



RE: [PATCH] hppa: Improve expansion of ashldi3 when !TARGET_64BIT

2020-08-26 Thread Roger Sayle


The failure of slsr-13.c is not caused by the patchh3.txt, but the previous 
patchh2.txt
that's now on mainline and the gcc-10 branch.  That change provided more 
accurate
rtx_costs for hppa, and solved the performance problems with synth_mult.

These more accurate target rtx_costs are used by the 
gimple-ssa-strength-reduction.c
(via a call to mult_by_coeff_cost) to decide whether applying strength 
reduction would
be profitable.  This test case, slsr-13.c, assumes that two multiplications by 
four are
cheaper than two multiplications by five.   (I believe) This is not the case on 
hppa which
has a sh2add instruction, that performs a multiplication by five in one cycle, 
or exactly
the same cost as performing a left shift by two (i.e. a multiplication by 
four).  Oddly, I
also believe this isn't the case on x86_64, where the similar lea instruction 
is (sometimes)
as efficient as left shift by two bits.

I suspect that slsr-13.c should be expected to fail on some platforms depending 
upon 
a targets instruction set/timings.

Unfortunately, to complicate things in our case, it appears that after RTL 
optimizations,
performing this strength reduction actually does results in fewer instructions 
on the PA,
so it's the right thing to do.  I'll need to study the logic in 
gimple-ssa-strength to see
how mult_by_coeff cost is being used; cost(x*4) == cost(x*5), but cost(x*4+y) < 
cost(x*5+y).

My apologies for the inconvenience.

Roger
--

-Original Message-
From: John David Anglin  
Sent: 26 August 2020 21:34
To: l...@redhat.com; Roger Sayle ; 'GCC Patches' 

Subject: Re: [PATCH] hppa: Improve expansion of ashldi3 when !TARGET_64BIT

On 2020-08-26 4:08 p.m., Jeff Law wrote:
> It 3-stages, but trips:
> Tests that now fail, but worked before (5 tests):
>
> gcc.dg/tree-ssa/slsr-13.c scan-tree-dump-times optimized " \\* 4" 2 
> gcc.dg/tree-ssa/slsr-13.c scan-tree-dump-times optimized " \\* 4" 2 
> gcc.dg/tree-ssa/slsr-13.c scan-tree-dump-times optimized " \\* 5" 0 
> gcc.dg/tree-ssa/slsr-13.c scan-tree-dump-times optimized " \\* 5" 0
In my last Linux build, these failed:

FAIL: gcc.dg/tree-ssa/slsr-13.c scan-tree-dump-times optimized " * 4" 2
FAIL: gcc.dg/tree-ssa/slsr-13.c scan-tree-dump-times optimized " * 5" 0

The above didn't include patchh3.txt.  Seems to be a difference in quoting.  
I'll check.
Linux build is close to running this test.

These tests didn't fail on 32-bit hpux without patchh2.txt or in previous Linux 
build.

Dave

--
John David Anglin  dave.ang...@bell.net




Re: [PATCH] hppa: Improve expansion of ashldi3 when !TARGET_64BIT

2020-08-26 Thread John David Anglin
On 2020-08-26 4:08 p.m., Jeff Law wrote:
> It 3-stages, but trips:
> Tests that now fail, but worked before (5 tests):
>
> gcc.dg/tree-ssa/slsr-13.c scan-tree-dump-times optimized " \\* 4" 2
> gcc.dg/tree-ssa/slsr-13.c scan-tree-dump-times optimized " \\* 4" 2
> gcc.dg/tree-ssa/slsr-13.c scan-tree-dump-times optimized " \\* 5" 0
> gcc.dg/tree-ssa/slsr-13.c scan-tree-dump-times optimized " \\* 5" 0
In my last Linux build, these failed:

FAIL: gcc.dg/tree-ssa/slsr-13.c scan-tree-dump-times optimized " * 4" 2
FAIL: gcc.dg/tree-ssa/slsr-13.c scan-tree-dump-times optimized " * 5" 0

The above didn't include patchh3.txt.  Seems to be a difference in quoting.  
I'll check.
Linux build is close to running this test.

These tests didn't fail on 32-bit hpux without patchh2.txt or in previous Linux 
build.

Dave

-- 
John David Anglin  dave.ang...@bell.net



Re: [PATCH] hppa: Improve expansion of ashldi3 when !TARGET_64BIT

2020-08-26 Thread Jeff Law via Gcc-patches
On Sun, 2020-08-23 at 00:24 +0100, Roger Sayle wrote:
> Hi Dave,
> I actually think using plus_xor_ior operator is useful.  It means that if 
> combine,
> inlining or some other RTL simplification generates these variants, these 
> forms
> will still be recognized by the backend.  It's more typing, but the compiler 
> produces
> better code.
> 
> Here's what I have so far, but please feel free to modify anything.  I'll 
> leave the
> rest to you.
> 
> With this patch:
> 
> unsigned long long rotl4(unsigned long long x)
> {
>   return (x<<4) | (x>>60);
> }
> 
> unsigned long long rotr4(unsigned long long x)
> {
>   return (x<<60) | (x>>4);
> }
> 
> which previously generated:
> 
> rotl4:depd,z %r26,59,60,%r28
>   extrd,u %r26,3,4,%r26
>   bve (%r2)
>   or %r26,%r28,%r28
> 
> rotr4:extrd,u %r26,59,60,%r28
>   depd,z %r26,3,4,%r26
>   bve (%r2)
>   or %r26,%r28,%r28
> 
> now produces:
> 
> rotl4:bve (%r2)
>   shrpd %r26,%r26,60,%r28
> 
> rotr4:bve (%r2)
>   shrpd %r26,%r26,4,%r28
> 
> 
> I'm guessing this is very similar to what you were thinking (or what I 
> described previously).
> 
> Many thanks again for trying out these patches/suggestions for me.
So I put this one into the tester overnight. 

It 3-stages, but trips:
Tests that now fail, but worked before (5 tests):

gcc.dg/tree-ssa/slsr-13.c scan-tree-dump-times optimized " \\* 4" 2
gcc.dg/tree-ssa/slsr-13.c scan-tree-dump-times optimized " \\* 4" 2
gcc.dg/tree-ssa/slsr-13.c scan-tree-dump-times optimized " \\* 5" 0
gcc.dg/tree-ssa/slsr-13.c scan-tree-dump-times optimized " \\* 5" 0
gcc.target/hppa/shadd-2.c scan-assembler-times sh.add 2

I think we've already discussed shadd-2.  It's not immediately clear if the slsr
failure is due to this change or something different -- the latter seems like a
definite possibility as we're changing things as we leave gimple, but the test 
is
checking the result of the gimple optimizers.

Your call on how to proceed.  I can put additional patches into the tester
easily, so if you've got something you want investigated, don't hesitate to 
reach
out.

jeff



Re: [PATCH] hppa: Improve expansion of ashldi3 when !TARGET_64BIT

2020-08-25 Thread John David Anglin
On 2020-08-25 10:05 a.m., Jeff Law wrote:
> On Sat, 2020-08-22 at 09:52 +0100, Roger Sayle wrote:
>> Hi Dave,
>>
>> It's great to hear from you.  It's been a long while.
>>
>> Sorry, doh! yes, there's a mistake in my patch (that I introduced when I
>> renumbered
>> the operands in the shd's define_expand to be the more logical 0, 1, 2, 3,
>> then 4).
>> Sorry for the inconvenience [due to my lack of familiarity with PA-RISC
>> assembly].
>> Hopefully you should get much better mileage out of the attached revision.
> FWIW this version 3-staged in my tester and is currently running the 
> testsuite.
>  It should finish testing later today.
>
> If David doesn't have objections, then I suggest going forward with the patch
> independent of the others you've got queued up.
The patch is fine.  Testing completed this morning on hpux and there are no 
regressions.

Please install on trunk and gcc-10 branch.

Thanks very much,
Dave

-- 
John David Anglin  dave.ang...@bell.net



Re: [PATCH] hppa: Improve expansion of ashldi3 when !TARGET_64BIT

2020-08-25 Thread Jeff Law via Gcc-patches
On Sat, 2020-08-22 at 09:52 +0100, Roger Sayle wrote:
> Hi Dave,
> 
> It's great to hear from you.  It's been a long while.
> 
> Sorry, doh! yes, there's a mistake in my patch (that I introduced when I
> renumbered
> the operands in the shd's define_expand to be the more logical 0, 1, 2, 3,
> then 4).
> Sorry for the inconvenience [due to my lack of familiarity with PA-RISC
> assembly].
> Hopefully you should get much better mileage out of the attached revision.
FWIW this version 3-staged in my tester and is currently running the testsuite.
 It should finish testing later today.

If David doesn't have objections, then I suggest going forward with the patch
independent of the others you've got queued up.

jeff
> 



Re: [PATCH] hppa: Improve expansion of ashldi3 when !TARGET_64BIT

2020-08-24 Thread Jeff Law via Gcc-patches
On Fri, 2020-08-21 at 13:53 +0100, Roger Sayle wrote:
> This patch improves the code generated on PA-RISC for DImode
> (double word) left shifts by small constants (1-31).  This target
> has a very cool shd instruction that can be recognized by combine
> for simple shifts, but relying on combine is fragile for more
> complicated functions.  This patch tweaks pa.md's ashldi3 expander,
> to form the optimal two instruction shd/zdep sequence at RTL
> expansion time.
> 
> As an example of the benefits of this approach, the simple function
> 
> unsigned long long u9(unsigned long long x) { return x*9; }
> 
> currently generates 9 instructions
> 
> u9: copy %r25,%r28
> copy %r26,%r29
> extru %r26,2,3,%r21
> zdep %r25,28,29,%r19
> zdep %r26,28,29,%r20
> or %r21,%r19,%r19
> add %r29,%r20,%r29
> addc %r28,%r19,%r28
> bv,n %r0(%r2)
> 
> and with this patch now requires only 7:
> 
> u9: copy %r25,%r28
> copy %r26,%r29
> shd %r26,%r25,29,%r19
> zdep %r26,28,29,%r20
> add %r29,%r20,%r29
> addc %r28,%r19,%r28
> bv,n %r0(%r2)
> 
> 
> This improvement is a first step towards getting synth_mult to
> behave sanely on hppa (PR middle-end/87256).
> 
> Unfortunately, it's been a long while since I've had access to a
> hppa system, so apart from building a cross-compiler and looking at
> the assembler it generates, this patch is completely untested.
> I was wondering whether Dave or Jeff (or someone else with access
> to real hardware) might "spin" this patch for me?
So while I have an R390 down in the basement, it overheats and I don't think 
I've
turned it on in a few years.

What I do for testing is exploit QEMU user mode emulation.  I have a little root
filesystem with HPPA binaries -- there's enough bits in that filesystem to build
binutils, gcc, glibc & the linux kernel as well as run the testsuite.

The root filesystems are here:

https://github.com/JeffreyALaw/chroots

The Jenkins tester spins the PA once week.  It failed its last run due to
regressions in the analyzer.  I didn't realize it was running the static 
analyzer
for the qemu emulated targets, which I turned off yesterday and re-started the 
PA
build.

http://3.14.90.209:8080/job/hppa-linux-gnu/



Jeff
> 



RE: [PATCH] hppa: Improve expansion of ashldi3 when !TARGET_64BIT

2020-08-22 Thread Roger Sayle

Hi Dave,
I actually think using plus_xor_ior operator is useful.  It means that if 
combine,
inlining or some other RTL simplification generates these variants, these forms
will still be recognized by the backend.  It's more typing, but the compiler 
produces
better code.

Here's what I have so far, but please feel free to modify anything.  I'll leave 
the
rest to you.

With this patch:

unsigned long long rotl4(unsigned long long x)
{
  return (x<<4) | (x>>60);
}

unsigned long long rotr4(unsigned long long x)
{
  return (x<<60) | (x>>4);
}

which previously generated:

rotl4:  depd,z %r26,59,60,%r28
extrd,u %r26,3,4,%r26
bve (%r2)
or %r26,%r28,%r28

rotr4:  extrd,u %r26,59,60,%r28
depd,z %r26,3,4,%r26
bve (%r2)
or %r26,%r28,%r28

now produces:

rotl4:  bve (%r2)
shrpd %r26,%r26,60,%r28

rotr4:  bve (%r2)
shrpd %r26,%r26,4,%r28


I'm guessing this is very similar to what you were thinking (or what I 
described previously).

Many thanks again for trying out these patches/suggestions for me.

Best regards,
Roger
--

-Original Message-
From: John David Anglin  
Sent: 22 August 2020 23:09
To: Roger Sayle ; 'GCC Patches' 

Cc: 'Jeff Law' 
Subject: Re: [PATCH] hppa: Improve expansion of ashldi3 when !TARGET_64BIT

On 2020-08-22 12:01 p.m., Roger Sayle wrote:
> I suspect that the issue with the 64-bit patterns is that the second 
> variant of pa.md's define_insn "shrpdi4" is unlikely ever to match as 
> (minus:DI (const_int 64) x) is never "canonical" when x is itself a 
> CONST_INT.  Splitting this define_insn into two (or three see below) 
> separate forms; the first as it currently is and the second (as you 
> suggest) with
>   "TARGET_64BIT
> && INTVAL (operands[3]) + INTVAL (operands[4]) == 64"
> should do the trick.
I will go ahead and add the basic patterns.  It seems it would be best if I 
avoid using the "plus_xor_ior_operator".  It also seems the 32-bit patterns 
should avoid it.
>
> My first impression was that the DImode shrpd instructions would be 
> most useful for implementing TI mode shifts, but that TI mode isn't 
> supported by hppa64.  But then I noticed that the more immediate 
> benefit would be in supporting rotrdi3 and rotldi3 on TARGET_64BIT 
> that currently don't have expanders nor insns defined.  Here GCC 
> currently generates three instructions where a single shrpd would be 
> optimal.
It turns out we now need to support TI mode and __int128 for libgomp.  The 
hppa64-hpux target won't boot without it.  I had just added a change to support 
TI mode but it's untested.

Regards,
Dave

--
John David Anglin  dave.ang...@bell.net



patchh3.log
Description: Binary data
diff --git a/gcc/config/pa/pa.md b/gcc/config/pa/pa.md
index 6350c68..5f04c02 100644
--- a/gcc/config/pa/pa.md
+++ b/gcc/config/pa/pa.md
@@ -6604,32 +6604,82 @@
(set_attr "length" "4")])
 
 ; Shift right pair word 0 to 31 bits.
-(define_insn "shrpsi4"
-  [(set (match_operand:SI 0 "register_operand" "=r,r")
-   (ior:SI (ashift:SI (match_operand:SI 1 "register_operand" "r,r")
-  (minus:SI (const_int 32)
-(match_operand:SI 3 "shift5_operand" "q,n")))
-   (lshiftrt:SI (match_operand:SI 2 "register_operand" "r,r")
-(match_dup 3]
+(define_insn "*shrpsi4_1"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+   (match_operator:SI 4 "plus_xor_ior_operator"
+ [(ashift:SI (match_operand:SI 1 "register_operand" "r")
+ (minus:SI (const_int 32)
+   (match_operand:SI 3 "register_operand" "q")))
+  (lshiftrt:SI (match_operand:SI 2 "register_operand" "r")
+   (match_dup 3))]))]
   ""
-  "@
-   {vshd %1,%2,%0|shrpw %1,%2,%%sar,%0}
-   {shd|shrpw} %1,%2,%3,%0"
+  "{vshd %1,%2,%0|shrpw %1,%2,%%sar,%0}"
+  [(set_attr "type" "shift")
+   (set_attr "length" "4")])
+
+(define_insn "*shrpsi4_2"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+   (match_operator:SI 4 "plus_xor_ior_operator"
+ [(lshiftrt:SI (match_operand:SI 2 "register_operand" "r")
+   (match_operand:SI 3 "register_operand" "q"))
+  (ashift:SI (match_operand:SI 1 "register_operand" "r")
+ (minus:SI (const_int 32)
+   (match_dup 3)))]))]
+  ""

Re: [PATCH] hppa: Improve expansion of ashldi3 when !TARGET_64BIT

2020-08-22 Thread John David Anglin
On 2020-08-22 12:01 p.m., Roger Sayle wrote:
> I suspect that the issue with the 64-bit patterns is that the second variant
> of 
> pa.md's define_insn "shrpdi4" is unlikely ever to match as (minus:DI
> (const_int 64) x)
> is never "canonical" when x is itself a CONST_INT.  Splitting this
> define_insn
> into two (or three see below) separate forms; the first as it currently is
> and the
> second (as you suggest) with 
>   "TARGET_64BIT
> && INTVAL (operands[3]) + INTVAL (operands[4]) == 64"
> should do the trick.
I will go ahead and add the basic patterns.  It seems it would be best if I 
avoid
using the "plus_xor_ior_operator".  It also seems the 32-bit patterns should 
avoid it.
>
> My first impression was that the DImode shrpd instructions would be most
> useful for implementing TI mode shifts, but that TI mode isn't supported by
> hppa64.  But then I noticed that the more immediate benefit would be in
> supporting rotrdi3 and rotldi3 on TARGET_64BIT that currently don't have
> expanders nor insns defined.  Here GCC currently generates three
> instructions
> where a single shrpd would be optimal.
It turns out we now need to support TI mode and __int128 for libgomp.  The 
hppa64-hpux target won't
boot without it.  I had just added a change to support TI mode but it's 
untested.

Regards,
Dave

-- 
John David Anglin  dave.ang...@bell.net



RE: [PATCH] hppa: Improve expansion of ashldi3 when !TARGET_64BIT

2020-08-22 Thread Roger Sayle


Hi Dave,

Many thanks for your help.

I suspect that the issue with the 64-bit patterns is that the second variant
of 
pa.md's define_insn "shrpdi4" is unlikely ever to match as (minus:DI
(const_int 64) x)
is never "canonical" when x is itself a CONST_INT.  Splitting this
define_insn
into two (or three see below) separate forms; the first as it currently is
and the
second (as you suggest) with 
"TARGET_64BIT
  && INTVAL (operands[3]) + INTVAL (operands[4]) == 64"
should do the trick.

My first impression was that the DImode shrpd instructions would be most
useful for implementing TI mode shifts, but that TI mode isn't supported by
hppa64.  But then I noticed that the more immediate benefit would be in
supporting rotrdi3 and rotldi3 on TARGET_64BIT that currently don't have
expanders nor insns defined.  Here GCC currently generates three
instructions
where a single shrpd would be optimal.

I'm sure that folks are aware of this but something I learned/found strange
was
the implications of using (match_operator:SI 5 "plus_xor_ior_operator" x y).
Clearly it's nice to have these patterns match IOR, PLUS and XOR (as
recently
pointed out by Jakub in a recent review), but using match_operator has two
side-effects.  The first is that recog's machinery doesn't know these
operators 
are commutative, so two variants of a define_insn using match_operator need
to be specified, i.e. (... y x).  The second side-effect is that for
generation/expansion
it's impossible (inefficient?) to use the corresponding gen_foo functions.
It would
be nice if match_operator:5 resulted in the gen_function taking an enum
rtx_code as the corresponding argument, to specify IOR or PLUS or XOR.
Alas no.  So instead I added the extremely light weight define_expand
for SImode shd_internal to select IOR arbitrarily as the preferred RTL.
You're right, that a similar strategy using a DImode shrpd_internal would
be required for the 64-bit form; but the two are different; it's not that
the
current name would be affected (or should be changed).

Hence the define_expand for rotrdi3 and rotldi3 would/should (for constant
shifts) call gen_shrpd_internal to create an insn that matches one of the
define_insns.

I'd be happy to write that patch, but I personally have no way of testing
it.
It's a shame there isn't a hppa64 machine on the GCC compile farm.

[p.s. I've ordered Gerry Kane's "PA-RISC 2.0 architecture" book from amazon,
so I'll hopefully understand more of what I'm talking about when it
arrives].

Thanks again.
Roger
--

-Original Message-
From: John David Anglin  
Sent: 22 August 2020 13:58
To: Roger Sayle ; 'GCC Patches'

Cc: 'Jeff Law' 
Subject: Re: [PATCH] hppa: Improve expansion of ashldi3 when !TARGET_64BIT

Hi Roger,

Started a test of your latest version.

It appears we miss 64-bit patterns similar to these:
(define_insn ""
  [(set (match_operand:SI 0 "register_operand" "=r")
    (match_operator:SI 5 "plus_xor_ior_operator"
  [(ashift:SI (match_operand:SI 1 "register_operand" "r")
  (match_operand:SI 3 "const_int_operand" "n"))
   (lshiftrt:SI (match_operand:SI 2 "register_operand" "r")
    (match_operand:SI 4 "const_int_operand" "n"))]))]
  "INTVAL (operands[3]) + INTVAL (operands[4]) == 32"
  "{shd|shrpw} %1,%2,%4,%0"
  [(set_attr "type" "shift")
   (set_attr "length" "4")])

(define_insn ""
  [(set (match_operand:SI 0 "register_operand" "=r")
    (match_operator:SI 5 "plus_xor_ior_operator"
  [(lshiftrt:SI (match_operand:SI 2 "register_operand" "r")
    (match_operand:SI 4 "const_int_operand" "n"))
   (ashift:SI (match_operand:SI 1 "register_operand" "r")
  (match_operand:SI 3 "const_int_operand" "n"))]))]
  "INTVAL (operands[3]) + INTVAL (operands[4]) == 32"
  "{shd|shrpw} %1,%2,%4,%0"
  [(set_attr "type" "shift")
   (set_attr "length" "4")])

I'm wondering if it would be useful to define the 64-bit equivalents using
the "shrpd" instruction.
If that's the case, then I think it would be good to rename "shd_internal"
to "shrpw_internal".

Regards,
Dave

On 2020-08-22 4:52 a.m., Roger Sayle wrote:
> Hi Dave,
>
> It's great to hear from you.  It's been a long while.
>
> Sorry, doh! yes, there's a mistake in my patch (that I introduced when 
> I renumbered the operands in the shd's define_expand to be t

Re: [PATCH] hppa: Improve expansion of ashldi3 when !TARGET_64BIT

2020-08-22 Thread John David Anglin
Hi Roger,

Started a test of your latest version.

It appears we miss 64-bit patterns similar to these:
(define_insn ""
  [(set (match_operand:SI 0 "register_operand" "=r")
    (match_operator:SI 5 "plus_xor_ior_operator"
  [(ashift:SI (match_operand:SI 1 "register_operand" "r")
  (match_operand:SI 3 "const_int_operand" "n"))
   (lshiftrt:SI (match_operand:SI 2 "register_operand" "r")
    (match_operand:SI 4 "const_int_operand" "n"))]))]
  "INTVAL (operands[3]) + INTVAL (operands[4]) == 32"
  "{shd|shrpw} %1,%2,%4,%0"
  [(set_attr "type" "shift")
   (set_attr "length" "4")])

(define_insn ""
  [(set (match_operand:SI 0 "register_operand" "=r")
    (match_operator:SI 5 "plus_xor_ior_operator"
  [(lshiftrt:SI (match_operand:SI 2 "register_operand" "r")
    (match_operand:SI 4 "const_int_operand" "n"))
   (ashift:SI (match_operand:SI 1 "register_operand" "r")
  (match_operand:SI 3 "const_int_operand" "n"))]))]
  "INTVAL (operands[3]) + INTVAL (operands[4]) == 32"
  "{shd|shrpw} %1,%2,%4,%0"
  [(set_attr "type" "shift")
   (set_attr "length" "4")])

I'm wondering if it would be useful to define the 64-bit equivalents using the 
"shrpd" instruction.
If that's the case, then I think it would be good to rename "shd_internal" to 
"shrpw_internal".

Regards,
Dave

On 2020-08-22 4:52 a.m., Roger Sayle wrote:
> Hi Dave,
>
> It's great to hear from you.  It's been a long while.
>
> Sorry, doh! yes, there's a mistake in my patch (that I introduced when I
> renumbered
> the operands in the shd's define_expand to be the more logical 0, 1, 2, 3,
> then 4).
> Sorry for the inconvenience [due to my lack of familiarity with PA-RISC
> assembly].
> Hopefully you should get much better mileage out of the attached revision.
>
> Thanks again (and my sincere apologies),
> Roger
> --
>
> -Original Message-
> From: John David Anglin  
> Sent: 21 August 2020 20:00
> To: Roger Sayle ; 'GCC Patches'
> 
> Cc: 'Jeff Law' 
> Subject: Re: [PATCH] hppa: Improve expansion of ashldi3 when !TARGET_64BIT
>
> Hi Roger,
>
> On 2020-08-21 8:53 a.m., Roger Sayle wrote:
>> I was wondering whether Dave or Jeff (or someone else with access to 
>> real hardware) might "spin" this patch for me?
> This may be totally unrelated to this patch but I hit this error in stage2
> testing your change:
> build/genattrtab ../../gcc/gcc/common.md ../../gcc/gcc/config/pa/pa.md
> insn-conditions.md \
>     -Atmp-attrtab.c -Dtmp-dfatab.c -Ltmp-latencytab.c
> genattrtab: Internal error: abort in attr_alt_union, at genattrtab.c:2383
>
> It's great that you are back helpting with the middle-end.
>
> Regards,
> Dave
>
> --
> John David Anglin  dave.ang...@bell.net
>


-- 
John David Anglin  dave.ang...@bell.net



RE: [PATCH] hppa: Improve expansion of ashldi3 when !TARGET_64BIT

2020-08-22 Thread Roger Sayle

Hi Dave,

It's great to hear from you.  It's been a long while.

Sorry, doh! yes, there's a mistake in my patch (that I introduced when I
renumbered
the operands in the shd's define_expand to be the more logical 0, 1, 2, 3,
then 4).
Sorry for the inconvenience [due to my lack of familiarity with PA-RISC
assembly].
Hopefully you should get much better mileage out of the attached revision.

Thanks again (and my sincere apologies),
Roger
--

-Original Message-
From: John David Anglin  
Sent: 21 August 2020 20:00
To: Roger Sayle ; 'GCC Patches'

Cc: 'Jeff Law' 
Subject: Re: [PATCH] hppa: Improve expansion of ashldi3 when !TARGET_64BIT

Hi Roger,

On 2020-08-21 8:53 a.m., Roger Sayle wrote:
> I was wondering whether Dave or Jeff (or someone else with access to 
> real hardware) might "spin" this patch for me?
This may be totally unrelated to this patch but I hit this error in stage2
testing your change:
build/genattrtab ../../gcc/gcc/common.md ../../gcc/gcc/config/pa/pa.md
insn-conditions.md \
    -Atmp-attrtab.c -Dtmp-dfatab.c -Ltmp-latencytab.c
genattrtab: Internal error: abort in attr_alt_union, at genattrtab.c:2383

It's great that you are back helpting with the middle-end.

Regards,
Dave

--
John David Anglin  dave.ang...@bell.net

diff --git a/gcc/config/pa/pa.md b/gcc/config/pa/pa.md
index 6350c68..713ff17 100644
--- a/gcc/config/pa/pa.md
+++ b/gcc/config/pa/pa.md
@@ -6416,9 +6416,32 @@
   [(set (match_operand:DI 0 "register_operand" "")
(ashift:DI (match_operand:DI 1 "lhs_lshift_operand" "")
   (match_operand:DI 2 "arith32_operand" "")))]
-  "TARGET_64BIT"
+  ""
   "
 {
+  if (!TARGET_64BIT)
+{
+  if (REG_P (operands[0]) && GET_CODE (operands[2]) == CONST_INT)
+   {
+ unsigned HOST_WIDE_INT shift = UINTVAL (operands[2]);
+ if (shift >= 1 && shift <= 31)
+   {
+ rtx dst = operands[0];
+ rtx src = force_reg (DImode, operands[1]);
+ emit_insn (gen_shd_internal (gen_highpart (SImode, dst),
+  gen_lowpart (SImode, src),
+  GEN_INT (32-shift),
+  gen_highpart (SImode, src),
+  GEN_INT (shift)));
+ emit_insn (gen_ashlsi3 (gen_lowpart (SImode, dst),
+ gen_lowpart (SImode, src),
+ GEN_INT (shift)));
+ DONE;
+   }
+   }
+  /* Fallback to using optabs.c's expand_doubleword_shift.  */
+  FAIL;
+}
   if (GET_CODE (operands[2]) != CONST_INT)
 {
   rtx temp = gen_reg_rtx (DImode);
@@ -6705,6 +6728,15 @@
   [(set_attr "type" "shift")
(set_attr "length" "4")])
 
+(define_expand "shd_internal"
+  [(set (match_operand:SI 0 "register_operand")
+   (ior:SI
+ (lshiftrt:SI (match_operand:SI 1 "register_operand")
+  (match_operand:SI 2 "const_int_operand"))
+ (ashift:SI (match_operand:SI 3 "register_operand")
+(match_operand:SI 4 "const_int_operand"]
+  "")
+
 (define_insn ""
   [(set (match_operand:SI 0 "register_operand" "=r")
(and:SI (ashift:SI (match_operand:SI 1 "register_operand" "r")


Re: [PATCH] hppa: Improve expansion of ashldi3 when !TARGET_64BIT

2020-08-21 Thread John David Anglin
Hi Roger,

On 2020-08-21 8:53 a.m., Roger Sayle wrote:
> I was wondering whether Dave or Jeff (or someone else with access
> to real hardware) might "spin" this patch for me?
This may be totally unrelated to this patch but I hit this error in stage2 
testing your change:
build/genattrtab ../../gcc/gcc/common.md ../../gcc/gcc/config/pa/pa.md 
insn-conditions.md \
    -Atmp-attrtab.c -Dtmp-dfatab.c -Ltmp-latencytab.c
genattrtab: Internal error: abort in attr_alt_union, at genattrtab.c:2383

It's great that you are back helpting with the middle-end.

Regards,
Dave

-- 
John David Anglin  dave.ang...@bell.net