from:"David Edelsohn"

Re: [RS6000] Correct PIC_OFFSET_TABLE_REGNUM

2016-05-04 Thread David Edelsohn

On Wed, May 4, 2016 at 1:30 AM, Alan Modra  wrote:
> Leaving this as r30 results in pic_offset_table_rtx of (reg 30)
> for -m64, which is completely bogus.  Various rtl analysis predicate
> functions treat pic_offset_table_rtx specially..
>
> Bootsrapped etc.  OK to apply?
>
> * config/rs6000/rs6000.h (PIC_OFFSET_TABLE_REGNUM): Correct.
>
> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
> index 230ca43..9647106 100644
> --- a/gcc/config/rs6000/rs6000.h
> +++ b/gcc/config/rs6000/rs6000.h
> @@ -2050,7 +2050,10 @@ do {   
>\
> to allocate such a register (if necessary).  */
>
>  #define RS6000_PIC_OFFSET_TABLE_REGNUM 30
> -#define PIC_OFFSET_TABLE_REGNUM (flag_pic ? RS6000_PIC_OFFSET_TABLE_REGNUM : 
> INVALID_REGNUM)
> +#define PIC_OFFSET_TABLE_REGNUM \
> +  (TARGET_TOC ? TOC_REGISTER   \
> +   : flag_pic ? RS6000_PIC_OFFSET_TABLE_REGNUM \
> +   : INVALID_REGNUM)
>
>  #define TOC_REGISTER (TARGET_MINIMAL_TOC ? RS6000_PIC_OFFSET_TABLE_REGNUM : 
> 2)

Okay.

Thanks, David

Re: Enabling -frename-registers?

2016-05-03 Thread David Edelsohn

On Tue, May 3, 2016 at 6:52 PM, Bernd Schmidt <bschm...@redhat.com> wrote:
> On 05/03/2016 11:26 PM, David Edelsohn wrote:
>>
>> Optimizations enabled by default at -O2 should show an overall net
>> benefit -- that is the general justification that we have used in the
>> past.  I request that this change be reverted until more compelling
>> evidence of benefit is presented.
>
> Shrug. Done. I was going to look at adding some more smarts to it to clean
> up after the register allocator; I guess I shan't bother.

Why not add the smarts and then x86, POWER, ARM and other
architectures can test the performance benefit?  If that change makes
the benefit more consistent, the motivation for making the option the
default at -O2 will be clearer.

Thanks, David

Re: Enabling -frename-registers?

2016-05-03 Thread David Edelsohn

On Fri, Apr 29, 2016 at 10:21 AM, Richard Biener
<richard.guent...@gmail.com> wrote:
> On April 29, 2016 3:48:37 PM GMT+02:00, David Edelsohn <dje@gmail.com> 
> wrote:
>>On Fri, Apr 29, 2016 at 9:44 AM, Bernd Schmidt <bschm...@redhat.com>
>>wrote:
>>>
>>>
>>> On 04/29/2016 03:42 PM, David Edelsohn wrote:
>>>>
>>>> On Fri, Apr 29, 2016 at 9:32 AM, Bernd Schmidt <bschm...@redhat.com>
>>>> wrote:
>>>>>
>>>>> On 04/29/2016 03:02 PM, David Edelsohn wrote:
>>>>>>
>>>>>>
>>>>>> How has this show general benefit for all architectures to deserve
>>>>>> enabling it by default at -O2?
>>>>>
>>>>>
>>>>>
>>>>> It should improve postreload scheduling in general, and it can also
>>help
>>>>> clear up bad code generation left behind by register allocation.
>>>>
>>>>
>>>> Did you test the actual performance benefit on any architectures,
>>>> especially architectures other than x86?
>>>
>>>
>>> No. If that's the standard, I'll back out the change.
>>
>>It seems rather strange to enable an optimization by default across
>>all targets without even knowing the performance impact.
>>
>>I'm eager to learn the opinion of others about this.
>
> It shows overall benefit on Itanic and ups and downs on x86.
>
> It's stage1 and the easiest to get feedback for all archs is to enable it by 
> default.

Bernd and Richard,

The IBM LTC team has tested the benefit of -frename-registers at -O2
and sees no net benefit for PowerPC -- some benchmarks improve
slightly but others degrade slightly (a few percent).  You mentioned
no overall benefit for x86.  Although you mentioned benefit for
Itanium, it is not a primary nor secondary architecture target for GCC
and continues to have limited adoption.  Andreas also reported a
bootstrap comparison failure for Itanium due to the change.

If I understand correctly, the change to enable -frename-registers was
motivated by PR 59173 for SSE.  The performance experiments, bootstrap
failure, and testsuite fallout do not provide good justification for
enabling this optimization by default at -O2 for all targets.  If this
helps SSE, this optimization can be enabled using target-specific
override mechanisms for that target.

Optimizations enabled by default at -O2 should show an overall net
benefit -- that is the general justification that we have used in the
past.  I request that this change be reverted until more compelling
evidence of benefit is presented.

Thanks, David

[wwwdocs] 2015 ACM Software System Award announcement

2016-04-29 Thread David Edelsohn

*** index.html  27 Apr 2016 13:06:08 -  1.1005
--- index.html  29 Apr 2016 17:44:45 -
*** mission statement.
*** 47,52 
--- 47,56 
  News
  

+ http://www.acm.org/awards/2015-technical-awards;>2015
ACM Software System Award
+ [2016-04-29]
+ 
+
  GCC 6.1 released
  [2016-04-27]

Re: Enabling -frename-registers?

2016-04-29 Thread David Edelsohn

On Fri, Apr 29, 2016 at 9:44 AM, Bernd Schmidt <bschm...@redhat.com> wrote:
>
>
> On 04/29/2016 03:42 PM, David Edelsohn wrote:
>>
>> On Fri, Apr 29, 2016 at 9:32 AM, Bernd Schmidt <bschm...@redhat.com>
>> wrote:
>>>
>>> On 04/29/2016 03:02 PM, David Edelsohn wrote:
>>>>
>>>>
>>>> How has this show general benefit for all architectures to deserve
>>>> enabling it by default at -O2?
>>>
>>>
>>>
>>> It should improve postreload scheduling in general, and it can also help
>>> clear up bad code generation left behind by register allocation.
>>
>>
>> Did you test the actual performance benefit on any architectures,
>> especially architectures other than x86?
>
>
> No. If that's the standard, I'll back out the change.

It seems rather strange to enable an optimization by default across
all targets without even knowing the performance impact.

I'm eager to learn the opinion of others about this.

Thanks, David

Re: Enabling -frename-registers?

2016-04-29 Thread David Edelsohn

On Fri, Apr 29, 2016 at 9:32 AM, Bernd Schmidt <bschm...@redhat.com> wrote:
> On 04/29/2016 03:02 PM, David Edelsohn wrote:
>>
>> How has this show general benefit for all architectures to deserve
>> enabling it by default at -O2?
>
>
> It should improve postreload scheduling in general, and it can also help
> clear up bad code generation left behind by register allocation.

Did you test the actual performance benefit on any architectures,
especially architectures other than x86?

Thanks, David

Re: Enabling -frename-registers?

2016-04-29 Thread David Edelsohn

How has this show general benefit for all architectures to deserve
enabling it by default at -O2?

As an aside, this change seems to be the source of a new code
generation bug affecting the PPC kernel.

Thanks, David

[PATCH] Fix PR target/69810 (GCC 7)

2016-04-28 Thread David Edelsohn

This PR was fixed earlier with a patch that was deemed safe for GCC 6
through the removal of splitters for zero extend and sign extend to
HImode.

Now that trunk has opened for GCC 7 development, the following patch
restores the splitters and fixes the bug in the more aggressive manner
originally proposed: disallow patterns for extension to HImode by
removing HImode from the iterator.  The PowerPC architecture does not
provide any instructions that directly operate on HImode, so it's
better for GCC to operate on it as SUBREG except for loads and stores,
as this patch accomplishes.

Bootstrapped on powerpc-ibm-aix7.1.0.0.

Thanks, David

PR target/69810
* config/rs6000/rs6000.md (EXTQI): Don't allow extension to HImode.
(zero_extendqi2_dot): Revert earlier conversion from
define_insn_and_split to define_insn.
(zero_extendqi2_dot2): Same.
(extendqi2_dot): Same.
(extendqi2_dot2): Same.

Index: rs6000.md
===
--- rs6000.md   (revision 235573)
+++ rs6000.md   (working copy)
@@ -322,7 +322,7 @@
 (define_mode_iterator INT1 [QI HI SI (DI "TARGET_POWERPC64")])

 ; Everything we can extend QImode to.
-(define_mode_iterator EXTQI [HI SI (DI "TARGET_POWERPC64")])
+(define_mode_iterator EXTQI [SI (DI "TARGET_POWERPC64")])

 ; Everything we can extend HImode to.
 (define_mode_iterator EXTHI [SI (DI "TARGET_POWERPC64")])
@@ -711,7 +711,7 @@
rlwinm %0,%1,0,0xff"
   [(set_attr "type" "load,shift")])

-(define_insn "*zero_extendqi2_dot"
+(define_insn_and_split "*zero_extendqi2_dot"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x,?y")
(compare:CC (zero_extend:EXTQI (match_operand:QI 1 "gpc_reg_operand" "r,
r"))
(const_int 0)))
@@ -719,12 +719,19 @@
   "rs6000_gen_cell_microcode"
   "@
andi. %0,%1,0xff
-   rlwinm %0,%1,0,0xff\;cmpwi %2,%0,0"
+   #"
+  "&& reload_completed && cc_reg_not_cr0_operand (operands[2], CCmode)"
+  [(set (match_dup 0)
+   (zero_extend:EXTQI (match_dup 1)))
+   (set (match_dup 2)
+   (compare:CC (match_dup 0)
+   (const_int 0)))]
+  ""
   [(set_attr "type" "logical")
(set_attr "dot" "yes")
(set_attr "length" "4,8")])

-(define_insn "*zero_extendqi2_dot2"
+(define_insn_and_split "*zero_extendqi2_dot2"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x,?y")
(compare:CC (zero_extend:EXTQI (match_operand:QI 1 "gpc_reg_operand" "r,
r"))
(const_int 0)))
@@ -733,7 +740,14 @@
   "rs6000_gen_cell_microcode"
   "@
andi. %0,%1,0xff
-   rlwinm %0,%1,0,0xff\;cmpwi %2,%0,0"
+   #"
+  "&& reload_completed && cc_reg_not_cr0_operand (operands[2], CCmode)"
+  [(set (match_dup 0)
+   (zero_extend:EXTQI (match_dup 1)))
+   (set (match_dup 2)
+   (compare:CC (match_dup 0)
+   (const_int 0)))]
+  ""
   [(set_attr "type" "logical")
(set_attr "dot" "yes")
(set_attr "length" "4,8")])
@@ -851,7 +865,7 @@
   "extsb %0,%1"
   [(set_attr "type" "exts")])

-(define_insn "*extendqi2_dot"
+(define_insn_and_split "*extendqi2_dot"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x,?y")
(compare:CC (sign_extend:EXTQI (match_operand:QI 1 "gpc_reg_operand" "r,
r"))
(const_int 0)))
@@ -859,12 +873,19 @@
   "rs6000_gen_cell_microcode"
   "@
extsb. %0,%1
-   extsb %0,%1\;cmpwi %2,%0,0"
+   #"
+  "&& reload_completed && cc_reg_not_cr0_operand (operands[2], CCmode)"
+  [(set (match_dup 0)
+   (sign_extend:EXTQI (match_dup 1)))
+   (set (match_dup 2)
+   (compare:CC (match_dup 0)
+   (const_int 0)))]
+  ""
   [(set_attr "type" "exts")
(set_attr "dot" "yes")
(set_attr "length" "4,8")])

-(define_insn "*extendqi2_dot2"
+(define_insn_and_split "*extendqi2_dot2"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x,?y")
(compare:CC (sign_extend:EXTQI (match_operand:QI 1 "gpc_reg_operand" "r,
r"))
(const_int 0)))
@@ -873,7 +894,14 @@
   "rs6000_gen_cell_microcode"
   "@
extsb. %0,%1
-   extsb %0,%1\;cmpwi %2,%0,0"
+   #"
+  "&& reload_completed && cc_reg_not_cr0_operand (operands[2], CCmode)"
+  [(set (match_dup 0)
+   (sign_extend:EXTQI (match_dup 1)))
+   (set (match_dup 2)
+   (compare:CC (match_dup 0)
+   (const_int 0)))]
+  ""
   [(set_attr "type" "exts")
(set_attr "dot" "yes")
(set_attr "length" "4,8")])

Re: [PATCH 2/2] (header usage fix) include c++ headers in system.h

2016-04-22 Thread David Edelsohn

On Fri, Apr 22, 2016 at 6:02 AM, Szabolcs Nagy  wrote:
> Some gcc source files include standard headers after
> "system.h" but those headers may declare and use poisoned
> symbols, they also cannot be included before "system.h"
> because they might depend on macro definitions from there,
> so they must be included in system.h.
>
> This patch fixes the use of , , , 
> and  headers, by using appropriate
> INCLUDE_{LIST, MAP, SET, VECTOR, ALGORITHM} macros.
> (Note that there are some other system header uses which
> did not get fixed.)
>
> Build tested on aarch64-*-gnu, sh-*-musl, x86_64-*-musl and
> bootstrapped x86_64-*-gnu (together with PATCH 1/2).
>
> is this ok for AIX?

It should be okay on AIX.

> OK for trunk?
>
> This would be nice to fix in gcc-6 too, because at least
> with musl libc the bootstrap is broken.
>
> gcc/ChangeLog:
>
> 2016-04-22  Szabolcs Nagy  
>
> * system.h (list, map, set, vector): Include conditionally.
> * auto-profile.c (INCLUDE_MAP, INCLUDE_SET): Define.
> * graphite-isl-ast-to-gimple.c (INCLUDE_MAP): Define.
> * ipa-icf.c (INCLUDE_LIST): Define.
> * config/aarch64/cortex-a57-fma-steering.c (INCLUDE_LIST): Define.
> * config/sh/sh.c (INCLUDE_VECTOR): Define.
> * config/sh/sh_treg_combine.cc (INCLUDE_ALGORITHM): Define.
> (INCLUDE_LIST, INCLUDE_VECTOR): Define.
> * cp/logic.cc (INCLUDE_LIST): Define.
> * fortran/trans-common.c (INCLUDE_MAP): Define.

Re: [PATCH] Fix PR target/70669 (allow __float128 to use direct move)

2016-04-14 Thread David Edelsohn

On Thu, Apr 14, 2016 at 6:43 PM, Michael Meissner
 wrote:
> When adding the basic __float128 support, I forgot to enable direct move
> support for moving __float128 between VSX registers and GPR registers.
>
> This patch enables using direct move for __float128 variables on Power8
> systems.  I bootstrapped the compiler and found no regressions with this
> patch.  Is it ok to apply to the GCC trunk?
>
> [gcc]
> 2016-04-14  Michael Meissner  
>
> PR target/70669
> * config/rs6000/rs6000.c (rs6000_init_hard_regno_mode_ok): Add
> direct move handlers for KFmode. Change TFmode handlers test from
> FLOAT128_IEEE_P to FLOAT128_VECTOR_P.
>
> [gcc/testsuite]
> 2016-04-14  Michael Meissner  
>
> PR target/70669
> * gcc.target/powerpc/pr70669.c: New test.

Okay.

Thanks, David

Re: [PATCH, rs6000] Add support for int versions of vec_adde

2016-04-13 Thread David Edelsohn

On Wed, Apr 13, 2016 at 10:47 AM, Bill Seurer  wrote:
> Here is an updated patch:
>
>
> This patch adds support for the signed and unsigned int versions of the
> vec_adde altivec builtins from the Power Architecture 64-Bit ELF V2 ABI
> OpenPOWER ABI for Linux Supplement (16 July 2015 Version 1.1). There are
> many of the builtins that are missing and this is the first of a series
> of patches to add them.
>
> There aren't instructions for the int versions of vec_adde so the
> output code is built from other built-ins that do have instructions
> which in this case is just two vec_adds.
>
> The new test cases are executable tests which verify that the generated
> code produces expected values. C macros were used so that the same
> test case could be used for both the signed and unsigned versions. An
> extra executable test case is also included to ensure that the modified
> support for the __int128 versions of vec_adde is not broken. The same
> test case could not be used for both int and __int128 because of some
> differences in loading and storing the vectors.
>
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> regressions. Is this ok for trunk?
>
> [gcc]
>
> 2016-04-06 Bill Seurer 
>
> * config/rs6000/rs6000-builtin.def (vec_adde): Change vec_adde to a
> special case builtin.
> * config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Remove
> ALTIVEC_BUILTIN_VEC_ADDE.
> * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): Add
> support for ALTIVEC_BUILTIN_VEC_ADDE.
> * config/rs6000/rs6000.c (altivec_init_builtins): Add definition
> for __builtin_vec_adde.
>
> [gcc/testsuite]
>
> 2016-04-06 Bill Seurer 
>
> * gcc.target/powerpc/vec-adde.c: New test.
> * gcc.target/powerpc/vec-adde-int128.c: New test.

The revised patch is okay for GCC 7, with the changes requested by Segher.

Thanks, David

Re: [PATCH], PR target/70640, Fix thinkos in PowerPC IEEE 128-bit floating point emulation

2016-04-12 Thread David Edelsohn

On Tue, Apr 12, 2016 at 3:11 PM, Michael Meissner
 wrote:
> After I moved the patches for the 70381 to my internal branch for GCC 7.0
> submissions, I noticed test float128-1.c was failing. I tracked it down to the
> fact that the pre-gcc7 branch defaults to using LRA on by default instead of
> reload.
>
> I tracked this down to using a "=" constraint on an input argument. LRA 
> deletes
> the insns that setup the input argument, since it believed it was an output
> only argument:
>
> (define_insn "*ieee_128bit_vsx_neg2_internal"
>   [(set (match_operand:IEEE128 0 "register_operand" "=wa")
> (neg:IEEE128 (match_operand:IEEE128 1 "register_operand" "wa")))
>(use (match_operand:V16QI 2 "register_operand" "=v"))]
>   "TARGET_FLOAT128 && !TARGET_FLOAT128_HW"
>   "xxlxor %x0,%x1,%x2"
>   [(set_attr "type" "vecsimple")])
>
> I have checked this by bootstrapping and doing a make check.  There were no
> regressions. Is it ok to check into the trunk?
>
> [gcc]
> 2016-04-12  Michael Meissner  
>
> PR target/70680
> * config/rs6000/rs6000.md (ieee_128bit_vsx_neg2_internal):
> Do not use "=" constraint on an input constraint.
> (ieee_128bit_vsx_abs2_internal): Likewise.
> (ieee_128bit_vsx_nabs2_internal): Likewise.
> (ieee_128bit_vsx_nabs2): Correct splitter so that it
> generates (neg (abs ...)) instead of (abs ...).
>
> [gcc/testsuite]
> 2016-04-12  Michael Meissner  
>
> PR target/70680
> * gcc.target/powerpc/pr70640.c: New test.

Okay.

Thanks, David

Re: [PATCH], Re-fix PR 70381 (disable -mfloat128 by default) and add workaround for PR 70589

2016-04-11 Thread David Edelsohn

On Thu, Apr 7, 2016 at 7:44 PM, Michael Meissner
 wrote:
> After applying the fix for PR 70381 to not enable -mfloat128 by default, I
> discovered the IEEE 128-bit floating point emulation routines in libgcc are no
> longer being built.
>
> The reason for this is the configuration test involved compiling this program:
>
> #pragma GCC target ("vsx,float128")
> __float128 add (__float128 *a) { return *a + *(a+1); }
>
> to see if the __float128 support was enabled.  Unfortunately, I discovered 
> that
> you can't currently set/disable float128 via the target option attribute or
> target pragmas. This is due to the fact that if -mfloat128 is disabled, the
> __float128 and __ibm128 keywords are not created.
>
> I raised this as a separate bug (70589).
>
> This patch does several things:
>
>1)   It disables using float128 in target attributes or target pragmas.
>
>2)   It fixes the configure test for software emulation to just see if the
> ISA 2.06 (vsx) instruction set is available. The makefile options in
> the PowerPC libgcc build ensures that -mfloat128 is used. I used
> similar logic to detect ISA 3.0 to see if we have support for the IEEE
> 128-bit floating point hardware.
>
>3)   I updated the documentation for -mfloat128.
>
>4)   I added two executable tests to verify that the float emulation is
> correct.  In working on adding the tests, I discovered I had the 
> return
> value from main inverted, and the test would fail.
>
> I have run a boostrap build and a make check to verify that the IEEE 128-bit
> floating point emulator in libgcc is indeed built. Are these patches ok to
> install in the GCC trunk?
>
> [gcc]
> 2016-04-07  Michael Meissner  
>
> PR target/70589
> * config/rs6000/rs6000.c (rs6000_opt_masks): Disable using the
> target attribute and pragma from changing the -mfloat128
> and -mfloat128-hardware options.
>
> * doc/extend.texi (Additional Floating Types): Document PowerPC
> __float128 restrictions.
>
> [libgcc]
> 2016-04-07  Michael Meissner  
>
> PR target/70381
> * configure.ac (powerpc*-*-linux*): Rework tests to build
> __float128 emulation routines to not depend on using #pragma GCC
> target to enable -mfloat128.
> * configure: Regnerate.
>
> [gcc/testsuite]
> 2016-04-07  Michael Meissner  
>
> PR target/70381
> * gcc.target/powerpc/float128-1.c: New tests to make sure the
> __float128 emulator is built and runs.
> * gcc.target/powerpc/float128-1.c: Likewise.
>
> * lib/target-supports.exp (check_ppc_float128_sw_available):
> Rework tests for __float128 software and hardware
> availability. Fix exit condition to return 0 on success.

This is okay.

Thanks, David

Re: [PATCH] PR70117, ppc long double isinf

2016-04-07 Thread David Edelsohn

On Thu, Apr 7, 2016 at 10:17 AM, Alan Modra  wrote:
> On Thu, Apr 07, 2016 at 11:32:58AM +0200, Richard Biener wrote:
>> That's good to know.  I think the patch is OK but please seek approval from 
>> a ppc maintainer as well
>
> There's only one of those.  David?  Thread starts here
> https://gcc.gnu.org/ml/gcc-patches/2016-04/msg00213.html

Yes, I have been following this entertaining thread.

This is okay.

By the way, xlc -qldbl128 should enable 128 bit.

Thanks, David

Re: [PATCH, rs6000] Add support for int versions of vec_adde

2016-04-05 Thread David Edelsohn

On Tue, Apr 5, 2016 at 3:36 PM, Bill Seurer  wrote:
> This patch adds support for the signed and unsigned int versions of the
> vec_adde altivec builtins from the Power Architecture 64-Bit ELF V2 ABI
> OpenPOWER ABI for Linux Supplement (16 July 2015 Version 1.1).  There are
> many of the builtins that are missing and this is the first of a series
> of patches to add them.
>
> There aren't instructions for the int versions of vec_adde so the
> output code is built from other built-ins that do have instructions
> which in this case is just two vec_adds.
>
> The new test cases are executable tests which verify that the generated
> code produces expected values.  C macros were used so that the same
> test case could be used for both the signed and unsigned versions.  An
> extra executable test case is also included to ensure that the modified
> support for the __int128 versions of vec_adde is not broken.  The same
> test case could not be used for both int and __int128 because of some
> differences in loading and storing the vectors.
>
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> regressions.  Is this ok for trunk?
>
> [gcc]
>
> 2016-04-06  Bill Seurer  
>
> * config/rs6000/rs6000-builtin.def (vec_adde): Change vec_adde to a
> special case builtin.
> * config/rs6000/rs6000-c.c (altivec_overloaded_builtins,
> altivec_resolve_overloaded_builtin): Remove ALTIVEC_BUILTIN_VEC_ADDE
> from altivec_overloaded_builtins structure.  Add support for it to
> altivec_resolve_overloaded_builtin function.
> * config/rs6000/rs6000.c (altivec_init_builtins): Add definition
> for __builtin_vec_adde.
>
> [gcc/testsuite]
>
> 2016-04-06  Bill Seurer  
>
> * gcc.target/powerpc/vec-adde.c: New test.
> * gcc.target/powerpc/vec-adde-int128.c: New test.
>
> Index: gcc/config/rs6000/rs6000-builtin.def
> ===
> --- gcc/config/rs6000/rs6000-builtin.def(revision 234745)
> +++ gcc/config/rs6000/rs6000-builtin.def(working copy)
> @@ -951,7 +951,6 @@ BU_ALTIVEC_X (VEC_EXT_V4SF, "vec_ext_v4sf", CO
> before we get to the point about classifying the builtin type.  */
>
>  /* 3 argument Altivec overloaded builtins.  */
> -BU_ALTIVEC_OVERLOAD_3 (ADDE,  "adde")
>  BU_ALTIVEC_OVERLOAD_3 (ADDEC, "addec")
>  BU_ALTIVEC_OVERLOAD_3 (MADD,   "madd")
>  BU_ALTIVEC_OVERLOAD_3 (MADDS,  "madds")
> @@ -1137,6 +1136,7 @@ BU_ALTIVEC_OVERLOAD_P (VCMPGT_P,   "vcmpgt_p")
>  BU_ALTIVEC_OVERLOAD_P (VCMPGE_P,   "vcmpge_p")
>
>  /* Overloaded Altivec builtins that are handled as special cases.  */
> +BU_ALTIVEC_OVERLOAD_X (ADDE,  "adde")
>  BU_ALTIVEC_OVERLOAD_X (CTF,   "ctf")
>  BU_ALTIVEC_OVERLOAD_X (CTS,   "cts")
>  BU_ALTIVEC_OVERLOAD_X (CTU,   "ctu")
> Index: gcc/config/rs6000/rs6000-c.c
> ===
> --- gcc/config/rs6000/rs6000-c.c(revision 234745)
> +++ gcc/config/rs6000/rs6000-c.c(working copy)
> @@ -842,11 +842,6 @@ const struct altivec_builtin_types altivec_overloa
>  RS6000_BTI_unsigned_V1TI, 0 },
>{ ALTIVEC_BUILTIN_VEC_ADDC, P8V_BUILTIN_VADDCUQ,
>  RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
> -  { ALTIVEC_BUILTIN_VEC_ADDE, P8V_BUILTIN_VADDEUQM,
> -RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
> -RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
> -  { ALTIVEC_BUILTIN_VEC_ADDE, P8V_BUILTIN_VADDEUQM,
> -RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, RS6000_BTI_V1TI },
>{ ALTIVEC_BUILTIN_VEC_ADDEC, P8V_BUILTIN_VADDECUQ,
>  RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
>  RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI },
> @@ -4515,6 +4510,59 @@ assignment for unaligned loads and stores");
>  warning (OPT_Wdeprecated, "vec_lvsr is deprecated for little endian; use 
> \
>  assignment for unaligned loads and stores");
>
> +  if (fcode == ALTIVEC_BUILTIN_VEC_ADDE)
> +{
> +  /* vec_adde needs to be special cased because there is no instruction
> + for the {un}signed int version */

End comment sentence with period and two spaces

> +  if (nargs != 3)
> +   {
> + error ("vec_adde only accepts 3 arguments");
> + return error_mark_node;
> +   }
> +
> +  tree arg0 = (*arglist)[0];
> +  tree arg0_type = TREE_TYPE (arg0);
> +  tree arg1 = (*arglist)[1];
> +  tree arg1_type = TREE_TYPE (arg1);
> +  tree arg2 = (*arglist)[2];
> +  tree arg2_type = TREE_TYPE (arg2);
> +
> +  /* All 3 arguments must be vectors of (signed or unsigned) (int or
> + __int128) and the types must match */

Same.

> +  if ((arg0_type != arg1_type) || (arg1_type != arg2_type))
> +   goto bad;
> +  if (TREE_CODE (arg0_type) != VECTOR_TYPE)
> +   goto bad;
>

Re: [PATCH] Disable guality tests for powerpc-linux

2016-03-29 Thread David Edelsohn

On Mon, Mar 28, 2016 at 8:38 PM, Bill Schmidt
 wrote:
> Hi,
>
> For a long time we've had hundreds of failing guality tests.  These
> failures don't seem to have any correlation with gdb functionality for
> POWER, which is working fine.  At this point the value of these tests to
> us seems questionable.  Fixing these is such low priority that it is
> unlikely we will ever get around to it.  In the meanwhile, the failures
> simply clutter up our regression test reports.  Thus I'd like to disable
> them, and that's what this test does.
>
> Verified to remove hundreds of failure messages on
> powerpc64le-unknown-linux-gnu. :)  Is this ok for trunk?
>
> Thanks,
> Bill
>
>
> 2016-03-28  Bill Schmidt  
>
> * g++.dg/guality/guality.exp: Disable for powerpc*-linux*.
> * gcc.dg/guality/guality.exp: Likewise.

Thanks for everyone else's suggestions.

As far as we understand, debugging quality on POWER is equivalent to
other targets.

There is an issue with PPC64 BE and AIX requiring an extra frame push
when debugging is enabled, which will cause differences between code
with debugging enabled and debugging disabled.  THIS WILL NOT BE
CHANGED.

We have no plans to make code generation a slave to the testsuite.
The testsuite is a tool, successful results from the testsuite is not
a goal unto itself.

This patch is okay.

Thanks, David

Re: [RS6000, PATCH] PR70052, ICE compiling _Decimal128 test case

2016-03-29 Thread David Edelsohn

On Tue, Mar 29, 2016 at 6:14 AM, Alan Modra  wrote:
> On Fri, Mar 25, 2016 at 07:36:34PM +1030, Alan Modra wrote:
>> +2016-03-25  Alan Modra  
>> +
>> + PR target/70052
>> + * config/rs6000/constraints.md (j): Simplify.
>> + * config/rs6000/predicates.md (easy_fp_constant): Exclude
>> + decimal float 0.D.
>> + * config/rs6000/rs6000.md (zero_fp): New mode_attr.
>> + (mov_hardfloat, mov_hardfloat32, mov_hardfloat64,
>> +  mov_64bit_dm, mov_32bit): Use zero_fp in place of j
>> + in all constraint alternatives.
>> + (movtd_64bit_nodm): Delete "j" constraint alternative.
>> +
> [snip]
>> +2016-03-25  Alan Modra  
>> +
>> + * gcc.dg/dfp/pr70052.c: New test.
>> +
>
> Testing showed that this problem exists on the gcc-5 branch too.  I've
> backported the above and bootstrapped plus regression tested on
> powerpc64le-linux.  OK for gcc-5?

Okay.

Thanks, David

Re: [RS6000, PATCH] PR70052, ICE compiling _Decimal128 test case

2016-03-24 Thread David Edelsohn

On Thu, Mar 24, 2016 at 7:01 AM, Alan Modra  wrote:
> This fixes the PR70052 ICE by modifying easy_fp_constant to correctly
> return false for decimal floating point zero.  0.0D is not an all-zero
> bit pattern, at least, not the canonical form.
>
> I've also taken on Mike's suggestion of using a mode dependent
> constraint for insns that currently use "j".  Note that
> "easy_fp_constant" is already part of "input_operand" so in the usual
> case we ought to be prevented from generating 0.0D immediate
> constants.  However, in the past I've seen reload do some nasty tricks
> when pseudos don't get hard regs, and believe that a pseudo that is
> known to be equal to 0.0D may have the constant substituted with only
> constraints being checked, not the operand predicates.  So either the
> "j" constraint needs fixing to reject decimal float (as I had in my
> original patch) or not used with decimal float (Mike's approach).
> I left in a small tidy for "j" from my original patch.
>
> Bootstrapped and regression tested powerpc64le-linux.  OK to apply?
>
> gcc/
> PR target/70052
> * config/rs6000/constraints.md (j): Simplify.
> * config/rs6000/predicates.md (easy_fp_constant): Exclude
> decimal float 0.D.
> * config/rs6000/rs6000.md (zero_fp): New mode_attr.  Use in place
> of "j" in all constraints.

The patch did not convert all "j" constraints, so the ChangeLog needs
to be a little clearer to explain which alternatives required the
change.

> (movtd_64bit_nodm): Delete "j" constraint alternative.
> gcc/testsuite/
> * gcc.dg/dfp/pr70052.c: New.

Okay with that clarification.

Thanks, David

Re: rs6000 stack_tie mishap again

2016-03-23 Thread David Edelsohn

First, SPE has not been maintained and little participation from
Freescale.  I would rather deprecate all SPE support.  SPE ABI is
broken by design.

I find the approach very heavy-handed.  If you want to enable the
target hook for SPE *only*, that's fine with me.  The description and
references to prior SPE prologue and epilogue changes do not confirm a
wider problem.

- David

Re: [PATCH] PR libgcc/70363, fix __float128 problem with non ISA-3.0 assembler

2016-03-22 Thread David Edelsohn

On Tue, Mar 22, 2016 at 4:33 PM, Michael Meissner
 wrote:
> This patch fixes PR libgcc/70363, which is a configuration issue if you build
> GCC 6.x with an assembler that does not support the ISA 3.0 instructions.  I
> missed one emulation function that needed to be a different name if the IFUNC
> functions added for ISA 3.0 support are not being built.
>
> I built a trunk compiler with a stock assembler, and did a program with a
> convert from __float128 to long double/__ibm128.  If I did not include the
> patch, the linker reported:
>
> -genoa-> ~/fsf-install-ppc64le/trunk-at9x/bin/gcc -O2 
> test-float128-6.c -DDEBUG && a.out
> /tmp/ccbCLWdO.o: In function `print_hex.constprop.0':
> test-float128-6.c:(.text+0x84): undefined reference to `__extendkftf2'
> collect2: error: ld returned 1 exit status
>
> If built a compiler with the patch, it succeeds.  Is this patch ok to install
> in the trunk?
>
> 2016-03-22  Michael Meissner  
>
> PR libgcc/70363
> * config/rs6000/extendkftf2-sw.c (__extendkftf2_sw): If libgcc was
> built with an assembler that does not support ISA 3.0
> instructions, rename __extendkftf2_sw to __extendkftf2.

Okay.

Thanks, David

Re: [PATCH 4/4] [RS6000] Allow saving of fixed regs.

2016-03-21 Thread David Edelsohn

On Mon, Mar 21, 2016 at 9:08 AM, Alan Modra  wrote:
> As I noted a long time ago in the comment on fixed_reg_p, the real
> problem with saving fixed/global regs is that exception frame
> unwinding might restore them.  So don't emit eh_frame info for any
> such reg, and the unwinder won't restore them.
>
> Also, tidy rs6000_savres_strategy.  Delaying some checks means we
> won't iterate over regs quite so often.
>
> * config/rs6000/rs6000.c (rs6000_savres_strategy): Force inline
> restoring when fixed_reg_p, but allow out-of-line or stmw save.
> Check for user regs later to avoid unnecessary looping over regs.
> Merge user reg check with non-saved reg check.  Don't force
> inline VR restore when static chain used.
> (rs6000_frame_related): Omit eh_frame info for user regs when
> saving.
> (fixed_regs_p): Delete.

Okay.

Thanks for cleaning this up and thanks for splitting this into
multiple patches for review.

- David

Re: [PATCH 3/4] [RS6000] Split SAVRES_STRATEGY

2016-03-21 Thread David Edelsohn

On Mon, Mar 21, 2016 at 9:07 AM, Alan Modra  wrote:
> No functional change here.  A single bit becomes two bits, which
> always have the same value at the moment.  In preparation for the
> next patch.
>
> * config/rs6000/rs6000.c (SAVRES_MULTIPLE): Replace with..
> (SAVE_STRATEGY, REST_STRATEGY): ..this.  Renumber and sort enum.
> Update all uses.

Okay.

Thanks, David

Re: [PATCH 2/4] [RS6000] PR69645, -ffixed-reg ignored

2016-03-21 Thread David Edelsohn

On Mon, Mar 21, 2016 at 9:06 AM, Alan Modra  wrote:
> Treat -ffixed-reg as we do for global asm regs.  The only slightly
> complicated part of this patch is that the rs6000 backend itself sets
> fixed_regs[RS6000_PIC_OFFSET_TABLE_REGNUM] in some cases, which means
> we can't simply test fixed_regs[] to determine whether a reg appeared
> as -ffixed-reg.
>
> PR target/69645
> * config/rs6000/rs6000.c (fixed_reg_p): New function.
> (fixed_regs_p): Rename from global_regs_p.  Call fixed_reg_p.
> Update all uses.

Okay.

Thanks, David

Re: [PATCH 1/4] [RS6000] Simplify setting of fixed_regs[RS6000_PIC_OFFSET_TABLE_REGNUM]

2016-03-21 Thread David Edelsohn

On Mon, Mar 21, 2016 at 9:06 AM, Alan Modra  wrote:
> This makes the conditions look the same as other places that deal with
> RS6000_PIC_OFFSET_TABLE_REGNUM, eg. first_reg_to_save.  No functional
> changes.
>
> * config/rs6000/rs6000.c (rs6000_conditional_register_usage):
> Remove redundant PIC_OFFSET_TABLE_REGNUM test.  Replace with
> flag_pic test for Darwin.

Okay.

Thanks, David

Re: [PATCH] Fix rs6000 vector builtin macro handling if it is followed by a fn-like macro without arguments (PR target/70296)

2016-03-19 Thread David Edelsohn

On Fri, Mar 18, 2016 at 5:34 PM, Jakub Jelinek  wrote:
> Hi!
>
> The following testcase is diagnosed as errorneous, because the preprocessor
> mishandles
>
> #define c(x) x
> vector c;
>
> and
>
> #define int(x) x
> vector int n;
>
> The thing is if a function-like macro is not followed by (, then it is kept
> as is, but the builtin conditional macro handling expects it always expands
> as something and calls cpp_get_token on it.  For non-function-like macros
> or function-like macros followed by ( that is not a problem, that
> cpp_get_token call just eats the macro token and pushes instead the
> replacement tokens, but for function-like macro not followed by ( it results
> in the token being dropped on the floor.
> So, in the above mentioned cases we preprocess it as
> vector ;
> and
> vector n;
> and when compiling, error on the first one, and (due to previous
> typedef int vector;) handle it at int n; rather than
> __attribute__((__vector)) int n;
>
> Fixed by peeking at the next token after the macro token (or more, if there
> are CPP_PADDING tokens) and if it is not followed by CPP_OPEN_PAREN, not
> calling cpp_get_token.  Unfortunately, cpp_macro structure is opaque outside
> of libcpp, so I had to add a helper function into libcpp.
>
> Bootstrapped/regtested on powerpc64{,le}-linux, ok for trunk?
>
> 2016-03-18  Jakub Jelinek  
>
> PR target/70296
> * include/cpplib.h (cpp_fun_like_macro_p): New prototype.
> * macro.c (cpp_fun_like_macro_p): New function.
>
> * config/rs6000/rs6000-c.c (rs6000_macro_to_expand): If IDENT is
> function-like macro, peek following token(s) if it is followed
> by CPP_OPEN_PAREN token with optional padding in between, and
> if not, don't treat it like a macro.
>
> * gcc.target/powerpc/altivec-36.c: New test.

I'm not an expert in this part of the compiler, but the rs6000 bits
are fine with me.

Thanks, David

Re: [PATCH, rs6000] Add support for xxpermr and vpermr instructions

2016-03-19 Thread David Edelsohn

On Thu, Mar 17, 2016 at 2:58 PM, Kelvin Nilsen
 wrote:
>
> This patch adds support for two new Power9 instructions, xxpermr and vpermr,
> providing more efficient vector permutation operations on
> little-endian configurations. These new instructions are described in
> the Power ISA 3.0 document.  Selection of the new instructions is
> conditioned upon TARGET_P9_VECTOR and !VECTOR_ELT_ORDER_BIG.
>
> The patch has bootstrapped and tested on powerpc64le-unknown-linux-gnu
> and powerpc64-unknown-linux-gnu with no regressions.  Is this ok for GCC 7
> when stage 1 opens?
>
> (A previous version of this patch was distributed and approved, but further
> experience with testing of P9 fusion instructions revealed a problem with
> that particular code expansion.  So this new revision of the patch omits the
> fusion instruction generation pattern.)
>
> Thanks.
>
> gcc/testsuite/ChangeLog:
>
> 2016-03-17  Kelvin Nilsen  
>
> * gcc.target/powerpc/p9-permute.c: Generalize test to run on
> big-endian Power9 in addition to little-endian Power9.
> * gcc.target/powerpc/p9-vpermr.c: New test.
>
>
> gcc/ChangeLog:
>
> 2016-03-17  Kelvin Nilsen  
>
> * config/rs6000/altivec.md: (UNSPEC_VPERMR): New unspec
> constant.
> (*altivecvpermr__internal): New insn.
> * config/rs6000/rs6000.c (rs6000_expand_vector_set): If
> !BYTES_BIG_ENDIAN and TARGET_P9_VECTOR, expand using template
> that translates into new xxpermr or vpermr instructions.
> (altivec_expand_vec_perm_le): if TARGET_P9_VECTOR, expand using
> template that translates into new xxpermr or vpermr
> instructions.

This is okay for GCC 7.

Thanks, David

Re: [PATCH, testsuite] Fix ifcvt-4.c for PowerPC

2016-03-15 Thread David Edelsohn

On Mon, Mar 14, 2016 at 4:23 PM, Pat Haugen  wrote:
> As stated in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68232, this test
> needs -misel on powerpc to pass. Verified the following fixes the test on
> both powerpc64/powerpc64le. Ok for trunk?
>
> -Pat
>
> testsuite/ChangeLog:
> 2016-03-14  Pat Haugen  
>
> * gcc.dg/ifcvt-4.c: Add -misel for powerpc* and remove skip for
> powerpc64le.

The -misel flag will override the code generation, even if the
architecture setting doesn't support the instruction.  I guess this is
good enough for the compile-only test.

This is okay.

Thanks, David

Re: [PATCH, testsuite] Fix ifcvt-4.c for PowerPC

2016-03-14 Thread David Edelsohn

On Mon, Mar 14, 2016 at 7:35 PM, Jeff Law  wrote:
> On 03/14/2016 02:23 PM, Pat Haugen wrote:
>>
>> As stated in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68232, this
>> test needs -misel on powerpc to pass. Verified the following fixes the
>> test on both powerpc64/powerpc64le. Ok for trunk?
>>
>> -Pat
>>
>> testsuite/ChangeLog:
>> 2016-03-14  Pat Haugen  
>>
>>  * gcc.dg/ifcvt-4.c: Add -misel for powerpc* and remove skip for
>> powerpc64le.
>
> OK.
> jeff

The change is going to fail on PowerPC systems that don't support
ISEL, so it needs to be adjusted.

- David

Re: [PATCH] rs6000: Handle "d" output in the bd*z patterns (PR70098)

2016-03-13 Thread David Edelsohn

On Sun, Mar 13, 2016 at 2:52 PM, Segher Boessenkool
<seg...@kernel.crashing.org> wrote:
> On Sat, Mar 12, 2016 at 09:59:12AM -0500, David Edelsohn wrote:
>> > 2016-03-12  Segher Boessenkool  <seg...@kernel.crashing.org>
>> >
>> > PR target/70098
>> > * config/rs6000/rs6000.md (*ctr_internal1, 
>> > *ctr_internal2,
>> > *ctr_internal5, *ctr_internal6): Also allow "d" as 
>> > output.
>> > (define_split for the GPR case): Use int_reg_operand instead of
>> > gpc_reg_operand for the output.
>> >
>> > gcc/testsuite/
>> > PR target/70098
>> > * g++.dg/pr70098.C: New testcase.
>>
>> This is okay.
>>
>> The testcase will need some XFAILs.
>
> That wasn't so easy.  I came up with the following; okay as well?
> (I'll fold it before committing).
>
>
> Segher
>
>
> gcc/testsuite/
> * lib/target-supports.exp (check_effective_target_powerpc64_no_dm):
> New function.

This is okay with me.

Should the testcase go in g++.dg or gcc.target/powerpc?

Thanks, David

Re: [PATCH] rs6000: Handle "d" output in the bd*z patterns (PR70098)

2016-03-12 Thread David Edelsohn

On Sat, Mar 12, 2016 at 8:55 AM, Segher Boessenkool
 wrote:
> In the rs6000 port, FLOAT_REGS can contain DImode values when compiling
> for 64-bit targets.  Some instructions (like "fcfid" in the testcase,
> convert from integer to DP float) only work on floating point registers.
> So, we do want to allow DImode in these regs.
>
> Now, in unusual cases IRA will assign FLOAT_REGS to some allocno where
> some insns cannot handle FLOAT_REGS there, so they will need a reload.
> Maybe IRA can be made smarter, but it isn't doing anything wrong here,
> so we should be able to handle it.
>
> The place it goes wrong is in the output of the *ctrdi_internal[1256]
> pattern: the "bdz" and "bdnz" instructions.  GCC refuses to do output
> reloads on JUMP_INSNs, probably because it is hard to do, needs different
> strategies than "normal" reloads do, and it cannot even be done at all
> for general patterns.  So JUMP_INSNs need to be able to handle every
> possible output for the register class used.
>
> These patterns already handle writing to "c" (the base insn case), and
> to "r", "m", and "c" or "l"; all those via splitters.  We just need to
> handle "d" as well.  That is what this patch does.  [A predicate in one
> of the splitters needs to be touched up so that the correct splitter
> is used for the FLOAT_REGS case.]
>
> But, that leaves another problem.  One of the insns that are split to
> is a move from a GPR to an FPR.  That work fine on targets with direct
> move (which does exactly that), i.e. power8 and up.  But older targets
> need memory to do the move, and this splitter runs after reload so
> it cannot allocate memory; and allocating memory beforehand for every
> bdnz insn is pretty horrible as well.
>
> This patch implements the easy part.  With it, power8 works, where it
> didn't before.
>
> Tested on powerpc64-linux, -m32 and -m64, -mlra and -mno-lra.  Also
> regstrapping on a power8 powerpc64le-linux.  Is this okay for trunk
> if that works as expected?
>
>
> Segher
>
>
> 2016-03-12  Segher Boessenkool  
>
> PR target/70098
> * config/rs6000/rs6000.md (*ctr_internal1, *ctr_internal2,
> *ctr_internal5, *ctr_internal6): Also allow "d" as output.
> (define_split for the GPR case): Use int_reg_operand instead of
> gpc_reg_operand for the output.
>
> gcc/testsuite/
> PR target/70098
> * g++.dg/pr70098.C: New testcase.

This is okay.

The testcase will need some XFAILs.

Thanks, David

Re: [PATCH], Fix PR 70131, disable (double)(int) optimization for power8

2016-03-11 Thread David Edelsohn

On Fri, Mar 11, 2016 at 5:41 PM, Michael Meissner
 wrote:
> As I was auditing rs6000.md for power9 changes, I noticed that changes I had
> made in 2010 for power7 weren't as effective with power8.
>
> The FCTIWZ/FCTIWUZ instructions convert the scalar floating point value to a
> 32-bit signed/unsigned integer in bits 32-63 of the floating point or vector
> register.  Unfortunately, the hardware does not guarantee that bits 0-31 are
> copies of the sign, so that it can be used as a valid 64-bit integer.  There 
> is
> no conversion from 32-bit int to floating point.  This meant in the power7
> days, if you wanted to round a floating point value to 32-bit integer, you
> would need to do:
>
> convert to 32-bit integer
> store 32-bit value on the stack
> load 32-bit value to a GPR
> sign/zero extend it
> store 32-bit value to the stack
> load 32-bit value to a FPR/vector register.
>
> The optimization does a store/load to sign/zero extend, rather than going
> through the GPRs.
>
> On power8, we have a direct move instruction that copies the value between the
> register sets, and the compiler will generate this if the above optimization 
> is
> turned off (which is what this patch does).
>
> There are other ways to sign/zero extend a value in the vector registers
> without doing a move using multiple instructions, but in practice direct move
> seems to be as fast as the other instructions.
>
> I bootstrapped the compiler and there were no regressions with this patch.
>
> I rebuilt the Spec 2006 benchmark suite, and there 7 of the benchmarks that
> used this sequence somewhere in the code.  I ran those benchmarks with this
> patch, and compared them to the original benchmarks.  In 6 of the benchmarks,
> the run time was almost precisely the same.  The 416.gamess benchmark was 
> about
> 2% faster, and there were no regressions.
>
> Is this patch ok to apply to the trunk?  I would like to apply it to the gcc 5
> branch as well.  Is this ok also?
>
> [gcc]
> 2016-03-11  Michael Meissner  
>
> PR target/70131
> * config/rs6000/rs6000.md (round322_fprs): Do not do the
> optimization if we have direct move.
> (roundu322_fprs): Likewise.
>
> [gcc/testsuite]
> 2016-03-11  Michael Meissner  
>
> PR target/70131
> * gcc.target/powerpc/ppc-round2.c: New test.

Okay for trunk and GCC 5.

Thanks, David

Re: [PATCH, rs6000] Fix PR target/70168

2016-03-10 Thread David Edelsohn

On Thu, Mar 10, 2016 at 6:10 PM, Ulrich Weigand  wrote:
> Hello,
>
> this patch fixes PR target/70168, a wrong code generation problem
> caused by rs6000_expand_atomic_compare_and_swap not properly handling
> the case where changing retval clobbers newval due to a register overlap.
>
> Tested with no regressions on powerpc64le-linux on mainline
> and gcc-5-branch.
>
> OK for both?
>
> Bye,
> Ulrich
>
>
> ChangeLog:
>
> PR target/70168
> * config/rs6000/rs6000.c (rs6000_expand_atomic_compare_and_swap):
> Handle overlapping retval and newval.

Okay everywhere.

Thanks, David

Re: [PATCH, rs6000] Add support for xxpermr and vpermr instructions

2016-03-09 Thread David Edelsohn

On Tue, Mar 8, 2016 at 11:24 AM, Kelvin Nilsen
 wrote:
>
> This patch adds support for two new Power9 instructions, xxpermr and vpermr,
> providing more efficient vector permutation operations on little-endian
> configurations. These new instructions are described in the Power ISA 3.0
> document.  Selection of the new instructions is conditioned upon
> TARGET_P9_VECTOR and !VECTOR_ELT_ORDER_BIG.
>
> The patch has bootstrapped and tested on powerpc64le-unknown-linux-gnu and
> powerpc64-unknown-linux-gnu with no regressions.  Is this ok for GCC 7 when
> stage 1 opens?

gcc/ChangeLog:

2016-03-07  Kelvin Nilsen  

* config/rs6000/rs6000.c (rs6000_expand_vector_set): If
!BYTES_BIG_ENDIAN and TARGET_P9_VECTOR, expand using template that
translates into new xxpermr or vpermr instructions.
(altivec_expand_vec_perm_le): If TARGET_P9_VECTOR, expand using
template that translates into new xxpermr or vpermr instructions.
* config/rs6000/altivec.md: (UNSPEC_VPERMR): New unspec constant.
(*altivec_vpermr__internal): New insn.

gcc/testsuite/ChangeLog:

2016-03-07  Kelvin Nilsen  

* gcc.target/powerpc/p9-permute.c: Generalize test to run on
big-endian Power9 in addition to little-endian Power9.
* gcc.target/powerpc/p9-vpermr.c: New test.

This patch is okay when GCC trunk re-opens for new features.

Thanks, David

P.S. In the future, please include the ChangeLog entry in the body of
the message, not a separate attachment.

Re: libgcc: On AIX, increase chances to find landing pads for exceptions

2016-03-01 Thread David Edelsohn

On Tue, Mar 1, 2016 at 7:09 AM, Michael Haubenwallner
 wrote:
> Hi David,
>
> On 02/10/2016 10:52 AM, Michael Haubenwallner wrote:
> 
>>> There are two remaining issues:
>>>
>>> 1) FDEs with overlapping ranges causing problems with exceptions.  I'm
>>> not sure of the best way to work around this.  Your patch is one
>>> possible solution.
>>
>> This patch is not meant as a final solution, but to improve current
>> situation with broken build systems exporting even _GLOBAL__ symbols.
>> I'm about to prepare another libtool patch to fix that one.
>
> so this is the libtool patch I'm about to submit.
>
> What do you think? Reasonable?

I don't think that Ian really cares about this.

I guess that the patch is reasonable, but the libtool command is
becoming extremely complicated.

- David

Re: [PATCH] Fix PR70011 (backlevel test case)

2016-02-29 Thread David Edelsohn

On Mon, Feb 29, 2016 at 11:49 AM, Bill Schmidt
 wrote:
> Hi,
>
> PR70011 identifies an old vectorization test that recently started
> failing on GCC 6 with POWER8 hardware.  This "failure" is that we now
> find vectorization of the test case to be profitable, where it didn't
> used to be.  A combination of two factors allowed this to become
> profitable here:  First, the POWER8 feature that unaligned vector
> accesses are supported by hardware; and second, some improvement in the
> vectorizer itself (vect_recog_mult_pattern now kicks in).
>
> The proposed fix herein is to XFAIL the test for vectorization failure
> for POWER subtargets that support efficient unaligned vector accesses.
> Since this also requires the vectorization improvement that only occurs
> in GCC 6, it makes sense to only make this change on trunk.
>
> I've verified the modified test on powerpc64le-unknown-linux-gnu
> (POWER8) and on powerpc64-unknown-linux-gnu (both POWER7 and POWER8) and
> everything works as expected.  Is this ok for trunk?
>
> Thanks,
> Bill
>
>
> 2016-02-29  Bill Schmidt  
>
> PR target/70011
> * gcc.dg/vect/costmodel/ppc/costmodel-fast-math-vect-pr299925.c:
> XFAIL when hardware supports efficient unaligned storage access.

Okay.

Thanks, David

Re: [PATCH] Fix "no-vsx" target attribute handling (PR target/69969)

2016-02-26 Thread David Edelsohn

On Fri, Feb 26, 2016 at 2:27 PM, Jakub Jelinek  wrote:
> Hi!
>
> Most of the errors and warnings in rs6000_option_override_internal
> are emitted only if the particular option is explicit, e.g.
>   if (TARGET_P9_DFORM && !TARGET_P9_VECTOR)
> {
>   if (rs6000_isa_flags_explicit & OPTION_MASK_P9_VECTOR)
> error ("-mpower9-dform requires -mpower9-vector");
>   rs6000_isa_flags &= ~OPTION_MASK_P9_DFORM;
> }
> and many others, which is right, but for the
> -mallow-movmisalign requires -mvsx
> error it doesn't do this, so if say -mcpu=power8 compiled TU
> contains a routine with target ("no-vsx") attribute, we get this
> error, even when the user hasn't done anything we should complain about.
>
> Fixed by following what we do for the other options, bootstrapped/regtested
> on powerpc64le-linux (and powerpc64-linux, but regtest is still pending
> there).  Ok for trunk?
>
> 2016-02-26  Jakub Jelinek  
>
> PR target/69969
> * config/rs6000/rs6000.c (rs6000_option_override_internal): Don't
> complain about -mallow-movmisalign without -mvsx if
> TARGET_ALLOW_MOVMISALIGN was not set explicitly.
>
> * gcc.target/powerpc/pr69969.c: New test.

Seems reasonable.  LGTM

Thanks, David

Re: [PATCH] powerpc: Handle DImode rotatert implemented with rlwinm (PR69946)

2016-02-26 Thread David Edelsohn

On Fri, Feb 26, 2016 at 1:52 PM, Segher Boessenkool
<seg...@kernel.crashing.org> wrote:
> On Thu, Feb 25, 2016 at 10:52:29AM -0500, David Edelsohn wrote:
>> Please add a short comment explaining why rs6000_insn_for_shift_mask
>> doesn't need to match the logic in rs6000_is_valid_shift_mask
>> converting rotates to simple shifts.
>
> I added this comment:
>
> --- trunk/gcc/config/rs6000/rs6000.c2016/02/26 18:17:02 233754
> +++ trunk/gcc/config/rs6000/rs6000.c2016/02/26 18:49:18 233755
> @@ -17438,9 +17438,12 @@ rs6000_insn_for_shift_mask (machine_mode
> operands[2] = GEN_INT (32 - INTVAL (operands[2]));
>operands[3] = GEN_INT (31 - nb);
>operands[4] = GEN_INT (31 - ne);
> +  /* This insn can also be a 64-bit rotate with mask that really makes
> +it just a shift right (with mask); the %h below are to adjust for
> +that situation (shift count is >= 32 in that case).  */
>if (dot)
> -   return "rlw%I2nm. %0,%1,%2,%3,%4";
> -  return "rlw%I2nm %0,%1,%2,%3,%4";
> +   return "rlw%I2nm. %0,%1,%h2,%3,%4";
> +  return "rlw%I2nm %0,%1,%h2,%3,%4";
>  }
>
>gcc_unreachable ();

Thanks!

- David

Re: [PATCH] Fix powerpc shift/rotate/mask insn handling (PR target/69946)

2016-02-26 Thread David Edelsohn

On Fri, Feb 26, 2016 at 11:02 AM, Jakub Jelinek  wrote:
> Hi!
>
> Segher has added last year a few routines for the shift/rotate + mask
> patterns, insns always have one predicate which tests if PowerPC supports
> such pattern, and another that emits the instruction for it.
>
> The testcase in the patch is miscompiled, we end up with an instruction
> with out of bound shift count, because there is disagreement between the
> analysis phase, which does some changes (changes shifts by 0
> into rotate and some rotates into left or right shifts (for the last one
> with changed shift count)), but those changes are only done virtually,
> the predicate can't change the instruction, it either accepts it or rejects
> it.  Then during output, we don't perform those changes, treat it as
> rotate or left or right shift just based on what the actual IL has.
> In some cases it is harmless, in other cases, as the testcase shows, it is
> harmful.
>
> Also, I've noticed that the preparation phase of both
> rs6000_is_valid_shift_mask and rs6000_is_valid_insert_mask is pretty much
> the same (the only difference is that the latter requires the shift/rotate
> count to be always CONST_INT, while the former allows also a REG in there).
>
> So, what the following patch does is that it moves the preparation
> from the predicate functions to a new static inline helper function,
> and then uses that function in both the predicate functions, and also
> in both the output functions.
>
> Bootstrapped/regtested on powerpc64{,le}-linux, ok for trunk?
>
> 2016-02-26  Jakub Jelinek  
>
> PR target/69946
> * config/rs6000/rs6000.c (rs6000_is_valid_shift_mask_1): New
> function.
> (rs6000_is_valid_shift_mask, rs6000_is_valid_insert_mask): Use it.
> (rs6000_insn_for_shift_mask, rs6000_insn_for_insert_mask): Likewise.
> Adjust for possible changes of shift/rotate CODE and shift count SH.
>
> * gcc.dg/pr69946.c: New test.

Segher's patch from Wednesday should fix this.

We can address the duplication separately in GCC 7.

Thanks, David

Re: [PATCH, rs6000] Fix PR61397 (test case update for P8 vector loads/stores)

2016-02-26 Thread David Edelsohn

On Fri, Feb 26, 2016 at 9:18 AM, Bill Schmidt
 wrote:
> Hi,
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61397 was almost resolved a
> year ago, but had a proposed patch by Mike Meissner that was never
> vetted and committed.  I've reviewed the patch and tested it on GCC 5
> and GCC 6, and with the patch applied we see the test pass for both
> 32-bit and 64-bit on a Power8 big-endian platform, as well as for 64-bit
> on a Power8 little-endian platform.
>
> As I understand it, the test case got out of sync with the
> implementation in GCC 5, and this rewrite of the test case restores
> order.  I've verified that the original compilation options from Andreas
> Schwab for this test case result in correct generation of lxsdx, which
> was not the case with the original report.
>
> The test case is extremely different in GCC 4.9.  As Mike has noted in
> the PR, the -mupper-regs support does not exist in GCC 4.9, so the
> rewritten test case does not apply there.
>
> As stated, verified on powerpc64-unknown-linux-gnu (-m32, -m64) and
> powerpc64le-unknown-linux-gnu (-m64).  Is this ok for trunk and GCC 5?
>
> Thanks,
> Bill
>
>
> 2016-02-26  Michael Meissner 
> Bill Schmidt  
>
> * gcc.target/powerpc/p8vector-ldst.c: Adjust to test desired
> functionality for both 32-bit and 64-bit.

Okay.

Thanks, David

Re: [PATCH, rs6000] Fixing PR 67145

2016-02-26 Thread David Edelsohn

On Fri, Feb 26, 2016 at 12:08 AM, Richard Henderson  wrote:
> It's the simplify-rtx.c portion of the patch that fixes the i686 regression.
>
> In the PR, Alan raises some good points, but I don't believe that we can
> address those for gcc6.  A new rtl reassoc optimization that takes loop
> invariance into account will have to wait.
>
> But we do need to take care of the rs6000 ice that results, and that's the
> bulk of the patch -- allowing CA to be sorted to any register of the plus
> chain.
>
> Some notes:
>
> ca_operand doesn't work as written, since CA_REGNO is not an available
> register, and thus doesn't satisfy register_operand.
>
> Is there any particular reason that subf<>3_carry_in_m1 was written with
> minus rather than plus like all of the other patterns?

Segher wrote the rs6000 carry infrastructure, so he's the best one to comment.

- David

Re: [PATCH] powerpc: Handle DImode rotatert implemented with rlwinm (PR69946)

2016-02-25 Thread David Edelsohn

On Wed, Feb 24, 2016 at 5:57 PM, Segher Boessenkool
 wrote:
> Some DImode rotate-right-and-mask can be implemented best with a rlwinm
> instruction: those that could be a lshiftrt instead of a rotatert, while
> the mask is not right-aligned.  Why the rotate in the testcase is not
> optimised to a plain shift is another question, but we need to handle
> it here anyway.  We compute the shift amount for a 64-bit rotate.  This
> is 32 too high in this case; if we print using %h that is masked out (and
> this doesn't silently let through invalid instructions, everything is
> checked by rs6000_is_valid_shift_mask which is much more thorough).
>
> Built and tested on powerpc64-linux, -m32,-m64 and -mlra,-mno-lra.  Also
> tested the new test on powerpc64le-linux (where the test is skipped).
> Is this okay for trunk?
>
>
> Segher
>
>
> 2016-02-24  Segher Boessenkool  
>
> PR target/69946
> * config/rs6000/rs6000.c (rs6000_insn_for_shift_mask): Print rlwinm
> shift amount using %h.
>
> gcc/testsuite/
> * pr69946.c: New file.

Okay.

Please add a short comment explaining why rs6000_insn_for_shift_mask
doesn't need to match the logic in rs6000_is_valid_shift_mask
converting rotates to simple shifts.

Thanks, David

[PATCH] Fix PR target/69810

2016-02-23 Thread David Edelsohn

Anton reported a latent bug in the rs6000 port discovered with csmith.
Splitters for extendqihi2 and zero_extendqihi2 can generate invalid
compare RTL.  PowerPC can load and store bytes or halfwords, but
computations operate on registers.  Currently the extend patterns
exist for HImode, although no instructions directly operate on
registers in that mode.

For GCC 7, I plan to disable the extendqihi2 and zero_extendqihi2
patterns, forcing GCC to use SUBREGs, but that is too dangerous for
Stage 4.

In the interim, this patch converts the splitters to normal combiner
patterns that directly emit the two instruction sequence when cr0 is
not available.  There is not a lot of scheduling opportunity after the
splitter, so not a huge degradation.  The temporary is allocated as
HImode, but always is a full register.  The compare is hard coded as
cmpw, but the condition bits will be the same for sign-extended or
zero-extended QImode whether the register is interpreted as word or
doubleword.

PR target/69810
* config/rs6000/rs6000.md (zero_extendqi2_dot): Convert from
define_insn_and_split to define_insn.
(zero_extendqi2_dot2): Same.
(extendqi2_dot): Same.
(extendqi2_dot2): Same.

Bootstrapped on powerpc-ibm-aix7.1.0.0 and powerpc64le-linux.

Thanks, David
Index: rs6000.md
===
--- rs6000.md   (revision 232439)
+++ rs6000.md   (working copy)
@@ -701,7 +701,7 @@
rlwinm %0,%1,0,0xff"
   [(set_attr "type" "load,shift")])
 
-(define_insn_and_split "*zero_extendqi2_dot"
+(define_insn "*zero_extendqi2_dot"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x,?y")
(compare:CC (zero_extend:EXTQI (match_operand:QI 1 "gpc_reg_operand" 
"r,r"))
(const_int 0)))
@@ -709,19 +709,12 @@
   "rs6000_gen_cell_microcode"
   "@
andi. %0,%1,0xff
-   #"
-  "&& reload_completed && cc_reg_not_cr0_operand (operands[2], CCmode)"
-  [(set (match_dup 0)
-   (zero_extend:EXTQI (match_dup 1)))
-   (set (match_dup 2)
-   (compare:CC (match_dup 0)
-   (const_int 0)))]
-  ""
+   rlwinm %0,%1,0,0xff\;cmpwi %2,%0,0"
   [(set_attr "type" "logical")
(set_attr "dot" "yes")
(set_attr "length" "4,8")])
 
-(define_insn_and_split "*zero_extendqi2_dot2"
+(define_insn "*zero_extendqi2_dot2"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x,?y")
(compare:CC (zero_extend:EXTQI (match_operand:QI 1 "gpc_reg_operand" 
"r,r"))
(const_int 0)))
@@ -730,14 +723,7 @@
   "rs6000_gen_cell_microcode"
   "@
andi. %0,%1,0xff
-   #"
-  "&& reload_completed && cc_reg_not_cr0_operand (operands[2], CCmode)"
-  [(set (match_dup 0)
-   (zero_extend:EXTQI (match_dup 1)))
-   (set (match_dup 2)
-   (compare:CC (match_dup 0)
-   (const_int 0)))]
-  ""
+   rlwinm %0,%1,0,0xff\;cmpwi %2,%0,0"
   [(set_attr "type" "logical")
(set_attr "dot" "yes")
(set_attr "length" "4,8")])
@@ -855,7 +841,7 @@
   "extsb %0,%1"
   [(set_attr "type" "exts")])
 
-(define_insn_and_split "*extendqi2_dot"
+(define_insn "*extendqi2_dot"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x,?y")
(compare:CC (sign_extend:EXTQI (match_operand:QI 1 "gpc_reg_operand" 
"r,r"))
(const_int 0)))
@@ -863,19 +849,12 @@
   "rs6000_gen_cell_microcode"
   "@
extsb. %0,%1
-   #"
-  "&& reload_completed && cc_reg_not_cr0_operand (operands[2], CCmode)"
-  [(set (match_dup 0)
-   (sign_extend:EXTQI (match_dup 1)))
-   (set (match_dup 2)
-   (compare:CC (match_dup 0)
-   (const_int 0)))]
-  ""
+   extsb %0,%1\;cmpwi %2,%0,0"
   [(set_attr "type" "exts")
(set_attr "dot" "yes")
(set_attr "length" "4,8")])
 
-(define_insn_and_split "*extendqi2_dot2"
+(define_insn "*extendqi2_dot2"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x,?y")
(compare:CC (sign_extend:EXTQI (match_operand:QI 1 "gpc_reg_operand" 
"r,r"))
(const_int 0)))
@@ -884,14 +863,7 @@
   "rs6000_gen_cell_microcode"
   "@
extsb. %0,%1
-   #"
-  "&& reload_completed && cc_reg_not_cr0_operand (operands[2], CCmode)"
-  [(set (match_dup 0)
-   (sign_extend:EXTQI (match_dup 1)))
-   (set (match_dup 2)
-   (compare:CC (match_dup 0)
-   (const_int 0)))]
-  ""
+   extsb %0,%1\;cmpwi %2,%0,0"
   [(set_attr "type" "exts")
(set_attr "dot" "yes")
(set_attr "length" "4,8")])

Re: PPC libgcc IEEE128 soft-fp exception/rounding fixes

2016-02-22 Thread David Edelsohn

libgcc
* config/rs6000/sfp-machine.h:
(_FP_DECL_EX): Declare _fpsr as a union of u64 and double.
(FP_TRAPPING_EXCEPTIONS): Return a bitmask of trapping
exceptions.
(FP_INIT_ROUNDMODE): Read the fpscr instead of writing
a mystery value.
(FP_ROUNDMODE): Update the usage of _fpscr.

Okay.

Thanks, David

Re: [PATCH, rs6000] PR 66337: Improve Compliance with Power ABI

2016-02-18 Thread David Edelsohn

> Ulrich Weigand wrote:
>> Kevin Nilsen wrote:
>
>> This patch has bootstrapped and tested on powerpc64le-unknown-linux-gnu and=
>>  powerpc64be-unknown-linux-gnu (both 32-bit and 64-bit) and=20
>> powerpc64-unknown-freebsd11.0 (big endian) with no regressions.  Is it ok to
>> fix this on the trunk?
>>
>> The problem described in PR66337 (https://gcc.gnu.org/bugzilla/show_bug.cgi=
>> ?id=3D3D66337) is that compiling for PowerPC targets with the -malign-power=
>>  command-line option results in invalid field offsets for certain structure=
>>  members.   As identified in the problem report, this results from a macro =
>> definition present in both config/rs6000/{freebsd64,linux64}.h, which in bo=
>> th cases is introduced by the comment:
>>
>> /* PowerPC64 Linux word-aligns FP doubles when -malign-power is given. */
>>
>> I have consulted various ABI documents, including "64-bit PowerPC ELF Appli=
>> cation Binary Interface Supplement 1.9" (http://refspecs.linuxfoundation.or=
>> g/ELF/ppc64/PPC-elf64abi.html), "Power Architecture 64-Bit ELF V2 ABI Speci=
>> fication" (https://members.openpowerfoundation.org/document/dl/576), and "P=
>> ower
>> Architecture(R) 32-bit Application Binary Interface Supplement 1.0 - Linux(=
>> R) & Embedded" (https://www.power.org/documentation/power-architecture-32-b=
>> it-abi-supplement-1-0-embeddedlinuxunified/).  I have not been able to find=
>>  any basis for this comment and thus am concluding that the comment and exi=
>> sting implementation are incorrect.
>>
>> The implemented patch removes the comment and changes the macro definition =
>> so that field alignment calculations on 64-bit architectures ignore the -ma=
>> lign-power command-line option.  With this fix, the test case identified in=
>>  the PR behaves as was expected by the submitter.
>
> There seems to be some confusion here.  First of all, on Linux and FreeBSD,
> the *default* behavior is -malign-natural, which matches what the Linux ABI
> specifies.  Using -malign-power on *Linux* is an explicit instruction to
> the compiler to *deviate* from the documented ABI.
>
> The only effect that the deviation has on Linux is to change the alignment
> requirements for certain structure elements.  Your patch removes this change,
> making -malign-power fully a no-op on Linux and FreeBSD.  This doesn't seem
> to be particularly useful ...  If you don't want the effect, you can simply
> not use that switch.
>
> To my understanding, the intent of providing that switch was to allow
> creating code that is compatible code produced by some other compilers
> that do not adhere to the Linux ABI, but some other ABI.  In particular,
> my understanding is that the *AIX* ABI has these alignment requirements.
> And in fact, GCC on *AIX* defaults to -malign-power.
>
> Looking at PR 66337, the submitter actually refers to the behaviour of
> GCC on AIX, so I'm not sure how Linux is even relevant here.  (Maybe
> there is something wrong in how GCC implements the AIX ABI.  But I'm
> not really familar with AIX, so I can't help much with that.)

AIX does not use natural alignment.  For historical reasons, the
maximum alignment of double is word alignment.  In an attempt to
correct the alignment mistake, the AIX POWER ABI increases the
alignment of structures who first element is double to double word.
XLC increases the alignment of the member but GCC does not.

GCC allows use of AIX POWER ABI alignment in ELF for some early
customers.  The option has nothing to do with Linux ABIs nor embedded
ABIs.

Thanks for the patch, but it is not addressing the correct problem.
The issue is specifically about GCC compatibility with XLC for AIX
ABI.

Thanks, David

Re: [PATCH], PR 68404 patch #4 (fix earlyclobber problem on power8 fusion)

2016-02-18 Thread David Edelsohn

On Thu, Feb 18, 2016 at 11:45 AM, Michael Meissner
 wrote:
> This patch to rs6000.md (which is essentially the same as #3) fixes the 
> problem
> by removing the early clobber.  The patches to predicates.md, and the fusion
> tests revert my changes on February 9th that originally 'solved' the problem 
> by
> not allowing fusion of ADDI values.  We have tested the fix using a combine
> profiled and LTO bootstrap build and it does not cause any regressions.
> Because machine independent changes can mask the bug at times, we also did a
> profiled/LTO build on the subversion revision that showed up the problem.  Is
> this ok to install in the trunk?
>
> Since gcc 5 contains the exact same early clobber, I would like to also back
> port the change to GCC 5.
>
> [gcc]
> 2016-02-18  Michael Meissner  
>
> PR target/68404
> * config/rs6000/predicates.md (fusion_gpr_addis): Revert
> 2016-02-09 change.
>
> * config/rs6000/rs6000.md (fusion_gpr_load_): Remove
> earlyclobber from target.  Use wF constraint for fused memory
> address.
> (fusion_gpr___load): Likewise.
>
> [gcc/testsuites]
> 2016-02-18  Michael Meissner  
>
> PR target/68404
> * gcc.target/powerpc/fusion.c: Revert the 2016-02-09 change.
> * gcc.target/powerpc/fusion3.c: Likewise.

This is okay for trunk and GCC 5 branch.

Thanks, David

Re: [PATCH, rs6000] Fix pasto resulting in wrong instruction from builtins for lvxl

2016-02-16 Thread David Edelsohn

On Tue, Feb 16, 2016 at 4:44 PM, Bill Schmidt
<wschm...@linux.vnet.ibm.com> wrote:
> On Tue, 2016-02-16 at 11:40 -0800, David Edelsohn wrote:
>> This is okay, but how about starting with a testcase for this?
>
> Fair enough.  Here's the revised patch with a test, which I've verified
> on powerpc64-unknown-linux-gnu.  Ok to proceed?
>
> Thanks!
>
> Bill
>
>
> [gcc]
>
> 2016-02-16  Bill Schmidt  <wschm...@linux.vnet.ibm.com>
>
> * config/rs6000/altivec.md (*altivec_lvxl__internal): Output
> correct instruction.
>
> [gcc/testsuite]
>
> 2012-02-16  Bill Schmidt  <wschm...@linux.vnet.ibm.com>
>
> * gcc.target/powerpc/vec-cg.c: New test.

Okay.

Thanks, David

Re: [RS6000] reload_vsx_from_gprsf splitter

2016-02-15 Thread David Edelsohn

On Mon, Feb 15, 2016 at 4:24 PM, Alan Modra <amo...@gmail.com> wrote:
> On Mon, Feb 15, 2016 at 06:42:35AM -0800, David Edelsohn wrote:
>> Is there still an issue with the constraints used for movdi_internal64?
>
> Yes and no.  No because we shouldn't be attempting DI moves between vsx
> regs and gprs.  Yes because we ought to allow DImode in vsx regs, but
> fixing that is likely not trivial.
>
> Do we want to backport the PR68973 fixes to gcc-5 and gcc-4.9?  We are
> exposed to the reload_vsx_from_gprsf bug there, I think, but TFmode
> won't be IEEE.

Backporting to 5 and 4.9 branches is okay with me.

Thanks, David

Re: [RS6000] reload_vsx_from_gprsf splitter

2016-02-15 Thread David Edelsohn

On Mon, Feb 15, 2016 at 4:36 AM, Alan Modra  wrote:
> On Fri, Feb 12, 2016 at 02:57:22PM +0100, Ulrich Weigand wrote:
>> > On Fri, Feb 12, 2016 at 08:54:19AM +1030, Alan Modra wrote:
>> > > Another concern I had about this, besides using %L in asm output (what
>> > > forces TFmode to use just fprs?), is what happens when we're using
>> > > IEEE 128-bit floats?  In that case it looks like we'd get just one reg.
>> >
>> > Good point that it breaks if the default long double (TFmode) type is IEEE
>> > 128-bit floating point.  We would need to have two patterns, one that uses
>> > TFmode and one that uses IFmode.  I wrote the power8 direct move stuff 
>> > before
>> > going down the road of IEEE 128-bit floating point.
>>
>> Right.  It's a bit unfortunate that we can't just use IFmode unconditionally,
>> but it seems rs6000_scalar_mode_supported_p (IFmode) may return false, and
>> then we probably shouldn't be using it.
>
> Actually, we can use IFmode unconditionally.  scalar_mode_supported_p
> is relevant only up to and including expand.  Nothing prevents the
> backend from using IFmode.
>
>> Another option might be to use TDmode to allocate a scratch register pair.
>
> That won't work, at least if we want to extract the two component regs
> with simplify_gen_subreg, due to rs6000_cannot_change_mode_class.  In
> my original patch I just extracted the regs by using gen_rtx_REG but I
> changed that, based on your criticism of using gen_rtx_REG in
> reload_vsx_from_gprsf, and because rs6000.md avoids gen_rtx_REG using
> operand regnos in other places.  That particular change is of course
> entirely cosmetic.  I also changed reload_vsx_from_gprsf to avoid mode
> punning regs, instead duplicating insn patterns as done elsewhere in
> the vsx support.  I don't believe we will see subregs of vsx or fp
> regs after reload, but I'm quite willing to concede the point for a
> stage4 fix.
>
> Here's the revised patch.  To recap, the main bug fixes here are:
> - stop reload_vsx_from_gprsf splitter from emitting a move not
> handled by movdi_internal64
> - don't use TFmode, which cannot now be assumed to be IBM
> double-double.
> Secondary to that, not using or passing around TFmode means the %L
> restriction no longer matters, and constraints on the reload temp reg
> can be relaxed.
>
> Bootstrapped and regression tested powerpc64-linux biarch and
> powerpc64le-linux.  OK David?
>
> PR target/68973
> * config/rs6000/rs6000.md (reload_vsx_from_gprsf): Use p8_mtvsrd_sf
> rather than attempting to use movdi_internal64.  Remove op0_di.
> (p8_mtvsrd_df, p8_mtvsrd_sf): New.
> (p8_mtvsrd_1, p8_mtvsrd_2): Delete.
> (p8_mtvsrwz): New.
> (p8_mtvsrwz_1, p8_mtvsrwz_2): Delete.
> (p8_xxpermdi_): Take two DF inputs rather than one TF.
> (p8_fmrgow_): Likewise.
> (reload_vsx_from_gpr): Make clobber IF.  Adjust for above
> changes.
> (reload_fpr_from_gpr): Similarly. Use "d" for op0 constraint.
> * config/rs6000/vsx.md (vsx_xscvspdpn_directmove): Make op1 SFmode.
>

Okay.

Is there still an issue with the constraints used for movdi_internal64?

Thanks, David

Re: [PATCH], PR 68404 patch #2 (disable power8/power9 fusion on PowerPC)

2016-02-11 Thread David Edelsohn

On Wed, Feb 10, 2016 at 2:46 PM, Jakub Jelinek  wrote:
> On Wed, Feb 10, 2016 at 05:42:17PM -0500, Michael Meissner wrote:
>> This patch disables -mcpu=power8/-mtune=power8 from setting -mpower8-fusion 
>> and
>> -mcpu=power9/-mtune=power9 from setting -mpower9-fusion.  I will look at the
>> earlyclobber that Bernd Schmidt mentioned, but for now it may be safest to 
>> just
>> disable it for GCC 6.0.
>>
>> I built it on a little endian power8 system, and there were no regressions.  
>> Is
>> it ok to install?
>
> Doesn't this mean the bug is still there, just not enabled unless
> -mpower[89]-fusion (ok, perhaps mitigated by the previous workaround patch)?
> Wouldn't it be better to just forcefully clear the options (and thus ignore
> -them) for the time being if they are known to be broken?
>
>> [gcc]
>> 2016-02-10  Michael Meissner  
>>
>>   PR target/68404
>>   * config/rs6000/predicates.md (fusion_gpr_addis): Revert
>>   2016-02-09 change.
>>
>>   * config/rs6000/rs6000-cpus.def (ISA_2_7_MASKS_SERVER): Do not set
>>   power8/power9 fusion by default.
>>   (ISA_3_0_MASKS_SERVER): Likewise.
>>
>>   * config/rs6000/rs6000.c (rs6000_option_override_internal): Remove
>>   code setting -mpower8-fusion if -mtune=power8 and -mpower9-fusion
>>   if -mtune=power9.
>>
>>   * doc/invoke.texi (RS/6000 and PowerPC Options): Document that
>>   -mpower8-fusion and -mpower9-fusion are not set by default.
>>
>> [gcc/testsuites]
>> 2016-02-10  Michael Meissner  
>>
>>   PR target/68404
>>   * gcc.target/powerpc/fusion.c: Do not assume that -mtune=power8
>>   sets -mpower8-fusion or -mtune=power9 sets -mpower9-fusion.
>>   * gcc.target/powerpc/fusion2.c: Likewise.
>>   * gcc.target/powerpc/fusion3.c: Likewise.

Because of the more recent patches that should fix the cause of this
failure, this set of patches now are moot.

- David

Re: [PATCH], PR 68404 patch #3 (fix earlyclobber problem on power8 fusion)

2016-02-11 Thread David Edelsohn

On Thu, Feb 11, 2016 at 1:43 PM, Michael Meissner
 wrote:
> After looking at Bernd Schmidt and Jakub Jelinek's suggestions, I came to
> conclusion that earlyclobber was not needed in this case, and I removed it.  I
> bootstrapped the compiler using profiledbootstrap and lto options and it
> succeeded build and running make check.  Just to be sure, I also did a
> profiledbootstrap with LTO and -O3 and it built fine.  Is it ok to install
> these patches?
>
> I decided to keep the changes to the testsuite explicitly passing the fusion
> switches, rather than letting -mtune=power8/power9 set them, but I can be
> persuaded to restore the 3 tests to the way they were before February 9th.
>
> [gcc]
> 2016-02-11  Michael Meissner  
>
> PR target/68404
> * config/rs6000/predicates.md (fusion_gpr_addis): Revert
> 2016-02-09 change.
>
> * config/rs6000/rs6000.md (fusion_gpr_load_): Remove
> earlyclobber from target.  Use wF constraint for fused memory
> address.
> (fusion_gpr___load): Likewise.
>
> [gcc/testsuites]
> 2016-02-11  Michael Meissner  
>
> PR target/68404
> * gcc.target/powerpc/fusion.c: Do not assume that -mtune=power8
> sets -mpower8-fusion or -mtune=power9 sets -mpower9-fusion.
> * gcc.target/powerpc/fusion2.c: Likewise.
> * gcc.target/powerpc/fusion3.c: Likewise.
>
> Since gcc 5.0 also has the earlyclobber in the pattern, I would like to apply
> the same change to gcc 5.x (after testing of course), even though we haven't
> yet run into the problem with GCC 5.x.  Is this ok as well?

This is okay for trunk and GCC 5 branch.

Did you test the patch with the first patch reverted?  The first patch
also was correct and fixed a problem, but it also allows this
underlying bug to appear more prominently.  I want to ensure that the
patch was compared with a version of the compiler that elicited the
failure symptoms.

Thanks, David

Re: [RS6000] reload_vsx_from_gprsf splitter

2016-02-11 Thread David Edelsohn

On Thu, Feb 11, 2016 at 10:38 AM, Ulrich Weigand <uweig...@de.ibm.com> wrote:
> David Edelsohn wrote:
>> On Thu, Feb 11, 2016 at 6:04 AM, Alan Modra <amo...@gmail.com> wrote:
>> > This is PR68973 part 2, the failure of a boost test, caused by a
>> > splitter emitting an invalid move in reload_vsx_from_gprsf:
>> >   emit_move_insn (op0_di, op2);
>> >
>> > op0 can be any vsx reg, but the mtvsrd destination constraint in
>> > movdi_internal64 is "wj", which only allows fp regs.  I'm not sure why
>> > we have that constraint so left movdi_internal64 alone and used a
>> > special p8_mtvsrd instead, similar to p8_mtvsrd_1 and p8_mtvsrd_2 used
>> > by reload_vsx_from_gpr.  When looking at those, I noticed we're
>> > restricted to fprs there too so fixed that as well.  (We can't use %L
>> > in asm operands that must accept vsx.)
>>
>> Michael, please review the "wj" constraint in these patterns.
>>
>> Alan, the explanation says that it uses a special p8_mtvsrd similar to
>> p8_mtvsrd_[12], but does not explain why the two similar patterns are
>> removed.  The incorrect use of %L implies those patterns, but the
>> change is to p8_xxpermdi_ that is not mentioned in the
>> ChangeLog.
>>
>> I also would appreciate Uli's comments on this direction because of
>> his reload expertise.
>
> For the most part, this patch doesn't really change anything in the
> interaction with reload as far as I can see.  The changes introduced
> by the patch do make sense to me.  In particular, replacing the two
> patterns p8_mtvsrd_1 and p8_mtvsrd_2 used to fill high and low parts
> of a TFmode register pair with a single pattern p8_mtvsrd that just
> works on any DFmode register (leaving the split into high/low to the
> caller if necessary) seems to simplify things.
>
>> > +  /* You might think that we could use op0 as one temp and a DF clobber
>> > + as the other, but you'd be wrong.  These secondary_reload patterns
>> > + don't check that the clobber is different to the destination, which
>> > + is probably a reload bug.  */
>
> It's not a bug, it's deliberate behavior.  The reload registers allocated
> for secondary reload clobbers may overlap the destination, since in many
> cases you simply move the input to the reload register, and then the
> reload register to the destination (where the latter move can be a no-op
> if it is possible to allocate the reload register and the destination
> into the same physical register).  If you need two separate intermediate
> values, you need to allocate a secondary reload register of a larger
> mode (as is already done in the pattern).
>
>> >/* Also use the destination register to hold the unconverted DImode 
>> > value.
>> >   This is conceptually a separate value from OP0, so we use gen_rtx_REG
>> >   rather than simplify_gen_subreg.  */
>> > -  rtx op0_di = gen_rtx_REG (DImode, REGNO (op0));
>> > +  rtx op0_df = gen_rtx_REG (DFmode, REGNO (op0));
>> > +  rtx op0_v4sf = gen_rtx_REG (V4SFmode, REGNO (op0));
>
> While this was not introduced by this patch, I'm a little bit concerned
> about the hard-coded use of REGNO here, which will crash if op0 at this
> point happens to be a SUBREG instead of a REG.  This is extremely unlikely
> at this point in reload, but not 100% impossible, e.g. if op0 for some
> reason is one of the "magic" registers like the stack pointer.
>
> That's why it is in general better to use simplify_gen_subreg or one of
> gen_highpart/gen_lowpart, which will handle SUBREG correctly as well.
> I'm not sure why it matters whether this is "conceptually a separate
> value" as the comment argues ...
>
>> >/* Move SF value to upper 32-bits for xscvspdpn.  */
>> >emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32)));
>> > -  emit_move_insn (op0_di, op2);
>> > -  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0_di));
>> > +  emit_insn (gen_p8_mtvsrd (op0_df, op2));
>> > +  emit_insn (gen_vsx_xscvspdpn (op0_df, op0_v4sf));
>
> The sequence of modes used for op0 here is a bit weird.  First,
> op0 is loaded as DFmode by mtvsrd, then it is silently re-
> interpreted as V4SFmode when used as input to xscvspdpn, which
> gets a DFmode output that is again silently re-interpreted as
> SFmode.
>
> This isn't really wrong as such, just maybe a bit confusing.
> Maybe instead have p8_mtvsrd use DImode as output (instead of
> DFmode), which shouldn't be any harder to use in the
> reload_vsx_from_gpr splitter, and keep using the
> vsx_xscvspdpn_directmove pattern?
>
> [ This of course reinforces the question why we have p8_mtvsrd
> in the first place, instead of just allowing this use directly
> in movdi_internal64 itself.  ]

Good question: is p8_mtvsrd really necessary if the movdi_internal64
is updated to use the correct constraints?

The patch definitely is going in the right direction.  Can we remove
more unnecessary bits?

Thanks, David

Re: [RS6000] reload_vsx_from_gprsf splitter

2016-02-11 Thread David Edelsohn

On Thu, Feb 11, 2016 at 6:04 AM, Alan Modra  wrote:
> This is PR68973 part 2, the failure of a boost test, caused by a
> splitter emitting an invalid move in reload_vsx_from_gprsf:
>   emit_move_insn (op0_di, op2);
>
> op0 can be any vsx reg, but the mtvsrd destination constraint in
> movdi_internal64 is "wj", which only allows fp regs.  I'm not sure why
> we have that constraint so left movdi_internal64 alone and used a
> special p8_mtvsrd instead, similar to p8_mtvsrd_1 and p8_mtvsrd_2 used
> by reload_vsx_from_gpr.  When looking at those, I noticed we're
> restricted to fprs there too so fixed that as well.  (We can't use %L
> in asm operands that must accept vsx.)

Michael, please review the "wj" constraint in these patterns.

Alan, the explanation says that it uses a special p8_mtvsrd similar to
p8_mtvsrd_[12], but does not explain why the two similar patterns are
removed.  The incorrect use of %L implies those patterns, but the
change is to p8_xxpermdi_ that is not mentioned in the
ChangeLog.

I also would appreciate Uli's comments on this direction because of
his reload expertise.

Thanks, David

>
> Bootstrapped and regression tested powerpc64le-linux.  powerpc64-linux
> biarch -mcpu=power7 bootstrap still in progress.  OK to apply assuming
> no regressions found?
>
> PR target/68973
> * config/rs6000/vsx.md (vsx_xscvspdpn_directmove): Delete.
> * config/rs6000/rs6000.md (reload_vsx_from_gprsf): Rewrite splitter.
> (p8_mtvsrd): New.
> (p8_mtvsrd_1, p8_mtvsrd_2): Delete.
> (reload_vsx_from_gpr): Adjust to use p8_mtvsrd.
>
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index cdbf873..745293b 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -7543,29 +7543,22 @@
> (set_attr "type" "three")])
>
>  ;; Move 128 bit values from GPRs to VSX registers in 64-bit mode
> -(define_insn "p8_mtvsrd_1"
> -  [(set (match_operand:TF 0 "register_operand" "=ws")
> -   (unspec:TF [(match_operand:DI 1 "register_operand" "r")]
> +(define_insn "p8_mtvsrd"
> +  [(set (match_operand:DF 0 "register_operand" "=ws")
> +   (unspec:DF [(match_operand:DI 1 "register_operand" "r")]
>UNSPEC_P8V_MTVSRD))]
>"TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
> -  "mtvsrd %0,%1"
> -  [(set_attr "type" "mftgpr")])
> -
> -(define_insn "p8_mtvsrd_2"
> -  [(set (match_operand:TF 0 "register_operand" "+ws")
> -   (unspec:TF [(match_dup 0)
> -   (match_operand:DI 1 "register_operand" "r")]
> -  UNSPEC_P8V_MTVSRD))]
> -  "TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
> -  "mtvsrd %L0,%1"
> +  "mtvsrd %x0,%1"
>[(set_attr "type" "mftgpr")])
>
>  (define_insn "p8_xxpermdi_"
>[(set (match_operand:FMOVE128_GPR 0 "register_operand" "=wa")
> -   (unspec:FMOVE128_GPR [(match_operand:TF 1 "register_operand" "ws")]
> -UNSPEC_P8V_XXPERMDI))]
> +   (unspec:FMOVE128_GPR [
> +   (match_operand:DF 1 "register_operand" "ws")
> +   (match_operand:DF 2 "register_operand" "ws")]
> +   UNSPEC_P8V_XXPERMDI))]
>"TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
> -  "xxpermdi %x0,%1,%L1,0"
> +  "xxpermdi %x0,%x1,%x2,0"
>[(set_attr "type" "vecperm")])
>
>  (define_insn_and_split "reload_vsx_from_gpr"
> @@ -7581,13 +7574,18 @@
>  {
>rtx dest = operands[0];
>rtx src = operands[1];
> -  rtx tmp = operands[2];
> +  /* You might think that we could use op0 as one temp and a DF clobber
> + as the other, but you'd be wrong.  These secondary_reload patterns
> + don't check that the clobber is different to the destination, which
> + is probably a reload bug.  */
> +  rtx tmp_hi = gen_rtx_REG (DFmode, REGNO (operands[2]));
> +  rtx tmp_lo = gen_rtx_REG (DFmode, REGNO (operands[2]) + 1);
>rtx gpr_hi_reg = gen_highpart (DImode, src);
>rtx gpr_lo_reg = gen_lowpart (DImode, src);
>
> -  emit_insn (gen_p8_mtvsrd_1 (tmp, gpr_hi_reg));
> -  emit_insn (gen_p8_mtvsrd_2 (tmp, gpr_lo_reg));
> -  emit_insn (gen_p8_xxpermdi_ (dest, tmp));
> +  emit_insn (gen_p8_mtvsrd (tmp_hi, gpr_hi_reg));
> +  emit_insn (gen_p8_mtvsrd (tmp_lo, gpr_lo_reg));
> +  emit_insn (gen_p8_xxpermdi_ (dest, tmp_hi, tmp_lo));
>DONE;
>  }
>[(set_attr "length" "12")
> @@ -7622,16 +7620,18 @@
>rtx op0 = operands[0];
>rtx op1 = operands[1];
>rtx op2 = operands[2];
> +
>/* Also use the destination register to hold the unconverted DImode value.
>   This is conceptually a separate value from OP0, so we use gen_rtx_REG
>   rather than simplify_gen_subreg.  */
> -  rtx op0_di = gen_rtx_REG (DImode, REGNO (op0));
> +  rtx op0_df = gen_rtx_REG (DFmode, REGNO (op0));
> +  rtx op0_v4sf = gen_rtx_REG (V4SFmode, REGNO (op0));
>rtx op1_di = simplify_gen_subreg (DImode, op1, SFmode, 0);
>
>/* Move SF value to upper 32-bits for xscvspdpn.  */
>emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32)));
> -

Re: libgcc: On AIX, increase chances to find landing pads for exceptions

2016-02-10 Thread David Edelsohn

On Wed, Feb 10, 2016 at 1:52 AM, Michael Haubenwallner
<michael.haubenwall...@ssi-schaefer.com> wrote:
>
> On 02/08/2016 02:59 PM, David Edelsohn wrote:
>> Runtime linking is disabled by default on AIX, and I disabled it for 
>> libstdc++.
>
> For large applications mainly developed on/for Linux I do prefer/need
> runtime linking even on AIX. Still I do believe there is no AIX-based
> reason to leave runtime linking disabled, but build-/linktime issues
> instead that cause things to fail with runtime linking enabled.

What do you mean by the term "runtime linking"?  Runtime linking means
runtime overloading of symbols -- preloading -- not dynamic linking
and loading.  dlopen does not require runtime linking.  There also are
issues with searching for shared objects with .a or .so file
extension, but that can be addressed separately.

Runtime linking causes every global, exported function call to be
invoked through indirect glue code.  And each function must be
inserted into the TOC.  The indirect call overhead is very expensive,
and potential TOC overflow can cause even more performance
degradation.

Your statement of no AIX-based reason to leave runtime linking
disabled is fundamentally flawed.

>
>> There are two remaining issues:
>>
>> 1) FDEs with overlapping ranges causing problems with exceptions.  I'm
>> not sure of the best way to work around this.  Your patch is one
>> possible solution.
>
> This patch is not meant as a final solution, but to improve current
> situation with broken build systems exporting even _GLOBAL__ symbols.
> I'm about to prepare another libtool patch to fix that one.
>
>> 2) AIX linker garbage collection conflicting with scanning for
>> symbols.  collect2 scanning needs to better emulate SVR4 linker
>> semantics for object files and archives.
>
> Probably collect2 should filter the symbol list originating in either
> an explicit -bexport:file or the -bexpall/-bexpfull flags and pass the
> resulting symbol list as explicit -bexport:file only to the AIX linker?

-bexpall and -bexpfull cause numerous problem by re-exporting symbols.

All of the suggestions will produce programs that function, but have
severe performance impacts and unintended consequences that you seem
to be ignoring.

- David

>
> /haubi/
>
>>
>> Thanks, David
>>
>>
>> On Mon, Feb 8, 2016 at 7:14 AM, Michael Haubenwallner
>> <michael.haubenwall...@ssi-schaefer.com> wrote:
>>> Hi David,
>>>
>>> still experiencing exception-not-caught problems with gcc-4.2.4 on AIX
>>> leads me to some patch proposed in http://gcc.gnu.org/PR13878 back in
>>> 2004 already, ought to be fixed by some different commit since 3.4.0.
>>>
>>> As long as build systems (even libtool right now) on AIX do export these
>>> _GLOBAL__* symbols from shared libraries, overlapping frame-base address
>>> ranges may become registered, even if newer gcc (seen with 4.8) does name
>>> the FDE symbols more complex to reduce these chances.
>>>
>>> But still, just think of linking some static library into multiple shared
>>> libraries and/or the main executable. Or sometimes there is just need for
>>> some hackery to override a shared object's implementation detail and rely
>>> on runtime linking to do the override at runtime.
>>>
>>> Agreed both is "wrong" to some degree, but the larger an application is,
>>> the higher is the chance for this to happen.
>>>
>>> Thoughts?
>>>
>>> Thanks!
>>> /haubi/

Re: [PATCH], PR target/68404, Fix PowerPC fusion error

2016-02-09 Thread David Edelsohn

On Tue, Feb 9, 2016 at 9:49 AM, Michael Meissner
 wrote:
> This bug fixes PR 68404, which created an insn for the fusion operation when
> accessing an array with a large constant offset that the downstream passes
> (regrenam in particular don't like).  Because fusion in general adds so little
> to the performance of power8, I just eliminated the compiler from generating
> this case for GCC 6.  In the GCC 7 timeframe, I likely will revist fusion for
> power9 support.  I ran a spec 2006 benchmark suite comparing the current
> behavior and the fix for PR 68404, and it was in the noise level (mcf was 1%
> slower, others ranged from 0.3% slower to 0.4% faster).
>
> I did a bootstrap build, including a bootstrap profiled build with LTO (which
> is how the problem was found) and it was found.  I rewrote 2 of the 3 fusion
> tests so that it uses fusion from a medium code toc entry instead of accessing
> an array element with a constant index over 65536 bytes.
>
> Is this patch ok to apply?  If you would prefer, I can eliminate the code
> inside of the fusion_gpr_addis predicate instead of using #if 0.
>
> [gcc]
> 2016-02-08  Michael Meissner  
>
> PR target/68404
> * config/rs6000/predicates.md (fusion_gpr_addis): Prevent fusing
> an ADDIS that adds a pointer to a large constant that sets the
> upper16 bits with a load operation.
>
> [gcc/testsuite]
> 2016-02-08  Michael Meissner  
>
> PR target/68404
> * gcc.target/powerpc/fusion.c: Rewrite test to use TOC fusion
> instead accessing a really large arrray.
> * gcc.target/powerpc/fusion3.c: Likewise.

Please remove the code entirely, not #if 0.

Okay with that change.

Thanks, David

Re: libgcc: On AIX, increase chances to find landing pads for exceptions

2016-02-08 Thread David Edelsohn

Runtime linking is disabled by default on AIX, and I disabled it for libstdc++.

There are two remaining issues:

1) FDEs with overlapping ranges causing problems with exceptions.  I'm
not sure of the best way to work around this.  Your patch is one
possible solution.

2) AIX linker garbage collection conflicting with scanning for
symbols.  collect2 scanning needs to better emulate SVR4 linker
semantics for object files and archives.

Thanks, David


On Mon, Feb 8, 2016 at 7:14 AM, Michael Haubenwallner
 wrote:
> Hi David,
>
> still experiencing exception-not-caught problems with gcc-4.2.4 on AIX
> leads me to some patch proposed in http://gcc.gnu.org/PR13878 back in
> 2004 already, ought to be fixed by some different commit since 3.4.0.
>
> As long as build systems (even libtool right now) on AIX do export these
> _GLOBAL__* symbols from shared libraries, overlapping frame-base address
> ranges may become registered, even if newer gcc (seen with 4.8) does name
> the FDE symbols more complex to reduce these chances.
>
> But still, just think of linking some static library into multiple shared
> libraries and/or the main executable. Or sometimes there is just need for
> some hackery to override a shared object's implementation detail and rely
> on runtime linking to do the override at runtime.
>
> Agreed both is "wrong" to some degree, but the larger an application is,
> the higher is the chance for this to happen.
>
> Thoughts?
>
> Thanks!
> /haubi/

Re: [PATCH] PR rtl-optimization/64081: Enable RTL loop unrolling for duplicated exit blocks and back edges.

2016-02-06 Thread David Edelsohn

On Fri, Feb 5, 2016 at 3:27 PM, Jeff Law  wrote:
> On 02/05/2016 06:43 AM, Alexander Fomin wrote:
>>
>> Hi!
>>
>> Some kind of this patch was submitted about a year ago by Igor
>> Zamyatin. It's an attempt to fix PR rtl-optimization/64081 by enabling
>> RTL loop unrolling for duplicated exit blocks and back edges.
>>
>> At the time it caused AIX bootstrap failure, but now it's OK according
>> to David's testing. I've also bootstrapped and regtested it on
>> x86_64-linux-gnu.
>>
>> Is it still OK for trunk now, or you consider this v7 stuff?
>> Anyway, it's a regression.
>>
>> Thanks,
>> Alexander
>> ---
>> gcc/
>>
>> PR rtl-optimization/64081
>> * loop-iv.c (def_pred_latch_p): New function.
>> (latch_dominating_def): Allow specific cases with non-single
>> definitions.
>> (iv_get_reaching_def): Likewise.
>> (check_complex_exit_p): New function.
>> (check_simple_exit): Use check_complex_exit_p to allow certain
>> cases
>> with exits not executing on any iteration.
>>
>> gcc/testsuite
>>
>> PR rtl-optimization/64081
>> * gcc.dg/pr64081.c: New test.
>
> Normally I'd say that if it was approved before, then it's still good to go
> since there haven't been major conceptual changes in this code since the
> patch was originally written and now.
>
> However, in this instance the patch had been reported to cause problems on
> AIX, problems that we can't reproduce now -- which makes me want to be more
> cautious.  Was it a problem with the patch, or some other latent issue -- we
> don't know at this point.
>
> So I think the way to go is to apply this patch on top of r219827 where it
> caused the AIX failure.  Then bootstrap on aix and determine the root cause
> of of the AIX bootstrap failure.  If it's this patch, then update the patch
> as needed.  If the patch is just exposing a latent bug elsewhere, we should
> evaluate whether or not that latent but has been fixed or not before
> applying this fix to the trunk.
>
> It's considerably more work, but ISTM it's the right thing to do.

I'm on the fence about this patch.  I definitely don't think that it
should be merged for GCC 6.

If the patch were to be proposed during Stage 1 for GCC 7 and had not
caused bootstrap problems for AIX, no one would have any question.

The problem is we don't know if the patch exposed a latent bug that
independently was fixed after the patch was reverted or if the patch
still contains a bug that has been rendered latent by another change.

Another approach to track down the cause would be to bisect which
patch fixed the bootstrap failure if the patch had not been reverted.

I agree with Jeff that a statement that "the original patch magically
works now" is not a good justification for merging it -- at least not
for GCC 6.

Thanks, David

Re: [PATCH, rs6000] Fix type attribute for a few insns

2016-02-04 Thread David Edelsohn

On Thu, Feb 4, 2016 at 9:33 PM, Pat Haugen  wrote:
> The following patch fixes a few insns that were specifying an incorrect
> 'type' attribute.
>
> Bootstrap/regtest on powerpc64-linux with no new regressions. Ok for trunk?
>
> -Pat
>
>
> 2016-02-04  Pat Haugen  
>
> * config/rs6000/crypto.md (crypto_vpermxor_): Correct insn
> type.
> * config/rs6000/rs6000.md (mov_hardfloat): Likewise.
> (*ieee128_mfvsrd_64bit): Likewise.
> (*ieee128_mfvsrd_32bit): Likewise.

Okay.

Thanks, David

Re: [PATCH] Fix -mcpu=power8 atomic expansion (PR target/69644)

2016-02-04 Thread David Edelsohn

On Thu, Feb 4, 2016 at 6:33 AM, Alan Modra <amo...@gmail.com> wrote:
> On Wed, Feb 03, 2016 at 05:34:17PM -0500, David Edelsohn wrote:
>> On Wed, Feb 3, 2016 at 5:28 PM, Jakub Jelinek <ja...@redhat.com> wrote:
>> > Hi!
>> >
>> > rs6000_expand_atomic_compare_and_swap uses oldval directly in
>> > a comparison instruction, but oldval might be a CONST_INT not suitable
>> > for the instruction (such as in the testcase below in SImode comparison
>> > 0x8000 constant).  We need to force those into register if they don't
>> > satisfy the predicate.
>> >
>> > Bootstrapped/regtested on powerpc64{,le}-linux, ok for trunk?
>> >
>> > 2016-02-03  Jakub Jelinek  <ja...@redhat.com>
>> >
>> > PR target/69644
>> > * config/rs6000/rs6000.c (rs6000_expand_atomic_compare_and_swap):
>> > Force oldval into register if it does not satisfy 
>> > reg_or_short_operand
>> > predicate.  Fix up formatting.
>> >
>> > * gcc.dg/pr69644.c: New test.
>>
>> Okay.
>
> This needs to go on gcc-5 and gcc-4.9 branches too, where it fixes
> pr69146.  pr69146 and pr69644 are dups.  OK to apply to the branches?

Okay with me, but coordinate with Jakub.

Thanks, David

Re: [PATCH], PR 69667, Fix PowerPC long double failure with -mlra

2016-02-04 Thread David Edelsohn

On Thu, Feb 4, 2016 at 3:39 PM, Michael Meissner
 wrote:
> This patch fixes a bug where LRA would abort when compiling a C++ program with
> -mlra.  I tracked this down to using the "ws" constraint for TFmode, TDmode,
> and IFmode, but those types are limited to just the traditional floating point
> registers (ws on power8 targets all of the VSX registers).
>
> With this patch, it eliminates the last use of the "wm" constraint.  However,
> since it is a documented constraint, I am not proposing to delete the
> constraint.
>
> I built a bootstrapped compiler on little endian power8, there were no
> regression errors.  Is it ok to check in this patch?
>
> [gcc]
> 2016-02-04  Michael Meissner  
>
> PR target/69667
> * config/rs6000/rs6000.md (mov_64bit_dm): Use 'd' constraint
> instead of 'ws', and 'wh' instead of 'wm' since TFmode/IFmode are
> not allowed into the traditional Altivec registers.
> (movtd_64bit_nodm): Likewise.
> (mov_32bit, FMOVE128_FPR iterator): Likewise.
>
> [gcc/testsuite]
> 2016-02-04  Michael Meissner  
>
> PR target/69667
> * g++.dg/pr69667.C: New file.

Okay.

Thanks, David

Re: [PATCH] bootstrap/69611

2016-02-03 Thread David Edelsohn

this patch fixes bootstrap on FreeBSD PowerPC and hopefully all other
PowerPC targets which do not have float128 support.

The patch itself is a bandaid to survive stage4. We have to come up
with a better solution for FreeBSD and all other soft float targets
which do not support float128.

The patch was tested by Michael Meissner on different POWER machines.

Ok to commit to trunk?

TIA,
Andreas

2016-02-03  Andreas Tobler  

PR bootstrap/69611
* config/rs6000/sfp-machine.h: Guard __sfp_exceptions with
__FLOAT128__ to compile only for __float128 capable targets.


Okay.

Thanks, David

Re: [PATCH] Fix -mcpu=power8 atomic expansion (PR target/69644)

2016-02-03 Thread David Edelsohn

On Wed, Feb 3, 2016 at 5:28 PM, Jakub Jelinek  wrote:
> Hi!
>
> rs6000_expand_atomic_compare_and_swap uses oldval directly in
> a comparison instruction, but oldval might be a CONST_INT not suitable
> for the instruction (such as in the testcase below in SImode comparison
> 0x8000 constant).  We need to force those into register if they don't
> satisfy the predicate.
>
> Bootstrapped/regtested on powerpc64{,le}-linux, ok for trunk?
>
> 2016-02-03  Jakub Jelinek  
>
> PR target/69644
> * config/rs6000/rs6000.c (rs6000_expand_atomic_compare_and_swap):
> Force oldval into register if it does not satisfy reg_or_short_operand
> predicate.  Fix up formatting.
>
> * gcc.dg/pr69644.c: New test.

Okay.

Thanks, David

Re: [PATCH], PR 69461, PowerPC specific fix for toc-fusion

2016-02-03 Thread David Edelsohn

On Wed, Feb 3, 2016 at 6:34 PM, Michael Meissner
 wrote:
> In PR 69461, Vlad mentioned that in rs6000_legitimate_address_p, I was trying
> to validate an address for TOC fusion, but I was using a predicate that looked
> for a MEM instead of an address.
>
> I bootstrapped the compiler on a little endian power8 and there were no
> regressions.  In addition, Segher Boessenkool, says that with Vlad's patch and
> this patch, it fixes a lot of the errors that he was looking at.
>
> Is the patch ok to check in?
>
> 2016-02-03  Michael Meissner  
> Vladimir Makarov  
>
> PR target/69461
> * config/rs6000/rs6000.c (rs6000_legitimate_address_p): Fix thinko
> in validating fused toc addresses.

Okay.

Thanks, David

Re: [RS6000] ABI_V4 init of toc section

2016-02-01 Thread David Edelsohn

On Fri, Jan 29, 2016 at 11:38 AM, Alan Modra  wrote:
> Since 4c4a180d, LTO has turned off flag_pic when linking a fixed
> position executable.  This results in flag_pic being zero in
> rs6000_file_start, and no definition of ".LCTOC1".
>
> However, when we get to actually emitting code, flag_pic may be on
> again, and references made to ".LCTOC1".  How flag_pic comes to be
> enabled again is quite a story.  It goes like this..  If a function is
> compiled with -fPIC then sysv4.h SUBTARGET_OVERRIDE_OPTIONS will set
> TARGET_RELOCATABLE.  Conversely, if TARGET_RELOCATABLE is set and
> flag_pic is zero, then SUBTARGET_OVERRIDE_OPTIONS will set flag_pic=2.
> It also happens that TARGET_RELOCATABLE is a bit in rs6000_isa_flags,
> which is handled by rs6000_function_specific_save and
> rs6000_function_specific_restore.  That last fact means lto streaming
> keeps track of the state of TARGET_RELOCATABLE for functions, and when
> options are restored for a given function we'll set flag_pic=2 if the
> function was originally compiled with -fPIC.  That's bad because it
> defeats the purpose of the 4c4a180d lto change, resulting in worse
> optimization of ppc32 executables.  What's more, we don't seem to turn
> off flag_pic once it is on.
>
> We should really untangle the flag_pic/TARGET_RELOCATABLE mess, but
> that change is probably a little dangerous for stage4.  Instead, this
> patch removes the toc symbol initialization from file_start and does
> so when the first item is emitted to the toc, or after the function
> epilogue in the cases where we emit code to initialize a toc pointer
> but don't actually use it (-O0 mostly, I think).
>
> Bootstrapped and regression tested powerpc64-linux biarch with all
> languages enabled.  OK to apply?
>
> PR target/68662
> * config/rs6000/rs6000.c (need_toc_init): New var, set it
> whenever toc_label_name used.
> (rs6000_file_start): Don't set up toc section here,
> (rs6000_output_function_epilogue): do so here instead,
> (rs6000_xcoff_file_start): and here.
> * config/rs6000/rs6000.md (load_toc_aix_si): Set need_toc_init.
> (load_toc_aix_di): Likewise.

This is okay as an interim fix for GCC 6.

Thanks, David

Re: [RS6000] lqarx and stqcx. registers

2016-02-01 Thread David Edelsohn

On Sun, Jan 31, 2016 at 5:28 PM, Alan Modra  wrote:
> lqarx RT and stqcx. RS are valid only with even numbered gprs.  The
> predicate to enforce this happens to allow a loophole, closed by this
> patch.
>
> This pattern created by combine:
> Trying 8 -> 9:
> Successfully matched this instruction:
> (set (subreg:PTI (reg:TI 155 [ D.2357 ]) 0)
> (unspec_volatile:PTI [
> (mem/v:TI (reg/v/f:DI 157 [ mptr ]) [-1  S16 A128])
> ] UNSPECV_LL))
>
> is seen by reload as needing to reload pseudo 155 in TI mode, which
> has no requirement that the reg be even.  Apparently, nothing checks
> the predicate again after reload.
>
> We only see this problem on gcc-5 and gcc-4.9, because on gcc-6 we
> don't define WORD_REGISTER_OPERATIONS and combine happens to have a
> bug in simplify_set that prevents it creating the problem subregs.
> See https://gcc.gnu.org/ml/gcc-patches/2016-01/msg02377.html
>
> Bootstrapped and regression tested powerpc64-linux biarch on master
> both with and without the combine bug, and on gcc-5.  OK for master
> and active branches?
>
> gcc/
> PR target/69548
> * config/rs6000/predicates.md (quad_int_reg_operand): Don't
> allow subregs.
> gcc/testsuite/
> * gcc.target/powerpc/pr69548.c: New test.

Okay.

Thanks, David

Re: [PATCH, rs6000] Fix PR65546

2016-01-29 Thread David Edelsohn

On Thu, Jan 28, 2016 at 5:41 PM, Bill Schmidt
 wrote:
> Hi,
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65546 identifies a failure
> in gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c.  The test case hasn't
> kept up with changes in the vectorizer, so it's looking for the wrong
> error message.  Also, the error message should be conditioned by a check
> for support of unaligned memory accesses.  This patch corrects these
> problems.
>
> For 4.9 and 5, the error message needs to be similarly changed.
> However, for these earlier releases, the check for misalignment support
> doesn't apply.
>
> Verified on powerpc64le-unknown-linux-gnu for both -mcpu=power7 and
> -mcpu=power8, which differ in their support for misalignment.  Is this
> ok for trunk?  Provided verification succeeds on 4.9 and 5, is the
> revised test ok for those releases?
>
> Thanks,
> Bill
>
>
> 2016-01-28  Bill Schmidt  
>
> PR target/65546
> * gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c: Correct
> condition being checked, and disable it when the target supports
> misaligned loads and stores.

Okay.

Thanks, David

Re: [RS6000] ABI_V4 init of toc section

2016-01-29 Thread David Edelsohn

On Fri, Jan 29, 2016 at 11:38 AM, Alan Modra  wrote:
> Since 4c4a180d, LTO has turned off flag_pic when linking a fixed
> position executable.  This results in flag_pic being zero in
> rs6000_file_start, and no definition of ".LCTOC1".
>
> However, when we get to actually emitting code, flag_pic may be on
> again, and references made to ".LCTOC1".  How flag_pic comes to be
> enabled again is quite a story.  It goes like this..  If a function is
> compiled with -fPIC then sysv4.h SUBTARGET_OVERRIDE_OPTIONS will set
> TARGET_RELOCATABLE.  Conversely, if TARGET_RELOCATABLE is set and
> flag_pic is zero, then SUBTARGET_OVERRIDE_OPTIONS will set flag_pic=2.
> It also happens that TARGET_RELOCATABLE is a bit in rs6000_isa_flags,
> which is handled by rs6000_function_specific_save and
> rs6000_function_specific_restore.  That last fact means lto streaming
> keeps track of the state of TARGET_RELOCATABLE for functions, and when
> options are restored for a given function we'll set flag_pic=2 if the
> function was originally compiled with -fPIC.  That's bad because it
> defeats the purpose of the 4c4a180d lto change, resulting in worse
> optimization of ppc32 executables.  What's more, we don't seem to turn
> off flag_pic once it is on.
>
> We should really untangle the flag_pic/TARGET_RELOCATABLE mess, but
> that change is probably a little dangerous for stage4.  Instead, this
> patch removes the toc symbol initialization from file_start and does
> so when the first item is emitted to the toc, or after the function
> epilogue in the cases where we emit code to initialize a toc pointer
> but don't actually use it (-O0 mostly, I think).
>
> Bootstrapped and regression tested powerpc64-linux biarch with all
> languages enabled.  OK to apply?
>
> PR target/68662
> * config/rs6000/rs6000.c (need_toc_init): New var, set it
> whenever toc_label_name used.
> (rs6000_file_start): Don't set up toc section here,
> (rs6000_output_function_epilogue): do so here instead,
> (rs6000_xcoff_file_start): and here.
> * config/rs6000/rs6000.md (load_toc_aix_si): Set need_toc_init.
> (load_toc_aix_di): Likewise.

I'm worried about how this is going to interact with AIX.  AIX
assembler is single pass and this patch moves the initialization from
the beginning of the file to the end of the file, which means there
will be references to a label whose definition is delayed until the
end.

- David

Re: [PATCH, rs6000] Disable static branch prediction in absence of real profile data

2016-01-27 Thread David Edelsohn

On Wed, Jan 27, 2016 at 6:10 PM, Pat Haugen  wrote:
> The following patch prevents static prediction if we don't have real profile
> data. Testing on SPEC CPU2006 showed a couple improvements in specint and
> specfp neutral. Bootstrap/regtest on powerpc64 with no new regressions. Ok
> for trunk?
>
> -Pat
>
>
> 2016-01-27  Pat Haugen  
>
> * config/rs6000/rs6000.c (output_cbranch): Don't statically predict
> branches if using guessed profile.

Okay.

Thanks, David

Re: [PATCH] add test for target/17381 - Unnecessary register move for float extend

2016-01-27 Thread David Edelsohn

On Wed, Jan 27, 2016 at 5:38 PM, Martin Sebor  wrote:
> The attached patch adds a test for the apparently long fixed
> bug.
>
> FWIW, I've been trying to close out some of these old bugs and
> while it doesn't seem to be done consistently, it occurs to me
> that it might be nice to add tests for them.  Please let me
> know if you don't think it's worth the trouble (not just mine
> but also reviewing the tests and maintaining them).

Assuming this passes, the additional test is okay.

Thanks, David

Re: [PATCH 1/4] Make SRA scalarize constant-pool loads

2016-01-27 Thread David Edelsohn

On Wed, Jan 27, 2016 at 6:36 PM, Jeff Law <l...@redhat.com> wrote:
> On 01/27/2016 12:39 PM, David Edelsohn wrote:
>>
>> The new sra-17.c and sra-18.c tests fail on AIX because the regex is
>> too restrictive -- AIX labels don't have exactly the same format.  On
>> AIX, the labels in the dumps look like "LC..0" instead of ".LC0".
>>
>> This patch adds "*" and ".*" so that the "." prepended to LC is
>> optional and to allow characters between the "LC" and the "0".
>>
>> I needed extra escapes for the sra-17.c line that matches multiple
>> times - for no apparent reason.
>
> The joys of expect/tcl.  I just keep escaping until the regex that I
> developed outside the suite works.  I have been trying to get away from
> using .* though.  The longest match nature sometimes gives surprising
> results.  In theory .*? ought to work better, but I haven't tried using it
> much.
>
> Anyway, the change looks fine to me.

Segher pointed out to me that my revised regex was matching multiple
lines, so it was not triggering multiple times without the restriction
on the pattern.

A revised, tighter patch uses "?"

Index: sra-17.c
===
--- sra-17.c(revision 232904)
+++ sra-17.c(working copy)
@@ -15,5 +15,5 @@
   abort ();
 }

-/* { dg-final { scan-tree-dump-times "Removing load: a = \\\*.LC0;" 1
"esra" } } */
-/* { dg-final { scan-tree-dump-times "SR.\[0-9_\]+ = \\\*.LC0\\\[" 4
"esra" } } */
+/* { dg-final { scan-tree-dump-times "Removing load: a =
\\\*\\.?LC\\.?\\.?0;" 1 "esra" } } */
+/* { dg-final { scan-tree-dump-times "SR\\.\[0-9_\]+ =
\\\*\\.?LC\\.?\\.?0\\\[" 4 "esra" } } */
Index: sra-18.c
===
--- sra-18.c(revision 232904)
+++ sra-18.c(working copy)
@@ -21,8 +21,8 @@
   abort ();
 }

-/* { dg-final { scan-tree-dump-times "Removing load: a = \\\*.LC0;" 1
"esra" } } */
-/* { dg-final { scan-tree-dump-times "SR.\[0-9_\]+ =
\\\*.LC0\\.b\\\[0\\\]\\.f\\\[0\\\]\\.x" 1 "esra" } } */
-/* { dg-final { scan-tree-dump-times "SR.\[0-9_\]+ =
\\\*.LC0\\.b\\\[0\\\]\\.f\\\[1\\\]\\.x" 1 "esra" } } */
-/* { dg-final { scan-tree-dump-times "SR.\[0-9_\]+ =
\\\*.LC0\\.b\\\[1\\\]\\.f\\\[0\\\]\\.x" 1 "esra" } } */
-/* { dg-final { scan-tree-dump-times "SR.\[0-9_\]+ =
\\\*.LC0\\.b\\\[1\\\]\\.f\\\[1\\\]\\.x" 1 "esra" } } */
+/* { dg-final { scan-tree-dump-times "Removing load: a =
\\\*\\.?LC\\.?\\.?0;" 1 "esra" } } */
+/* { dg-final { scan-tree-dump-times "SR\\.\[0-9_\]+ =
\\\*\\.?LC\\.?\\.?0\\.b\\\[0\\\]\\.f\\\[0\\\]\\.x" 1 "esra" } } */
+/* { dg-final { scan-tree-dump-times "SR\\.\[0-9_\]+ =
\\\*\\.?LC\\.?\\.?0\\.b\\\[0\\\]\\.f\\\[1\\\]\\.x" 1 "esra" } } */
+/* { dg-final { scan-tree-dump-times "SR\\.\[0-9_\]+ =
\\\*\\.?LC\\.?\\.?0\\.b\\\[1\\\]\\.f\\\[0\\\]\\.x" 1 "esra" } } */
+/* { dg-final { scan-tree-dump-times "SR\\.\[0-9_\]+ =
\\\*\\.?LC\\.?\\.?0\\.b\\\[1\\\]\\.f\\\[1\\\]\\.x" 1 "esra" } } */

Re: [PATCH 1/4] Make SRA scalarize constant-pool loads

2016-01-27 Thread David Edelsohn

The new sra-17.c and sra-18.c tests fail on AIX because the regex is
too restrictive -- AIX labels don't have exactly the same format.  On
AIX, the labels in the dumps look like "LC..0" instead of ".LC0".

This patch adds "*" and ".*" so that the "." prepended to LC is
optional and to allow characters between the "LC" and the "0".

I needed extra escapes for the sra-17.c line that matches multiple
times - for no apparent reason.

Okay?

Thanks, David

Index: sra-17.c
===
--- sra-17.c(revision 232872)
+++ sra-17.c(working copy)
@@ -15,5 +15,5 @@
   abort ();
 }

-/* { dg-final { scan-tree-dump-times "Removing load: a = \\\*.LC0;" 1
"esra" } } */
-/* { dg-final { scan-tree-dump-times "SR.\[0-9_\]+ = \\\*.LC0\\\[" 4
"esra" } } */
+/* { dg-final { scan-tree-dump-times "Removing load: a =
\\\*.*LC.*0;" 1 "esra" } } */
+/* { dg-final { scan-tree-dump-times "SR.\[0-9_\]+ =
\\\*\\.*LC\\.*0\\\[" 4 "esra" } } */
Index: sra-18.c
===
--- sra-18.c(revision 232872)
+++ sra-18.c(working copy)
@@ -21,8 +21,8 @@
   abort ();
 }

-/* { dg-final { scan-tree-dump-times "Removing load: a = \\\*.LC0;" 1
"esra" } } */
-/* { dg-final { scan-tree-dump-times "SR.\[0-9_\]+ =
\\\*.LC0\\.b\\\[0\\\]\\.f\\\[0\\\]\\.x" 1 "esra" } } */
-/* { dg-final { scan-tree-dump-times "SR.\[0-9_\]+ =
\\\*.LC0\\.b\\\[0\\\]\\.f\\\[1\\\]\\.x" 1 "esra" } } */
-/* { dg-final { scan-tree-dump-times "SR.\[0-9_\]+ =
\\\*.LC0\\.b\\\[1\\\]\\.f\\\[0\\\]\\.x" 1 "esra" } } */
-/* { dg-final { scan-tree-dump-times "SR.\[0-9_\]+ =
\\\*.LC0\\.b\\\[1\\\]\\.f\\\[1\\\]\\.x" 1 "esra" } } */
+/* { dg-final { scan-tree-dump-times "Removing load: a =
\\\*.*LC.*0;" 1 "esra" } } */
+/* { dg-final { scan-tree-dump-times "SR.\[0-9_\]+ =
\\\*.*LC.*0\\.b\\\[0\\\]\\.f\\\[0\\\]\\.x" 1 "esra" } } */
+/* { dg-final { scan-tree-dump-times "SR.\[0-9_\]+ =
\\\*.*LC.*0\\.b\\\[0\\\]\\.f\\\[1\\\]\\.x" 1 "esra" } } */
+/* { dg-final { scan-tree-dump-times "SR.\[0-9_\]+ =
\\\*.*LC.*0\\.b\\\[1\\\]\\.f\\\[0\\\]\\.x" 1 "esra" } } */
+/* { dg-final { scan-tree-dump-times "SR.\[0-9_\]+ =
\\\*.*LC.*0\\.b\\\[1\\\]\\.f\\\[1\\\]\\.x" 1 "esra" } } */

Re: [PATCH, 4.9, rs6000, testsuite] Fix PR69479

2016-01-27 Thread David Edelsohn

On Tue, Jan 26, 2016 at 4:46 PM, Bill Schmidt
 wrote:
> Hi,
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69479 notes that
> gcc.dg/and-1.c fails a scan-assembler-not test for nand, but the test
> does pass in subsequent releases.  The test author indicates in comment
> #1 that we can just remove this test for powerpc*-*-*, which this patch
> does.  Verified for 4.9 on powerpc64le-unknown-linux-gnu.  Ok to commit
> to that branch?
>
> Thanks,
> Bill
>
>
> 2016-01-26  Bill Schmidt  
>
> * gcc.dg/and-1.c: Remove nand test for powerpc*-*-*.

Sigh. This testcase should have been placed in gcc.target.

This patch is okay.

Thanks, David

Re: [PATCH] rs6000: Put back the 's' output modifier

2016-01-26 Thread David Edelsohn

On Mon, Jan 25, 2016 at 9:39 PM, Segher Boessenkool
 wrote:
> It turns out the 's' output modifier is used in some glibc math code,
> and is in an installed header even.  So let's put it back, it is much
> less of a burden supporting it a bit longer than to deal with the fallout.
> (It is also being fixed for glibc.)
>
> Tested on powerpc64-linux-gcc; is this okay for mainline?

Okay.

Thanks, David

Re: [PATCH], PowerPC IEEE 128-bit fp, #12 (default -mfloat128 on PowerPC-Linux)

2016-01-26 Thread David Edelsohn

On Thu, Jan 21, 2016 at 4:25 PM, Michael Meissner
 wrote:
> This is the final patch (at least so far) that turns on -mfloat128 by default
> for PowerPC Linux systems where the VSX instruction set is enabled.  As I
> mentioned in the last email, because we don't build the __float128 emulator on
> other systems, I didn't think it would be useful to make it the default.
>
> I did a boostrap build/check with no regressions on a little endian power8
> system.  Are the patches ok to check in?
>
> [gcc]
> 2016-01-21   Michael Meissner  
>
> * config/rs6000/rs6000.c (rs6000_option_override_internal): Enable
> -mfloat128 by default on PowerPC Linux systems with the VSX
> instruction enabled.
>
> [gcc/testsuite]
> 2016-01-21   Michael Meissner  
>
> * gcc.target/powerpc/float128-1.c: New test for IEEE 128-bit
> floating point support.
> * gcc.target/powerpc/float128-2.c: Likewise.

No.  This is too risky a change during Stage 4.

- David

Re: [PATCH] Partial fix for PR target/68662

2016-01-26 Thread David Edelsohn

On Tue, Jan 26, 2016 at 2:15 PM, Jakub Jelinek  wrote:
> Hi!
>
> As Alan mentioned in the PR, there is some other issue still around, but
> by the time I've noticed that, I already had this patch being
> bootstrapped/regtested on powerpc64{,le}-linux (which just passed).
> Ok for trunk and deal with the rest incrementally?
>
> 2016-01-26  Jakub Jelinek  
>
> PR target/68662
> * config/rs6000/rs6000.c (rs6000_option_override_internal): Initialize
> toc_label_name unconditionally.
> (rs6000_emit_load_toc_table): Call ggc_strdup on toc_label_name for
> SYMBOL_REF string.  Use toc_label_name instead of constructing
> LCTOC1.
> (rs6000_elf_declare_function_name): Use toc_label_name instead of
> constructing LCTOC1.

This is okay as an incremental fix.

Thanks, David

Re: [PATCH, 4.9, rs6000, testsuite] Fix PR69479

2016-01-26 Thread David Edelsohn

On Tue, Jan 26, 2016 at 4:46 PM, Bill Schmidt
 wrote:
> Hi,
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69479 notes that
> gcc.dg/and-1.c fails a scan-assembler-not test for nand, but the test
> does pass in subsequent releases.  The test author indicates in comment
> #1 that we can just remove this test for powerpc*-*-*, which this patch
> does.  Verified for 4.9 on powerpc64le-unknown-linux-gnu.  Ok to commit
> to that branch?
>
> Thanks,
> Bill
>
>
> 2016-01-26  Bill Schmidt  
>
> * gcc.dg/and-1.c: Remove nand test for powerpc*-*-*.

Please use an XFAIL, not removing the target from the test.

Thanks, David

Re: [PATCH, rs6000] Fix PR63354

2016-01-25 Thread David Edelsohn

On Sun, Jan 24, 2016 at 9:17 PM, Bill Schmidt
 wrote:

> Hi Jan, thanks for the report!  Patch below that should fix the problem.
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu, no
> regressions.  David, is this ok for trunk?
>
> Thanks,
> Bill
>
>
> 2016-01-24  Bill Schmidt  
>
> * config/rs6000/rs6000.c (rs6000_keep_leaf_when_profiled):  Add
> decl with __attribute__ ((unused)) annotation.

Okay.

Thanks, David

Re: [PATCH] Fix a typo in ppc libgcc (PR target/69444)

2016-01-25 Thread David Edelsohn

On Mon, Jan 25, 2016 at 3:34 PM, Jakub Jelinek  wrote:
> Hi!
>
> The soft-fp multilib of powerpc libgcc doesn't build because of a typo
> in the conditional - the guarded code uses inline asm that assumes hard
> float.
>
> Ok for trunk?
>
> 2016-01-25  Jakub Jelinek  
>
> PR target/69444
> * config/rs6000/sfp-machine.h: Fix a typo in #ifndef - __NO_FPRS__
> instead of ___NO_FPRS__.

Okay.

Thanks, David

Re: [PATCH, rs6000] Fix PR63354

2016-01-22 Thread David Edelsohn

On Fri, Jan 22, 2016 at 12:42 AM, Bill Schmidt
 wrote:
> Hi,
>
> On Thu, 2016-01-21 at 21:21 -0600, Bill Schmidt wrote:
>> The testcase will need a slight adjustment, as currently it fails on
>> powerpc64 with -m32 testing.  Working on a fix.
>>
>> Bill
>>
>
> This patch adjusts the gcc.target/powerpc/pr63354 test to require 64-bit
> code generation, and also restricts the test to Linux targets, as this
> is necessary for using -mprofile-kernel.  Tested on
> powerpc64-unknown-linux-gnu configured with --with-cpu=power7 and
> testing with -m32; the test is now correctly skipped there.  Is this
> okay for trunk?
>
> Thanks,
> Bill
>
>
> 2016-01-22  Bill Schmidt  
>
> * gcc.target/powerpc/pr63354.c: Restrict to Linux targets with
> 64-bit support.

Okay.

Thanks, David

Re: [PATCH, rs6000] Add Power9 asm entries

2016-01-21 Thread David Edelsohn

On Wed, Jan 20, 2016 at 4:21 PM, Pat Haugen  wrote:
> The following adds a couple missed Power9 assembler option entries.
> Bootstrapped on ppc64. Ok for trunk?
>
> -Pat
>
> 2016-01-20  Pat Haugen  
>
> * config/rs6000/aix71.h (ASM_CPU_SPEC): Add entry for Power9.
> * config/rs6000/driver-rs6000.c (struct asm_names): Likewise.

Okay.

We still need to find out the magic number that AIX will return for Power9.

Thanks, David

Re: [PATCH], PowerPC IEEE 128-bit fp, #11-rev4 (enable libgcc conversions)

2016-01-21 Thread David Edelsohn

On Wed, Jan 20, 2016 at 8:00 PM, Michael Meissner
 wrote:
> This is revision 4 of the IEEE 128-bit floating point libgcc support.
>
> Since revision 3, I have removed the gcc changes that broke AIX.  I rewrote 
> the
> IBM extended double pack/unpack support to not use the builtin functions, but
> instead uses a union.  The libgcc code that I wrote tickles a bug in the pack
> function.  While I would like to fix the pack function bug, I will need to 
> make
> sure I don't break AIX, so I didn't want to couple this library to getting
> those bugs fixed.
>
> I have also rewritten how the ifunc support is done, so that ifunc is only 
> done
> if the target assembler supports ISA 3.0 instructions AND the compiler 
> supports
> ifunc functions.  This is so that the compiler can build on 64-bit systems if
> --enable-gnu-indirect-function is not specified without the ifunc functions
> being flagged.
>
> I have done bootstraps on both a big endian power7 system and a little endian
> power8 with no regressions.  In addition, I have built a compiler explicitly
> disabling ifunc support, and it built and ran the ieee 128-bit floating point
> unit tests correctly.
>
> Can I install this into libgcc?
>
> Assuming I can install these changes, the one final change that I would like 
> to
> make is to enable float128 automatically on VSX powerpc Linux systems, but not
> on other systems (AIX, *BSD, etc.) since those systems do not build float128
> emulator functions.

What does "enable" mean?

> 2016-01-20  Michael Meissner  
> Steven Munroe 
> Tulio Magno Quites Machado Filho 
>
> * config/rs6000/float128-sed: New files to convert TF names to KF
> names for PowerPC IEEE 128-bit floating point support.
> * config/rs6000/float128-sed-hw: Likewise.
>
> * config/rs6000/float128-hw.c: New file for ISA 3.0 IEEE 128-bit
> floating point hardware support.
>
> * config/rs6000/float128-ifunc.c: New file to pick either IEEE
> 128-bit floating point software emulation or use ISA 3.0 hardware
> support if it is available.
>
> * config/rs6000/quad-float128.h: New file to support IEEE 128-bit
> floating point.
>
> * config/rs6000/extendkftf2-sw.c: New file, convert IEEE 128-bit
> floating point to IBM extended double.
>
> * config/rs6000/trunctfkf2-sw.c: New file, convert IBM extended
> double to IEEE 128-bit floating point.
>
> * config/rs6000/t-float128: New Makefile fragments to enable
> building __float128 emulation support.
> * config/rs6000/t-float128-hw: Likewise.
>
> * config/rs6000/sfp-exceptions.c: New file to provide exception
> support for IEEE 128-bit floating point.
>
> * config/rs6000/floattikf.c: New files for converting between IEEE
> 128-bit floating point and signed/unsigned 128-bit integers.
> * config/rs6000/fixunskfti.c: Likewise.
> * config/rs6000/fixkfti.c: Likewise.
> * config/rs6000/floatuntikf.c: Likewise.
>
> * config/rs6000/sfp-machine.h (_FP_W_TYPE_SIZE): Use 64-bit types
> when building on 64-bit systems, or when VSX is enabled.
> (_FP_W_TYPE): Likewise.
> (_FP_WS_TYPE): Likewise.
> (_FP_I_TYPE): Likewise.
> (TItype): Define on 64-bit systems.
> (UTItype): Likewise.
> (TI_BITS): Likewise.
> (_FP_MUL_MEAT_D): Add support for using 64-bit types.
> (_FP_MUL_MEAT_Q): Likewise.
> (_FP_DIV_MEAT_D): Likewise.
> (_FP_DIV_MEAT_Q): Likewise.
> (_FP_NANFRAC_D): Likewise.
> (_FP_NANFRAC_Q): Likewise.
> (ISA_BIT): Add exception support if we are being compiled on a
> machine with hardware floating point support to build the IEEE
> 128-bit emulation functions.
> (FP_EX_INVALID): Likewise.
> (FP_EX_OVERFLOW): Likewise.
> (FP_EX_UNDERFLOW): Likewise.
> (FP_EX_DIVZERO): Likewise.
> (FP_EX_INEXACT): Likewise.
> (FP_EX_ALL): Likewise.
> (__sfp_handle_exceptions): Likewise.
> (FP_HANDLE_EXCEPTIONS): Likewise.
> (FP_RND_NEAREST): Likewise.
> (FP_RND_ZERO): Likewise.
> (FP_RND_PINF): Likewise.
> (FP_RND_MINF): Likewise.
> (FP_RND_MASK): Likewise.
> (_FP_DECL_EX): Likewise.
> (FP_INIT_ROUNDMODE): Likewise.
> (FP_ROUNDMODE): Likewise.
>
> * libgcc/config.host (powerpc*-*-linux*): If compiler can compile
> VSX code, enable IEEE 128-bit floating point.  If the compiler can
> compile IEEE 128-bit floating point code with ISA 3.0 IEEE 128-bit
> floating point hardware instructions and it supports declaring
> functions with the ifunc attribute, enable ifunc functions to
> switch between software and hardware

[PATCH] Detangle gcc/configure for Darwin

2016-01-21 Thread David Edelsohn

A gcc/configure stanza to test for PowerPC mfcrf support became
tangled with Darwin test for .machine directive.  This patch detangles
and separates the two tests.

I don't have a Darwin system to test.

* configure.ac (gcc_cv_as_powerpc_mfcrf, gcc_cv_as_machine_directive): Detangle.

Okay?

Thanks, David

Index: configure.ac
===
--- configure.ac(revision 232675)
+++ configure.ac(working copy)
@@ -4172,10 +4172,8 @@
 ;;

   powerpc*-*-*)
+
 case $target in
-  *-*-aix*) conftest_s='   .machine "pwr5"
-   .csect .text[[PR]]
-   mfcr 3,128';;
   *-*-darwin*)
gcc_GAS_CHECK_FEATURE([.machine directive support],
  gcc_cv_as_machine_directive,,,
@@ -4185,7 +4183,14 @@
  echo you can get it from:
ftp://gcc.gnu.org/pub/gcc/infrastructure/cctools-528.5.dmg >&2
  test x$build = x$target && exit 1
fi
-   conftest_s='.text
+;;
+esac
+
+case $target in
+  *-*-aix*) conftest_s='   .machine "pwr5"
+   .csect .text[[PR]]
+   mfcr 3,128';;
+  *-*-darwin*) conftest_s='.text
mfcr r3,128';;
   *) conftest_s='  .machine power4
.text

Re: [PATCH, rs6000] Fix PR63354

2016-01-21 Thread David Edelsohn

On Thu, Jan 21, 2016 at 11:48 AM, Bill Schmidt
 wrote:
> Hi,
>
> Anton Blanchard proposed a fix to his own bug report in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63354, but never submitted
> the patch upstream.  I've added a formal test case and am submitting on
> his behalf.
>
> The patch simply ensures that we don't stack a frame for leaf procedures
> when called with -pg -mprofile-kernel.  The automatically generated
> calls to _mcount occur prior to the prolog and do not require us to
> stack a frame.
>
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> regressions.  Is this ok for trunk?
>
> Thanks,
> Bill
>
>
> [gcc]
>
> 2016-01-21  Anton Blanchard  
> Bill Schmidt  
>
> PR target/63354
> * config/rs6000/linux64.h (TARGET_KEEP_LEAF_WHEN_PROFILED): New
> #define.
> * config/rs6000/rs6000.c (rs6000_keep_leaf_when_profiled): New
> function.
>
> [gcc/testsuite]
>
> 2016-01-21  Anton Blanchard  
> Bill Schmidt  
>
> PR target/63354
> * gcc.target/powerpc/pr63354.c:  New test.

Okay.

Thanks, David

Re: [PATCH] gcc/configure test for AIX DWARF

2016-01-21 Thread David Edelsohn

On Thu, Jan 21, 2016 at 12:47 PM, Bernd Schmidt <bschm...@redhat.com> wrote:
> On 01/18/2016 08:30 PM, David Edelsohn wrote:
>>
>> Bootstrapped on powerpc-ibm-aix7.1.2.0 with and without the corrected
>> assembler.
>>
>> Okay?
>
>
> The changes seem to be in *-*-aix blocks, so as far as I'm concerned you are
> the maintainer and can check this in. One question though:
>
>> ;;
>>  esac
>> -;;
>>
>>mips*-*-*)
>>  gcc_GAS_CHECK_FEATURE([explicit relocation support],
>
>
> Did you intend to remove this line? This looks odd.

The ";;" were mis-matched in the patch I sent.  It is correct on trunk.

- David

Re: [PATCH, rs6000] Fix PR67489

2016-01-21 Thread David Edelsohn

On Thu, Jan 21, 2016 at 6:00 PM, Bill Schmidt
 wrote:
> Hi,
>
> The test case gcc.target/powerpc/p8vector-builtin-8.c needs to be
> restricted to targets that support the __int128 keyword.  This was
> wrongly being attempted with { dg-do compile { target int128 } } when
> what's really wanted is { dg-require-effective-target int128 }.  With
> this patch, the test no longer runs on 32-bit targets.
>
> Tested on powerpc64-unknown-linux-gnu using -m32.  Is this ok for trunk?
>
> Thanks,
> Bill
>
>
> 2016-01-21  Bill Schmidt  
>
> PR testsuite/67489
> * gcc.target/powerpc/p8vector-builtin-8.c: Remove { target int128
> } from dg-do compile directive, and instead add {
> dg-require-effective-target int128 }.

Okay.

Thanks, David

[PATCH] PR target/68609 vector swsqrt

2016-01-20 Thread David Edelsohn

This patch finishes PR target/68609 to use reciprocal estimate for vector sqrt.

PR target/68609
* config/rs6000/rs6000.c (rs6000_emit_swsqrt): Add vector domain check.
* config/rs6000/vector.md (sqrt2): Call rs6000_emit_swsqrt for V4SFmode.

Thanks, David

Index: rs6000.c
===
--- rs6000.c (revision 232439)
+++ rs6000.c (working copy)
@@ -32904,10 +32904,19 @@
   if (!recip)
 {
   rtx zero = force_reg (mode, CONST0_RTX (mode));
-  rtx target = emit_conditional_move (e, GT, src, zero, mode,
-  e, zero, mode, 0);
-  if (target != e)
- emit_move_insn (e, target);
+
+  if (mode == SFmode)
+ {
+  rtx target = emit_conditional_move (e, GT, src, zero, mode,
+  e, zero, mode, 0);
+  if (target != e)
+emit_move_insn (e, target);
+ }
+  else
+ {
+  rtx cond = gen_rtx_GT (VOIDmode, e, zero);
+  rs6000_emit_vector_cond_expr (e, e, zero, cond, src, zero);
+ }
 }

   /* g = sqrt estimate.  */
Index: vector.md
===
--- vector.md (revision 232438)
+++ vector.md (working copy)
@@ -270,7 +270,16 @@
   [(set (match_operand:VEC_F 0 "vfloat_operand" "")
  (sqrt:VEC_F (match_operand:VEC_F 1 "vfloat_operand" "")))]
   "VECTOR_UNIT_VSX_P (mode)"
-  "")
+{
+  if (mode == V4SFmode
+  && !optimize_function_for_size_p (cfun)
+  && flag_finite_math_only && !flag_trapping_math
+  && flag_unsafe_math_optimizations)
+{
+  rs6000_emit_swsqrt (operands[0], operands[1], 0);
+  DONE;
+}
+})

 (define_expand "rsqrte2"
   [(set (match_operand:VEC_F 0 "vfloat_operand" "")

Re: [PATCH, rs6000] Add support for __builtin_cpu_is() and __builtin_cpu_supports()

2016-01-20 Thread David Edelsohn

On Thu, Jan 14, 2016 at 10:50 PM, Peter Bergner  wrote:
> This patch adds support for __builtin_cpu_init(), __builtin_cpu_is() and
> __builtin_cpu_supports() builtins for PowerPC.  We use the same API as the
> x86* builtins of the same name.  These builtins uses the new GLIBC 2.23
> feature where we store the AT_PLATFORM, AT_HWCAP and AT_HWCAP2 values in the
> Thread Control Block (TCB) which offers very fast access to these values.
>
> As part of the agreement with the GLIBC community, we always emit a reference
> to a special symbol exported by LIBCs that support the AT_PLATFORM/AT_HWCAP*
> values in the TCB, whenever we expand one of the CPU builtins.  We do this
> so that we will never attempt to access the TCB on old LIBCs.  Joseph also
> asked that we conditionalize the enabling of this code with a configure time
> check for GLIBC's version and that is included here.
>
> I'll note that since GLIBC initializes the TCB before the application gets
> control, we don't actually need __builtin_cpu_init(), but we have implemented
> it anyway, to keep the same API as x86.  It's just our init expands to 
> nothing.
>
> This passes bootstrap and regtesting with no errors.  Ok for mainline?
>
> Peter
>
>
> gcc/
> * config/rs6000/ppc-auxv.h: New file.
> * config/rs6000/rs6000-builtin.def (cpu_init): Add new builtin.
> (cpu_is): Likewise.
> (cpu_supports): Likewise.
> * config/rs6000/rs6000.c: include "ppc-auxv.h".
> (cpu_is_info): New variable.
> (cpu_supports_info): Likewise.
> (tcb_verification_symbol): Likewise.
> (cpu_builtin_p): Likewise.
> (cpu_expand_builtin): New function.
> (rs6000_expand_ternop_builtin): Add support for CPU builtin functions.
> (rs6000_init_builtins): Likewise.
> (rs6000_elf_file_end): Emit HWCAP in TCB verification symbol.
> * config/rs6000/rs6000.h (TLS_REGNUM): New define.
> * configure.ac (gcc_cv_libc_provides_hwcap_in_tcb): New test.
> * configure: Regenerate.
> * config.in: Likewise.
>
> gcc/testsuite/
> * gcc.target/powerpc/cpu-builtin-1.c: New test.

>* doc/extend.texi (PowerPC Built-in Functions): Document
 >   __builtin_cpu_init, __builtin_cpu_is and __builtin_cpu_supports.

This is okay.

Thanks, David

Re: [PATCH] xfail one ppc testcase (PR tree-optimization/66612)

2016-01-20 Thread David Edelsohn

On Wed, Jan 20, 2016 at 9:28 AM, Jakub Jelinek  wrote:
> Hi!
>
> As per discussion in the PR, I'd like to xfail this test for GCC6 and
> change it to 7.0 milestone, because it is too late/too risky to change
> this for gcc 6 now.
>
> Bootstrapped/regtested on powerpc64{,le}-linux, ok for trunk?
>
> 2016-01-20  Jakub Jelinek  
>
> PR tree-optimization/66612
> * gcc.target/powerpc/20050830-1.c: Xfail the scan-assembler test
> for bdn instruction.

Okay with me.

Thanks, David

[PATCH] gcc/configure test for AIX DWARF

2016-01-18 Thread David Edelsohn

AIX7 has added support for DWARF to XCOFF, but complete and correct
support did not occur with a single update and the initial release of
AIX7.  The initial support defined a subset of common DWARF debug
sections.  A later update added most of the remaining sections for
location lists and frames, but the AIX Assembler did not correctly
handle references to labels generated by GCC.

This patch updates the gcc/configure test for the extended DWARF
support to ensure that the AIX toolchain correctly handles the label
reference.

Bootstrapped on powerpc-ibm-aix7.1.2.0 with and without the corrected assembler.

Okay?

Thanks, David

* configure.ac (gcc_cv_as_dwloc): Test support for debug frame section
label reference.
* configure: Regenerate.

Index: configure.ac
===
--- configure.ac(revision 232532)
+++ configure.ac(working copy)
@@ -4384,7 +4384,7 @@

 case $target in
   *-*-aix*)
-   gcc_GAS_CHECK_FEATURE([.ref support],
+   gcc_GAS_CHECK_FEATURE([AIX .ref support],
  gcc_cv_as_aix_ref, [2,21,0],,
  [ .csect stuff[[rw]]
 stuff:
@@ -4395,19 +4395,17 @@
  [AC_DEFINE(HAVE_AS_REF, 1,
[Define if your assembler supports .ref])])
;;
-esac

-case $target in
-  *-*-aix*)
-   gcc_GAS_CHECK_FEATURE([dwarf location lists section support],
+   gcc_GAS_CHECK_FEATURE([AIX DWARF location lists section support],
  gcc_cv_as_aix_dwloc, [2,21,0],,
- [ .dwsect 0xB
+ [ .dwsect 0xA
+   Lframe..0:
+   .vbyte 4,Lframe..0:
  ],,
  [AC_DEFINE(HAVE_XCOFF_DWARF_EXTRAS, 1,
-   [Define if your assembler supports .dwsect 0xB])])
+   [Define if your assembler supports AIX debug frame section
label reference.])])
;;
 esac
-;;

   mips*-*-*)
 gcc_GAS_CHECK_FEATURE([explicit relocation support],

Re: [PATCH v2] libstdc++: Make certain exceptions transaction_safe.

2016-01-17 Thread David Edelsohn

On Sun, Jan 17, 2016 at 3:21 PM, Torvald Riegel <trie...@redhat.com> wrote:
> On Sat, 2016-01-16 at 15:38 -0500, David Edelsohn wrote:
>> On Sat, Jan 16, 2016 at 8:35 AM, Jakub Jelinek <ja...@redhat.com> wrote:
>> > On Sat, Jan 16, 2016 at 07:47:33AM -0500, David Edelsohn wrote:
>> >> stage1 libstdc++ builds just fine.  the problem is stage2 configure
>> >> fails due to missing ITM_xxx symbols when configure tries to compile
>> >> and run conftest programs.
>> >
>> > On x86_64-linux, the _ITM_xxx symbols are undef weak ones and thus it is
>> > fine to load libstdc++ without libitm and libstdc++ doesn't depend on
>> > libitm.
>> >
>> > So, is AIX defining __GXX_WEAK__ or not?  Perhaps some other macro or
>> > configure check needs to be used to determine if undefined weak symbols
>> > work the way libstdc++ needs them to.
>>
>> __GXX_WEAK__ appears to be defined by gcc/c-family/c-cppbuiltin.c
>> based on  SUPPORTS_ONE_ONLY.  gcc/defaults.h defines SUPPORTS_ONE_ONLY
>> if the target supports MAKE_DECL_ONE_ONLY and link-once semantics.
>> AIX weak correctly supports link-once semantics.  AIX also supports
>> the definition of __GXX_WEAK__ in gcc/doc/cpp.texi, namely collapsing
>> symbols with vague linkage in multiple translation units.
>>
>> libstdc++/src/c++11/cow-stdexcept.cc appears to be using __GXX_WEAK__
>> and __attribute__ ((weak)) for references to symbols that may not be
>> defined at link time or run time.  AIX does not allow undefined symbol
>> errors by default.  And the libstdc++ inference about the semantics of
>> __GXX_WEAK__ are different than the documentation.
>>
>> AIX supports MAKE_DECL_ONE_ONLY and the documented meaning of
>> __GXX_WEAK__.  AIX does not support extension of the meaning to
>> additional SVR4 semantics not specified in the documentation.
>
> I see, so we might be assuming that __GXX_WEAK__ means more than it
> actually does (I'm saying "might" because personally, I don't know; your
> information supports this is the case, but the initial info I got was
> that __GXX_WEAK__ would mean we could have weak decls without
> definitions).

I believe that libstdc++ must continue with the weak undefined
references to the symbols as designed, but protect them with a
different macro.  For example, __GXX_WEAK_REF__ or __GXX_WEAK_UNDEF__
defined in defaults.h based on configure test or simply overridden in
config/rs6000/aix.h.  Or the macro could be local to libstdc++ and
overridden in config/os/aix/os_defines.h.

Thanks, David

Re: [PATCH v2] libstdc++: Make certain exceptions transaction_safe.

2016-01-16 Thread David Edelsohn

stage1 libstdc++ builds just fine.  the problem is stage2 configure
fails due to missing ITM_xxx symbols when configure tries to compile
and run conftest programs.

Thanks, David

On Sat, Jan 16, 2016 at 7:43 AM, Jonathan Wakely  wrote:
> What are the errors?
>
> I can build libstdc++ on gcc111.
>
> Does this patch help?
>
>

Re: [PATCH v2] libstdc++: Make certain exceptions transaction_safe.

2016-01-16 Thread David Edelsohn

This patch broke bootstrap on AIX.  Not all targets support TM.  This
patch makes libstdc++ unconditionally refer to TM symbols.

Please fix.

- David

Re: [PATCH v2] libstdc++: Make certain exceptions transaction_safe.

2016-01-16 Thread David Edelsohn

On Sat, Jan 16, 2016 at 8:35 AM, Jakub Jelinek <ja...@redhat.com> wrote:
> On Sat, Jan 16, 2016 at 07:47:33AM -0500, David Edelsohn wrote:
>> stage1 libstdc++ builds just fine.  the problem is stage2 configure
>> fails due to missing ITM_xxx symbols when configure tries to compile
>> and run conftest programs.
>
> On x86_64-linux, the _ITM_xxx symbols are undef weak ones and thus it is
> fine to load libstdc++ without libitm and libstdc++ doesn't depend on
> libitm.
>
> So, is AIX defining __GXX_WEAK__ or not?  Perhaps some other macro or
> configure check needs to be used to determine if undefined weak symbols
> work the way libstdc++ needs them to.

__GXX_WEAK__ appears to be defined by gcc/c-family/c-cppbuiltin.c
based on  SUPPORTS_ONE_ONLY.  gcc/defaults.h defines SUPPORTS_ONE_ONLY
if the target supports MAKE_DECL_ONE_ONLY and link-once semantics.
AIX weak correctly supports link-once semantics.  AIX also supports
the definition of __GXX_WEAK__ in gcc/doc/cpp.texi, namely collapsing
symbols with vague linkage in multiple translation units.

libstdc++/src/c++11/cow-stdexcept.cc appears to be using __GXX_WEAK__
and __attribute__ ((weak)) for references to symbols that may not be
defined at link time or run time.  AIX does not allow undefined symbol
errors by default.  And the libstdc++ inference about the semantics of
__GXX_WEAK__ are different than the documentation.

AIX supports MAKE_DECL_ONE_ONLY and the documented meaning of
__GXX_WEAK__.  AIX does not support extension of the meaning to
additional SVR4 semantics not specified in the documentation.

Thanks, David

Re: [PATCH v2] libstdc++: Make certain exceptions transaction_safe.

2016-01-16 Thread David Edelsohn

Torvald,

The error is a link failure in stage2 configure due to the missing
_ITM_xxx and related symbols.  I don't have the failed build any more.
Maybe Jonathan can reply with the specific failures.

There is an AIX system in the GNU Compile Farm: gcc111.

- David

On Sat, Jan 16, 2016 at 3:12 PM, Torvald Riegel <trie...@redhat.com> wrote:
> On Sat, 2016-01-16 at 14:35 +0100, Jakub Jelinek wrote:
>> On Sat, Jan 16, 2016 at 07:47:33AM -0500, David Edelsohn wrote:
>> > stage1 libstdc++ builds just fine.  the problem is stage2 configure
>> > fails due to missing ITM_xxx symbols when configure tries to compile
>> > and run conftest programs.
>>
>> On x86_64-linux, the _ITM_xxx symbols are undef weak ones and thus it is
>> fine to load libstdc++ without libitm and libstdc++ doesn't depend on
>> libitm.
>>
>> So, is AIX defining __GXX_WEAK__ or not?  Perhaps some other macro or
>> configure check needs to be used to determine if undefined weak symbols
>> work the way libstdc++ needs them to.
>
> David, if you can tell me what AIX supports and whether it defines
> __GXX_WEAK__ with the semantics we assume here, I can see what a fix
> would be.  As Jakub says, the point of all what's below is to actually
> make it work when there's no TM support.
>
> Also, knowing the actual error that AIX fails with would be helpful.  I
> have no access to AIX, so can't check.  Thanks.
>
>> #if __GXX_WEAK__
>> // Declare all libitm symbols we rely on, but make them weak so that we do
>> // not depend on libitm.
>> extern void* _ZGTtnaX (size_t sz) __attribute__((weak));
>> extern void _ZGTtdlPv (void* ptr) __attribute__((weak));
>> extern uint8_t _ITM_RU1(const uint8_t *p)
>>   ITM_REGPARM __attribute__((weak));
>> extern uint32_t _ITM_RU4(const uint32_t *p)
>>   ITM_REGPARM __attribute__((weak));
>> extern uint64_t _ITM_RU8(const uint64_t *p)
>>   ITM_REGPARM __attribute__((weak));
>> extern void _ITM_memcpyRtWn(void *, const void *, size_t)
>>   ITM_REGPARM __attribute__((weak));
>> extern void _ITM_memcpyRnWt(void *, const void *, size_t)
>>   ITM_REGPARM __attribute__((weak));
>> extern void _ITM_addUserCommitAction(void (*)(void *), uint64_t, void *)
>>   ITM_REGPARM __attribute__((weak));
>>
>> #else
>> // If there is no support for weak symbols, create dummies.  The exceptions
>> // will not be declared transaction_safe in this case.
>> void* _ZGTtnaX (size_t) { return NULL; }
>> void _ZGTtdlPv (void*) { }
>> uint8_t _ITM_RU1(const uint8_t *) { return 0; }
>> uint32_t _ITM_RU4(const uint32_t *) { return 0; }
>> uint64_t _ITM_RU8(const uint64_t *) { return 0; }
>> void _ITM_memcpyRtWn(void *, const void *, size_t) { }
>> void _ITM_memcpyRnWt(void *, const void *, size_t) { }
>> void _ITM_addUserCommitAction(void (*)(void *), uint64_t, void *) { };
>> #endif
>>
>>   Jakub
>
>
>

PR68609

2016-01-15 Thread David Edelsohn

My initial implementation of software sqrt based on estimate was
fragile for denormal inputs.  This revised version converts both sqrt
and rsqrt to use Goldschmidt's Algorithm and calculates sqrt through
an iterative correction to a sqrt estimate.

Because sqrt only is profitable for 1 iteration, this patch also
restricts swsqrt to processors that generate a high precision
estimate.

Bootstrapped on powerpc-ibm-aix7.1.0.0 and powerpc64le-linux.

Thanks, David

PR target/68609
* config/rs6000/rs6000.c (rs6000_emit_msub): Delete.
(rs6000_emit_swsqrt): Convert to Goldschmidt's Algorithm
* config/rs6000/rs6000.md (sqrt2): Limit swsqrt to high
precision estimate.

Index: rs6000.c
===
--- rs6000.c(revision 232326)
+++ rs6000.c(working copy)
@@ -32769,29 +32769,6 @@
 emit_move_insn (target, dst);
 }

-/* Generate a FMSUB instruction: dst = fma(m1, m2, -a).  */
-
-static void
-rs6000_emit_msub (rtx target, rtx m1, rtx m2, rtx a)
-{
-  machine_mode mode = GET_MODE (target);
-  rtx dst;
-
-  /* Altivec does not support fms directly;
- generate in terms of fma in that case.  */
-  if (optab_handler (fms_optab, mode) != CODE_FOR_nothing)
-dst = expand_ternary_op (mode, fms_optab, m1, m2, a, target, 0);
-  else
-{
-  a = expand_unop (mode, neg_optab, a, NULL_RTX, 0);
-  dst = expand_ternary_op (mode, fma_optab, m1, m2, a, target, 0);
-}
-  gcc_assert (dst != NULL);
-
-  if (dst != target)
-emit_move_insn (target, dst);
-}
-
 /* Generate a FNMSUB instruction: dst = -fma(m1, m2, -a).  */

 static void
@@ -32890,15 +32867,16 @@
 add_reg_note (get_last_insn (), REG_EQUAL, gen_rtx_DIV (mode, n, d));
 }

-/* Newton-Raphson approximation of single/double-precision floating point
-   rsqrt.  Assumes no trapping math and finite arguments.  */
+/* Goldschmidt's Algorithm for single/double-precision floating point
+   sqrt and rsqrt.  Assumes no trapping math and finite arguments.  */

 void
 rs6000_emit_swsqrt (rtx dst, rtx src, bool recip)
 {
   machine_mode mode = GET_MODE (src);
-  rtx x0 = gen_reg_rtx (mode);
-  rtx y = gen_reg_rtx (mode);
+  rtx e = gen_reg_rtx (mode);
+  rtx g = gen_reg_rtx (mode);
+  rtx h = gen_reg_rtx (mode);

   /* Low precision estimates guarantee 5 bits of accuracy.  High
  precision estimates guarantee 14 bits of accuracy.  SFmode
@@ -32909,55 +32887,68 @@
   if (mode == DFmode || mode == V2DFmode)
 passes++;

-  REAL_VALUE_TYPE dconst3_2;
   int i;
-  rtx halfthree;
+  rtx mhalf;
   enum insn_code code = optab_handler (smul_optab, mode);
   insn_gen_fn gen_mul = GEN_FCN (code);

   gcc_assert (code != CODE_FOR_nothing);

-  /* Load up the constant 1.5 either as a scalar, or as a vector.  */
-  real_from_integer (_2, VOIDmode, 3, SIGNED);
-  SET_REAL_EXP (_2, REAL_EXP (_2) - 1);
+  mhalf = rs6000_load_constant_and_splat (mode, dconsthalf);
-  halfthree = rs6000_load_constant_and_splat (mode, dconst3_2);
+  /* e = rsqrt estimate */
+  emit_insn (gen_rtx_SET (e, gen_rtx_UNSPEC (mode, gen_rtvec (1, src),
+UNSPEC_RSQRT)));

-  /* x0 = rsqrt estimate */
-  emit_insn (gen_rtx_SET (x0, gen_rtx_UNSPEC (mode, gen_rtvec (1, src),
- UNSPEC_RSQRT)));
-
   /* If (src == 0.0) filter infinity to prevent NaN for sqrt(0.0).  */
   if (!recip)
 {
   rtx zero = force_reg (mode, CONST0_RTX (mode));
-  rtx target = emit_conditional_move (x0, GT, src, zero, mode,
- x0, zero, mode, 0);
-  if (target != x0)
-   emit_move_insn (x0, target);
+  rtx target = emit_conditional_move (e, GT, src, zero, mode,
+ e, zero, mode, 0);
+  if (target != e)
+   emit_move_insn (e, target);
 }

-  /* y = 0.5 * src = 1.5 * src - src -> fewer constants */
-  rs6000_emit_msub (y, src, halfthree, src);
+  /* g = sqrt estimate.  */
+  emit_insn (gen_mul (g, e, src));
+  /* h = 1/(2*sqrt) estimate.  */
+  emit_insn (gen_mul (h, e, mhalf));

-  for (i = 0; i < passes; i++)
+  if (recip)
 {
-  rtx x1 = gen_reg_rtx (mode);
-  rtx u = gen_reg_rtx (mode);
-  rtx v = gen_reg_rtx (mode);
+  if (passes == 1)
+   {
+ rtx t = gen_reg_rtx (mode);
+ rs6000_emit_nmsub (t, g, h, mhalf);
+ /* Apply correction directly to 1/rsqrt estimate.  */
+ rs6000_emit_madd (dst, e, t, e);
+   }
+  else
+   {
+ for (i = 0; i < passes; i++)
+   {
+ rtx t1 = gen_reg_rtx (mode);
+ rtx g1 = gen_reg_rtx (mode);
+ rtx h1 = gen_reg_rtx (mode);

-  /* x1 = x0 * (1.5 - y * (x0 * x0)) */
-  emit_insn (gen_mul (u, x0, x0));
-  rs6000_emit_nmsub (v, y, u, halfthree);
-  emit_insn (gen_mul (x1, x0, v));
-  x0 = x1;
+ rs6000_emit_nmsub (t1, g, h, mhalf);
+ rs6000_emit_madd

Re: [trans-mem, aa64, arm, ppc, s390] Fixing PR68964

2016-01-12 Thread David Edelsohn

On Tue, Jan 12, 2016 at 11:53 AM, Richard Henderson  wrote:
> The problem in this PR is that we never got around to flushing out the vector
> support for transactions for anything but x86.  My goal here is to make this 
> as
> generic as possible, so that it should Just Work with existing vector support
> in the backend.
>
> In addition, if I encounter other unexpected register types, I will now copy
> them to memory and use memcpy, rather than crash.
>
> The one piece of this that requires a tiny bit of extra work is enabling the
> vector entry points in libitm.
>
> For x86, we make sure to build the files with SSE or AVX support enabled.  For
> s390x, I do the same thing, enabling z13 support.  I suppose we might need to
> check for binutils support, but I'd rather do this only if necessary.
>
> For arm I'm less sure what to do, since I seem to recall that use of Neon sets
> a bit in the ELF header.  Which presumably means that the binary could no
> longer be run without neon, even though the entry points wouldn't be used.
>
> For powerpc, I don't know how to select Altivec if VSX isn't already enabled,
> or indeed if that's the best thing to do.

VSX is an extension of Altivec (VMX) -- VSX always includes Altivec.
If VSX is enable, Altivec will be enabled and available.

Thanks, David

Re: [PATCH], PowerPC IEEE 128-bit fp, #11-rev3 (enable libgcc conversions)

2016-01-12 Thread David Edelsohn

On Tue, Jan 12, 2016 at 6:47 PM, Joseph Myers  wrote:
> On Tue, 12 Jan 2016, Michael Meissner wrote:
>
>> On Tue, Jan 12, 2016 at 12:18:55AM +, Joseph Myers wrote:
>> > On Mon, 11 Jan 2016, Michael Meissner wrote:
>> >
>> > > I fixed the #ifdef to use __NO_FPRS__ (thanks for the heads up on that). 
>> > >  I
>> > > also believe I fixed the various formatting issues.  These two patches 
>> > > build on
>> > > a big endian power7 host and little endian power8 host with no 
>> > > regressions in
>> > > the testsuite (the gcc patch is included here, but it hasn't changed 
>> > > since the
>> > > previous version of this patch).  Are they ok to be checked in?
>> >
>> > Are you sure you sent the right patch version?  I don't see those fixes in
>> > this one.
>>
>> You are right.  I did not update the patches from the changes I had made in 
>> the
>> branch.
>
> I have no further comments on the patch.

If Joseph is satisfied, it's okay with me.

Thanks, David

Re: [PATCH, rs6000] Use lxvx and stxvx for 128-bit float, etc., with -mcpu=power9

2016-01-06 Thread David Edelsohn

On Wed, Jan 6, 2016 at 1:34 PM, Bill Schmidt
 wrote:
> Hi,
>
> I previously added POWER9 support for lxvx and stxvx to replace the
> load-swap and swap-store patterns for POWER8.  However, I missed the
> fact that we have different patterns for loads and stores of 128-bit
> floats and other scalars.  This patch expands the previous POWER9
> override to catch these cases, and disables those other patterns when P9
> vector support is available.
>
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> regressions.  Ok for trunk?
>
> Thanks,
> Bill
>
>
> [gcc]
>
> 2015-01-06  Bill Schmidt  
>
> * config/rs6000/vsx.md (*p9_vecload_): Replace VSX_M
> mode iterator with VSX_M2.
> (*p9_vecstore_): Likewise.
> (*vsx_le_permute_): Restrict to !TARGET_P9_VECTOR.
> (*vsx_le_perm_load_ for VSX_LE_128): Likewise.
> (*vsx_le_perm_store_ for VSX_LE_128): Likewise.
> (define_split for VSX_LE128 stores): Likewise.
> (define_peephole2 for TImode LE swaps): Likewise.
> (define_split for VSX_LE128 post-reload stores): Likewise.
>
> [gcc/testsuite]
>
> 2015-01-06  Bill Schmidt  
>
> * gcc.target/powerpc/p9-lxvx-stxvx-3.c: New test.

Okay.

Thanks, David

Re: [PATCH, rs6000] Handle vector reductions in swap optimization

2016-01-06 Thread David Edelsohn

On Wed, Jan 6, 2016 at 5:37 PM, Bill Schmidt
 wrote:
> Hi,
>
> Swap optimization is missing some opportunities when vector reductions
> are present.  This patch adds logic to recognize vector-reduction
> patterns and mark them as swappable.  Some of these are very easy to
> recognize, given the presence of a specific unspec.  For V2DF
> reductions, we have to recognize a specific somewhat complex pattern.
>
> I've added test cases to verify swap optimization works correctly on
> reductions for V2DF and V4SF.
>
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> regressions.  Is this ok for trunk?  I would also like to backport this
> to GCC 5 after a settling period.
>
> Thanks,
> Bill
>
>
> [gcc]
>
> 2016-01-06  Bill Schmidt  
>
> * config/rs6000/rs6000.c (v2df_reduction_p): New function.
> (rtx_is_swappable_p): Reductions are swappable.
> (insn_is_swappable_p): V2DF reductions are swappable.
>
> [gcc/testsuite]
>
> 2016-01-06  Bill Schmidt  
>
> * gcc.target/powerpc/swaps-p8-23.c: New test.
> * gcc.target/powerpc/swaps-p8-24.c: Likewise.

This is okay for trunk, but not for GCC 5 branch because this is not a bug fix.

Thanks, David

Re: adjust fallback_frame_state for 32bits AIX 7.1

2016-01-05 Thread David Edelsohn

On Tue, Jan 5, 2016 at 6:15 AM, Olivier Hainque  wrote:
> Hello,
>
> This is a tiny change we have been using successfully for at least a couple
> of years now, improving exception propagation through signal handlers on 
> 32bits
> AIX 7.1.
>
> While this isn't a complete generalization to all possible configurations
> (haven't had the time to converge on 64bits kernels at this stage, 
> unfortunately),
> this is nevertheless an improvement on 32bits.

There are no 32 bit kernels for AIX 7.1.  This is the signal handler
path for 32 bit environment, but there are no 32 bit kernels.

>
> The patch is very short and in line with what is already there, and we thought
> it might be useful to others as well.
>
> OK to commit ?
>
> 2015-01-05  Olivier Hainque  
>
> libgcc/
> * config/rs6000/aix-unwind.h (ucontext_for): Handle AIX 7.1
> specificities.

Okay.

Thanks, David

Re: [PATCH], PowerPC, add ISA 3.0 xxperm (power9 patch #12)

2016-01-04 Thread David Edelsohn

On Thu, Dec 31, 2015 at 1:30 PM, Michael Meissner
 wrote:
> This patch adds support for the ISA 3.0 XXPERM instruction, which is like
> VPERM, except it can operate on any VSX register.  Since the instruction is a 
> 3
> operand instruction (RT and RA must be the same), I made it so VPERM was
> preferred.  I also added XXPERM fusion support where a XXLOR move instruction
> immediately before the XXPERM instruction is fused together.
>
> I have bootstrapped and done make check on a big endian power7 and a little
> endian power8 system.  In addition, I built all of Spec 2006 with power9
> support enabled, and all of the tests that previously built now build with
> XXPERM being generated (the OMNETPP benchmark currently does not build on
> little endian for either power8 or power9).  Are these patches ok to check in?
>
> [gcc]
> 2015-12-31  Michael Meissner  
>
> * config/rs6000/constraints.md (wo constraint): New constraint for
> ISA 3.0 (power9).
>
> * config/rs6000/rs6000.c (rs6000_debug_reg_global): Add support
> for wo constraint.
> (rs6000_init_hard_regno_mode_ok): Likewise.
>
> * config/rs6000/rs6000.h (r6000_reg_class_enum): Add support for
> wo constraint.
>
> * config/rs6000/altivec.md (altivec_vperm_): Clean up vperm
> expanders not to have constraints.  Add support for ISA 3.0 xxperm
> instruction.  Add support for fusing xxlor with xxperm.
> (altivec_vperm__internal): Likewise.
> (altivec_vperm_v8hiv16qi): Likewise.
> (altivec_vperm_v16q): Likewise.
> (altivec_vperm__uns): Likewise.
> (vperm_v8hiv4si): Likewise.
> (vperm_v16qiv8hi): Likewise.
>
> * doc/md.texi (RS/6000 constraints): Document wo constraint.
>
> [gcc/testsuite]
> 2015-12-31  Michael Meissner  
>
> * gcc.target/powerpc/p9-permute.c: New test for xxperm code
> generation.

This is okay.

Thanks, David

< 3 4 5 6 7 8 9 10 11 12 >

701 - 800 of 2067 matches

Mail list logo