[PATCH PR96357][GCC][AArch64]: could not split insn UNSPEC_COND_FSUB with AArch64 SVE

2020-08-18 Thread Przemyslaw Wirkus
Hi,

Problem is related to that operand 4 (In original pattern
*cond_sub_any_const) is no longer the same as operand 1, and so
the pattern doesn't match the split condition.

Pattern *cond_sub_any_const is being split by this patch into two
separate patterns:
* Pattern *cond_sub_relaxed_const now matches const_int
  SVE_RELAXED_GP operand.
* Pattern *cond_sub_strict_const now matches const_int
  SVE_STRICT_GP operand.
* Remove aarch64_sve_pred_dominates_p condition from both patterns.

Bootstrapped and tested on aarch64-none-linux-gnu.

OK for master?

Cheers,
Przemyslaw

gcc/ChangeLog:

PR target/96357
* config/aarch64/aarch64-sve.md
(*cond_sub_relaxed_const): Updated and renamed from
*cond_sub_any_const pattern.
(*cond_sub_strict_const): New pattern.

gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/pr96357.c: New test.


rb13393.patch
Description: rb13393.patch


Re: [PATCH][testsuite, nvptx] Add effective target sync_int_long_stack

2020-08-18 Thread Mike Stump via Gcc-patches
On Aug 12, 2020, at 6:57 AM, Tom de Vries  wrote:
> 
> The nvptx target currently doesn't support effective target sync_int_long,
> although it has support for 32-bit and 64-bit atomic.
> 
> When enabling sync_int_long for nvptx, we run into a failure in
> gcc.dg/pr86314.c:
> ...
> nvptx-run: error getting kernel result: operation not supported on \
>   global/shared address space
> ...
> due to a ptx restriction:  accesses to local memory are illegal, and the
> test-case does an atomic operation on a stack address, which is mapped to
> local memory.
> 
> Fix this by adding a target sync_int_long_stack, wich returns false for nvptx,
> which can be used to mark test-cases that require sync_int_long support for
> stack address.
> 
> Build on nvptx and tested with make check-gcc.
> 
> OK for trunk?

Ok.


Re: [PATCH] rs6000: Rename instruction xvcvbf16sp to xvcvbf16spn

2020-08-18 Thread Peter Bergner via Gcc-patches
On 8/18/20 1:34 PM, Segher Boessenkool wrote:
> On Tue, Aug 18, 2020 at 01:30:53PM -0500, Peter Bergner wrote:
>> The xvcvbf16sp mnemonic, which was just added in ISA 3.1 has been renamed
>> to xvcvbf16spn, to make it consistent with the other non-signaling conversion
>> instructions which all end with "n".  The only use of this instruction is in
>> an MMA conversion built-in function, so there is little to no compatibility
>> issue.
>>
>> I just pushed the patch that does the rename to binutils today.
>>
>> Ok for trunk and the GCC 10 branch after testing is clean?
> 
> Yes, okay everywhere.  Thanks!

Thanks.  Pushed to trunk and GCC 10 branch.  Thanks!

Peter




Re: [PATCH v2] rs6000: ICE when using an MMA type as a function param or return value [PR96506]

2020-08-18 Thread Peter Bergner via Gcc-patches
On 8/13/20 4:27 PM, Segher Boessenkool wrote:
> Anyway, okay for trunk and backports.  Thanks!

Ok, I committed this to trunk and waited a few days and then
pushed this to GCC 10 release branch too.  Thanks!

Peter



Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-18 Thread Hongtao Liu via Gcc-patches
On Mon, Aug 17, 2020 at 6:08 PM Uros Bizjak  wrote:
>
> On Fri, Aug 14, 2020 at 10:26 AM Hongtao Liu  wrote:
> >
> > Enable operator or/xor/and/andn/not for mask register, kxnor is not
> > enabled since there's no corresponding instruction for general
> > registers.
> >
> > gcc/
> > PR target/88808
> > * config/i386/i386.md: (*movsi_internal): Adjust constraints
> > for mask registers.
> > (*movhi_internal): Ditto.
> > (*movqi_internal): Ditto.
> > (*anddi_1): Support mask register operations
> > (*and_1): Ditto.
> > (*andqi_1): Ditto.
> > (*andn_1): Ditto.
> > (*_1): Ditto.
> > (*qi_1): Ditto.
> > (*one_cmpl2_1): Ditto.
> > (*one_cmplsi2_1_zext): Ditto.
> > (*one_cmplqi2_1): Ditto.
> >
> > gcc/testsuite/
> > * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase.
> > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
>
> index 74d207c3711..e8ad79d1b0a 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -2294,7 +2294,7 @@
>
>  (define_insn "*movsi_internal"
>[(set (match_operand:SI 0 "nonimmediate_operand"
> -"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,*k")
> +"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,k")
>  (match_operand:SI 1 "general_operand"
>  "g ,re,C ,*y,m  ,*y,*y,r  ,C ,*v,m ,*v,*v,r  ,*r,*km,*k ,CBC"))]
>"!(MEM_P (operands[0]) && MEM_P (operands[1]))"
>
> I'd rather see *k everywhere, also with *movqi_internal and
> *movhi_internal patterns. The "*" means that the allocator won't
> allocate a mask register by default, but it will be used to optimize
> moves. With the above change, you are risking that during integer
> register pressure, the register allocator will allocate zero to a mask
> register, and later "optimize" the move with a direct maskreg-intreg
> move.
>
> The current strategy is that only general registers get allocated for
> integer modes. Let's keep it this way for now.
>

Yes,  though it would fail gcc.target/i386/avx512dq-pr88465.c and
gcc.target/i386/avx512f-pr88465.c, i think it's more reasonable not to
move zero into mask register directly.

> Otherwise, the patchset LGTM, but please test the suggested changes and 
> repost.
>
> BTW: Do you plan to remove mask operations from sse.md? ATM, they are
> used to distinguish mask operations, generated from builtins from
> generic operations, so I'd like to keep them for a while. The drawback
> is, that they are not combined with other operations, but at the end
> of the day, this is what the programmer asked for by using builtins.

Agree, I prefer to keep them.

>
> Uros.

Bootstrap is ok, regression test is ok for i386/x86-64 backend(After
adjusting testcase).

impact for SPEC2017 on SKL.

500.perlbench_r 0.00%
502.gcc_r 1.59%
505.mcf_r 1.49%
520.omnetpp_r 1.91%
523.xalancbmk_r -1.22%
525.x264_r 0.00%
531.deepsjeng_r 0.00%
541.leela_r -0.22%
548.exchange2_r 2.27%
557.xz_r 0.63%
INT geomean 0.64%

503.bwaves_r 3.68%
507.cactuBSSN_r -0.62%
508.namd_r 0.51%
510.parest_r -0.16%
511.povray_r 0.57%
519.lbm_r 0.50%
521.wrf_r 0.00%
526.blender_r 0.00%
527.cam4_r 0.00%
538.imagick_r -0.41%
544.nab_r 0.00%
549.fotonik3d_r -0.20%
554.roms_r 4.19%
FP geomean 0.66%

-- 
BR,
Hongtao
From e546516449ec4ed9301b83a063efdefbf0f7e75a Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Thu, 13 Aug 2020 14:20:43 +0800
Subject: [PATCH 4/4] Enable bitwise operation for type mask.

Enable operator or/xor/and/andn/not for mask register, kxnor is not
enabled since there's no corresponding instruction for general
registers.

gcc/
	PR target/88808
	* config/i386/i386.md: (*movsi_internal): Adjust constraints
	for mask registers.
	(*movhi_internal): Ditto.
	(*movqi_internal): Ditto.
	(*anddi_1): Support mask register operations
	(*and_1): Ditto.
	(*andqi_1): Ditto.
	(*andn_1): Ditto.
	(*_1): Ditto.
	(*qi_1): Ditto.
	(*one_cmpl2_1): Ditto.
	(*one_cmplsi2_1_zext): Ditto.
	(*one_cmplqi2_1): Ditto.

gcc/testsuite/
	* gcc.target/i386/bitwise_mask_op-1.c: New test.
	* gcc.target/i386/bitwise_mask_op-2.c: New test.
	* gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase.
	* gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
	* gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
	* gcc.target/i386/avx512f-kmovw-5.c: Ditto.
	* gcc.target/i386/avx512bw-pr88465.c: Ditto.
	* gcc.target/i386/avx512f-pr88465.c: Ditto.
---
 gcc/config/i386/i386.md   | 260 +-
 .../gcc.target/i386/avx512bw-kunpckwd-1.c |   2 +-
 .../gcc.target/i386/avx512bw-kunpckwd-3.c |   2 +-
 .../gcc.target/i386/avx512dq-kmovb-5.c|   2 +-
 .../gcc.target/i386/avx512dq-pr88465.c|   4 +-
 .../gcc.target/i386/avx512f-kmovw-5.c |  

Re: [PATCH 2/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-18 Thread Hongtao Liu via Gcc-patches
On Mon, Aug 17, 2020 at 5:34 PM Uros Bizjak  wrote:
>
> On Fri, Aug 14, 2020 at 10:24 AM Hongtao Liu  wrote:
> >
> >   Enable direct move between masks and gprs in pass_reload with
> > consideration of cost model.
> >
> > Changelog
> > gcc/
> > * config/i386/i386.c (inline_secondary_memory_needed):
> > No memory is needed between mask regs and gpr.
> > (ix86_hard_regno_mode_ok): Add condition TARGET_AVX512F for
> > mask regno.
> > * config/i386/i386.h (enum reg_class): Add INT_MASK_REGS.
> > (REG_CLASS_NAMES): Ditto.
> > (REG_CLASS_CONTENTS): Ditto.
> > * config/i386/i386.md: Exclude mask register in
> > define_peephole2 which is available only for gpr.
> >
> > gcc/testsuites/
> > * gcc.target/i386/pr71453-1.c: New tests.
> > * gcc.target/i386/pr71453-2.c: Ditto.
> > * gcc.target/i386/pr71453-3.c: Ditto.
> > * gcc.target/i386/pr71453-4.c: Ditto.
>
> @@ -18571,9 +18571,7 @@ inline_secondary_memory_needed (machine_mode
> mode, reg_class_t class1,
>|| MAYBE_SSE_CLASS_P (class1) != SSE_CLASS_P (class1)
>|| MAYBE_SSE_CLASS_P (class2) != SSE_CLASS_P (class2)
>|| MAYBE_MMX_CLASS_P (class1) != MMX_CLASS_P (class1)
> -  || MAYBE_MMX_CLASS_P (class2) != MMX_CLASS_P (class2)
> -  || MAYBE_MASK_CLASS_P (class1) != MASK_CLASS_P (class1)
> -  || MAYBE_MASK_CLASS_P (class2) != MASK_CLASS_P (class2))
> +  || MAYBE_MMX_CLASS_P (class2) != MMX_CLASS_P (class2))
>  {
>gcc_assert (!strict || lra_in_progress);
>return true;
>
> No, this is still needed, the reason is explained in the comment above

Remove my change here.

> inline_secondary_memory_needed:
>
>The function can't work reliably when one of the CLASSES is a class
>containing registers from multiple sets.  We avoid this by never combining
>different sets in a single alternative in the machine description.
>Ensure that this constraint holds to avoid unexpected surprises.
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index b24a4557871..74d207c3711 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -15051,7 +15051,7 @@
> (parallel [(set (reg:CC FLAGS_REG)
> (unspec:CC [(match_dup 0)] UNSPEC_PARITY))
>(clobber (match_dup 0))])]
> -  ""
> +  "!MASK_REGNO_P (REGNO (operands[0]))"
>[(set (reg:CC FLAGS_REG)
>  (unspec:CC [(match_dup 1)] UNSPEC_PARITY))])
>
> @@ -15072,6 +15072,7 @@
> (label_ref (match_operand 5))
> (pc)))]
>"REGNO (operands[2]) == REGNO (operands[3])
> +   && !MASK_REGNO_P (REGNO (operands[1]))
> && peep2_reg_dead_p (3, operands[0])
> && peep2_reg_dead_p (3, operands[2])
> && peep2_regno_dead_p (4, FLAGS_REG)"
>

Changed for upper two define_peepholes.

> Actually, there are several (historic?) peephole2 patterns that assume

I didn't find those patterns.

I looked through i386.md, there are 3 cases
1. Since mask registers are only used for "mov/zero_extend/bitwise"
patterns, peephole2 patterns involving only other patterns are safe to
use general_registers.
2. For those peephole2 patterns containing both
mov/zero_extend/bitwise and other patterns, implicit conditions such
as match_dup used in other patterns will make them the same as case 1.
3. Peephole2 patterns are safe to use mask registers, since they would
be eliminated in output patterns.

> register_operand means only integer registers. Just change
> register_operand to general_reg_operand and eventually
> nonimmediate_operand to nonimmediate_gr_operand. Do not put additional
> predicates into insn predicate.



>
> Uros.

Update patch.

-- 
BR,
Hongtao
From 5a07403279622447cc6503a8dcc3c0cecb9ffcef Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Thu, 24 Oct 2019 11:13:00 +0800
Subject: [PATCH 3/4] According to instruction_tables.pdf

1. Set cost of movement inside mask registers a bit higher than gpr's.
2. Set cost of movement between mask register and gpr much higher than movement
   inside gpr, but still less equal than load/store.
3. Set cost of mask register load/store a bit higher than gpr load/store.
---
 gcc/config/i386/x86-tune-costs.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h
index 256c84e364e..a782a9dd9e3 100644
--- a/gcc/config/i386/x86-tune-costs.h
+++ b/gcc/config/i386/x86-tune-costs.h
@@ -1727,12 +1727,12 @@ struct processor_costs skylake_cost = {
   {8, 8, 8, 12, 24},			/* cost of storing SSE registers
 	   in 32,64,128,256 and 512-bit */
   6, 6,	/* SSE->integer and integer->SSE moves */
-  2, 2,/* mask->integer and integer->mask moves */
-  {4, 4, 4},/* cost of loading mask register
+  4, 6,/* mask->integer and integer->mask moves */
+  {6, 6, 6},/* cost of loading mask register
 	   in QImode, HImode, SImode.  */
-  {6, 6, 6},/* cost if storing 

Re: [PATCH 2/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-18 Thread Hongtao Liu via Gcc-patches
On Wed, Aug 19, 2020 at 10:17 AM Hongtao Liu  wrote:
>
> On Mon, Aug 17, 2020 at 5:34 PM Uros Bizjak  wrote:
> >
> > On Fri, Aug 14, 2020 at 10:24 AM Hongtao Liu  wrote:
> > >
> > >   Enable direct move between masks and gprs in pass_reload with
> > > consideration of cost model.
> > >
> > > Changelog
> > > gcc/
> > > * config/i386/i386.c (inline_secondary_memory_needed):
> > > No memory is needed between mask regs and gpr.
> > > (ix86_hard_regno_mode_ok): Add condition TARGET_AVX512F for
> > > mask regno.
> > > * config/i386/i386.h (enum reg_class): Add INT_MASK_REGS.
> > > (REG_CLASS_NAMES): Ditto.
> > > (REG_CLASS_CONTENTS): Ditto.
> > > * config/i386/i386.md: Exclude mask register in
> > > define_peephole2 which is available only for gpr.
> > >
> > > gcc/testsuites/
> > > * gcc.target/i386/pr71453-1.c: New tests.
> > > * gcc.target/i386/pr71453-2.c: Ditto.
> > > * gcc.target/i386/pr71453-3.c: Ditto.
> > > * gcc.target/i386/pr71453-4.c: Ditto.
> >
> > @@ -18571,9 +18571,7 @@ inline_secondary_memory_needed (machine_mode
> > mode, reg_class_t class1,
> >|| MAYBE_SSE_CLASS_P (class1) != SSE_CLASS_P (class1)
> >|| MAYBE_SSE_CLASS_P (class2) != SSE_CLASS_P (class2)
> >|| MAYBE_MMX_CLASS_P (class1) != MMX_CLASS_P (class1)
> > -  || MAYBE_MMX_CLASS_P (class2) != MMX_CLASS_P (class2)
> > -  || MAYBE_MASK_CLASS_P (class1) != MASK_CLASS_P (class1)
> > -  || MAYBE_MASK_CLASS_P (class2) != MASK_CLASS_P (class2))
> > +  || MAYBE_MMX_CLASS_P (class2) != MMX_CLASS_P (class2))
> >  {
> >gcc_assert (!strict || lra_in_progress);
> >return true;
> >
> > No, this is still needed, the reason is explained in the comment above
>
> Remove my change here.
>
> > inline_secondary_memory_needed:
> >
> >The function can't work reliably when one of the CLASSES is a class
> >containing registers from multiple sets.  We avoid this by never 
> > combining
> >different sets in a single alternative in the machine description.
> >Ensure that this constraint holds to avoid unexpected surprises.
> >
> > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > index b24a4557871..74d207c3711 100644
> > --- a/gcc/config/i386/i386.md
> > +++ b/gcc/config/i386/i386.md
> > @@ -15051,7 +15051,7 @@
> > (parallel [(set (reg:CC FLAGS_REG)
> > (unspec:CC [(match_dup 0)] UNSPEC_PARITY))
> >(clobber (match_dup 0))])]
> > -  ""
> > +  "!MASK_REGNO_P (REGNO (operands[0]))"
> >[(set (reg:CC FLAGS_REG)
> >  (unspec:CC [(match_dup 1)] UNSPEC_PARITY))])
> >
> > @@ -15072,6 +15072,7 @@
> > (label_ref (match_operand 5))
> > (pc)))]
> >"REGNO (operands[2]) == REGNO (operands[3])
> > +   && !MASK_REGNO_P (REGNO (operands[1]))
> > && peep2_reg_dead_p (3, operands[0])
> > && peep2_reg_dead_p (3, operands[2])
> > && peep2_regno_dead_p (4, FLAGS_REG)"
> >
>
> Changed for upper two define_peepholes.
>
> > Actually, there are several (historic?) peephole2 patterns that assume
>
> I didn't find those patterns.
>
> I looked through i386.md, there are 3 cases
> 1. Since mask registers are only used for "mov/zero_extend/bitwise"
> patterns, peephole2 patterns involving only other patterns are safe to
> use general_registers.
> 2. For those peephole2 patterns containing both
> mov/zero_extend/bitwise and other patterns, implicit conditions such
> as match_dup used in other patterns will make them the same as case 1.
> 3. Peephole2 patterns are safe to use mask registers, since they would
> be eliminated in output patterns.
>
> > register_operand means only integer registers. Just change
> > register_operand to general_reg_operand and eventually
> > nonimmediate_operand to nonimmediate_gr_operand. Do not put additional
> > predicates into insn predicate.
>
>
>
> >
> > Uros.
>
> Update patch.
>
> --
> BR,
> Hongtao

This patch.

-- 
BR,
Hongtao
From 388300c90b7b147d088ccc58a39fcec9556979b5 Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Thu, 6 Aug 2020 13:48:38 +0800
Subject: [PATCH 2/4] Enable direct movement between gpr and mask registers in
 pass_reload.

Changelog
gcc/
	* config/i386/i386.c (inline_secondary_memory_needed):
	No memory is needed between mask regs and gpr.
	(ix86_hard_regno_mode_ok): Add condition TARGET_AVX512F for
	mask regno.
	* config/i386/i386.h (enum reg_class): Add INT_MASK_REGS.
	(REG_CLASS_NAMES): Ditto.
	(REG_CLASS_CONTENTS): Ditto.
	* config/i386/i386.md: Exclude mask register in
	define_peephole2 which is avaiable only for gpr.

gcc/testsuites/
	* gcc.target/i386/pr71453-1.c: New tests.
	* gcc.target/i386/pr71453-2.c: Ditto.
	* gcc.target/i386/pr71453-3.c: Ditto.
	* gcc.target/i386/pr71453-4.c: Ditto.
---
 gcc/config/i386/i386.c|  2 +-
 gcc/config/i386/i386.h|  3 +
 gcc/config/i386/i386.md   |  4 +-
 

Re: [PATCH 1/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-18 Thread Hongtao Liu via Gcc-patches
On Mon, Aug 17, 2020 at 5:20 PM Uros Bizjak  wrote:
>
> On Fri, Aug 14, 2020 at 10:22 AM Hongtao Liu  wrote:
> >
> > Hi:
> >   First, since avx512 masks involve both vector isa and general part,
> > so i add both maintainers to the maillist.
> >
> >   I'm doing this in 4 steps:
> >   1 - Add cost model for operation of mask registers.
> >   2 - Introduce new cover class INT_MASK_REGS, this will enable direct
> > move between gpr and mask registers in pass_reload by consideration of
> > cost model, this is similar as INT_SSE_REGS.
> >   3 - Tune cost model.
> >   4 - Enable operator or/xor/and/andn/not for mask register. kxnor is
> > not enabled since there's no corresponding instruction for general
> > registers, 64bit mask op is not enabled for 32bit target.
> > kadd/kshift/ktest are not merged into general versionsadd/ashl/test
> > since i think it would be odd to use mask register for those
> > operations.
> >
> >   Bootstrap is ok, regression test is ok for i386/x86-64 result.
> >   There's some improvement for performance of SPEC2017 tested on SKL,
> > i observe there're many spills from integer to mask registers instead
> > of memory which is the reason for the improvement.
>
> +  if (MASK_CLASS_P (regclass))
> +{
> +  int index;
> +  switch (GET_MODE_SIZE (mode))
> +{
> +case 1:
> +  index = 0;
> +  break;
> +case 2:
> +  index = 1;
> +  break;
> +default:
> +  index = 3;
>
> Max index = 2!
>

Fix typo.

> +  break;
> +}
> +
> +  if (in == 2)
> +return MAX (ix86_cost->hard_register.mask_load[index],
> +ix86_cost->hard_register.mask_store[index]);
> +  return in ? ix86_cost->hard_register.mask_load[2]
> +: ix86_cost->hard_register.mask_store[2];
> +}
>
> Are DImode loads and stores assumed to cost the same as SImode? A
> comment would be nice here.
>

Yes, comment is added.

> Uros.

Update patch.

-- 
BR,
Hongtao
From 70e9e389d751c79caf957ef336dded34726f0533 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Tue, 3 Sep 2019 14:41:02 -0700
Subject: [PATCH 1/4] x86: Add cost model for operation of mask registers.

gcc/

	PR target/71453
	* config/i386/i386.h (struct processor_costs): Add member
	mask_to_integer, integer_to_mask, mask_load[3], mask_store[3],
	mask_move.
	* config/i386/x86-tune-costs.h (ix86_size_cost, i386_cost,
	i386_cost, pentium_cost, lakemont_cost, pentiumpro_cost,
	geode_cost, k6_cost, athlon_cost, k8_cost, amdfam10_cost,
	bdver_cost, znver1_cost, znver2_cost, skylake_cost,
	btver1_cost, btver2_cost, pentium4_cost, nocona_cost,
	atom_cost, slm_cost, intel_cost, generic_cost, core_cost):
	Initialize mask_load[3], mask_store[3], mask_move,
	integer_to_mask, mask_to_integer for all target costs.
	* config/i386/i386.c (ix86_register_move_cost): Using cost
	model of mask registers.
	(inline_memory_move_cost): Ditto.
	(ix86_register_move_cost): Ditto.
---
 gcc/config/i386/i386.c   |  34 
 gcc/config/i386/i386.h   |   7 ++
 gcc/config/i386/x86-tune-costs.h | 144 +++
 3 files changed, 185 insertions(+)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 8ea6a4d7ea7..f5e824a16ad 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -18769,6 +18769,29 @@ inline_memory_move_cost (machine_mode mode, enum reg_class regclass, int in)
   return in ? ix86_cost->hard_register.sse_load [index]
 		: ix86_cost->hard_register.sse_store [index];
 }
+  if (MASK_CLASS_P (regclass))
+{
+  int index;
+  switch (GET_MODE_SIZE (mode))
+	{
+	case 1:
+	  index = 0;
+	  break;
+	case 2:
+	  index = 1;
+	  break;
+	/* DImode loads and stores assumed to cost the same as SImode.  */
+	default:
+	  index = 2;
+	  break;
+	}
+
+  if (in == 2)
+	return MAX (ix86_cost->hard_register.mask_load[index],
+		ix86_cost->hard_register.mask_store[index]);
+  return in ? ix86_cost->hard_register.mask_load[2]
+		: ix86_cost->hard_register.mask_store[2];
+}
   if (MMX_CLASS_P (regclass))
 {
   int index;
@@ -18894,6 +18917,17 @@ ix86_register_move_cost (machine_mode mode, reg_class_t class1_i,
 	? ix86_cost->hard_register.sse_to_integer
 	: ix86_cost->hard_register.integer_to_sse);
 
+  /* Moves between mask register and GPR.  */
+  if (MASK_CLASS_P (class1) != MASK_CLASS_P (class2))
+{
+  return (MASK_CLASS_P (class1)
+	  ? ix86_cost->hard_register.mask_to_integer
+	  : ix86_cost->hard_register.integer_to_mask);
+}
+  /* Moving between mask registers.  */
+  if (MASK_CLASS_P (class1) && MASK_CLASS_P (class2))
+return ix86_cost->hard_register.mask_move;
+
   if (MAYBE_FLOAT_CLASS_P (class1))
 return ix86_cost->hard_register.fp_move;
   if (MAYBE_SSE_CLASS_P (class1))
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 114967e49a3..e0af87450b8 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -279,6 +279,13 @@ struct processor_costs {
    in SImode, 

[committed] analyzer: consider initializers for globals [PR96651]

2020-08-18 Thread David Malcolm via Gcc-patches
PR analyzer/96651 reports a false positive in which a global
that can't have been touched yet is checked in "main".  The analyzer
fails to reject code paths in which the initial value of the global
makes the path condition impossible.

This patch detects cases where the code path begins at the entrypoint
of "main", and extracts values from initializers for globals that
can't have been touched yet, rather than using a symbolic
"INIT_VAL(REG)", fixing the false positive.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as r11-2753-g400abebf48a90d0797718ab7c3864de331e85b70.

gcc/analyzer/ChangeLog:
PR analyzer/96651
* region-model.cc (region_model::called_from_main_p): New.
(region_model::get_store_value): Move handling for globals into...
(region_model::get_initial_value_for_global): ...this new
function, and add logic for extracting values from decl
initializers.
* region-model.h (decl_region::get_svalue_for_constructor): New
decl.
(decl_region::get_svalue_for_initializer): New decl.
(region_model::called_from_main_p): New decl.
(region_model::get_initial_value_for_global): New.
* region.cc (decl_region::maybe_get_constant_value): Move logic
for getting an svalue from a CONSTRUCTOR node to...
(decl_region::get_svalue_for_constructor): ...this new function.
(decl_region::get_svalue_for_initializer): New.
* store.cc (get_svalue_for_ctor_val): Rewrite in terms of
region_model::get_rvalue.
* store.h (binding_cluster::get_map): New accessor.

gcc/testsuite/ChangeLog:
PR analyzer/96651
* gcc.dg/analyzer/pr96651-1.c: New test.
* gcc.dg/analyzer/pr96651-2.c: New test.
---
 gcc/analyzer/region-model.cc  | 82 ---
 gcc/analyzer/region-model.h   |  6 ++
 gcc/analyzer/region.cc| 57 
 gcc/analyzer/store.cc | 12 +---
 gcc/analyzer/store.h  |  2 +
 gcc/testsuite/gcc.dg/analyzer/pr96651-1.c | 22 ++
 gcc/testsuite/gcc.dg/analyzer/pr96651-2.c | 72 
 7 files changed, 224 insertions(+), 29 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr96651-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr96651-2.c

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index c3d9ca7f650..5b08e48e6e5 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -1204,6 +1204,76 @@ region_model::get_rvalue (tree expr, 
region_model_context *ctxt)
   return get_rvalue (path_var (expr, get_stack_depth () - 1), ctxt);
 }
 
+/* Return true if this model is on a path with "main" as the entrypoint
+   (as opposed to one in which we're merely analyzing a subset of the
+   path through the code).  */
+
+bool
+region_model::called_from_main_p () const
+{
+  if (!m_current_frame)
+return false;
+  /* Determine if the oldest stack frame in this model is for "main".  */
+  const frame_region *frame0 = get_frame_at_index (0);
+  gcc_assert (frame0);
+  return id_equal (DECL_NAME (frame0->get_function ()->decl), "main");
+}
+
+/* Subroutine of region_model::get_store_value for when REG is (or is within)
+   a global variable that hasn't been touched since the start of this path
+   (or was implicitly touched due to a call to an unknown function).  */
+
+const svalue *
+region_model::get_initial_value_for_global (const region *reg) const
+{
+  /* Get the decl that REG is for (or is within).  */
+  const decl_region *base_reg
+= reg->get_base_region ()->dyn_cast_decl_region ();
+  gcc_assert (base_reg);
+  tree decl = base_reg->get_decl ();
+
+  /* Special-case: to avoid having to explicitly update all previously
+ untracked globals when calling an unknown fn, they implicitly have
+ an unknown value if an unknown call has occurred, unless this is
+ static to-this-TU and hasn't escaped.  Globals that have escaped
+ are explicitly tracked, so we shouldn't hit this case for them.  */
+  if (m_store.called_unknown_fn_p () && TREE_PUBLIC (decl))
+return m_mgr->get_or_create_unknown_svalue (reg->get_type ());
+
+  /* If we are on a path from the entrypoint from "main" and we have a
+ global decl defined in this TU that hasn't been touched yet, then
+ the initial value of REG can be taken from the initialization value
+ of the decl.  */
+  if (called_from_main_p () && !DECL_EXTERNAL (decl))
+{
+  /* Get the initializer value for base_reg.  */
+  const svalue *base_reg_init
+   = base_reg->get_svalue_for_initializer (m_mgr);
+  gcc_assert (base_reg_init);
+  if (reg == base_reg)
+   return base_reg_init;
+  else
+   {
+ /* Get the value for REG within base_reg_init.  */
+ binding_cluster c (base_reg);
+ c.bind (m_mgr->get_store_manager (), base_reg, base_reg_init,
+   

[committed] analyzer: fix ICE with negative bit offsets [PR96648]

2020-08-18 Thread David Malcolm via Gcc-patches
PR analyzer/96648 reports an ICE within get_field_at_bit_offset due
to a negative bit offset, arising due to pointer arithmetic.

This patch replaces an assertion with handling for this case, fixing the
ICE.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as r11-2753-g400abebf48a90d0797718ab7c3864de331e85b70.

gcc/analyzer/ChangeLog:
PR analyzer/96648
* region.cc (get_field_at_bit_offset): Gracefully handle negative
values for bit_offset.

gcc/testsuite/ChangeLog:
PR analyzer/96648
* gcc.dg/analyzer/pr96648.c: New test.
---
 gcc/analyzer/region.cc  |  3 ++-
 gcc/testsuite/gcc.dg/analyzer/pr96648.c | 36 +
 2 files changed, 38 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr96648.c

diff --git a/gcc/analyzer/region.cc b/gcc/analyzer/region.cc
index eab1f2771cf..770e2cb849e 100644
--- a/gcc/analyzer/region.cc
+++ b/gcc/analyzer/region.cc
@@ -226,7 +226,8 @@ static tree
 get_field_at_bit_offset (tree record_type, bit_offset_t bit_offset)
 {
   gcc_assert (TREE_CODE (record_type) == RECORD_TYPE);
-  gcc_assert (bit_offset >= 0);
+  if (bit_offset < 0)
+return NULL;
 
   /* Find the first field that has an offset > BIT_OFFSET,
  then return the one preceding it.
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr96648.c 
b/gcc/testsuite/gcc.dg/analyzer/pr96648.c
new file mode 100644
index 000..a6b0c727287
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/pr96648.c
@@ -0,0 +1,36 @@
+/* { dg-additional-options "-O1" } */
+
+struct vd {
+  struct vd *rs;
+};
+
+struct fh {
+  struct vd cl;
+};
+
+struct i3 {
+  struct fh *h4;
+};
+
+struct fh *
+gm (void);
+
+void
+j7 (struct vd *);
+
+inline void
+mb (struct vd *e7)
+{
+  j7 (e7->rs);
+}
+
+void
+po (struct i3 *d2)
+{
+  struct i3 *s2;
+
+  d2->h4 = gm ();
+  mb (>h4->cl);
+  s2 = ({ d2 - 1; });
+  po (s2);
+}
-- 
2.26.2



Re: [PATCH] rs6000: Add builtins for fegetround, feclearexcept and feraiseexcept [PR94193]

2020-08-18 Thread Segher Boessenkool
Hi!

On Fri, Aug 14, 2020 at 07:54:23PM -0300, Raoni Fassina Firmino via Gcc-patches 
wrote:
> So, this patch adds new rs6000 expand optimizations for fegetround and
> for some calls to feclearexcept and feraiseexcept. All of them C99
> functions from fenv.h

And the fenv.h implementation can then use the builtins.

> To check the FE_* flags used in feclearexcept and feraiseexcept
> expands I decided copy verbatim the definitions from glibc instead of
> using the macros, which would means including fenv.h somewhere to get
> them.

Good plan :-)

> Still on feclearexcept and feraiseexcept I, I am not sure I used
> exact_log2_cint_operand correctly because on my tests it kept
> accepting feclearexcept(0) and it should not.

I am not sure what you mean.  If you pass a number not one of the four
allowed ones, the pattern FAILs anyway?

In fact, you could just use const_int_operand with "n"?

> In any case, because I
> decided to test for all valid flags, this is not a problem for correct
> generation, but I thought I should mention it.

Ah gotcha.  Yes, see above.

> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -115,6 +115,8 @@ static rtx expand_builtin_mathfn_3 (tree, rtx, rtx);
>  static rtx expand_builtin_mathfn_ternary (tree, rtx, rtx);
>  static rtx expand_builtin_interclass_mathfn (tree, rtx);
>  static rtx expand_builtin_sincos (tree);
> +static rtx expand_builtin_fegetround (tree, rtx, machine_mode);
> +static rtx expand_builtin_feclear_feraise_except (tree, rtx, machine_mode, 
> optab);

That last line is too long, please break it?

> +/* Expand call EXP to the fegetround builtin (from C99 venv.h), returning the

"fenv.h"

> +  if (target == 0
> +  || GET_MODE (target) != target_mode
> +  || ! (*insn_data[icode].operand[0].predicate) (target, target_mode))
> +target = gen_reg_rtx (target_mode);
> +
> +  rtx pat = GEN_FCN (icode) (target);
> +  if (! pat)
> +return NULL_RTX;
> +  emit_insn (pat);

No space after unary operators (like !) please (the exception is those
written with alphabetics, like casts and sizeof).

I guess you copied this, so I don't know -- have to stop bad habits
somewhere I guess :-)

> +;; int __builtin_fegetround()
> +(define_expand "fegetroundsi"
> +  [(use (match_operand:SI 0 "gpc_reg_operand"))]
> +  "TARGET_HARD_FLOAT"
> +{
> +rtx tmp_df = gen_reg_rtx (DFmode);

Indentation should be just two spaces.

> +;; int feclearexcept(int excepts)
> +;;
> +;; This expansion for the C99 function only works when excepts is a
> +;; constant know at compile time and specifying only one of
> +;; FE_INEXACT, FE_DIVBYZERO, FE_UNDERFLOW and FE_OVERFLOW flags.
> +;; It dosen't handle values out of range, and always returns 0.

(doesn't)

> +;; Note that FE_INVALID is unsuported because it maps to more than

(unsupported)

> +;; one bit on FPSCR register.

You cannot set or clear the VX bit directly, yes (you have to twiddle
the component VX* bits you care about).  Which we could do later
perhaps, but this is fine now :-)

> +;; Because this restrictions, this only expands on the desired cases.

(Because of these)

> +(define_expand "feclearexceptsi"
> +  [(use (match_operand:SI 1 "exact_log2_cint_operand" "N"))

So just  "const_int_operand" "n"  should work fine here, and make it
more obvious that it won't actually allow all numbers.

> +  switch (INTVAL (operands[1]))
> +{
> +case (1 << (31 - 6)): /* FE_INEXACT */

I would just write it as 0x02000 etc.?  much clearer, and you have
the comment demagicificating it anyway!

> +case (1 << (31 - 5)): /* FE_DIVBYZERO */
> +case (1 << (31 - 4)): /* FE_UNDERFLOW */
> +case (1 << (31 - 3)): /* FE_OVERFLOW */
> +  break;
> +default:
> +  FAIL;
> +}
> +
> +  rtx tmp = gen_rtx_CONST_INT (SImode, __builtin_clz (INTVAL(operands[1])));

Space after "INTVAL".

> +  emit_insn (gen_rs6000_mtfsb0 (tmp));
> +  emit_move_insn (operands[0], GEN_INT (0));
> +  DONE;
> +})

GEN_INT (0)  is just  const0_rtx  , please use that?

> +(define_expand "feraiseexceptsi"
> +  [(use (match_operand:SI 1 "exact_log2_cint_operand" "N"))
> +   (set (match_operand:SI 0 "gpc_reg_operand")
> +(const_int 0))]

Indent by 8 spaces should be a tab (here and elsewhere).

> +OPTAB_D (fegetround_optab, "fegetround$a")
> +OPTAB_D (feclearexcept_optab, "feclearexcept$a")
> +OPTAB_D (feraiseexcept_optab, "feraiseexcept$a")

Should those be documented somewhere?  (In gcc/doc/ somewhere).

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/builtin-feclearexcept-feraiseexcept-1.c
> @@ -0,0 +1,64 @@
> +/* { dg-do run { target { powerpc*-*-* } } } */

All files in gcc.target/powerpc/ are run for powerpc already; just
/* { dg-do run } */
please.

> +/* { dg-options "-lm -fno-builtin" } */

Does that work everywhere?  AIX, Darwin, other non-Linux systems, systems
without OS, etc.

> +#include 

That header does not exist everywhere.  You can just declare the things
you need (the FE_ constants?)

Or perhaps you want to 

Re: [PATCH] Fortran : rejected f0.d edit descriptor PR96436

2020-08-18 Thread Jerry DeLisle via Gcc-patches




On 8/17/20 12:31 AM, Mark Eggleston wrote:

Please find attached a patch for PR96436.

OK to commit?


Looks good to me.  Thanks for fixing this.

Regards,

Jerry


Re: [PATCH v2] libgcc: Use `-fasynchronous-unwind-tables' for LIB2_DIVMOD_FUNCS

2020-08-18 Thread Richard Earnshaw
On 06/08/2020 14:04, Maciej W. Rozycki via Gcc-patches wrote:
> Complement commit b932f770f70d ("x86_64 frame unwind info"), SVN r46374, 
> , and replace 
> `-fexceptions -fnon-call-exceptions' with `-fasynchronous-unwind-tables' 
> in LIB2_DIVMOD_FUNCS compilation flags so as to provide unwind tables 
> for the affected functions while not pulling the unwinder proper, which 
> is not required here.
> 
> Remove the ARM overrides accordingly, retaining the hook infrastructure 
> however, and make the ARM test case a generic one.
> 
> Beyond saving program space it fixes a RISC-V glibc build error due to 
> unsatisfied `malloc' and `free' references from the unwinder causing 
> link errors with `ld.so' where libgcc has been built at -O0.
> 
>   gcc/
>   * testsuite/gcc.target/arm/div64-unwinding.c: Rename to...
>   * testsuite/gcc.dg/div64-unwinding.c: ... this.
> 
>   libgcc/
>   * Makefile.in [!LIB2_DIVMOD_EXCEPTION_FLAGS]
>   (LIB2_DIVMOD_EXCEPTION_FLAGS): Replace `-fexceptions
>   -fnon-call-exceptions' with `-fasynchronous-unwind-tables'.
>   * config/arm/t-bpabi (LIB2_DIVMOD_EXCEPTION_FLAGS): Remove
>   variable.
>   * config/arm/t-netbsd-eabi (LIB2_DIVMOD_EXCEPTION_FLAGS):
>   Likewise.

>From a quick glance, I'm not convinced this is right for Arm, since the
Arm unwind format does not support anything other than call-based
exceptions.  How did you test it?

R.

> ---
> Hi,
> 
>  I realised we still use handwritten ChangeLog entries (I got confused 
> with now different policies each of the various pieces of the GNU 
> toolchain has), so here's v2 of the change with a fix for that problem 
> being the only update.
> 
>  Also I have since run verification with the `riscv64-linux-gnu' target 
> and the ilp32d multilib as more representative for the change being made.
> No problems were observed, although the now enabled test case scored:
> 
> UNSUPPORTED: gcc.dg/div64-unwinding.c
> 
> of course with the target failing the `! *-*-linux*' condition.
> 
>  Given that for the `riscv64-linux-gnu' target and the ilp32d multilib 
> glibc currently fails to link against libgcc.a built at -O0 I first ran 
> reference testing with target libraries built at -O2, but comparing that 
> to change-under-test -O2 results revealed another issue with GCC target 
> libraries built at -O0 causing link failures across testsuites, namely 
> libgcov.a referring atomic primitives where libatomic.a has not been 
> linked in.  I haven't figured out yet if the issue is in libgcov, the 
> testsuite or the specs.  Examples of failures:
> 
> .../bin/riscv64-linux-gnu-ld: 
> .../gcc/testsuite/g++/../../lib32/ilp32d/libgcov.a(_gcov_indirect_call_profiler_v4.o):
>  in function `__gcov_topn_values_profiler_body': 
> .../libgcc/libgcov-profiler.c:116: undefined reference to 
> `__atomic_fetch_add_8'
> .../bin/riscv64-linux-gnu-ld: .../libgcc/libgcov-profiler.c:129: undefined 
> reference to `__atomic_fetch_add_8'
> .../bin/riscv64-linux-gnu-ld: .../libgcc/libgcov-profiler.c:150: undefined 
> reference to `__atomic_fetch_sub_8'
> collect2: error: ld returned 1 exit status
> compiler exited with status 1
> FAIL: g++.dg/other/pr55650.C  -std=gnu++98 (test for excess errors)
> 
> There were some odd Fortran failures too, with test cases failing to link, 
> making the results difficult to interpret.  Therefore I decided to arrange 
> for a special build with first stage GCC built with its target libraries 
> at -O2, so that first stage glibc builds, and then second stage GCC built 
> with its target libraries at -O0 and second stage glibc omitted.  That 
> removed the extra Fortran failures regardless of whether this change has 
> been applied or not, but we may consider looking overall into why a full 
> `riscv64-linux-gnu' build at -O0 has regressions against -O2 at least in 
> the ilp32d multilib.
> 
>  Meanwhile, OK to apply?
> 
>   Maciej
> 
> Changes from v1:
> 
> - ChangeLog entries added.
> ---
>  gcc/testsuite/gcc.dg/div64-unwinding.c |   25 
> +
>  gcc/testsuite/gcc.target/arm/div64-unwinding.c |   25 
> -
>  libgcc/Makefile.in |2 +-
>  libgcc/config/arm/t-bpabi  |5 -
>  libgcc/config/arm/t-netbsd-eabi|5 -
>  5 files changed, 26 insertions(+), 36 deletions(-)
> 
> gcc-libgcc-divmod-asynchronous-unwind-tables.diff
> Index: gcc/gcc/testsuite/gcc.dg/div64-unwinding.c
> ===
> --- /dev/null
> +++ gcc/gcc/testsuite/gcc.dg/div64-unwinding.c
> @@ -0,0 +1,25 @@
> +/* Performing a 64-bit division should not pull in the unwinder.  */
> +
> +/* { dg-do run { target { { ! *-*-linux* } && { ! *-*-uclinux* } } } } */
> +/* { dg-skip-if "load causes weak symbol resolution" { vxworks_kernel } } */
> +/* { dg-options "-O0" } */
> +
> +#include 
> +

[PATCH] i386: Add c99 runtime requirement to math optimisation tests

2020-08-18 Thread Pat Bernardi
A number of i386 math optimisation tests are looking assembly instructions
that are only emitted when the compiler knows the target has a C99 libm
available. Since targets like *-elf may not have such a libm, a C99 runtime
requirement is added to these tests.

Tested on x86-elf and x86_64-elf hosted on x86_64-linux in addition to 
x86_64-pc-linux-gnu

If approved, I'll need a maintainer to kindly commit on my behalf.

Thanks,

Pat Bernardi
Senior Software Engineer, AdaCore

2020-08-18  Pat Bernardi  

gcc/testsuite/ChangeLog

* gcc.target/i386/387-7.c: Add dg-require-effective-target c99_runtime.
* gcc.target/i386/387-9.c: Likewise.
* gcc.target/i386/avx512bw-pr96246-1.c: Likewise.
* gcc.target/i386/avx512f-rint-sfix-vec-2.c: Likewise.
* gcc.target/i386/avx512f-rintf-sfix-vec-2.c: Likewise.
* gcc.target/i386/avx512vl-pr96246-1.c: Likewise.
* gcc.target/i386/pr61403.c: Likewise.
* gcc.target/i386/sse4_1-ceil-sfix-vec.c: Likewise.
* gcc.target/i386/sse4_1-ceilf-sfix-vec.c: Likewise.
* gcc.target/i386/sse4_1-floor-sfix-vec.c: Likewise.
* gcc.target/i386/sse4_1-floorf-sfix-vec.c: Likewise.
* gcc.target/i386/sse4_1-rint-sfix-vec.c: Likewise.
* gcc.target/i386/sse4_1-rintf-sfix-vec.c: Likewise.
* gcc.target/i386/sse4_1-round-sfix-vec.c: Likewise.
* gcc.target/i386/sse4_1-roundf-sfix-vec.c: Likewise.
---
 gcc/testsuite/gcc.target/i386/387-7.c| 1 +
 gcc/testsuite/gcc.target/i386/387-9.c| 1 +
 gcc/testsuite/gcc.target/i386/avx512bw-pr96246-1.c   | 1 +
 gcc/testsuite/gcc.target/i386/avx512f-rint-sfix-vec-2.c  | 1 +
 gcc/testsuite/gcc.target/i386/avx512f-rintf-sfix-vec-2.c | 1 +
 gcc/testsuite/gcc.target/i386/avx512vl-pr96246-1.c   | 1 +
 gcc/testsuite/gcc.target/i386/pr61403.c  | 1 +
 gcc/testsuite/gcc.target/i386/sse4_1-ceil-sfix-vec.c | 1 +
 gcc/testsuite/gcc.target/i386/sse4_1-ceilf-sfix-vec.c| 1 +
 gcc/testsuite/gcc.target/i386/sse4_1-floor-sfix-vec.c| 1 +
 gcc/testsuite/gcc.target/i386/sse4_1-floorf-sfix-vec.c   | 1 +
 gcc/testsuite/gcc.target/i386/sse4_1-rint-sfix-vec.c | 1 +
 gcc/testsuite/gcc.target/i386/sse4_1-rintf-sfix-vec.c| 1 +
 gcc/testsuite/gcc.target/i386/sse4_1-round-sfix-vec.c| 1 +
 gcc/testsuite/gcc.target/i386/sse4_1-roundf-sfix-vec.c   | 1 +
 15 files changed, 15 insertions(+)

diff --git a/gcc/testsuite/gcc.target/i386/387-7.c 
b/gcc/testsuite/gcc.target/i386/387-7.c
index e01ed2e0576..3c1ad606462 100644
--- a/gcc/testsuite/gcc.target/i386/387-7.c
+++ b/gcc/testsuite/gcc.target/i386/387-7.c
@@ -1,6 +1,7 @@
 /* Verify that 387 fsincos instruction is generated.  */
 /* { dg-do compile } */
 /* { dg-options "-O -ffast-math -mfpmath=387 -mfancy-math-387" } */
+/* { dg-require-effective-target c99_runtime } */
 /* { dg-final { scan-assembler "fsincos" } } */
 
 extern double sin (double);
diff --git a/gcc/testsuite/gcc.target/i386/387-9.c 
b/gcc/testsuite/gcc.target/i386/387-9.c
index 2667aa46872..469c635e479 100644
--- a/gcc/testsuite/gcc.target/i386/387-9.c
+++ b/gcc/testsuite/gcc.target/i386/387-9.c
@@ -1,6 +1,7 @@
 /* Verify that 387 fsincos instruction is generated.  */
 /* { dg-do compile } */
 /* { dg-options "-O -funsafe-math-optimizations -mfpmath=387 -mfancy-math-387" 
} */
+/* { dg-require-effective-target c99_runtime } */
 
 extern double sin (double);
 extern double cos (double);
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-pr96246-1.c 
b/gcc/testsuite/gcc.target/i386/avx512bw-pr96246-1.c
index 2bfcc840a91..4aaa28866ca 100644
--- a/gcc/testsuite/gcc.target/i386/avx512bw-pr96246-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-pr96246-1.c
@@ -1,6 +1,7 @@
 /* PR target/96246 */
 /* { dg-do compile } */
 /* { dg-options "-O2 -ftree-vectorize -mavx512bw" } */
+/* { dg-require-effective-target c99_runtime } */
 /* { dg-final { scan-assembler-times "vpblendm\[bwdq\]\[\t ]" 4 } } */
 /* { dg-final { scan-assembler-times "vblendmp\[sd\]\[\t ]" 2 } } */
 
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-rint-sfix-vec-2.c 
b/gcc/testsuite/gcc.target/i386/avx512f-rint-sfix-vec-2.c
index c3f78ac3f25..2d2099f77cb 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-rint-sfix-vec-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-rint-sfix-vec-2.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -ffast-math -ftree-vectorize -mavx512f" } */
+/* { dg-require-effective-target c99_runtime } */
 
 #include "avx512f-rint-sfix-vec-1.c"
 
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-rintf-sfix-vec-2.c 
b/gcc/testsuite/gcc.target/i386/avx512f-rintf-sfix-vec-2.c
index c172e61f84a..fe473766c86 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-rintf-sfix-vec-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-rintf-sfix-vec-2.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -ffast-math -ftree-vectorize -mavx512f" } */
+/* { dg-require-effective-target c99_runtime } */
 
 

[PATCH] testsuite: require c99 runtime for trigonometric optimisation tests

2020-08-18 Thread Pat Bernardi
A number of optimisation that simplify trigonometric expressions are only
performed when the compiler knows the target has a C99 libm available.
Since targets like *-elf may not have such a libm, a C99 runtime requirement
is added to these tests.

Tested on x86-elf and x86_64-elf hosted on x86_64-linux in addition to 
x86_64-pc-linux-gnu

If approved, I'll need a maintainer to kindly commit on my behalf.

Thanks,

Pat Bernardi
Senior Software Engineer, AdaCore

2020-08-18  Pat Bernardi  

gcc/testsuite/ChangeLog

* gcc.dg/sinatan-2.c: Add dg-require-effective-target c99_runtime.
* gcc.dg/sinhovercosh-1.c: Likewise.
* gcc.dg/tanhbysinh.c: Likewise.
---
 gcc/testsuite/gcc.dg/sinatan-2.c  | 1 +
 gcc/testsuite/gcc.dg/sinhovercosh-1.c | 1 +
 gcc/testsuite/gcc.dg/tanhbysinh.c | 3 ++-
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/sinatan-2.c b/gcc/testsuite/gcc.dg/sinatan-2.c
index 8e7ea3c90fc..64d6d301535 100644
--- a/gcc/testsuite/gcc.dg/sinatan-2.c
+++ b/gcc/testsuite/gcc.dg/sinatan-2.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-Ofast -fdump-tree-optimized" } */
+/* { dg-require-effective-target c99_runtime } */
 
 extern float sinf (float);
 extern float cosf (float);
diff --git a/gcc/testsuite/gcc.dg/sinhovercosh-1.c 
b/gcc/testsuite/gcc.dg/sinhovercosh-1.c
index d41093fa6de..564d3c51b3e 100644
--- a/gcc/testsuite/gcc.dg/sinhovercosh-1.c
+++ b/gcc/testsuite/gcc.dg/sinhovercosh-1.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-Ofast -fdump-tree-optimized" } */
+/* { dg-require-effective-target c99_runtime } */
 
 extern float sinhf (float);
 extern float coshf (float);
diff --git a/gcc/testsuite/gcc.dg/tanhbysinh.c 
b/gcc/testsuite/gcc.dg/tanhbysinh.c
index fde72c2f93b..9dbe133ec74 100644
--- a/gcc/testsuite/gcc.dg/tanhbysinh.c
+++ b/gcc/testsuite/gcc.dg/tanhbysinh.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-Ofast -fdump-tree-optimized" } */
+/* { dg-require-effective-target c99_runtime } */
 
 extern float sinhf (float);
 extern float tanhf (float);
@@ -37,4 +38,4 @@ tanhbysinhl_ (long double x)
 /* {dg-final { scan-tree-dump-not "tanhl " "optimized" }} */
 /* { dg-final { scan-tree-dump "cosh " "optimized" } } */
 /* { dg-final { scan-tree-dump "coshf " "optimized" } } */
-/* { dg-final { scan-tree-dump "coshl " "optimized" } } */
\ No newline at end of file
+/* { dg-final { scan-tree-dump "coshl " "optimized" } } */
-- 
2.27.0



[PATCH] i386: Cleanup i386/i386elf.h and align it's return convention with the SVR4 ABI

2020-08-18 Thread Pat Bernardi
As observed a number of years ago in the following thread, i386/i386elf.h has 
not been kept up to date:

https://gcc.gnu.org/pipermail/gcc/2013-August/209981.html

This patch does the following cleanup:

1. The return convention now follows the i386 and x86_64 SVR4 ABIs again. As 
discussed in the above thread, the current return convention does not match any 
other target or existing ABI, which is problematic since the current approach 
is inefficient (particularly on x86_64-elf) and confuses other tools like GDB 
(unfortunately that thread did not lead to any fix at the time). 

2. The default version of ASM_OUTPUT_ASCII from elfos.h is used. As mentioned 
in the cleanup of i386/sysv4.h [1] the ASM_OUTPUT_ASCII implementation then 
used by sysv4.h, and currently used by i386elf.h, has a significantly higher 
computation complexity than the default version provided by elfos.h.

The patch has been tested on i386-elf and x86_64-elf hosted on x86_64-linux, 
fixing a number failing tests that were expecting the SVR4 ABI return 
convention. It has also been bootstrapped and tested on x86_64-pc-linux-gnu 
without regression.

If approved, I'll need a maintainer to kindly commit on my behalf.

Thanks,

Pat Bernardi
Senior Software Engineer, AdaCore

[1] https://gcc.gnu.org/pipermail/gcc-patches/2011-February/305559.html

2020-08-18  Pat Bernardi  

gcc/ChangeLog

* config/i386/i386elf.h (SUBTARGET_RETURN_IN_MEMORY): Remove.
(ASM_OUTPUT_ASCII): Likewise.
(DEFAULT_PCC_STRUCT_RETURN): Define.
* config/i386/i386.c (ix86_return_in_memory): Remove
SUBTARGET_RETURN_IN_MEMORY.

From fe617455561a4c8d898b4e231c447b16e5661e10 Mon Sep 17 00:00:00 2001
From: Pat Bernardi 
Date: Fri, 14 Aug 2020 17:34:38 -0400
Subject: [PATCH] i386: Cleanup i386/i386elf.h and align it's return convention
 with the SVR4 ABI

While i386elf.h was originally derived from sysv4.h it has not been kept
up to date with the development of the compiler. Two changes are made:

* The return convention now follows the i386 and x86_64 SVR4 ABIs again.

* The more efficient default version of ASM_OUTPUT_ASCII in elfos.h is used.

2020-08-18  Pat Bernardi  

gcc/ChangeLog

* config/i386/i386elf.h (SUBTARGET_RETURN_IN_MEMORY): Remove.
(ASM_OUTPUT_ASCII): Likewise.
(DEFAULT_PCC_STRUCT_RETURN): Define.
* config/i386/i386.c (ix86_return_in_memory): Remove
SUBTARGET_RETURN_IN_MEMORY.
---
 gcc/config/i386/i386.c|  4 ---
 gcc/config/i386/i386elf.h | 62 ---
 2 files changed, 6 insertions(+), 60 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index e9ecb94d174..65e87b41e80 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -3796,9 +3796,6 @@ ix86_libcall_value (machine_mode mode)
 static bool
 ix86_return_in_memory (const_tree type, const_tree fntype ATTRIBUTE_UNUSED)
 {
-#ifdef SUBTARGET_RETURN_IN_MEMORY
-  return SUBTARGET_RETURN_IN_MEMORY (type, fntype);
-#else
   const machine_mode mode = type_natural_mode (type, NULL, true);
   HOST_WIDE_INT size;
 
@@ -3879,7 +3876,6 @@ ix86_return_in_memory (const_tree type, const_tree fntype 
ATTRIBUTE_UNUSED)
 
   return false;
 }
-#endif
 }
 
 
diff --git a/gcc/config/i386/i386elf.h b/gcc/config/i386/i386elf.h
index eb2203cf323..05cee89f795 100644
--- a/gcc/config/i386/i386elf.h
+++ b/gcc/config/i386/i386elf.h
@@ -19,12 +19,12 @@ You should have received a copy of the GNU General Public 
License
 along with GCC; see the file COPYING3.  If not see
 .  */
 
-/* The ELF ABI for the i386 says that records and unions are returned
-   in memory.  */
-
-#define SUBTARGET_RETURN_IN_MEMORY(TYPE, FNTYPE) \
-   (TYPE_MODE (TYPE) == BLKmode \
-|| (VECTOR_MODE_P (TYPE_MODE (TYPE)) && int_size_in_bytes (TYPE) == 8))
+/* Define DEFAULT_PCC_STRUCT_RETURN to 1 because the i386 SVR4 ABI returns
+   records and unions in memory. ix86_option_override_internal will overide
+   this flag when compiling 64-bit code as we never do pcc_struct_return
+   scheme on x86-64.  */
+#undef DEFAULT_PCC_STRUCT_RETURN
+#define DEFAULT_PCC_STRUCT_RETURN 1
 
 #undef CPP_SPEC
 #define CPP_SPEC ""
@@ -40,56 +40,6 @@ along with GCC; see the file COPYING3.  If not see
 #define DBX_REGISTER_NUMBER(n) \
   (TARGET_64BIT ? dbx64_register_map[n] : svr4_dbx_register_map[n])
 
-/* The routine used to output sequences of byte values.  We use a special
-   version of this for most svr4 targets because doing so makes the
-   generated assembly code more compact (and thus faster to assemble)
-   as well as more readable.  Note that if we find subparts of the
-   character sequence which end with NUL (and which are shorter than
-   ELF_STRING_LIMIT) we output those using ASM_OUTPUT_LIMITED_STRING.  */
-
-#undef ASM_OUTPUT_ASCII
-#define ASM_OUTPUT_ASCII(FILE, STR, LENGTH)\
-  do  

c++: alias template template_info setting

2020-08-18 Thread Nathan Sidwell

During the construction of alias templates we can alter its
template_info.  This is really weird, because that's morally immutable
data.  In this case it's ok, but let's not create a duplicate
template_info, and add asserts to make sure it is changing in exactly
the way we expect.

Needless to say modules fell over the duplicated template_info and got 
very confused.


gcc/cp/
* cp-tree.h (SET_TYPE_TEMPLTE_INFO): Do not deal with ALIAS 
templates.

* pt.c (lookup_template_class_1): Special-case alias template
template_info setting.


--
Nathan Sidwell
diff --git i/gcc/cp/cp-tree.h w/gcc/cp/cp-tree.h
index 04758574019..5f2c7e574c4 100644
--- i/gcc/cp/cp-tree.h
+++ w/gcc/cp/cp-tree.h
@@ -3485,13 +3485,12 @@ struct GTY(()) lang_decl {
? TYPE_ALIAS_TEMPLATE_INFO (NODE)	\
: TYPE_TEMPLATE_INFO (NODE))
 
-/* Set the template information for an ENUMERAL_, RECORD_, or
-   UNION_TYPE to VAL.  */
+/* Set the template information for a non-alias n ENUMERAL_, RECORD_,
+   or UNION_TYPE to VAL.  ALIAS's are dealt with separately.  */
 #define SET_TYPE_TEMPLATE_INFO(NODE, VAL)\
-  (TREE_CODE (NODE) == ENUMERAL_TYPE	\
-   || (CLASS_TYPE_P (NODE) && !TYPE_ALIAS_P (NODE))			\
-   ? (TYPE_LANG_SLOT_1 (NODE) = (VAL))\
-   : (DECL_TEMPLATE_INFO (TYPE_NAME (NODE)) = (VAL)))
+  (gcc_checking_assert (TREE_CODE (NODE) == ENUMERAL_TYPE		\
+			|| (CLASS_TYPE_P (NODE) && !TYPE_ALIAS_P (NODE))), \
+   (TYPE_LANG_SLOT_1 (NODE) = (VAL)))	\
 
 #define TI_TEMPLATE(NODE) \
   ((struct tree_template_info*)TEMPLATE_INFO_CHECK (NODE))->tmpl
diff --git i/gcc/cp/pt.c w/gcc/cp/pt.c
index b80fe0a5cc5..ada0438f272 100644
--- i/gcc/cp/pt.c
+++ w/gcc/cp/pt.c
@@ -10088,8 +10088,26 @@ lookup_template_class_1 (tree d1, tree arglist, tree in_decl, tree context,
 	}
 	}
 
-  // Build template info for the new specialization.
-  SET_TYPE_TEMPLATE_INFO (t, build_template_info (found, arglist));
+  /* Build template info for the new specialization.  */
+  if (TYPE_ALIAS_P (t))
+	{
+	  /* This is constructed during instantiation of the alias
+	 decl.  But for member templates of template classes, that
+	 is not correct as we need to refer to the partially
+	 instantiated template, not the most general template.
+	 The incorrect knowledge will not have escaped this
+	 instantiation process, so we're good just updating the
+	 template_info we made then.  */
+	  tree ti = DECL_TEMPLATE_INFO (TYPE_NAME (t));
+	  gcc_checking_assert (template_args_equal (TI_ARGS (ti), arglist));
+	  if (TI_TEMPLATE (ti) != found)
+	{
+	  gcc_checking_assert (DECL_TI_TEMPLATE (found) == TI_TEMPLATE (ti));
+	  TI_TEMPLATE (ti) = found;
+	}
+	}
+  else
+	SET_TYPE_TEMPLATE_INFO (t, build_template_info (found, arglist));
 
   elt.spec = t;
   slot = type_specializations->find_slot_with_hash (, hash, INSERT);


Re: [EXTERNAL] Re: [Patch 1/5] rs6000, Add 128-bit sign extension support

2020-08-18 Thread Segher Boessenkool
On Thu, Aug 13, 2020 at 06:53:56PM -0500, will schmidt wrote:
> On Thu, 2020-08-13 at 17:55 -0500, Segher Boessenkool wrote:
> > > As long as there are no issues defining the builtins for 3.0 here.
> > > AFAIK they are not documented in ISA 3.0.  This is a happy accident
> > > that these ISA 3.1 builtins can be implemented with existing
> > > support.
> > 
> > There are *no* builtins defined in the ISA!  The insns are just ISA
> > 3.0
> > instructions.
> 
> Ok. 
> 
> So then maybe just "Sign extend builtins" and leave off the ISA
> reference all together.   

Sure.  Or you can say "builtins for the instructions introduced in
Power ISA 3.1" or such.

If we ever get the builtins documentation updated quickly (and updated),
it should go on https://gcc.gnu.org/readings.html , and live will be
good.


Segher


Re: [RFC PATCH v1 1/1] PPC64: Implement POWER Architecture Vector Function ABI.

2020-08-18 Thread Segher Boessenkool
On Tue, Aug 18, 2020 at 07:14:19PM +, GT wrote:
> > That sounds like libmvec?
> >
> > I still don't know what this is.
> 
> Yes, it is libmvec.
> 
> Now look at what GCC does to the code in Examples 1 and 2 at this link:
> https://sourceware.org/glibc/wiki/libmvec
> 
> x86_64 added functionality to GCC so such code uses the new functions without 
> the user
> having to re-write the loops and explicitly call the new functions.
> 
> We are aiming to provide that same capability for PPC64 in GCC.

Great!  Please repost with what I already pointed out fixed, that
explanation added, and working links to the documentation?

Thanks in advance,


Segher


[PATCH 2/2] c++: Rewrite members for all deduction guides. [PR96199]

2020-08-18 Thread Jason Merrill via Gcc-patches
After the last patch, it occurred to me that we could run into the
specialization issue with non-alias deduction guides as well, so this patch
extends the rewriting to C++17 mode.

Doing this revealed that we weren't properly pushing into class scope for
normalization.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/ChangeLog:

PR c++/96199
* pt.c (tsubst_aggr_type): Rewrite in C++17, too.
(maybe_dependent_member_ref): Likewise.
(build_deduction_guide): Re-substitute template parms.
* cp-tree.h (struct push_nested_class_guard): New.
* constraint.cc (get_normalized_constraints_from_decl): Use it.

gcc/testsuite/ChangeLog:

PR c++/96199
* g++.dg/cpp1z/class-deduction-spec1.C: New test.
---
 gcc/cp/cp-tree.h  | 18 +
 gcc/cp/constraint.cc  |  2 +
 gcc/cp/pt.c   | 17 +++--
 .../g++.dg/cpp1z/class-deduction-spec1.C  | 38 +++
 4 files changed, 71 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction-spec1.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 04758574019..5ba82ee60db 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -8137,6 +8137,24 @@ is_constrained_auto (const_tree t)
   return is_auto (t) && PLACEHOLDER_TYPE_CONSTRAINTS (t);
 }
 
+/* RAII class to push/pop class scope T; if T is not a class, do nothing.  */
+
+struct push_nested_class_guard
+{
+  bool push;
+  push_nested_class_guard (tree t)
+: push (t && CLASS_TYPE_P (t))
+  {
+if (push)
+  push_nested_class (t);
+  }
+  ~push_nested_class_guard ()
+  {
+if (push)
+  pop_nested_class ();
+  }
+};
+
 #if CHECKING_P
 namespace selftest {
   extern void run_cp_tests (void);
diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index e4aace596e7..48d52ec5b7a 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -840,6 +840,8 @@ get_normalized_constraints_from_decl (tree d, bool diag = 
false)
 if (tree *p = hash_map_safe_get (normalized_map, tmpl))
   return *p;
 
+  push_nested_class_guard pncs (DECL_CONTEXT (d));
+
   tree args = generic_targs_for (tmpl);
   tree ci = get_constraints (decl);
   tree norm = get_normalized_constraints_from_info (ci, args, tmpl, diag);
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 585d944542b..8ad91b37297 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -13391,7 +13391,7 @@ tsubst_aggr_type (tree t,
 complain, in_decl);
  if (argvec == error_mark_node)
r = error_mark_node;
- else if (cxx_dialect >= cxx20 && dependent_scope_p (context))
+ else if (cxx_dialect >= cxx17 && dependent_scope_p (context))
{
  /* See maybe_dependent_member_ref.  */
  tree name = TYPE_IDENTIFIER (t);
@@ -16328,14 +16328,13 @@ tsubst_init (tree init, tree decl, tree args,
we are trying to refer to that member in a partial instantiation of C,
return a SCOPE_REF; otherwise, return NULL_TREE.
 
-   This can happen when forming a C++20 alias template deduction guide, as in
-   PR96199.  */
+   This can happen when forming a C++17 deduction guide, as in PR96199.  */
 
 static tree
 maybe_dependent_member_ref (tree t, tree args, tsubst_flags_t complain,
tree in_decl)
 {
-  if (cxx_dialect < cxx20)
+  if (cxx_dialect < cxx17)
 return NULL_TREE;
 
   tree ctx = context_for_name_lookup (t);
@@ -28370,6 +28369,16 @@ build_deduction_guide (tree type, tree ctor, tree 
outer_args, tsubst_flags_t com
  fargs = tsubst (fargs, tsubst_args, complain, ctor);
  current_template_parms = save_parms;
}
+  else
+   {
+ /* Substitute in the same arguments to rewrite class members into
+references to members of an unknown specialization.  */
+ cp_evaluated ev;
+ fparms = tsubst_arg_types (fparms, targs, NULL_TREE, complain, ctor);
+ fargs = tsubst (fargs, targs, complain, ctor);
+ if (ci)
+   ci = tsubst_constraint_info (ci, targs, complain, ctor);
+   }
 
   --processing_template_decl;
   if (!ok)
diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction-spec1.C 
b/gcc/testsuite/g++.dg/cpp1z/class-deduction-spec1.C
new file mode 100644
index 000..fcdf746134b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction-spec1.C
@@ -0,0 +1,38 @@
+// PR c++/96199
+// { dg-do compile { target c++17 } }
+
+template struct A1 { };
+template struct A2 { };
+template struct A3 { };
+
+int i;
+template struct B {
+  enum E { X };
+  B(A1, V) { }
+
+  constexpr static V& ir = i;
+  B(A2, V) { }
+
+  B(A3, V) { }
+};
+
+// template B(A1::X>,T) -> B;
+// template B(A2::ir>,T) -> B;
+// template B(A3::E>,T) -> B;
+
+int j;
+template <> struct B {
+  using V = int;
+
+  enum E { X = 1 };
+  B(A1, V) { }
+
+  constexpr static V& ir = j;
+  B(A2, V) { 

[PATCH 1/2] c++: Handle enumerator in C++20 alias CTAD. [PR96199]

2020-08-18 Thread Jason Merrill via Gcc-patches
To form a deduction guide for an alias template, we substitute the template
arguments from the pattern into the deduction guide for the underlying
class.  In the case of B(A1), that produces B(A1::X>) -> B.
But since an enumerator doesn't have its own template info, and B is a
dependent scope, trying to look up B::X fails and we crash.  So we need
to produce a SCOPE_REF instead.

And trying to use the members of the template class is wrong for other
members, as well, as it gives a nonsensical result if the class is
specialized.

Tested x86_64-pc-linux-gnu, applying to trunk and 10.

gcc/cp/ChangeLog:

PR c++/96199
* pt.c (maybe_dependent_member_ref): New.
(tsubst_copy) [CONST_DECL]: Use it.
[VAR_DECL]: Likewise.
(tsubst_aggr_type): Handle nested type.

gcc/testsuite/ChangeLog:

PR c++/96199
* g++.dg/cpp2a/class-deduction-alias4.C: New test.
---
 gcc/cp/pt.c   | 43 ++
 .../g++.dg/cpp2a/class-deduction-alias4.C | 44 +++
 2 files changed, 87 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-alias4.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index b80fe0a5cc5..585d944542b 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -13391,6 +13391,17 @@ tsubst_aggr_type (tree t,
 complain, in_decl);
  if (argvec == error_mark_node)
r = error_mark_node;
+ else if (cxx_dialect >= cxx20 && dependent_scope_p (context))
+   {
+ /* See maybe_dependent_member_ref.  */
+ tree name = TYPE_IDENTIFIER (t);
+ tree fullname = name;
+ if (instantiates_primary_template_p (t))
+   fullname = build_nt (TEMPLATE_ID_EXPR, name,
+INNERMOST_TEMPLATE_ARGS (argvec));
+ return build_typename_type (context, name, fullname,
+ typename_type);
+   }
  else
{
  r = lookup_template_class (t, argvec, in_decl, context,
@@ -16313,6 +16324,32 @@ tsubst_init (tree init, tree decl, tree args,
   return init;
 }
 
+/* If T is a reference to a dependent member of the current instantiation C and
+   we are trying to refer to that member in a partial instantiation of C,
+   return a SCOPE_REF; otherwise, return NULL_TREE.
+
+   This can happen when forming a C++20 alias template deduction guide, as in
+   PR96199.  */
+
+static tree
+maybe_dependent_member_ref (tree t, tree args, tsubst_flags_t complain,
+   tree in_decl)
+{
+  if (cxx_dialect < cxx20)
+return NULL_TREE;
+
+  tree ctx = context_for_name_lookup (t);
+  if (!CLASS_TYPE_P (ctx))
+return NULL_TREE;
+
+  ctx = tsubst (ctx, args, complain, in_decl);
+  if (dependent_scope_p (ctx))
+return build_qualified_name (NULL_TREE, ctx, DECL_NAME (t),
+/*template_p=*/false);
+
+  return NULL_TREE;
+}
+
 /* Like tsubst, but deals with expressions.  This function just replaces
template parms; to finish processing the resultant expression, use
tsubst_copy_and_build or tsubst_expr.  */
@@ -16371,6 +16408,9 @@ tsubst_copy (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
if (args == NULL_TREE)
  return scalar_constant_value (t);
 
+   if (tree ref = maybe_dependent_member_ref (t, args, complain, in_decl))
+ return ref;
+
/* Unfortunately, we cannot just call lookup_name here.
   Consider:
 
@@ -16421,6 +16461,9 @@ tsubst_copy (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
   return t;
 
 case VAR_DECL:
+  if (tree ref = maybe_dependent_member_ref (t, args, complain, in_decl))
+   return ref;
+  gcc_fallthrough();
 case FUNCTION_DECL:
   if (DECL_LANG_SPECIFIC (t) && DECL_TEMPLATE_INFO (t))
r = tsubst (t, args, complain, in_decl);
diff --git a/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias4.C 
b/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias4.C
new file mode 100644
index 000..f2c3ffda85a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias4.C
@@ -0,0 +1,44 @@
+// PR c++/96199
+// { dg-do compile { target c++2a } }
+
+template struct A1 { };
+template struct A2 { };
+template struct A3 { };
+
+int i;
+template struct B {
+  enum E { X };
+  B(A1, V) { }
+
+  constexpr static V& ir = i;
+  B(A2, V) { }
+
+  B(A3, V);
+};
+
+// template B(A1::X>,T) -> B;
+// template B(A2::ir>,T) -> B;
+// template B(A3::E>,T) -> B;
+
+template  using U = B;
+
+// template B(A1::X>,T) -> B;
+// template B(A2::ir>,T) -> B;
+// template B(A3::E>,T) -> B;
+
+int j;
+template <> struct B {
+  using V = int;
+
+  enum E { X = 1 };
+  B(A1, V) { }
+
+  constexpr static V& ir = j;
+  B(A2, V) { }
+
+  B(A3, V);
+};
+
+U u1 { A1<1>(), 42 };
+U u2 { A2(), 42 };
+U u3 { A3::E>(), 42 };

base-commit: 3c04bd60e56da399a441f73ebb687b5039b9cf3f

[PATCH, committed] PR fortran/96613,96686 - Fix type/kind issues, temporaries evaluating MIN/MAX

2020-08-18 Thread Harald Anlauf
There was another issue (PR96686) with MIN/MAX for character arguments of
different kind.

Character arguments to MIN/MAX are a Fortran 2003 feature, so there is no
real reason to have a new GNU extension, and no related legacy code.
Instead of ICEing, we now unconditionally generate an error

This was confirmed in PR96686 by Steve, who also approved the patch for PR96613.
Both patches were combined, regtested on x86_64-pc-linux-gnu, and committed.

Thanks,
Harald

Full commit message:

PR fortran/96613,96686 - Fix type/kind issues, temporaries evaluating MIN/MAX

When evaluating functions of the MIN/MAX variety inline, use a temporary
of appropriate type and kind, and convert to the result type at the end.
In the case of allowing for the GNU extensions to MIN/MAX, derive the
result kind consistently during simplificaton.

Furthermore, the Fortran standard requires type and kind of arguments to
the MIN/MAX intrinsics to all have the same type and kind.  While a GNU
extension accepts kind differences for integer and real arguments which
seems to have been used in legacy code, there is no reason to allow
different character kinds.  We now reject the latter unconditionally.

gcc/fortran/ChangeLog:

* check.c (check_rest): Reject MIN/MAX character arguments of
different kind.
* simplify.c (min_max_choose): The simplification result shall
have the highest kind value of the arguments.
* trans-intrinsic.c (gfc_conv_intrinsic_minmax): Choose type and
kind of intermediate by looking at all arguments, not the result.

gcc/testsuite/ChangeLog:

* gfortran.dg/minmax_char_3.f90: New test.
* gfortran.dg/min_max_kind.f90: New test.
* gfortran.dg/pr96613.f90: New test.

diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c
index 74e5e448760..65b46cd3f85 100644
--- a/gcc/fortran/check.c
+++ b/gcc/fortran/check.c
@@ -3693,6 +3693,11 @@ check_rest (bt type, int kind, gfc_actual_arglist *arglist)
 	{
 	  if (x->ts.type == type)
 	{
+	  if (x->ts.type == BT_CHARACTER)
+		{
+		  gfc_error ("Different character kinds at %L", >where);
+		  return false;
+		}
 	  if (!gfc_notify_std (GFC_STD_GNU, "Different type "
    "kinds at %L", >where))
 		return false;
diff --git a/gcc/fortran/simplify.c b/gcc/fortran/simplify.c
index eb8b2afeb29..074b50c2e68 100644
--- a/gcc/fortran/simplify.c
+++ b/gcc/fortran/simplify.c
@@ -4924,6 +4924,8 @@ min_max_choose (gfc_expr *arg, gfc_expr *extremum, int sign, bool back_val)
   switch (arg->ts.type)
 {
   case BT_INTEGER:
+	if (extremum->ts.kind < arg->ts.kind)
+	  extremum->ts.kind = arg->ts.kind;
 	ret = mpz_cmp (arg->value.integer,
 		   extremum->value.integer) * sign;
 	if (ret > 0)
@@ -4931,6 +4933,8 @@ min_max_choose (gfc_expr *arg, gfc_expr *extremum, int sign, bool back_val)
 	break;

   case BT_REAL:
+	if (extremum->ts.kind < arg->ts.kind)
+	  extremum->ts.kind = arg->ts.kind;
 	if (mpfr_nan_p (extremum->value.real))
 	  {
 	ret = 1;
diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index fd8809902b7..2483f016d8e 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -4073,6 +4073,7 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op)
   tree val;
   tree *args;
   tree type;
+  tree argtype;
   gfc_actual_arglist *argexpr;
   unsigned int i, nargs;

@@ -4082,16 +4083,24 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op)
   gfc_conv_intrinsic_function_args (se, expr, args, nargs);
   type = gfc_typenode_for_spec (>ts);

-  argexpr = expr->value.function.actual;
-  if (TREE_TYPE (args[0]) != type)
-args[0] = convert (type, args[0]);
   /* Only evaluate the argument once.  */
   if (!VAR_P (args[0]) && !TREE_CONSTANT (args[0]))
 args[0] = gfc_evaluate_now (args[0], >pre);

-  mvar = gfc_create_var (type, "M");
-  gfc_add_modify (>pre, mvar, args[0]);
+  /* Determine suitable type of temporary, as a GNU extension allows
+ different argument kinds.  */
+  argtype = TREE_TYPE (args[0]);
+  argexpr = expr->value.function.actual;
+  for (i = 1, argexpr = argexpr->next; i < nargs; i++, argexpr = argexpr->next)
+{
+  tree tmptype = TREE_TYPE (args[i]);
+  if (TYPE_PRECISION (tmptype) > TYPE_PRECISION (argtype))
+	argtype = tmptype;
+}
+  mvar = gfc_create_var (argtype, "M");
+  gfc_add_modify (>pre, mvar, convert (argtype, args[0]));

+  argexpr = expr->value.function.actual;
   for (i = 1, argexpr = argexpr->next; i < nargs; i++, argexpr = argexpr->next)
 {
   tree cond = NULL_TREE;
@@ -4119,8 +4128,8 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op)
 	 Also, there is no consensus among other tested compilers.  In
 	 short, it's a mess.  So lets just do whatever is fastest.  */
   tree_code code = op == GT_EXPR ? MAX_EXPR : MIN_EXPR;
-  calc = fold_build2_loc (input_location, code, type,
-			  convert (type, 

[committed] [OG10] Backport OpenMP-related patches

2020-08-18 Thread Kwok Cheung Yeung

Hello

I have now backported a number of OpenMP-related patches from master to 
devel/omp/gcc-10. These are:


- Fortran: Fix character-kind=4 substring resolution (PR95837) (commit 
f48bffe70cba310461ec19ffcd07c573a6b86575)
- libgomp.fortran/struct-elem-map-1.f90: Add char kind=4 tests (commit 
e0685fadb6aa7c9cc895bc14cbbe2b9026fa3a94)
- OpenMP: Fixes for omp critical + hint (commit 
c7c24828cfa4983ebc6744be3f913d0da6ff7163)
- critical-hint-*.{c,f90}: Move from gcc/testsuite to libgomp/testsuite (commit 
ade6e7204ce4d179cd9fa4637ddee85ba1fa12d9)
- openmp: Handle clauses with gimple sequences in convert_nonlocal_omp_clauses 
properly (commit 676b5525e8333005bdc1c596ed086f1da27a450f)
- Fortran/OpenMP: Fix detecting not perfectly nested loops (commit 
57dd9f3bfca8bb752c630431dc033c761e2ad382)


Kwok
From 1e3e1fb54ace591926e80fccbf39d518a9dd7ca6 Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Tue, 18 Aug 2020 04:35:43 -0700
Subject: [PATCH 1/6] Fortran: Fix character-kind=4 substring resolution
 (PR95837)

This is a backport from master of commit
f48bffe70cba310461ec19ffcd07c573a6b86575.

Testing showed that it is always set and its value matches
always ts->kind (if available) or otherwise, if it is a variable,
the sym->ts.kind.

gcc/fortran/ChangeLog:

PR fortran/95837
* resolve.c (gfc_resolve_substring_charlen): Remove
bogus ts.kind setting for the expression.

gcc/testsuite/ChangeLog:

PR fortran/95837
* gfortran.dg/char4-subscript.f90: New test.
---
 gcc/fortran/ChangeLog.omp |  9 
 gcc/fortran/resolve.c |  3 ---
 gcc/testsuite/ChangeLog.omp   |  8 +++
 gcc/testsuite/gfortran.dg/char4-subscript.f90 | 30 +++
 4 files changed, 47 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/char4-subscript.f90

diff --git a/gcc/fortran/ChangeLog.omp b/gcc/fortran/ChangeLog.omp
index 00c6be0..a6b7452 100644
--- a/gcc/fortran/ChangeLog.omp
+++ b/gcc/fortran/ChangeLog.omp
@@ -1,3 +1,12 @@
+2020-08-18  Kwok Cheung Yeung  
+
+   Backport from mainline
+   2020-06-25  Tobias Burnus  
+
+   PR fortran/95837
+   * resolve.c (gfc_resolve_substring_charlen): Remove
+   bogus ts.kind setting for the expression.
+
 2020-08-14  Kwok Cheung Yeung  
 
Backport from mainline
diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index 3166cc3..c05be55 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -5140,9 +5140,6 @@ gfc_resolve_substring_charlen (gfc_expr *e)
return;
 }
 
-  e->ts.type = BT_CHARACTER;
-  e->ts.kind = gfc_default_character_kind;
-
   if (!e->ts.u.cl)
 e->ts.u.cl = gfc_new_charlen (gfc_current_ns, NULL);
 
diff --git a/gcc/testsuite/ChangeLog.omp b/gcc/testsuite/ChangeLog.omp
index 8f652f4..e9d589f 100644
--- a/gcc/testsuite/ChangeLog.omp
+++ b/gcc/testsuite/ChangeLog.omp
@@ -1,3 +1,11 @@
+2020-08-18  Kwok Cheung Yeung  
+
+   Backport from mainline
+   2020-06-25  Tobias Burnus  
+
+   PR fortran/95837
+   * gfortran.dg/char4-subscript.f90: New test.
+
 2020-08-14  Kwok Cheung Yeung  
 
Backport from mainline
diff --git a/gcc/testsuite/gfortran.dg/char4-subscript.f90 
b/gcc/testsuite/gfortran.dg/char4-subscript.f90
new file mode 100644
index 000..f1f915c
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/char4-subscript.f90
@@ -0,0 +1,30 @@
+! { dg-do run }
+! { dg-additional-options "-fdump-tree-original" }
+!
+! PR fortran/95837
+!
+type t
+  character(len=:, kind=4), pointer :: str2
+end type t
+type(t) :: var
+
+allocate(character(len=5, kind=4) :: var%str2)
+
+var%str2(1:1) = 4_"d"
+var%str2(2:3) = 4_"ef"
+var%str2(4:4) = achar(int(Z'1F600'), kind=4)
+var%str2(5:5) = achar(int(Z'1F608'), kind=4)
+
+if (var%str2(1:3) /= 4_"def") stop 1
+if (ichar(var%str2(4:4)) /= int(Z'1F600')) stop 2
+if (ichar(var%str2(5:5)) /= int(Z'1F608')) stop 2
+
+deallocate(var%str2)
+end
+
+! Note: the last '\x00' is regarded as string terminator, hence, the tailing 
\0 byte is not in the dump
+
+! { dg-final { scan-tree-dump "  \\(\\*var\\.str2\\)\\\[1\\\]{lb: 1 sz: 4} = 
.dx00x00.\\\[1\\\]{lb: 1 sz: 4};" "original" } }
+! { dg-final { scan-tree-dump "  __builtin_memmove \\(\\(void \\*\\) 
&\\(\\*var.str2\\)\\\[2\\\]{lb: 1 sz: 4}, \\(void \\*\\) 
&.ex00x00x00fx00x00.\\\[1\\\]{lb: 1 sz: 4}, 8\\);" 
"original" } }
+! { dg-final { scan-tree-dump "  \\(\\*var.str2\\)\\\[4\\\]{lb: 1 sz: 4} = 
.x00xf6x01.\\\[1\\\]{lb: 1 sz: 4};" "original" } }
+! { dg-final { scan-tree-dump "  \\(\\*var.str2\\)\\\[5\\\]{lb: 1 sz: 4} = 
.bxf6x01.\\\[1\\\]{lb: 1 sz: 4};" "original" } }
-- 
2.8.1

From 4120e9973c6989ae7787776371aa1b3aff856d03 Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Tue, 18 Aug 2020 04:39:49 -0700
Subject: [PATCH 2/6] libgomp.fortran/struct-elem-map-1.f90: Add char kind=4
 tests

This is a backport from master of commit

[committed] rs6000: unaligned VSX in memcpy/memmove expansion

2020-08-18 Thread Aaron Sawdey via Gcc-patches
I've modified slightly per Will & Segher's comments, re-regstrapped and
posting what I've actually committed.

  Aaron

This patch adds a few new instructions to inline expansion of
memcpy/memmove. Generation of all these are controlled by
the option -mblock-ops-unaligned-vsx which is set on by default if the
target has TARGET_EFFICIENT_UNALIGNED_VSX.
 * unaligned vsx load/store (V2DImode)
 * unaligned vsx pair load/store (POImode) which is also controlled
   by -mblock-ops-vector-pair in case it is not wanted at some point.
   The default for -mblock-ops-vector-pair is for it to be on if the
   target has TARGET_MMA and TARGET_EFFICIENT_UNALIGNED_VSX. This is
   redundant, but nice for the future to clearly specify what is
   required.
 * unaligned vsx lxvl/stxvl but generally only to do the remainder
   of a copy/move we stated with some vsx loads/stores, and also prefer
   to use lb/lh/lw/ld if the remainder is 1/2/4/8 bytes.

Testing of this is actually accomplished by gcc.dg/memcmp-1.c which does
two memcpy() for each memcmp(). If the memcpy() calls don't do the right
thing then the memcmp() will fail unexpectedly.

gcc/ChangeLog:

* config/rs6000/rs6000-string.c (gen_lxvl_stxvl_move):
Helper function.
(expand_block_move): Add lxvl/stxvl, vector pair, and
unaligned VSX.
* config/rs6000/rs6000.c (rs6000_option_override_internal):
Default value for -mblock-ops-vector-pair.
* config/rs6000/rs6000.opt: Add -mblock-ops-vector-pair.
---
 gcc/config/rs6000/rs6000-string.c | 103 ++
 gcc/config/rs6000/rs6000.c|  14 +++-
 gcc/config/rs6000/rs6000.opt  |   4 ++
 3 files changed, 105 insertions(+), 16 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-string.c 
b/gcc/config/rs6000/rs6000-string.c
index c35d93180ca..82cc24ecdda 100644
--- a/gcc/config/rs6000/rs6000-string.c
+++ b/gcc/config/rs6000/rs6000-string.c
@@ -2708,6 +2708,32 @@ gen_lvx_v4si_move (rtx dest, rtx src)
 return gen_altivec_lvx_v4si_internal (dest, src);
 }
 
+static rtx
+gen_lxvl_stxvl_move (rtx dest, rtx src, int length)
+{
+  gcc_assert (MEM_P (dest) ^ MEM_P (src));
+  gcc_assert (GET_MODE (dest) == V16QImode && GET_MODE (src) == V16QImode);
+  gcc_assert (length <= 16);
+
+  bool is_store = MEM_P (dest);
+  rtx addr;
+
+  /* If the address form is not a simple register, make it so.  */
+  if (is_store)
+addr = XEXP (dest, 0);
+  else
+addr = XEXP (src, 0);
+
+  if (!REG_P (addr))
+addr = force_reg (Pmode, addr);
+
+  rtx len = force_reg (DImode, gen_int_mode (length, DImode));
+  if (is_store)
+return gen_stxvl (src, addr, len);
+  else
+return gen_lxvl (dest, addr, len);
+}
+
 /* Expand a block move operation, and return 1 if successful.  Return 0
if we should let the compiler generate normal code.
 
@@ -2750,18 +2776,56 @@ expand_block_move (rtx operands[], bool might_overlap)
   if (bytes > rs6000_block_move_inline_limit)
 return 0;
 
+  int orig_bytes = bytes;
   for (offset = 0; bytes > 0; offset += move_bytes, bytes -= move_bytes)
 {
   union {
-   rtx (*movmemsi) (rtx, rtx, rtx, rtx);
rtx (*mov) (rtx, rtx);
+   rtx (*movlen) (rtx, rtx, int);
   } gen_func;
   machine_mode mode = BLKmode;
   rtx src, dest;
-
-  /* Altivec first, since it will be faster than a string move
-when it applies, and usually not significantly larger.  */
-  if (TARGET_ALTIVEC && bytes >= 16 && align >= 128)
+  bool move_with_length = false;
+
+  /* Use POImode for paired vsx load/store.  Use V2DI for single
+unaligned vsx load/store, for consistency with what other
+expansions (compare) already do, and so we can use lxvd2x on
+p8.  Order is VSX pair unaligned, VSX unaligned, Altivec, VSX
+with length < 16 (if allowed), then gpr load/store.  */
+
+  if (TARGET_MMA && TARGET_BLOCK_OPS_UNALIGNED_VSX
+ && TARGET_BLOCK_OPS_VECTOR_PAIR
+ && bytes >= 32
+ && (align >= 256 || !STRICT_ALIGNMENT))
+   {
+ move_bytes = 32;
+ mode = POImode;
+ gen_func.mov = gen_movpoi;
+   }
+  else if (TARGET_POWERPC64 && TARGET_BLOCK_OPS_UNALIGNED_VSX
+  && VECTOR_MEM_VSX_P (V2DImode)
+  && bytes >= 16 && (align >= 128 || !STRICT_ALIGNMENT))
+   {
+ move_bytes = 16;
+ mode = V2DImode;
+ gen_func.mov = gen_vsx_movv2di_64bit;
+   }
+  else if (TARGET_BLOCK_OPS_UNALIGNED_VSX
+  && TARGET_POWER10 && bytes < 16
+  && orig_bytes > 16
+  && !(bytes == 1 || bytes == 2
+   || bytes == 4 || bytes == 8)
+  && (align >= 128 || !STRICT_ALIGNMENT))
+   {
+ /* Only use lxvl/stxvl if it could replace multiple ordinary
+loads+stores.  Also don't use it unless we likely already
+did one vsx copy so we aren't mixing gpr and vsx.  */
+ move_bytes = bytes;
+

Re: [RFC PATCH v1 1/1] PPC64: Implement POWER Architecture Vector Function ABI.

2020-08-18 Thread GT via Gcc-patches


‐‐‐ Original Message ‐‐‐
On Monday, August 17, 2020 5:28 PM, Segher Boessenkool 
 wrote:

> On Mon, Aug 17, 2020 at 05:44:46PM +, GT wrote:
>
> > > This is about the Power binding to some OpenMP API, right? It has
> > > nothing to do with "vector" or "ABI" -- we have vectors already, and
> > > we have ABIs already, more than enough of each.
> > > It is very very VERY hard to review this without being told the proper
> > > setting here.
> >
> > What this is about:
> > David Edelsohn wanted to have new library functions, one for each of these 
> > 6 single-precision functions:
> > sinf, cosf, sincosf, expf, logf, powf; and these 6 double-precision 
> > functions:
> > sin, cos, sincos, exp, log, and pow.
> > For the single-precision functions, the corresponding new functions would 
> > compute 4 results
> > simulatneously. For the double-precision functions, the new ones would 
> > compute 2 results
> > simultaneously.
> > x86_64 has already done something very similar so I thought I would adapt 
> > as much of their
> > documentation and implementation as I could for PPC64.
> > Let's start with that. Comments so far?
>
> That sounds like libmvec?
>
> I still don't know what this is.
>

Yes, it is libmvec.

Now look at what GCC does to the code in Examples 1 and 2 at this link:
https://sourceware.org/glibc/wiki/libmvec

x86_64 added functionality to GCC so such code uses the new functions without 
the user
having to re-write the loops and explicitly call the new functions.

We are aiming to provide that same capability for PPC64 in GCC.

Bert.


[committed] use byte_representation instead of string_constant (PR 96670)

2020-08-18 Thread Martin Sebor via Gcc-patches

The recent enhancement to memchr/memcmp folding introduced two bugs
(that I know of).  The attached patch fixes the one where a call to
the string_constant function that would previously be guaranteed to
succeed now fails as a result of the function only handling strings
and not other types.  The unexpected failure triggers an ICE down
the line.  I have committed the bootstrapped/regtested one-line
patch in r11-2742.

Martin
PR tree-optimization/96670 - ICE on memchr with an empty initializer

gcc/ChangeLog:

	PR tree-optimization/96670
	PR middle-end/78257
	* gimple-fold.c (gimple_fold_builtin_memchr): Call byte_representation
	to get it, not string_constant.

gcc/testsuite/ChangeLog:

	PR tree-optimization/96670
	* gcc.dg/memchr-2.c: New test.
	* gcc.dg/memcmp-6.c: New test.

diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index db56cb6aa47..dcc1b56a273 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -2670,7 +2670,7 @@ gimple_fold_builtin_memchr (gimple_stmt_iterator *gsi)
   if (r == NULL)
 	{
 	  tree mem_size, offset_node;
-	  string_constant (arg1, _node, _size, NULL);
+	  byte_representation (arg1, _node, _size, NULL);
 	  unsigned HOST_WIDE_INT offset = (offset_node == NULL_TREE)
 	  ? 0 : tree_to_uhwi (offset_node);
 	  /* MEM_SIZE is the size of the array the string literal
diff --git a/gcc/testsuite/gcc.dg/memchr-2.c b/gcc/testsuite/gcc.dg/memchr-2.c
new file mode 100644
index 000..61357f96d12
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/memchr-2.c
@@ -0,0 +1,41 @@
+/* PR tree-optimization/96670 - ICE on memchr with an empty initializer
+   { dg-do compile }
+   { dg-options "-O -Wall -fdump-tree-optimized" } */
+
+struct {
+  int i, j;
+} const s = { };
+
+void memchr_success_unused (void)
+{
+  int n = (char *) - (char *)
+  char *p = (char *)
+  __builtin_memchr (p, '\0', n);
+}
+
+void memchr_success_used (void)
+{
+  int n = (char *) - (char *)
+  char *p = (char *)
+  if ( != __builtin_memchr (p, '\0', n))
+__builtin_abort ();
+}
+
+void memchr_fail_unused (void)
+{
+  int n = (char *) - (char *)
+  char *p = (char *)
+  __builtin_memchr (p, '\5', n);
+}
+
+void memchr_fail_used (void)
+{
+  int n = (char *) - (char *)
+  char *p = (char *)
+  if (__builtin_memchr (p, '\5', n))
+__builtin_abort ();
+}
+
+/* { dg-prune-output "\\\[-Wunused-value" }
+   { dg-final { scan-tree-dump-not "abort" "optimized" } }
+   { dg-final { scan-tree-dump-not "memcmp \\(" "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/memcmp-6.c b/gcc/testsuite/gcc.dg/memcmp-6.c
new file mode 100644
index 000..d57352616cf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/memcmp-6.c
@@ -0,0 +1,47 @@
+/* PR tree-optimization/96670 - ICE on memchr with an empty initializer
+   { dg-do compile }
+   { dg-options "-O -Wall -fdump-tree-optimized" } */
+
+struct {
+  int i, j;
+} const s = { };
+
+const char a[sizeof s] = { };
+
+void memcmp_success_unused (void)
+{
+  int n = (char *) - (char *)
+  char *p = (char *)
+  __builtin_memcmp (p, a, n);
+  __builtin_memcmp (a, p, n);
+}
+
+void memcmp_success_used (void)
+{
+  int n = (char *) - (char *)
+  char *p = (char *)
+  if (__builtin_memcmp (p, a, n)
+  || __builtin_memcmp (a, p, n))
+__builtin_abort ();
+}
+
+void memcmp_fail_unused (void)
+{
+  int n = (char *) - (char *)
+  char *p = (char *)
+  __builtin_memcmp (p, a, n);
+  __builtin_memcmp (a, p, n);
+}
+
+void memcmp_fail_used (void)
+{
+  int n = (char *) - (char *)
+  char *p = (char *)
+  if (__builtin_memcmp (p, a, n)
+  || __builtin_memcmp (a, p, n))
+__builtin_abort ();
+}
+
+/* { dg-prune-output "\\\[-Wunused-value" }
+   { dg-final { scan-tree-dump-not "abort" "optimized" } }
+   { dg-final { scan-tree-dump-not "memcmp \\\(" "optimized" } } */


Have a look

2020-08-18 Thread Timothy Moss
Hi,



Would you like to check out the contacts of *Ericsson *users?



If you are interested please drop me a note so that we can connect and
discuss about the opportunity.



Thanks in advance!



Regards,

*Timothy Moss| **Manager Demand Generation|*



If you do not wish further mail please reply with “Leave Out” in subject
line


Re: [PATCH] rs6000: Rename instruction xvcvbf16sp to xvcvbf16spn

2020-08-18 Thread Segher Boessenkool
On Tue, Aug 18, 2020 at 01:30:53PM -0500, Peter Bergner wrote:
> The xvcvbf16sp mnemonic, which was just added in ISA 3.1 has been renamed
> to xvcvbf16spn, to make it consistent with the other non-signaling conversion
> instructions which all end with "n".  The only use of this instruction is in
> an MMA conversion built-in function, so there is little to no compatibility
> issue.
> 
> I just pushed the patch that does the rename to binutils today.
> 
> Ok for trunk and the GCC 10 branch after testing is clean?

Yes, okay everywhere.  Thanks!


Segher


> gcc/
>   * config/rs6000/rs6000-builtin.def (BU_VSX_1): Rename xvcvbf16sp to
>   xvcvbf16spn.
>   * config/rs6000/rs6000-call.c (builtin_function_type): Likewise.
>   * config/rs6000/vsx.md: Likewise.
>   * doc/extend.texi: Likewise.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/mma-builtin-3.c: Rename xvcvbf16sp to xvcvbf16spn.


[PATCH] rs6000: Rename instruction xvcvbf16sp to xvcvbf16spn

2020-08-18 Thread Peter Bergner via Gcc-patches
The xvcvbf16sp mnemonic, which was just added in ISA 3.1 has been renamed
to xvcvbf16spn, to make it consistent with the other non-signaling conversion
instructions which all end with "n".  The only use of this instruction is in
an MMA conversion built-in function, so there is little to no compatibility
issue.

I just pushed the patch that does the rename to binutils today.

Ok for trunk and the GCC 10 branch after testing is clean?

Peter


gcc/
* config/rs6000/rs6000-builtin.def (BU_VSX_1): Rename xvcvbf16sp to
xvcvbf16spn.
* config/rs6000/rs6000-call.c (builtin_function_type): Likewise.
* config/rs6000/vsx.md: Likewise.
* doc/extend.texi: Likewise.

gcc/testsuite/
* gcc.target/powerpc/mma-builtin-3.c: Rename xvcvbf16sp to xvcvbf16spn.

diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
index f9f0fece549..03c234ffa98 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -2998,7 +2998,7 @@ BU_SPECIAL_X (RS6000_BUILTIN_CFSTRING, 
"__builtin_cfstring", RS6000_BTM_ALWAYS,
  RS6000_BTC_MISC)
 
 /* POWER10 MMA builtins.  */
-BU_VSX_1 (XVCVBF16SP,  "xvcvbf16sp",   MISC, vsx_xvcvbf16sp)
+BU_VSX_1 (XVCVBF16SPN, "xvcvbf16spn",  MISC, vsx_xvcvbf16spn)
 BU_VSX_1 (XVCVSPBF16,  "xvcvspbf16",   MISC, vsx_xvcvspbf16)
 
 BU_MMA_1 (XXMFACC, "xxmfacc",  QUAD, mma_xxmfacc)
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index e39cfcf672b..3a23f1980ce 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -14037,7 +14037,7 @@ builtin_function_type (machine_mode mode_ret, 
machine_mode mode_arg0,
 case MISC_BUILTIN_CDTBCD:
 case MISC_BUILTIN_CBCDTD:
 case VSX_BUILTIN_XVCVSPBF16:
-case VSX_BUILTIN_XVCVBF16SP:
+case VSX_BUILTIN_XVCVBF16SPN:
 case P10_BUILTIN_MTVSRBM:
 case P10_BUILTIN_MTVSRHM:
 case P10_BUILTIN_MTVSRWM:
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index dd750210758..54da54c43dc 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -300,7 +300,7 @@
UNSPEC_VSX_DIVUD
UNSPEC_VSX_MULSD
UNSPEC_VSX_SIGN_EXTEND
-   UNSPEC_VSX_XVCVBF16SP
+   UNSPEC_VSX_XVCVBF16SPN
UNSPEC_VSX_XVCVSPBF16
UNSPEC_VSX_XVCVSPSXDS
UNSPEC_VSX_XVCVSPHP
@@ -364,10 +364,10 @@
   ])
 
 (define_int_iterator XVCVBF16  [UNSPEC_VSX_XVCVSPBF16
-UNSPEC_VSX_XVCVBF16SP])
+UNSPEC_VSX_XVCVBF16SPN])
 
 (define_int_attr xvcvbf16   [(UNSPEC_VSX_XVCVSPBF16 "xvcvspbf16")
-(UNSPEC_VSX_XVCVBF16SP "xvcvbf16sp")])
+(UNSPEC_VSX_XVCVBF16SPN "xvcvbf16spn")])
 
 ;; Like VI, defined in vector.md, but add ISA 2.07 integer vector ops
 (define_mode_iterator VI2 [V4SI V8HI V16QI V2DI])
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 79833171c5a..bcc251481ca 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21624,7 +21624,7 @@ void __builtin_mma_assemble_pair (__vector_pair *, 
vec_t, vec_t);
 void __builtin_mma_disassemble_pair (void *, __vector_pair *);
 
 vec_t __builtin_vsx_xvcvspbf16 (vec_t);
-vec_t __builtin_vsx_xvcvbf16sp (vec_t);
+vec_t __builtin_vsx_xvcvbf16spn (vec_t);
 @end smallexample
 
 @node RISC-V Built-in Functions
diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-3.c 
b/gcc/testsuite/gcc.target/powerpc/mma-builtin-3.c
index 29eb2754999..9bec78d333f 100644
--- a/gcc/testsuite/gcc.target/powerpc/mma-builtin-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-3.c
@@ -18,7 +18,7 @@ void
 foo1 (vec_t *vec)
 {
   vec[1] = __builtin_vsx_xvcvspbf16 (vec[0]);
-  vec[3] = __builtin_vsx_xvcvbf16sp (vec[2]);
+  vec[3] = __builtin_vsx_xvcvbf16spn (vec[2]);
 }
 
 /* { dg-final { scan-assembler-times {\mxxmtacc\M} 1 } } */
@@ -28,4 +28,4 @@ foo1 (vec_t *vec)
 /* { dg-final { scan-assembler-not {\mlxvp\M} } } */
 /* { dg-final { scan-assembler-not {\mstxvp\M} } } */
 /* { dg-final { scan-assembler-times {\mxvcvspbf16\M} 1 } } */
-/* { dg-final { scan-assembler-times {\mxvcvbf16sp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvcvbf16spn\M} 1 } } */


Re: [PATCH] libstdc++: testsuite: Address random failure in pthread_create() [PR54185]

2020-08-18 Thread Jonathan Wakely via Gcc-patches

On 18/08/20 11:20 -0400, Lewis Hyatt wrote:

On Tue, Aug 18, 2020 at 09:43:31AM +0100, Jonathan Wakely wrote:

On 13/08/20 18:15 -0400, Lewis Hyatt via Libstdc++ wrote:
> Hello-
>
> The attached patch was discussed briefly on PR 54185 here:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54185#c14
> The test case for this PR sometimes fails due to random failures in
> pthread_create() that are not related to the original PR. This patch fixes
> it up by ignoring those failures. The test case was designed to repeat the
> same test 1000 times to attempt to reproduce a race condition, so I think is
> OK if some of those iterations are simply skipped.
>
> Thanks for taking a look at it; I can commit it if it makes sense.
>
> -Lewis

> libstdc++: testsuite: Address random failure in pthread_create() [PR54185]
>
> The test for this PR calls pthread_create() many times in a row, which may 
fail
> with EAGAIN sometimes. Avoid generating a test failure in this case.
>
> libstdc++-v3/ChangeLog:
>
>PR libstdc++/54185
>* testsuite/30_threads/condition_variable/54185.cc: Make test robust
>to random pthread_create() failures.

Thanks for the patch. It certainly looks reasonable, but I wonder if
the attached version wouldn't be (very slightly) better. The
difference is that instead of just giving up at the first EAGAIN we
keep trying. This way we might be able to create a few more threads
before the loop finishes. If we still keep failing, it works the same.

I've also added a check that the failures are due to EAGAIN, and we'll
still terminate if there's some other problem. I'm assuming that your
failures are EAGAIN. Do you know why that's happening? Does your
system a low value for RLIMIT_NPROC or something?



Right, good point to check for EAGAIN. Yes, that's the error I get. I don't
understand why it happens. It's not related to libstdc++, I can reproduce it
with the below:

==
#include 
void* do_nothing (void*)
{
 return nullptr;
}
int main () {
 for (int i = 0; i != 1000; ++i)
   {
 for (int j = 0; j != 10; ++j)
{
  pthread_t thread;
  const int err = pthread_create (, nullptr, do_nothing, 
nullptr);
  if (err) return 1;
  pthread_join (thread, nullptr);
   }
   }
}
==

If I run this just once at a time, it never fails. But if I run it twice at
a time, it fails about 30% of the time, like:
root@host:/home/lewis# (./pthread_fail || echo ERR) & \
  (./pthread_fail || echo ERR) & wait
[1] 25041
[2] 25042
ERR
ERR

All the rlimits are infinite or as high as possible, but I dug around a bit
and it seems this is a systemd thing, this system had systemd-logind
disabled (perhaps not in the correct way) and something about the
configuration led to the issue. Enabling systemd-logind resolves it for
me. So perhaps this was mostly specific to me. Sorry if I wasted your
time... if you still think it's worth doing something here I am happy to
help.


I don't think it's a waste of time. Adding the 'notified' variable to
the test to prevent spurious wakeups is an improvement if nothing else.


FWIW, regarding your extension to the patch, in case there are some
legitimate thread creation problems, one thing to keep in mind is that the
retrying after failure makes certain things worse. For instance, (with my
system in the previous state), what would happen is the 54185.cc hit the
pthread_create failure, then prior to this patch it just bailed out. With
either of these patches it tries more times, which can worsen issues in
unrelated test cases running in parallel, that may see random failures in
their own forks or thread creations. This test case is trying hard to
reproduce the race condition by running 1000 iterations, which seems
worthwhile given it's still failing on some systems like AIX, but on the


On AIX it fails even with one iteration (not 1000) i.e. you simply
can't destroy a pthread_cond_t while there are threads still waiting
on it. We don't need 1000 iterations to hit that bug, it happens right
away.


other hand it's possible doing 50 instead of 1000 would work too, and be
less prone to unrelated resource issues.


Maybe. I'm a bit concerned that if the test started consistently
breaking out of the inner loop after one or two threads on everybody's
systems it would never actually try to delete a condition_variable
that is being waited on. And so it would never exercise the problem
case, and we'd never know. The test would PASS with no indication of
problems, but wouldn't actually test anything.

So I think I'd like it to keep trying to create threads. The meaning
of EAGAIN is "try again" after all :-)


Thanks for taking a look at this.




[PING][PATCH v2] libgcc: Use `-fasynchronous-unwind-tables' for LIB2_DIVMOD_FUNCS

2020-08-18 Thread Maciej W. Rozycki via Gcc-patches
On Thu, 6 Aug 2020, Maciej W. Rozycki wrote:

> Complement commit b932f770f70d ("x86_64 frame unwind info"), SVN r46374, 
> , and replace 
> `-fexceptions -fnon-call-exceptions' with `-fasynchronous-unwind-tables' 
> in LIB2_DIVMOD_FUNCS compilation flags so as to provide unwind tables 
> for the affected functions while not pulling the unwinder proper, which 
> is not required here.
> 
> Remove the ARM overrides accordingly, retaining the hook infrastructure 
> however, and make the ARM test case a generic one.
> 
> Beyond saving program space it fixes a RISC-V glibc build error due to 
> unsatisfied `malloc' and `free' references from the unwinder causing 
> link errors with `ld.so' where libgcc has been built at -O0.

 Ping for: 



  Maciej


Re: [PATCH] arm: Require MVE memory operand for destination of vst1q intrinsic

2020-08-18 Thread Ramana Radhakrishnan via Gcc-patches
On Thu, Aug 13, 2020 at 2:18 PM Joe Ramsay  wrote:
>
> From: Joe Ramsay 
>
> Hi,
>
> Previously, the machine description patterns for vst1q accepted a generic 
> memory
> operand for the destination, which could lead to an unrecognised builtin when
> expanding vst1q* intrinsics. This change fixes the patterns to only accept MVE
> memory operands.

This is OK though I suspect this needs a PR and a backport request for GCC 10.


regards
Ramana

>
> Thanks,
> Joe
>
> gcc/ChangeLog:
>
> 2020-08-13  Joe Ramsay 
>
> * config/arm/mve.md (mve_vst1q_f): Require MVE memory operand 
> for
> destination.
> (mve_vst1q_): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> 2020-08-13  Joe Ramsay 
>
> * gcc.target/arm/mve/intrinsics/vst1q_f16.c: Add test that only MVE
> memory operand is accepted.
> * gcc.target/arm/mve/intrinsics/vst1q_s16.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vst1q_s8.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vst1q_u16.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vst1q_u8.c: Likewise.
> ---
>  gcc/config/arm/mve.md   |  4 ++--
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c | 10 +++---
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c | 10 +++---
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c  | 10 +++---
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c | 10 +++---
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c  | 10 +++---
>  6 files changed, 37 insertions(+), 17 deletions(-)
>
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 9758862..465b39a 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -9330,7 +9330,7 @@
>[(set_attr "length" "4")])
>
>  (define_expand "mve_vst1q_f"
> -  [(match_operand: 0 "memory_operand")
> +  [(match_operand: 0 "mve_memory_operand")
> (unspec: [(match_operand:MVE_0 1 "s_register_operand")] VST1Q_F)
>]
>"TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT"
> @@ -9340,7 +9340,7 @@
>  })
>
>  (define_expand "mve_vst1q_"
> -  [(match_operand:MVE_2 0 "memory_operand")
> +  [(match_operand:MVE_2 0 "mve_memory_operand")
> (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand")] VST1Q)
>]
>"TARGET_HAVE_MVE"
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> index 363b4ca..312b746 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> @@ -10,12 +10,16 @@ foo (float16_t * addr, float16x8_t value)
>vst1q_f16 (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> -
>  void
>  foo1 (float16_t * addr, float16x8_t value)
>  {
>vst1q (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> +/* { dg-final { scan-assembler-times "vstrh.16" 2 }  } */
> +
> +void
> +foo2 (float16_t a, float16x8_t x)
> +{
> +  vst1q (, x);
> +}
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> index 37c4713..cd14e2c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> @@ -10,12 +10,16 @@ foo (int16_t * addr, int16x8_t value)
>vst1q_s16 (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> -
>  void
>  foo1 (int16_t * addr, int16x8_t value)
>  {
>vst1q (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> +/* { dg-final { scan-assembler-times "vstrh.16" 2 }  } */
> +
> +void
> +foo2 (int16_t a, int16x8_t x)
> +{
> +  vst1q (, x);
> +}
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
> index fe5edea..0004c80 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
> @@ -10,12 +10,16 @@ foo (int8_t * addr, int8x16_t value)
>vst1q_s8 (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrb.8"  }  } */
> -
>  void
>  foo1 (int8_t * addr, int8x16_t value)
>  {
>vst1q (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrb.8"  }  } */
> +/* { dg-final { scan-assembler-times "vstrb.8" 2 }  } */
> +
> +void
> +foo2 (int8_t a, int8x16_t x)
> +{
> +  vst1q (, x);
> +}
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
> index a4c8c1a..248e7ce 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
> @@ -10,12 +10,16 @@ foo (uint16_t * addr, uint16x8_t value)
>vst1q_u16 (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> -
>  void
>  foo1 (uint16_t * addr, uint16x8_t value)
>  {
>vst1q 

Re: [Patch] Fortran: Add 'device_type' clause to OpenMP's declare target

2020-08-18 Thread Andre Vehreschild
Hi Tobias,

I am not deep in OMP dev, i.e., not at all, but this does not make sense to me:

@@ -2397,6 +2401,22 @@ mio_symbol_attribute (symbol_attribute *attr)
  == OMP_REQ_ATOMIC_MEM_ORDER_RELAXED)
MIO_NAME (ab_attribute) (AB_OMP_REQ_MEM_ORDER_RELAXED, attr_bits);
}
+  switch (attr->omp_device_type)
+   {
+   case OMP_DEVICE_TYPE_UNSET:
+ break;
+   case OMP_DEVICE_TYPE_HOST:
+ MIO_NAME (ab_attribute) (AB_OMP_DEVICE_TYPE_NOHOST, attr_bits);

^
Why also NOHOST here? If this intentional please comment.

+ break;
+   case OMP_DEVICE_TYPE_NOHOST:
+ MIO_NAME (ab_attribute) (AB_OMP_DEVICE_TYPE_NOHOST, attr_bits);
+ break;


diff --git a/gcc/fortran/trans-common.c b/gcc/fortran/trans-common.c
index c6383fc2352..1be5e51b67d 100644
--- a/gcc/fortran/trans-common.c
+++ b/gcc/fortran/trans-common.c
@@ -426,6 +426,8 @@ build_common_decl (gfc_common_head *com, tree union_type,
bool is_init) /* If there is no backend_decl for the common block, build it.  */
   if (decl == NULL_TREE)
 {
+  tree clauses = NULL_TREE;

Would you mind using "omp_clauses" or the like here?

The reminder looks good to my omp-unexperienced eye.

Regards,
Andre

On Fri, 7 Aug 2020 17:03:34 +0200
Tobias Burnus  wrote:

> This patch adds the device_type(any|nohost|host)
> clause for 'omp declare target' to Fortran.
> 
> In OpenMP 5.0, it has no effect on variables but
> only on procedures – in TR8 (and later), it also
> affects variables.
> 
> This patch adds this clause to either – except that
> the middle end does not seem to like 'target link'
> with that clause – for normal variables, common
> blocks are accepted. (In line with OpenMP 5, the
> middle end ignores the clause for variables.)
> 
> OK?
> 
> Tobias
> 
> -
> Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
> Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung,
> Alexander Walter


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


Re: [Patch, fortran, coarray] Fix obvious typo in co_broadcast's argument assembly

2020-08-18 Thread Andre Vehreschild
Hi Tobias,

On Tue, 18 Aug 2020 19:14:30 +0200
Tobias Burnus  wrote:

> On 8/18/20 7:04 PM, Andre Vehreschild wrote:
> 
> > attached patch fixes an obvious typo in the routine gathering arguments for
> > co_broadcast().  See pr94958 for a detailed analysis, please.  
> 
> LGTM – except that I do not like the ChangeLog entry.
> 
> It sounds like a mispelling in terms of a comment or
> error message. How about "Using the correct variable."
> – or something like that?

That's a good idea. Will use that.

> You could also consider adding a libcaf_single test case,
> given that you wrote one (see PR)...

Well, the test case in the PR does not test the issue, only with additional
modifications of trans-array one may see an impact in the pseudo code.
Alternatively one has to do a lot more of code generation aggregating the
results of the broadcasts of the different components.  Given this is not
defined in the standard, I am not sure what to do here. And therefore just
wanted to correct the "miss-assignment" allowing future correct code generation.

Regards,
Andre
> 
> Thanks for the patch!
> 
> Tobias
> 
> > gcc/fortran/ChangeLog:
> >
> > 2020-08-18  Andre Vehreschild
> >
> >   PR fortran/94958
> >   * trans-array.c (gfc_bcast_alloc_comp): Fix typo.
> >
> >
> > pr94958.patch
> >
> > diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
> > index 7a1b2fc74c9..73a45cd2dcf 100644
> > --- a/gcc/fortran/trans-array.c
> > +++ b/gcc/fortran/trans-array.c
> > @@ -9732,7 +9732,7 @@ gfc_bcast_alloc_comp (gfc_symbol *derived, gfc_expr
> > *expr, int rank, args.image_index = image_index;
> > args.stat = stat;
> > args.errmsg = errmsg;
> > -  args.errmsg = errmsg_len;
> > +  args.errmsg_len = errmsg_len;
> >
> > if (rank == 0)
> >   {  
> -
> Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
> Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung,
> Alexander Walter


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


Re: [Patch, fortran, coarray] Fix obvious typo in co_broadcast's argument assembly

2020-08-18 Thread Tobias Burnus

On 8/18/20 7:04 PM, Andre Vehreschild wrote:


attached patch fixes an obvious typo in the routine gathering arguments for
co_broadcast().  See pr94958 for a detailed analysis, please.


LGTM – except that I do not like the ChangeLog entry.

It sounds like a mispelling in terms of a comment or
error message. How about "Using the correct variable."
– or something like that?

You could also consider adding a libcaf_single test case,
given that you wrote one (see PR)...

Thanks for the patch!

Tobias


gcc/fortran/ChangeLog:

2020-08-18  Andre Vehreschild

  PR fortran/94958
  * trans-array.c (gfc_bcast_alloc_comp): Fix typo.


pr94958.patch

diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index 7a1b2fc74c9..73a45cd2dcf 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -9732,7 +9732,7 @@ gfc_bcast_alloc_comp (gfc_symbol *derived, gfc_expr 
*expr, int rank,
args.image_index = image_index;
args.stat = stat;
args.errmsg = errmsg;
-  args.errmsg = errmsg_len;
+  args.errmsg_len = errmsg_len;

if (rank == 0)
  {

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


Re: PING: Fwd: [PATCH 2/2] Decouple adjust_range_from_scev from vr_values and value_range_equiv.

2020-08-18 Thread Andrew MacLeod via Gcc-patches

On 8/18/20 12:38 PM, Aldy Hernandez wrote:

And here's the patch without the sanity check.

Aldy


That diff was difficult to read.. I had to apply the patch to really 
follow it :-P


Anyway, yeah, this looks better.  effectively, you have
  1) left the input range "vr" range merging in adjust-range_with_scev,
  2) adjusted for the fact that the code extracted into 
"bounds_of_var_in_loop" now returns a min/max properly set, which 
sometimes  includes  basic symbolic expressions which the ranger can 
simply invoke range-ops on.
  3) the only functional difference now is that we still fully call 
bounds_of_var_in_loop when "vr" is an anti-range whereas before we 
bailed early.  But you added comments that we may be able to utilize 
that under some circumstances


this is OK.





diff --git a/gcc/vr-values.c b/gcc/vr-values.c
index fe51a6faeb8..9b21441dff3 100644
--- a/gcc/vr-values.c
+++ b/gcc/vr-values.c
@@ -1006,7 +1006,7 @@ vr_values::extract_range_from_comparison 
(value_range_equiv *vr,
 overflow.  */
  
  static bool

-check_for_binary_op_overflow (vr_values *store,
+check_for_binary_op_overflow (range_query *store,
  enum tree_code subcode, tree type,
  tree op0, tree op1, bool *ovf)
  {
@@ -1736,42 +1736,40 @@ compare_range_with_value (enum tree_code comp, const 
value_range *vr,
  
gcc_unreachable ();

  }
-/* Given a range VR, a LOOP and a variable VAR, determine whether it
-   would be profitable to adjust VR using scalar evolution information
-   for VAR.  If so, update VR with the new limits.  */
  
-void

-vr_values::adjust_range_with_scev (value_range_equiv *vr, class loop *loop,
-  gimple *stmt, tree var)
+/* Given a VAR in STMT within LOOP, determine the bounds of the
+   variable and store it in MIN/MAX and return TRUE.  If no bounds
+   could be determined, return FALSE.  */
+
+bool
+bounds_of_var_in_loop (tree *min, tree *max, range_query *query,
+  class loop *loop, gimple *stmt, tree var)
  {
-  tree init, step, chrec, tmin, tmax, min, max, type, tem;
+  tree init, step, chrec, tmin, tmax, type = TREE_TYPE (var);
enum ev_direction dir;
  
-  /* TODO.  Don't adjust anti-ranges.  An anti-range may provide

- better opportunities than a regular range, but I'm not sure.  */
-  if (vr->kind () == VR_ANTI_RANGE)
-return;
-
chrec = instantiate_parameters (loop, analyze_scalar_evolution (loop, var));
  
/* Like in PR19590, scev can return a constant function.  */

if (is_gimple_min_invariant (chrec))
  {
-  vr->set (chrec);
-  return;
+  *min = *max = chrec;
+  return true;
  }
  
if (TREE_CODE (chrec) != POLYNOMIAL_CHREC)

-return;
+return false;
  
init = initial_condition_in_loop_num (chrec, loop->num);

-  tem = op_with_constant_singleton_value_range (init);
-  if (tem)
-init = tem;
step = evolution_part_in_loop_num (chrec, loop->num);
-  tem = op_with_constant_singleton_value_range (step);
-  if (tem)
-step = tem;
+
+  /* If INIT is an SSA with a singleton range, set INIT to said
+ singleton, otherwise leave INIT alone.  */
+  if (TREE_CODE (init) == SSA_NAME)
+query->get_value_range (init, stmt)->singleton_p ();
+  /* Likewise for step.  */
+  if (TREE_CODE (step) == SSA_NAME)
+query->get_value_range (step, stmt)->singleton_p ();
  
/* If STEP is symbolic, we can't know whether INIT will be the

   minimum or maximum value in the range.  Also, unless INIT is
@@ -1780,7 +1778,7 @@ vr_values::adjust_range_with_scev (value_range_equiv *vr, 
class loop *loop,
if (step == NULL_TREE
|| !is_gimple_min_invariant (step)
|| !valid_value_p (init))
-return;
+return false;
  
dir = scev_direction (chrec);

if (/* Do not adjust ranges if we do not know whether the iv increases
@@ -1789,9 +1787,8 @@ vr_values::adjust_range_with_scev (value_range_equiv *vr, 
class loop *loop,
/* ... or if it may wrap.  */
|| scev_probably_wraps_p (NULL_TREE, init, step, stmt,
get_chrec_loop (chrec), true))
-return;
+return false;
  
-  type = TREE_TYPE (var);

if (POINTER_TYPE_P (type) || !TYPE_MIN_VALUE (type))
  tmin = lower_bound_in_type (type, type);
else
@@ -1806,7 +1803,7 @@ vr_values::adjust_range_with_scev (value_range_equiv *vr, 
class loop *loop,
if (TREE_CODE (step) == INTEGER_CST
&& is_gimple_val (init)
&& (TREE_CODE (init) != SSA_NAME
- || get_value_range (init, stmt)->kind () == VR_RANGE))
+ || query->get_value_range (init, stmt)->kind () == VR_RANGE))
  {
widest_int nit;
  
@@ -1829,21 +1826,29 @@ vr_values::adjust_range_with_scev (value_range_equiv *vr, class loop *loop,

  && (sgn == UNSIGNED
  || wi::gts_p (wtmp, 0) == wi::gts_p (wi::to_wide (step), 0)))
{
- value_range_equiv maxvr;
-   

[Patch, fortran, coarray] Fix obvious typo in co_broadcast's argument assembly

2020-08-18 Thread Andre Vehreschild
Hi all,

attached patch fixes an obvious typo in the routine gathering arguments for
co_broadcast().  See pr94958 for a detailed analysis, please.

Regtests ok on FC31/x86_64. Will commit as obvious on Thursday, if no one
objects.

Regards,
Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de
gcc/fortran/ChangeLog:

2020-08-18  Andre Vehreschild  

PR fortran/94958
* trans-array.c (gfc_bcast_alloc_comp): Fix typo.

diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index 7a1b2fc74c9..73a45cd2dcf 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -9732,7 +9732,7 @@ gfc_bcast_alloc_comp (gfc_symbol *derived, gfc_expr *expr, int rank,
   args.image_index = image_index;
   args.stat = stat;
   args.errmsg = errmsg;
-  args.errmsg = errmsg_len;
+  args.errmsg_len = errmsg_len;

   if (rank == 0)
 {


Re: [PATCH] AArch64: Add if condition in aarch64_function_value [PR96479]

2020-08-18 Thread Richard Sandiford
qiaopeixin  writes:
> Hi Richard,
>
> Thanks for the review and explanation.
>
> The previous fix adding if condition of TARGET_FLOAT does crash glibc-2.29.
>
> I checked the past log of writing the function aarch64_init_cumulative_args, 
> and did not find the reason why Alan Lawrence added TREE_PUBLIC (fndecl) as 
> one condition for entering the function type check. Maybe Alan could clarify? 
> I tried to delete TREE_PUBLIC (fndecl), which turns out could solve both the 
> glibc problem and the previous ICE problem. A new fix is made as following, 
> passed bootstrap and deja test. I believe this fix is reasonable, since the 
> function type should be checked no matter if it has external linkage or not.
>
> The function aarch64_init_cumulative_args checks the function types and 
> should catch the error that "-mgeneral-regs-only" is incompatible with the 
> use of SIMD/FP registers. In the test case on PR96479, the function myfunc2 
> returns one vector of 4 integers, while it is defined static type. 
> TREE_PUBLIC (fndecl) is set as false and it prevents from entering if 
> statement and checking function types. I delete "TREE_PUBLIC (fndecl)" so 
> that gcc can catch the error through the function 
> aarch64_init_cumulative_args now. The ICE on PR96479 can report the 
> diagnostic error with this fix. The patch for the fix is attached as 
> following:
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index b7f5bc76f1b..9ce83dce131 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -6017,7 +6017,7 @@ aarch64_init_cumulative_args (CUMULATIVE_ARGS *pcum,
>  
>if (!silent_p
>&& !TARGET_FLOAT
> -  && fndecl && TREE_PUBLIC (fndecl)
> +  && fndecl
>&& fntype && fntype != error_mark_node)
>  {
>const_tree type = TREE_TYPE (fntype);

I think the fndecl test is problematic too though.  E.g. consider:

typedef int v4si __attribute__((vector_size(16)));
v4si (*foo) ();
void f (v4si *ptr) { *ptr = foo (); }

which ICEs for me even with the above.

I suggest we just remove the line and see whether anything breaks.

Thanks,
Richard


Re: [PATCH] doc: add return type for functions in gimple.texi

2020-08-18 Thread Richard Sandiford
Hu Jiangping  writes:
> This patch add the return type for some functions in gimple.texi,
> to make the context unified. OK for trunk?

LGTM, thanks.  Pushed to master.

Richard

>
> Tested on x86_64.
>
> Regards!
> Hujp
>
> ---
>  gcc/doc/gimple.texi | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi
> index 5e0fc2e0dc5..f01c3083835 100644
> --- a/gcc/doc/gimple.texi
> +++ b/gcc/doc/gimple.texi
> @@ -831,17 +831,17 @@ expression to a variable.
>  Return true if g is any of the OpenMP codes.
>  @end deftypefn
>  
> -@deftypefn {GIMPLE function} gimple_debug_begin_stmt_p (gimple g)
> +@deftypefn {GIMPLE function} bool gimple_debug_begin_stmt_p (gimple g)
>  Return true if g is a @code{GIMPLE_DEBUG} that marks the beginning of
>  a source statement.
>  @end deftypefn
>  
> -@deftypefn {GIMPLE function} gimple_debug_inline_entry_p (gimple g)
> +@deftypefn {GIMPLE function} bool gimple_debug_inline_entry_p (gimple g)
>  Return true if g is a @code{GIMPLE_DEBUG} that marks the entry
>  point of an inlined function.
>  @end deftypefn
>  
> -@deftypefn {GIMPLE function} gimple_debug_nonbind_marker_p (gimple g)
> +@deftypefn {GIMPLE function} bool gimple_debug_nonbind_marker_p (gimple g)
>  Return true if g is a @code{GIMPLE_DEBUG} that marks a program location,
>  without any variable binding.
>  @end deftypefn


Re: PING: Fwd: [PATCH 2/2] Decouple adjust_range_from_scev from vr_values and value_range_equiv.

2020-08-18 Thread Aldy Hernandez via Gcc-patches

And here's the patch without the sanity check.

Aldy
diff --git a/gcc/vr-values.c b/gcc/vr-values.c
index fe51a6faeb8..9b21441dff3 100644
--- a/gcc/vr-values.c
+++ b/gcc/vr-values.c
@@ -1006,7 +1006,7 @@ vr_values::extract_range_from_comparison (value_range_equiv *vr,
overflow.  */
 
 static bool
-check_for_binary_op_overflow (vr_values *store,
+check_for_binary_op_overflow (range_query *store,
 			  enum tree_code subcode, tree type,
 			  tree op0, tree op1, bool *ovf)
 {
@@ -1736,42 +1736,40 @@ compare_range_with_value (enum tree_code comp, const value_range *vr,
 
   gcc_unreachable ();
 }
-/* Given a range VR, a LOOP and a variable VAR, determine whether it
-   would be profitable to adjust VR using scalar evolution information
-   for VAR.  If so, update VR with the new limits.  */
 
-void
-vr_values::adjust_range_with_scev (value_range_equiv *vr, class loop *loop,
-   gimple *stmt, tree var)
+/* Given a VAR in STMT within LOOP, determine the bounds of the
+   variable and store it in MIN/MAX and return TRUE.  If no bounds
+   could be determined, return FALSE.  */
+
+bool
+bounds_of_var_in_loop (tree *min, tree *max, range_query *query,
+		   class loop *loop, gimple *stmt, tree var)
 {
-  tree init, step, chrec, tmin, tmax, min, max, type, tem;
+  tree init, step, chrec, tmin, tmax, type = TREE_TYPE (var);
   enum ev_direction dir;
 
-  /* TODO.  Don't adjust anti-ranges.  An anti-range may provide
- better opportunities than a regular range, but I'm not sure.  */
-  if (vr->kind () == VR_ANTI_RANGE)
-return;
-
   chrec = instantiate_parameters (loop, analyze_scalar_evolution (loop, var));
 
   /* Like in PR19590, scev can return a constant function.  */
   if (is_gimple_min_invariant (chrec))
 {
-  vr->set (chrec);
-  return;
+  *min = *max = chrec;
+  return true;
 }
 
   if (TREE_CODE (chrec) != POLYNOMIAL_CHREC)
-return;
+return false;
 
   init = initial_condition_in_loop_num (chrec, loop->num);
-  tem = op_with_constant_singleton_value_range (init);
-  if (tem)
-init = tem;
   step = evolution_part_in_loop_num (chrec, loop->num);
-  tem = op_with_constant_singleton_value_range (step);
-  if (tem)
-step = tem;
+
+  /* If INIT is an SSA with a singleton range, set INIT to said
+ singleton, otherwise leave INIT alone.  */
+  if (TREE_CODE (init) == SSA_NAME)
+query->get_value_range (init, stmt)->singleton_p ();
+  /* Likewise for step.  */
+  if (TREE_CODE (step) == SSA_NAME)
+query->get_value_range (step, stmt)->singleton_p ();
 
   /* If STEP is symbolic, we can't know whether INIT will be the
  minimum or maximum value in the range.  Also, unless INIT is
@@ -1780,7 +1778,7 @@ vr_values::adjust_range_with_scev (value_range_equiv *vr, class loop *loop,
   if (step == NULL_TREE
   || !is_gimple_min_invariant (step)
   || !valid_value_p (init))
-return;
+return false;
 
   dir = scev_direction (chrec);
   if (/* Do not adjust ranges if we do not know whether the iv increases
@@ -1789,9 +1787,8 @@ vr_values::adjust_range_with_scev (value_range_equiv *vr, class loop *loop,
   /* ... or if it may wrap.  */
   || scev_probably_wraps_p (NULL_TREE, init, step, stmt,
 get_chrec_loop (chrec), true))
-return;
+return false;
 
-  type = TREE_TYPE (var);
   if (POINTER_TYPE_P (type) || !TYPE_MIN_VALUE (type))
 tmin = lower_bound_in_type (type, type);
   else
@@ -1806,7 +1803,7 @@ vr_values::adjust_range_with_scev (value_range_equiv *vr, class loop *loop,
   if (TREE_CODE (step) == INTEGER_CST
   && is_gimple_val (init)
   && (TREE_CODE (init) != SSA_NAME
-	  || get_value_range (init, stmt)->kind () == VR_RANGE))
+	  || query->get_value_range (init, stmt)->kind () == VR_RANGE))
 {
   widest_int nit;
 
@@ -1829,21 +1826,29 @@ vr_values::adjust_range_with_scev (value_range_equiv *vr, class loop *loop,
 	  && (sgn == UNSIGNED
 		  || wi::gts_p (wtmp, 0) == wi::gts_p (wi::to_wide (step), 0)))
 	{
-	  value_range_equiv maxvr;
-	  tem = wide_int_to_tree (TREE_TYPE (init), wtmp);
-	  extract_range_from_binary_expr (, PLUS_EXPR,
-	  TREE_TYPE (init), init, tem);
+	  value_range maxvr, vr0, vr1;
+	  if (TREE_CODE (init) == SSA_NAME)
+		vr0 = *(query->get_value_range (init, stmt));
+	  else if (is_gimple_min_invariant (init))
+		vr0.set (init);
+	  else
+		vr0.set_varying (TREE_TYPE (init));
+	  tree tem = wide_int_to_tree (TREE_TYPE (init), wtmp);
+	  vr1.set (tem, tem);
+	  range_fold_binary_expr (, PLUS_EXPR,
+  TREE_TYPE (init), , );
+
 	  /* Likewise if the addition did.  */
 	  if (maxvr.kind () == VR_RANGE)
 		{
 		  value_range initvr;
 
 		  if (TREE_CODE (init) == SSA_NAME)
-		initvr = *(get_value_range (init, stmt));
+		initvr = *(query->get_value_range (init, stmt));
 		  else if (is_gimple_min_invariant (init))
 		initvr.set (init);
 		  else
-		return;
+		return false;
 
 		 

Re: [PATCH] improve memcmp and memchr constant folding (PR 78257)

2020-08-18 Thread Martin Sebor via Gcc-patches

On 8/15/20 8:19 AM, Christophe Lyon wrote:

Hi Martin,


On Sat, 15 Aug 2020 at 01:14, Martin Sebor via Gcc-patches
 wrote:


On 8/13/20 11:44 AM, Martin Sebor wrote:

On 8/13/20 10:21 AM, Jeff Law wrote:

On Fri, 2020-07-31 at 17:55 -0600, Martin Sebor via Gcc-patches wrote:

The folders for these functions (and some others) call c_getsr
which relies on string_constant to return the representation of
constant strings.  Because the function doesn't handle constants
of other types, including aggregates, memcmp or memchr calls
involving those are not folded when they could be.

The attached patch extends the algorithm used by string_constant
to also handle constant aggregates involving elements or members
of the same types as native_encode_expr.  (The change restores
the empty initializer optimization inadvertently disabled in
the fix for pr96058.)

To avoid accidentally misusing either string_constant or c_getstr
with non-strings I have introduced a pair of new functions to get
the representation of those: byte_representation and getbyterep.

Tested on x86_64-linux.

Martin



PR tree-optimization/78257 - missing memcmp optimization with
constant arrays

gcc/ChangeLog:

 PR middle-end/78257
 * builtins.c (expand_builtin_memory_copy_args): Rename called
function.
 (expand_builtin_stpcpy_1): Remove argument from call.
 (expand_builtin_memcmp): Rename called function.
 (inline_expand_builtin_bytecmp): Same.
 * expr.c (convert_to_bytes): New function.
 (constant_byte_string): New function (formerly string_constant).
 (string_constant): Call constant_byte_string.
 (byte_representation): New function.
 * expr.h (byte_representation): Declare.
 * fold-const-call.c (fold_const_call): Rename called function.
 * fold-const.c (c_getstr): Remove an argument.
 (getbyterep): Define a new function.
 * fold-const.h (c_getstr): Remove an argument.
 (getbyterep): Declare a new function.
 * gimple-fold.c (gimple_fold_builtin_memory_op): Rename callee.
 (gimple_fold_builtin_string_compare): Same.
 (gimple_fold_builtin_memchr): Same.

gcc/testsuite/ChangeLog:

 PR middle-end/78257
 * gcc.dg/memchr.c: New test.
 * gcc.dg/memcmp-2.c: New test.
 * gcc.dg/memcmp-3.c: New test.
 * gcc.dg/memcmp-4.c: New test.

diff --git a/gcc/expr.c b/gcc/expr.c
index a150fa0d3b5..a124df54655 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -11594,15 +11594,103 @@ is_aligning_offset (const_tree offset,
const_tree exp)
 /* This must now be the address of EXP.  */
 return TREE_CODE (offset) == ADDR_EXPR && TREE_OPERAND (offset,
0) == exp;
   }
-
-/* Return the tree node if an ARG corresponds to a string constant
or zero
-   if it doesn't.  If we return nonzero, set *PTR_OFFSET to the
(possibly
-   non-constant) offset in bytes within the string that ARG is
accessing.
-   If MEM_SIZE is non-zero the storage size of the memory is returned.
-   If DECL is non-zero the constant declaration is returned if
available.  */
-tree
-string_constant (tree arg, tree *ptr_offset, tree *mem_size, tree
*decl)
+/* If EXPR is a constant initializer (either an expression or
CONSTRUCTOR),
+   attempt to obtain its native representation as an array of
nonzero BYTES.
+   Return true on success and false on failure (the latter without
modifying
+   BYTES).  */
+
+static bool
+convert_to_bytes (tree type, tree expr, vec *bytes)
+{
+  if (TREE_CODE (expr) == CONSTRUCTOR)
+{
+  /* Set to the size of the CONSTRUCTOR elements.  */
+  unsigned HOST_WIDE_INT ctor_size = bytes->length ();
+
+  if (TREE_CODE (type) == ARRAY_TYPE)
+{
+  tree val, idx;
+  tree eltype = TREE_TYPE (type);
+  unsigned HOST_WIDE_INT elsize =
+tree_to_uhwi (TYPE_SIZE_UNIT (eltype));
+  unsigned HOST_WIDE_INT i, last_idx = HOST_WIDE_INT_M1U;
+  FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (expr), i, idx, val)
+{
+  /* Append zeros for elements with no initializers.  */
+  if (!tree_fits_uhwi_p (idx))
+return false;
+  unsigned HOST_WIDE_INT cur_idx = tree_to_uhwi (idx);
+  if (unsigned HOST_WIDE_INT size = cur_idx - (last_idx + 1))
+{
+  size = size * elsize + bytes->length ();
+  bytes->safe_grow_cleared (size);

^^^


+}
+
+  if (!convert_to_bytes (eltype, val, bytes))
+return false;
+
+  last_idx = cur_idx;
+}
+}
+  else if (TREE_CODE (type) == RECORD_TYPE)
+{
+  tree val, fld;
+  unsigned HOST_WIDE_INT i;
+  FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (expr), i, fld, val)
+{
+  /* Append zeros for members with no initializers and
+ any padding.  */
+  unsigned HOST_WIDE_INT cur_off = int_byte_position (fld);
+  if (bytes->length () < cur_off)
+bytes->safe_grow_cleared (cur_off);

  ^^

+

Re: PING: Fwd: [PATCH 2/2] Decouple adjust_range_from_scev from vr_values and value_range_equiv.

2020-08-18 Thread Aldy Hernandez via Gcc-patches



On 8/17/20 5:59 PM, Andrew MacLeod wrote:

On 8/17/20 6:04 AM, Aldy Hernandez wrote:



On 8/14/20 7:16 PM, Andrew MacLeod wrote:

On 8/14/20 12:05 PM, Aldy Hernandez wrote:

I made some minor changes to the function comments.

gcc/ChangeLog:

* vr-values.c (check_for_binary_op_overflow): Change type of store
to range_query.
(vr_values::adjust_range_with_scev): Abstract most of the code...
(range_of_var_in_loop): ...here.  Remove value_range_equiv uses.
(simplify_using_ranges::simplify_using_ranges): Change type of 
store

to range_query.
* vr-values.h (class range_query): New.
(class simplify_using_ranges): Use range_query.
(class vr_values): Add OVERRIDE to get_value_range.
(range_of_var_in_loop): New.
---
 gcc/vr-values.c | 150 ++--
 gcc/vr-values.h |  23 ++--
 2 files changed, 88 insertions(+), 85 deletions(-)

diff --git a/gcc/vr-values.c b/gcc/vr-values.c
index 9002d87c14b..5b7bae3bfb7 100644
--- a/gcc/vr-values.c
+++ b/gcc/vr-values.c
@@ -1004,7 +1004,7 @@ vr_values::extract_range_from_comparison 
(value_range_equiv *vr,

    overflow.  */

 static bool
-check_for_binary_op_overflow (vr_values *store,
+check_for_binary_op_overflow (range_query *store,
   enum tree_code subcode, tree type,
   tree op0, tree op1, bool *ovf)
 {
@@ -1737,22 +1737,18 @@ compare_range_with_value (enum tree_code 
comp, const value_range *vr,


   gcc_unreachable ();
 }
-/* Given a range VR, a LOOP and a variable VAR, determine whether it
-   would be profitable to adjust VR using scalar evolution information
-   for VAR.  If so, update VR with the new limits.  */
+
+/* Given a VAR in STMT within LOOP, determine the range of the
+   variable and store it in VR.  If no range can be determined, the
+   resulting range will be set to VARYING.  */

 void
-vr_values::adjust_range_with_scev (value_range_equiv *vr, class 
loop *loop,

-   gimple *stmt, tree var)
+range_of_var_in_loop (irange *vr, range_query *query,
+  class loop *loop, gimple *stmt, tree var)
 {
-  tree init, step, chrec, tmin, tmax, min, max, type, tem;
+  tree init, step, chrec, tmin, tmax, min, max, type;
   enum ev_direction dir;

-  /* TODO.  Don't adjust anti-ranges.  An anti-range may provide
- better opportunities than a regular range, but I'm not sure.  */
-  if (vr->kind () == VR_ANTI_RANGE)
-    return;
-


IIUC, you've switched to using the new API, so the bounds calls will 
basically turn and ANTI range into a varying , making [lbound,ubound] 
will be [MIN, MAX] ?
so its effectively a no-op, except we will not punt on getting a 
range when VR is an anti range anymore.. so that goodness...


Yes.




chrec = instantiate_parameters (loop, analyze_scalar_evolution 
(loop, var));


   /* Like in PR19590, scev can return a constant function. */
@@ -1763,16 +1759,17 @@ vr_values::adjust_range_with_scev 
(value_range_equiv *vr, class loop *loop,

 }

   if (TREE_CODE (chrec) != POLYNOMIAL_CHREC)
-    return;
+    {
+  vr->set_varying (TREE_TYPE (var));
+  return;
+    }


Im seeing a lot of this pattern...
Maybe we should set vr to varying upon entry to the function as the 
default return value.. then we can just return like it did before in 
all those places.


Better yet, since this routine doesn't "update" anymore and simply 
returns a range, maybe it could instead return a boolean if it finds 
a range rather than the current behaviour...

then those simply become

+    return false;

We won't have to intersect at the caller if we don't need to, and its 
useful information at other points to know a range was calculated 
without having to see if varying_p () came back from the call.

ie, we'd the usage pattern would then be

value_range_equiv r;
if (range_of_var_in_loop (, this, loop, stmt, var))
    vr->intersect ();

This is the pattern we use throughout the ranger.


Done.






   init = initial_condition_in_loop_num (chrec, loop->num);
-  tem = op_with_constant_singleton_value_range (init);
-  if (tem)
-    init = tem;
+  if (TREE_CODE (init) == SSA_NAME)
+    query->get_value_range (init, stmt)->singleton_p ();
   step = evolution_part_in_loop_num (chrec, loop->num);
-  tem = op_with_constant_singleton_value_range (step);
-  if (tem)
-    step = tem;
+  if (TREE_CODE (step) == SSA_NAME)
+    query->get_value_range (step, stmt)->singleton_p ();


If I read this correctly, we get values for init and step... and if 
they are SSA_NAMES, then we query ranges, otherwise use what we got 
back.. So that would seem to be the same behaviour as before then..

Perhaps a comment is warranted? I had to read it a few times :-)


Indeed.  I am trying to do too much in one line.  I've added a comment.






   /* If STEP is symbolic, we can't know whether INIT will be the
  minimum or maximum value in the range.  Also, unless INIT is
@@ -1781,7 +1778,10 @@ vr_values::adjust_range_with_scev 

Re: [PATCH] arm: Add +nomve and +nomve.fp options to -mcpu=cortex-m55

2020-08-18 Thread Richard Earnshaw
On 10/08/2020 15:21, Joe Ramsay wrote:
> From: Joe Ramsay 
> 
> Hi,
> 
> This patch rearranges feature bits for MVE and FP to implement the
> following flags for -mcpu=cortex-m55.
> 
>   - +nomve:equivalent to armv8.1-m.main+fp.dp+dsp.
>   - +nomve.fp: equivalent to armv8.1-m.main+mve+fp.dp (+dsp is implied by 
> +mve).
>   - +nofp: equivalent to armv8.1-m.main+mve (+dsp is implied by +mve).
>   - +nodsp:equivalent to armv8.1-m.main+fp.dp.
> 
> Combinations of the above:
> 
>   - +nomve+nofp: equivalent to armv8.1-m.main+dsp.
>   - +nodsp+nofp: equivalent to armv8.1-m.main.
> 
> Due to MVE and FP sharing vfp_base, some new syntax was required in the CPU
> description to implement the concept of 'implied bits'. These are non-named
> features added to the ISA late, depending on whether one or more features 
> which
> depend on them are present. This means vfp_base can be present when only one 
> of
> MVE and FP is removed, but absent when both are removed.
> 
> Bootstrapped and tested on arm-none-eabi. OK for master?

This all looks OK.  But we need some additional tests in the multilib
selection tests (testsuite/gcc.target/arm/multilib.exp) to make sure
that the correct libraries are being picked when the new functionality
is being required.

R.

> 
> Thanks all!
> Joe
> 
> gcc/ChangeLog:
> 
> 2020-07-31  Joe Ramsay  
> 
>   * config/arm/arm-cpus.in:
>   (ALL_FPU_INTERNAL): Remove vfp_base.
>   (VFPv2): Remove vfp_base.
>   (MVE): Remove vfp_base.
>   (vfp_base): Redefine as implied bit dependent on MVE or FP
>   (cortex-m55): Add flags to disable MVE, MVE FP, FP and DSP extensions.
>   * config/arm/arm.c (arm_configure_build_target): Add implied bits to 
> ISA.
>   * config/arm/parsecpu.awk:
>   (gen_isa): Print implied bits and their dependencies to ISA header.
>   (gen_data): Add parsing for implied feature bits.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-07-31  Joe Ramsay  
> 
>   * gcc.target/arm/cortex-m55-nodsp-flag.c: New test.
>   * gcc.target/arm/cortex-m55-nodsp-nofp-flag.c: New test.
>   * gcc.target/arm/cortex-m55-nofp-flag.c: New test.
>   * gcc.target/arm/cortex-m55-nofp-nomve-flag.c: New test.
>   * gcc.target/arm/cortex-m55-nomve-flag.c: New test.
>   * gcc.target/arm/cortex-m55-nomve.fp-flag.c: New test.
> ---
>  gcc/config/arm/arm-cpus.in | 26 ---
>  gcc/config/arm/arm.c   | 14 ++
>  gcc/config/arm/parsecpu.awk| 51 
> ++
>  .../gcc.target/arm/cortex-m55-nodsp-flag-hard.c| 15 +++
>  .../gcc.target/arm/cortex-m55-nodsp-flag-softfp.c  | 15 +++
>  .../arm/cortex-m55-nodsp-nofp-flag-softfp.c| 15 +++
>  .../gcc.target/arm/cortex-m55-nofp-flag-hard.c | 15 +++
>  .../gcc.target/arm/cortex-m55-nofp-flag-softfp.c   | 15 +++
>  .../arm/cortex-m55-nofp-nomve-flag-softfp.c| 15 +++
>  .../gcc.target/arm/cortex-m55-nomve-flag-hard.c| 15 +++
>  .../gcc.target/arm/cortex-m55-nomve-flag-softfp.c  | 15 +++
>  .../gcc.target/arm/cortex-m55-nomve.fp-flag-hard.c | 15 +++
>  .../arm/cortex-m55-nomve.fp-flag-softfp.c  | 15 +++
>  13 files changed, 234 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/cortex-m55-nodsp-flag-hard.c
>  create mode 100644 
> gcc/testsuite/gcc.target/arm/cortex-m55-nodsp-flag-softfp.c
>  create mode 100644 
> gcc/testsuite/gcc.target/arm/cortex-m55-nodsp-nofp-flag-softfp.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/cortex-m55-nofp-flag-hard.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/cortex-m55-nofp-flag-softfp.c
>  create mode 100644 
> gcc/testsuite/gcc.target/arm/cortex-m55-nofp-nomve-flag-softfp.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/cortex-m55-nomve-flag-hard.c
>  create mode 100644 
> gcc/testsuite/gcc.target/arm/cortex-m55-nomve-flag-softfp.c
>  create mode 100644 
> gcc/testsuite/gcc.target/arm/cortex-m55-nomve.fp-flag-hard.c
>  create mode 100644 
> gcc/testsuite/gcc.target/arm/cortex-m55-nomve.fp-flag-softfp.c
> 
> diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
> index c98f8ed..5083028 100644
> --- a/gcc/config/arm/arm-cpus.in
> +++ b/gcc/config/arm/arm-cpus.in
> @@ -135,10 +135,6 @@ define feature armv8_1m_main
>  # Floating point and Neon extensions.
>  # VFPv1 is not supported in GCC.
>  
> -# This feature bit is enabled for all VFP, MVE and
> -# MVE with floating point extensions.
> -define feature vfp_base
> -
>  # Vector floating point v2.
>  define feature vfpv2
>  
> @@ -251,7 +247,7 @@ define fgroup ALL_SIMDALL_SIMD_INTERNAL 
> ALL_SIMD_EXTERNAL
>  
>  # List of all FPU bits to strip out if -mfpu is used to override the
>  # default.  fp16 is deliberately missing from this list.
> -define fgroup ALL_FPU_INTERNAL   vfp_base vfpv2 vfpv3 vfpv4 fpv5 
> fp16conv fp_dbl ALL_SIMD_INTERNAL
> +define fgroup ALL_FPU_INTERNAL   

[committed] update native_encode_expr description

2020-08-18 Thread Martin Sebor via Gcc-patches

I committed the change below updating and completing the description
of the native_encode_expr function.

Martin

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 9fc4c2a06fb..7c4d1eff215 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -7875,11 +7875,12 @@ native_encode_string (const_tree expr, unsigned 
char *ptr, int len, int off)

 }


-/* Subroutine of fold_view_convert_expr.  Encode the INTEGER_CST,
-   REAL_CST, COMPLEX_CST or VECTOR_CST specified by EXPR into the
-   buffer PTR of length LEN bytes.  If PTR is NULL, don't actually store
-   anything, just do a dry run.  If OFF is not -1 then start
-   the encoding at byte offset OFF and encode at most LEN bytes.
+/* Subroutine of fold_view_convert_expr.  Encode the INTEGER_CST, REAL_CST,
+   FIXED_CST, COMPLEX_CST, STRING_CST, or VECTOR_CST specified by EXPR into
+   the buffer PTR of size LEN bytes.  If PTR is NULL, don't actually store
+   anything, just do a dry run.  Fail either if OFF is -1 and LEN isn't
+   sufficient to encode the entire EXPR, or if OFF is out of bounds.
+   Otherwise, start at byte offset OFF and encode at most LEN bytes.
Return the number of bytes placed in the buffer, or zero upon 
failure.  */


 int


Re: [PATCH][GCC-10 Backport] arm: Enable no-writeback vldr.16/vstr.16.

2020-08-18 Thread Richard Earnshaw
On 05/08/2020 10:28, Joe Ramsay wrote:
> From: Joe Ramsay 
> 
> Hi,
> 
> There was previously no way to specify that a register operand cannot
> have any writeback modifiers, and as a result the argument to vldr.16
> and vstr.16 could be erroneously output with post-increment. This
> change adds a constraint which forbids all writeback, and
> selects it in the relevant case for vldr.16 and vstr.16
> 
> Bootstrapped on arm-none-eabi. Patch backports cleanly onto gcc-10
> branch with no regressions. OK for gcc-10 branch?
> 

OK.

R.

> Thanks,
> Joe
> 
> gcc/ChangeLog:
> 
> 2020-08-04  Joe Ramsay  
> 
>   Backported from master
>   2020-05-20  Joe Ramsay  
> 
>   * config/arm/arm-protos.h (arm_coproc_mem_operand_no_writeback):
>   Declare prototype.
>   (arm_mve_mode_and_operands_type_check): Declare prototype.
>   * config/arm/arm.c (arm_coproc_mem_operand): Refactor to use
>   _arm_coproc_mem_operand.
>   (arm_coproc_mem_operand_wb): New function to cover full, limited
>   and no writeback.
>   (arm_coproc_mem_operand_no_writeback): New constraint for memory
>   operand with no writeback.
>   (arm_print_operand): Extend 'E' specifier for memory operand
>   that does not support writeback.
>   (arm_mve_mode_and_operands_type_check): New constraint check for
>   MVE memory operands.
>   * config/arm/constraints.md: Add Uj constraint for VFP vldr.16
>   and vstr.16.
>   * config/arm/vfp.md (*mov_load_vfp_hf16): New pattern for
>   vldr.16.
>   (*mov_store_vfp_hf16): New pattern for vstr.16.
>   (*mov_vfp_16): Remove MVE moves.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-08-04  Joe Ramsay  
> 
>   Backported from master
>   2020-05-20  Joe Ramsay  
> 
>   * gcc.target/arm/mve/intrinsics/mve-vldstr16-no-writeback.c: New test.
> ---
>  gcc/config/arm/arm-protos.h|  3 +
>  gcc/config/arm/arm.c   | 74 
> ++
>  gcc/config/arm/constraints.md  |  7 ++
>  gcc/config/arm/vfp.md  | 26 +---
>  .../arm/mve/intrinsics/mve-vldstr16-no-writeback.c | 17 +
>  5 files changed, 105 insertions(+), 22 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/arm/mve/intrinsics/mve-vldstr16-no-writeback.c
> 
> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
> index 33d162c..e811da4 100644
> --- a/gcc/config/arm/arm-protos.h
> +++ b/gcc/config/arm/arm-protos.h
> @@ -115,8 +115,11 @@ extern enum reg_class coproc_secondary_reload_class 
> (machine_mode, rtx,
>  extern bool arm_tls_referenced_p (rtx);
>  
>  extern int arm_coproc_mem_operand (rtx, bool);
> +extern int arm_coproc_mem_operand_no_writeback (rtx);
> +extern int arm_coproc_mem_operand_wb (rtx, int);
>  extern int neon_vector_mem_operand (rtx, int, bool);
>  extern int mve_vector_mem_operand (machine_mode, rtx, bool);
> +bool arm_mve_mode_and_operands_type_check (machine_mode, rtx, rtx);
>  extern int neon_struct_mem_operand (rtx);
>  
>  extern rtx *neon_vcmla_lane_prepare_operands (rtx *);
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index a8825ee..d8da167 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -13192,13 +13192,14 @@ neon_element_bits (machine_mode mode)
>  /* Predicates for `match_operand' and `match_operator'.  */
>  
>  /* Return TRUE if OP is a valid coprocessor memory address pattern.
> -   WB is true if full writeback address modes are allowed and is false
> +   WB level is 2 if full writeback address modes are allowed, 1
> if limited writeback address modes (POST_INC and PRE_DEC) are
> -   allowed.  */
> +   allowed and 0 if no writeback at all is supported.  */
>  
>  int
> -arm_coproc_mem_operand (rtx op, bool wb)
> +arm_coproc_mem_operand_wb (rtx op, int wb_level)
>  {
> +  gcc_assert (wb_level == 0 || wb_level == 1 || wb_level == 2);
>rtx ind;
>  
>/* Reject eliminable registers.  */
> @@ -13231,16 +13232,18 @@ arm_coproc_mem_operand (rtx op, bool wb)
>  
>/* Autoincremment addressing modes.  POST_INC and PRE_DEC are
>   acceptable in any case (subject to verification by
> - arm_address_register_rtx_p).  We need WB to be true to accept
> + arm_address_register_rtx_p).  We need full writeback to accept
> + PRE_INC and POST_DEC, and at least restricted writeback for
>   PRE_INC and POST_DEC.  */
> -  if (GET_CODE (ind) == POST_INC
> -  || GET_CODE (ind) == PRE_DEC
> -  || (wb
> -   && (GET_CODE (ind) == PRE_INC
> -   || GET_CODE (ind) == POST_DEC)))
> +  if (wb_level > 0
> +  && (GET_CODE (ind) == POST_INC
> +   || GET_CODE (ind) == PRE_DEC
> +   || (wb_level > 1
> +   && (GET_CODE (ind) == PRE_INC
> +   || GET_CODE (ind) == POST_DEC
>  return arm_address_register_rtx_p (XEXP (ind, 0), 0);
>  
> -  if (wb
> +  if (wb_level > 1
>&& (GET_CODE (ind) == POST_MODIFY || 

Re: [PATCH] middle-end: Fix PR middle-end/85811: Introduce tree_expr_maybe_nan_p et al.

2020-08-18 Thread Joseph Myers
On Mon, 17 Aug 2020, Segher Boessenkool wrote:

> Ah, so "When both arguments are NaNs, the return value should be a qNaN"
> means the QNaN corresponding to eother x or y.  I see, thanks!

Yes.  (The precise choice of NaN result given a NaN input is the subject 
of various "should"s, in 6.2.3 NaN propagation, with the choice between 
multiple input NaNs to provide the payload unspecified.  E.g. RISC-V 
doesn't propagate NaN payloads at all.  On x86, the rules for choosing a 
NaN result with more than one NaN input are different between x87 and SSE 
arithmetic.  The compiler can ignore these issues, as long as it gets the 
choice between quiet and signaling NaN correct.)

The IEEE 754-2008 min/max operations also do not specify the precise 
choice of result when the arguments are +0 and -0 in some order, or when 
they are equal decimal floating-point values with different quantum 
exponent.  The new operations in IEEE 754-2019 treat -0 as less than +0, 
but still leave the quantum exponent case unspecified.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [committed] i386: Rewrite restore_stack_nonlocal expander [PR96536].

2020-08-18 Thread Uros Bizjak via Gcc-patches
On Tue, Aug 18, 2020 at 5:42 PM Uros Bizjak  wrote:
>
> -fcf-protection code in restore_stack_nonlocal uses a branch based on
> a clobber result.  The patch adds missing compare and completely
> rewrites the expander to use high-level functions in RTL construction.

Backported gcc-10 version introduces only minimal change to the code.

2020-08-18  Uroš Bizjak  

gcc/ChangeLog:

PR target/96536
* config/i386/i386.md (restore_stack_nonlocal):
Add missing compare RTX.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 292de142e90..44dbb79d008 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -18696,8 +18696,11 @@
   emit_insn (tmp);
 
   /* Compare and jump over adjustment code.  */
-  noadj_label = gen_label_rtx ();
+  tmp = gen_rtx_COMPARE (CCZmode, reg_ssp, const0_rtx);
   flags = gen_rtx_REG (CCZmode, FLAGS_REG);
+  emit_insn (gen_rtx_SET (flags, tmp));
+
+  noadj_label = gen_label_rtx ();
   tmp = gen_rtx_EQ (VOIDmode, flags, const0_rtx);
   tmp = gen_rtx_IF_THEN_ELSE (VOIDmode, tmp,
  gen_rtx_LABEL_REF (VOIDmode, noadj_label),


[committed] i386: Rewrite restore_stack_nonlocal expander [PR96536].

2020-08-18 Thread Uros Bizjak via Gcc-patches
-fcf-protection code in restore_stack_nonlocal uses a branch based on
a clobber result.  The patch adds missing compare and completely
rewrites the expander to use high-level functions in RTL construction.

2020-08-18  Uroš Bizjak  

gcc/ChangeLog:

PR target/96536
* config/i386/i386.md (restore_stack_nonlocal): Add missing compare
RTX.  Rewrite expander to use high-level functions in RTL construction.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Also
bootstrapped and regtested by Hongtao on CET enabled target (thanks).

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 3985c771d00..05af8639dbc 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -19133,15 +19133,17 @@
   ""
 {
   rtx stack_slot;
-  if ((flag_cf_protection & CF_RETURN))
+
+  if (flag_cf_protection & CF_RETURN)
 {
-  /* Copy shadow stack pointer to the first slot and stack ppointer
-to the second slot.  */
+  /* Copy shadow stack pointer to the first slot
+and stack pointer to the second slot.  */
   rtx ssp_slot = adjust_address (operands[0], word_mode, 0);
   stack_slot = adjust_address (operands[0], Pmode, UNITS_PER_WORD);
-  rtx ssp = force_reg (word_mode, const0_rtx);
-  emit_insn (gen_rdssp (word_mode, ssp, ssp));
-  emit_move_insn (ssp_slot, ssp);
+
+  rtx reg_ssp = force_reg (word_mode, const0_rtx);
+  emit_insn (gen_rdssp (word_mode, reg_ssp, reg_ssp));
+  emit_move_insn (ssp_slot, reg_ssp);
 }
   else
 stack_slot = adjust_address (operands[0], Pmode, 0);
@@ -19155,95 +19157,64 @@
   ""
 {
   rtx stack_slot;
-  if ((flag_cf_protection & CF_RETURN))
+
+  if (flag_cf_protection & CF_RETURN)
 {
-  /* Restore shadow stack pointer from the first slot and stack
-pointer from the second slot.  */
+  /* Restore shadow stack pointer from the first slot
+and stack pointer from the second slot.  */
   rtx ssp_slot = adjust_address (operands[1], word_mode, 0);
   stack_slot = adjust_address (operands[1], Pmode, UNITS_PER_WORD);
 
-  rtx flags, jump, noadj_label, inc_label, loop_label;
-  rtx reg_adj, reg_ssp, tmp, clob;
-
   /* Get the current shadow stack pointer.  The code below will check if
 SHSTK feature is enabled.  If it is not enabled the RDSSP instruction
 is a NOP.  */
-  reg_ssp = force_reg (word_mode, const0_rtx);
+  rtx reg_ssp = force_reg (word_mode, const0_rtx);
   emit_insn (gen_rdssp (word_mode, reg_ssp, reg_ssp));
 
-  /* Compare through substraction the saved and the current ssp to decide
-if ssp has to be adjusted.  */
-  tmp = gen_rtx_SET (reg_ssp, gen_rtx_MINUS (word_mode, reg_ssp,
-ssp_slot));
-  clob = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, FLAGS_REG));
-  tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, tmp, clob));
-  emit_insn (tmp);
+  /* Compare through subtraction the saved and the current ssp
+to decide if ssp has to be adjusted.  */
+  reg_ssp = expand_simple_binop (word_mode, MINUS,
+reg_ssp, ssp_slot,
+reg_ssp, 1, OPTAB_DIRECT);
 
   /* Compare and jump over adjustment code.  */
-  noadj_label = gen_label_rtx ();
-  flags = gen_rtx_REG (CCZmode, FLAGS_REG);
-  tmp = gen_rtx_EQ (VOIDmode, flags, const0_rtx);
-  tmp = gen_rtx_IF_THEN_ELSE (VOIDmode, tmp,
- gen_rtx_LABEL_REF (VOIDmode, noadj_label),
- pc_rtx);
-  jump = emit_jump_insn (gen_rtx_SET (pc_rtx, tmp));
-  JUMP_LABEL (jump) = noadj_label;
-
-  /* Compute the numebr of frames to adjust.  */
-  reg_adj = gen_lowpart (ptr_mode, reg_ssp);
-  tmp = gen_rtx_SET (reg_adj,
-gen_rtx_LSHIFTRT (ptr_mode,
-  negate_rtx (ptr_mode, reg_adj),
-  GEN_INT ((word_mode == SImode)
-   ? 2
-   : 3)));
-  clob = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, FLAGS_REG));
-  tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, tmp, clob));
-  emit_insn (tmp);
+  rtx noadj_label = gen_label_rtx ();
+  emit_cmp_and_jump_insns (reg_ssp, const0_rtx, EQ, NULL_RTX,
+  word_mode, 1, noadj_label);
 
-  /* Check if number of frames <= 255 so no loop is needed.  */
-  tmp = gen_rtx_COMPARE (CCmode, reg_adj, GEN_INT (255));
-  flags = gen_rtx_REG (CCmode, FLAGS_REG);
-  emit_insn (gen_rtx_SET (flags, tmp));
+  /* Compute the number of frames to adjust.  */
+  rtx reg_adj = gen_lowpart (ptr_mode, reg_ssp);
+  rtx reg_adj_neg = expand_simple_unop (ptr_mode, NEG, reg_adj,
+   NULL_RTX, 1);
 
-  

Microsoft Dynamic GP, SAP ERP, Sage 100cloud Accounts

2020-08-18 Thread vickie . segar
Hi,

Would you be interested in acquiring *Microsoft Dynamic GP,* *SAP ERP, Sage
100cloud *Accounts for your business and sales needs?

Please let me know your interest so that I can provide you more information.

*Thanks,*

*Vickie Segar*

*Demand Generation*



If not interested kindly reply back “UNSUBSCRIBE”


Re: [PATCH] libstdc++: testsuite: Address random failure in pthread_create() [PR54185]

2020-08-18 Thread Lewis Hyatt via Gcc-patches
On Tue, Aug 18, 2020 at 09:43:31AM +0100, Jonathan Wakely wrote:
> On 13/08/20 18:15 -0400, Lewis Hyatt via Libstdc++ wrote:
> > Hello-
> > 
> > The attached patch was discussed briefly on PR 54185 here:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54185#c14
> > The test case for this PR sometimes fails due to random failures in
> > pthread_create() that are not related to the original PR. This patch fixes
> > it up by ignoring those failures. The test case was designed to repeat the
> > same test 1000 times to attempt to reproduce a race condition, so I think is
> > OK if some of those iterations are simply skipped.
> > 
> > Thanks for taking a look at it; I can commit it if it makes sense.
> > 
> > -Lewis
> 
> > libstdc++: testsuite: Address random failure in pthread_create() [PR54185]
> > 
> > The test for this PR calls pthread_create() many times in a row, which may 
> > fail
> > with EAGAIN sometimes. Avoid generating a test failure in this case.
> > 
> > libstdc++-v3/ChangeLog:
> > 
> > PR libstdc++/54185
> > * testsuite/30_threads/condition_variable/54185.cc: Make test robust
> > to random pthread_create() failures.
> 
> Thanks for the patch. It certainly looks reasonable, but I wonder if
> the attached version wouldn't be (very slightly) better. The
> difference is that instead of just giving up at the first EAGAIN we
> keep trying. This way we might be able to create a few more threads
> before the loop finishes. If we still keep failing, it works the same.
>
> I've also added a check that the failures are due to EAGAIN, and we'll
> still terminate if there's some other problem. I'm assuming that your
> failures are EAGAIN. Do you know why that's happening? Does your
> system a low value for RLIMIT_NPROC or something?
>

Right, good point to check for EAGAIN. Yes, that's the error I get. I don't
understand why it happens. It's not related to libstdc++, I can reproduce it
with the below:

==
#include 
void* do_nothing (void*) 
{
  return nullptr;
}
int main () {
  for (int i = 0; i != 1000; ++i)
{
  for (int j = 0; j != 10; ++j)
{
  pthread_t thread;
  const int err = pthread_create (, nullptr, do_nothing, 
nullptr);
  if (err) return 1;
  pthread_join (thread, nullptr);
}
}
}
==

If I run this just once at a time, it never fails. But if I run it twice at
a time, it fails about 30% of the time, like:
root@host:/home/lewis# (./pthread_fail || echo ERR) & \
   (./pthread_fail || echo ERR) & wait
[1] 25041
[2] 25042
ERR
ERR

All the rlimits are infinite or as high as possible, but I dug around a bit
and it seems this is a systemd thing, this system had systemd-logind
disabled (perhaps not in the correct way) and something about the
configuration led to the issue. Enabling systemd-logind resolves it for
me. So perhaps this was mostly specific to me. Sorry if I wasted your
time... if you still think it's worth doing something here I am happy to
help.

FWIW, regarding your extension to the patch, in case there are some
legitimate thread creation problems, one thing to keep in mind is that the
retrying after failure makes certain things worse. For instance, (with my
system in the previous state), what would happen is the 54185.cc hit the
pthread_create failure, then prior to this patch it just bailed out. With
either of these patches it tries more times, which can worsen issues in
unrelated test cases running in parallel, that may see random failures in
their own forks or thread creations. This test case is trying hard to
reproduce the race condition by running 1000 iterations, which seems
worthwhile given it's still failing on some systems like AIX, but on the
other hand it's possible doing 50 instead of 1000 would work too, and be
less prone to unrelated resource issues.

Thanks for taking a look at this.

-Lewis

> The failures for that testcase on AIX appear to be different. It just
> segfaults after destroying the condition_variable, which probably
> means there's a POSIX conformance issue in AIX's pthread_cond_t.
> 
> 

> diff --git a/libstdc++-v3/testsuite/30_threads/condition_variable/54185.cc 
> b/libstdc++-v3/testsuite/30_threads/condition_variable/54185.cc
> index ea0d5bb8740..8ccb79e6de6 100644
> --- a/libstdc++-v3/testsuite/30_threads/condition_variable/54185.cc
> +++ b/libstdc++-v3/testsuite/30_threads/condition_variable/54185.cc
> @@ -31,31 +31,48 @@
>  std::condition_variable* cond = nullptr;
>  std::mutex mx;
>  int started = 0;
> +bool notified = false;
>  int constexpr NUM_THREADS = 10;
>  
> -void do_thread_a()
> +void do_thread_a(bool wait)
>  {
>std::unique_lock lock(mx);
> -  if(++started >= NUM_THREADS)
> +  if (++started >= NUM_THREADS)
>{
> +notified = true;
>  cond->notify_all();
>  delete cond;
>  cond = nullptr;
>}
> -  else
> -cond->wait(lock);
> +  else if (wait)
> +cond->wait(lock, [] { return notified; });
>  }
>  
> -int main(){
> 

Re: [PATCH] Add cold attribute to one time construction APIs

2020-08-18 Thread Jonathan Wakely via Gcc-patches

On 18/08/20 15:35 +0100, Jonathan Wakely wrote:

On 17/08/20 18:15 +, Aditya K via Libstdc++ wrote:

This would help compiler optimize local static objects.

Added changelog.


Please don't :-)

GCC patch policies always said NOT to change the ChangeLog in the
patch, because the ChangeLog files change so rapidly that it means by
the time you've sent the patch, it no longer applies.

Current GCC policies are that NOBODY changes the ChangeLog files, they
are autogenerated from Git commit logs once per day.

So please just include the proposed ChangeLog entry as the Git commit
message in the body of your email.

Patch for libstdc++ need to go to both the libstdc++ list and the
gcc-patches list, in the same email. Not sent once to each list as
separate emails.



```
From c6cba40e0434147db89d3af05b598782cde651e3 Mon Sep 17 00:00:00 2001
From: Aditya Kumar <1894981+hiradi...@users.noreply.github.com>
Date: Thu, 13 Aug 2020 09:41:34 -0700
Subject: [PATCH] Add cold attribute to one time construction APIs

__cxa_guard_acquire is used for only one purpose,
namely guarding local static variable initialization,
and since that purpose is definitionally cold, it should be attributed as cold.


Definitionally isn't a word. Attributed is a word, but it doesn't mean
marked with an attribute.


Similarly for __cxa_guard_release and __cxa_guard_abort
---
libstdc++-v3/ChangeLog  | 5 +
libstdc++-v3/include/bits/c++config | 5 +
libstdc++-v3/libsupc++/cxxabi.h | 6 +++---
3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog
index fe6884bf3..86b707ac7 100644
--- a/libstdc++-v3/ChangeLog
+++ b/libstdc++-v3/ChangeLog
@@ -1,3 +1,8 @@
+2020-08-17  Aditya Kumar  
+   * libstdc++-v3/include/bits/c++config: Add _GLIBCXX_NOTHROW attribute
+   * libstdc++-v3/libsupc++/cxxabi.h (__cxa_guard_acquire, 
__cxa_guard_release,
+   __cxa_guard_abort): Add _GLIBCXX_NOTHROW attribute.


The changelog format is wrong. There should be a blank line after the
date+author line, and you're adding _GLIBCXX_COLD not
_GLIBCXX_NOTHROW. But it shouldn't be here at all as explained above.


2020-08-14  Lewis Hyatt  

* testsuite/lib/libstdc++.exp: Use the new option
diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index b1fad59d4..f6f954eef 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -42,6 +42,7 @@
//   _GLIBCXX_NORETURN
//   _GLIBCXX_NOTHROW
//   _GLIBCXX_VISIBILITY
+//   _GLIBCXX_COLD
#ifndef _GLIBCXX_PURE
# define _GLIBCXX_PURE __attribute__ ((__pure__))
#endif
@@ -74,6 +75,10 @@
# define _GLIBCXX_VISIBILITY(V) _GLIBCXX_PSEUDO_VISIBILITY(V)
#endif

+#ifndef _GLIBCXX_COLD
+# define _GLIBCXX_COLD __attribute__ ((cold))
+#endif


"cold" is not a reserved name so this needs to be __cold__.


I've just pushed the attached patch which ensures we don't use the
non-reserved form of the attribute.

Tested x86_64-linux, committed to trunk.

commit 6c1a58b7fbdaa8ac00957fccfb379af163309311
Author: Jonathan Wakely 
Date:   Tue Aug 18 15:37:14 2020

libstdc++: Add "cold" to tests for reserved attribute names

libstdc++-v3/ChangeLog:

* testsuite/17_intro/headers/c++1998/all_attributes.cc: Check
"cold" isn't used in the library. Also check .
* testsuite/17_intro/headers/c++2011/all_attributes.cc:
Likewise.
* testsuite/17_intro/headers/c++2014/all_attributes.cc:
Likewise.
* testsuite/17_intro/headers/c++2017/all_attributes.cc:
Likewise.
* testsuite/17_intro/headers/c++2020/all_attributes.cc:
Likewise.

diff --git a/libstdc++-v3/testsuite/17_intro/headers/c++1998/all_attributes.cc b/libstdc++-v3/testsuite/17_intro/headers/c++1998/all_attributes.cc
index 73a20e346e4..dd28429e1d5 100644
--- a/libstdc++-v3/testsuite/17_intro/headers/c++1998/all_attributes.cc
+++ b/libstdc++-v3/testsuite/17_intro/headers/c++1998/all_attributes.cc
@@ -21,6 +21,7 @@
 // Ensure the library only uses the __name__ form for attributes.
 // Don't test 'const' because it is reserved anyway.
 #define abi_tag 1
+#define cold 1
 #ifndef __APPLE__
 // darwin headers use these, see PR 64883
 # define always_inline 1
@@ -36,6 +37,7 @@
 #endif
 
 #include 
+#include 
 
 int
 main()
diff --git a/libstdc++-v3/testsuite/17_intro/headers/c++2011/all_attributes.cc b/libstdc++-v3/testsuite/17_intro/headers/c++2011/all_attributes.cc
index c0a79e06ddc..db00a33f6a3 100644
--- a/libstdc++-v3/testsuite/17_intro/headers/c++2011/all_attributes.cc
+++ b/libstdc++-v3/testsuite/17_intro/headers/c++2011/all_attributes.cc
@@ -21,6 +21,7 @@
 // Ensure the library only uses the __name__ form for attributes.
 // Don't test 'const' and 'noreturn' because they are reserved anyway.
 #define abi_tag 1
+#define cold 1
 #ifndef __APPLE__
 // darwin headers use these, see PR 64883
 # define always_inline 1
@@ 

Re: [PATCH] Add cold attribute to one time construction APIs

2020-08-18 Thread Jonathan Wakely via Gcc-patches

On 17/08/20 18:15 +, Aditya K via Libstdc++ wrote:

This would help compiler optimize local static objects.

Added changelog.


Please don't :-)

GCC patch policies always said NOT to change the ChangeLog in the
patch, because the ChangeLog files change so rapidly that it means by
the time you've sent the patch, it no longer applies.

Current GCC policies are that NOBODY changes the ChangeLog files, they
are autogenerated from Git commit logs once per day.

So please just include the proposed ChangeLog entry as the Git commit
message in the body of your email.

Patch for libstdc++ need to go to both the libstdc++ list and the
gcc-patches list, in the same email. Not sent once to each list as
separate emails.



```
From c6cba40e0434147db89d3af05b598782cde651e3 Mon Sep 17 00:00:00 2001
From: Aditya Kumar <1894981+hiradi...@users.noreply.github.com>
Date: Thu, 13 Aug 2020 09:41:34 -0700
Subject: [PATCH] Add cold attribute to one time construction APIs

__cxa_guard_acquire is used for only one purpose,
namely guarding local static variable initialization,
and since that purpose is definitionally cold, it should be attributed as cold.


Definitionally isn't a word. Attributed is a word, but it doesn't mean
marked with an attribute.


Similarly for __cxa_guard_release and __cxa_guard_abort
---
libstdc++-v3/ChangeLog  | 5 +
libstdc++-v3/include/bits/c++config | 5 +
libstdc++-v3/libsupc++/cxxabi.h | 6 +++---
3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog
index fe6884bf3..86b707ac7 100644
--- a/libstdc++-v3/ChangeLog
+++ b/libstdc++-v3/ChangeLog
@@ -1,3 +1,8 @@
+2020-08-17  Aditya Kumar  
+   * libstdc++-v3/include/bits/c++config: Add _GLIBCXX_NOTHROW attribute
+   * libstdc++-v3/libsupc++/cxxabi.h (__cxa_guard_acquire, 
__cxa_guard_release,
+   __cxa_guard_abort): Add _GLIBCXX_NOTHROW attribute.


The changelog format is wrong. There should be a blank line after the
date+author line, and you're adding _GLIBCXX_COLD not
_GLIBCXX_NOTHROW. But it shouldn't be here at all as explained above.


2020-08-14  Lewis Hyatt  

* testsuite/lib/libstdc++.exp: Use the new option
diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index b1fad59d4..f6f954eef 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -42,6 +42,7 @@
//   _GLIBCXX_NORETURN
//   _GLIBCXX_NOTHROW
//   _GLIBCXX_VISIBILITY
+//   _GLIBCXX_COLD
#ifndef _GLIBCXX_PURE
# define _GLIBCXX_PURE __attribute__ ((__pure__))
#endif
@@ -74,6 +75,10 @@
# define _GLIBCXX_VISIBILITY(V) _GLIBCXX_PSEUDO_VISIBILITY(V)
#endif

+#ifndef _GLIBCXX_COLD
+# define _GLIBCXX_COLD __attribute__ ((cold))
+#endif


"cold" is not a reserved name so this needs to be __cold__.

I'm not sure we really need it in  if we only use it
in one file, but maybe we'll find more uses for it later.


diff --git a/libstdc++-v3/libsupc++/cxxabi.h b/libstdc++-v3/libsupc++/cxxabi.h
index 000713ecd..24c1366e2 100644
--- a/libstdc++-v3/libsupc++/cxxabi.h
+++ b/libstdc++-v3/libsupc++/cxxabi.h
@@ -115,13 +115,13 @@ namespace __cxxabiv1
void (*__dealloc) (void*, size_t));

  int
-  __cxa_guard_acquire(__guard*);
+  __cxa_guard_acquire(__guard*) _GLIBCXX_COLD;


The GCC manual says that functions marked cold will be optimized for
size not for speed. Is that really what we want here?

It makes sense to put them in a cold section, but I think we still
want them to be optimized for speed, don't we?

I've attached a patch addressing the issues above, but I'd like to
know whether the change to how the functions are optimized is
desirable, or even noticable.

commit fadd79179f93c82c2935fdfe17a2ab1586b4e70f
Author: Aditya Kumar 
Date:   Tue Aug 18 15:22:24 2020

libstdc++: Add cold attribute to one time construction APIs

__cxa_guard_acquire is used for only one purpose, namely guarding local
static variable initialization. Since that purpose happens rarely, it
should be marked with the 'cold' attribute.

Similarly for __cxa_guard_release and __cxa_guard_abort.

libstdc++-v3/ChangeLog:

2020-08-18  Aditya Kumar  

* include/bits/c++config (_GLIBCXX_COLD): Define.
* libsupc++/cxxabi.h (__cxa_guard_acquire, __cxa_guard_release)
(__cxa_guard_abort): Add _GLIBCXX_COLD.

diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index b1fad59d4b3..a135de00a50 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -42,6 +42,7 @@
 //   _GLIBCXX_NORETURN
 //   _GLIBCXX_NOTHROW
 //   _GLIBCXX_VISIBILITY
+//   _GLIBCXX_COLD
 #ifndef _GLIBCXX_PURE
 # define _GLIBCXX_PURE __attribute__ ((__pure__))
 #endif
@@ -74,6 +75,10 @@
 # define _GLIBCXX_VISIBILITY(V) _GLIBCXX_PSEUDO_VISIBILITY(V)
 #endif
 
+#ifndef _GLIBCXX_COLD
+# define _GLIBCXX_COLD __attribute__ ((__cold__))

Re: [PATCH v2] C-SKY: Support -mfloat-abi=hard.

2020-08-18 Thread Jojo R
Hi,

Good points :)

Jojo
在 2020年8月18日 +0800 AM10:40,Cooper Qu ,写道:
> Hi Jojo,
>
> Nowhere is this rule directly stated. But there are indent options
> showed in
> https://www.gnu.org/prep/standards/html_node/Formatting.html#Formatting
> corresponding to recommendations of C formatting style, which use the
> defualt 8 clumns tab wide.
>
>
> On 8/18/20 9:42 AM, Jojo R wrote:
> > Hi,
> >
> > Is there coding rule with it ?
> >
> > I can not find it from 
> > https://www.gnu.org/prep/standards/html_node/index.html
> > and https://gcc.gnu.org/codingconventions.html
> >
> > Could you give me any hints ?
> >
> > Thanks.
> >
> > Jojo
> > 在 2020年8月17日 +0800 PM11:05,Xianmiao Qu ,写道:
> > > Hi Jojo,
> > >
> > >
> > > On 8/17/20 7:09 PM, Jojo R wrote:
> > > > diff --git a/gcc/config/csky/csky.c b/gcc/config/csky/csky.c
> > > > index 7ba3ed3..b71291a 100644
> > > > --- a/gcc/config/csky/csky.c
> > > > +++ b/gcc/config/csky/csky.c
> > > > @@ -328,6 +328,16 @@ csky_cpu_cpp_builtins (cpp_reader *pfile)
> > > > {
> > > > builtin_define ("__csky_hard_float__");
> > > > builtin_define ("__CSKY_HARD_FLOAT__");
> > > > + if (TARGET_HARD_FLOAT_ABI)
> > > > + {
> > > > + builtin_define ("__csky_hard_float_abi__");
> > > > + builtin_define ("__CSKY_HARD_FLOAT_ABI__");
> > > > + }
> > > > + if (TARGET_SINGLE_FPU)
> > > > + {
> > > > + builtin_define ("__csky_hard_float_fpu_sf__");
> > > > + builtin_define ("__CSKY_HARD_FLOAT_FPU_SF__");
> > > > + }
> > > > }
> > > These is one more thing you shoud pay attention to, if the spaces number
> > > reaches 8 at begining of a line, you should use tab instead of 8 spaces.
> > >
> > >
> > > Thanks,
> > >
> > > Xianmiao


[committed] libstdc++: Remove redundant copying of std::async arguments [PR 69724]

2020-08-18 Thread Jonathan Wakely via Gcc-patches
As was previously done for std::thread, this removes an unnecessary copy
of an rvalue of type thread::_Invoker. Instead of creating the rvalue
and then moving that into the shared state, the member of the shared
state is initialized directly from the forwarded callable and bound
arguments.

This also slightly simplifies std::thread creation to remove the
_S_make_state helper function.

libstdc++-v3/ChangeLog:

PR libstdc++/69724
* include/std/future (__future_base::_S_make_deferred_state)
(__future_base::_S_make_async_state): Remove.
(__future_base::_Deferred_state): Change constructor to accept a
parameter pack of arguments and forward them to the call
wrapper.
(__future_base::_Async_state_impl): Likewise. Replace lambda
expression with a named member function.
(async): Construct state object directly from the arguments,
instead of using thread::__make_invoker, _S_make_deferred_state
and _S_make_async_state. Move shared state into the returned
future.
* include/std/thread (thread::_Call_wrapper): New alias
template for use by constructor and std::async.
(thread::thread(Callable&&, Args&&...)): Create state object
directly instead of using _S_make_state.
(thread::__make_invoker, thread::__decayed_tuple)
(thread::_S_make_state): Remove.
* testsuite/30_threads/async/69724.cc: New test.

Tested powerpc64le-linux. Committed to trunk.

commit bb1b7f087bdd028000fd8f84e74b20adccc9d5bb
Author: Jonathan Wakely 
Date:   Tue Aug 18 14:23:19 2020

libstdc++: Remove redundant copying of std::async arguments [PR 69724]

As was previously done for std::thread, this removes an unnecessary copy
of an rvalue of type thread::_Invoker. Instead of creating the rvalue
and then moving that into the shared state, the member of the shared
state is initialized directly from the forwarded callable and bound
arguments.

This also slightly simplifies std::thread creation to remove the
_S_make_state helper function.

libstdc++-v3/ChangeLog:

PR libstdc++/69724
* include/std/future (__future_base::_S_make_deferred_state)
(__future_base::_S_make_async_state): Remove.
(__future_base::_Deferred_state): Change constructor to accept a
parameter pack of arguments and forward them to the call
wrapper.
(__future_base::_Async_state_impl): Likewise. Replace lambda
expression with a named member function.
(async): Construct state object directly from the arguments,
instead of using thread::__make_invoker, _S_make_deferred_state
and _S_make_async_state. Move shared state into the returned
future.
* include/std/thread (thread::_Call_wrapper): New alias
template for use by constructor and std::async.
(thread::thread(Callable&&, Args&&...)): Create state object
directly instead of using _S_make_state.
(thread::__make_invoker, thread::__decayed_tuple)
(thread::_S_make_state): Remove.
* testsuite/30_threads/async/69724.cc: New test.

diff --git a/libstdc++-v3/include/std/future b/libstdc++-v3/include/std/future
index bdf4a75d694..e0816c2f5e1 100644
--- a/libstdc++-v3/include/std/future
+++ b/libstdc++-v3/include/std/future
@@ -605,14 +605,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 template
   class _Task_state;
 
-template
-  static std::shared_ptr<_State_base>
-  _S_make_deferred_state(_BoundFn&& __fn);
-
-template
-  static std::shared_ptr<_State_base>
-  _S_make_async_state(_BoundFn&& __fn);
-
 template
   struct _Task_setter;
@@ -1614,10 +1606,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 : public __future_base::_State_base
 {
 public:
-  explicit
-  _Deferred_state(_BoundFn&& __fn)
-  : _M_result(new _Result<_Res>()), _M_fn(std::move(__fn))
-  { }
+  template
+   explicit
+   _Deferred_state(_Args&&... __args)
+   : _M_result(new _Result<_Res>()),
+ _M_fn{{std::forward<_Args>(__args)...}}
+   { }
 
 private:
   typedef __future_base::_Ptr<_Result<_Res>> _Ptr_type;
@@ -1679,69 +1673,63 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 : public __future_base::_Async_state_commonV2
 {
 public:
-  explicit
-  _Async_state_impl(_BoundFn&& __fn)
-  : _M_result(new _Result<_Res>()), _M_fn(std::move(__fn))
-  {
-   _M_thread = std::thread{ [this] {
-   __try
- {
-   _M_set_result(_S_task_setter(_M_result, _M_fn));
- }
-   __catch (const __cxxabiv1::__forced_unwind&)
- {
-   // make the shared state ready on thread cancellation
-   if (static_cast(_M_result))
- this->_M_break_promise(std::move(_M_result));
- 

c++: Move hidden-lambda entity lookup checking

2020-08-18 Thread Nathan Sidwell

Hidden lambda entities only occur in block and class scopes.  There's
no need to check for them on every lookup.  So moving that particular
piece of validation to lookup_name_1, which cares.  Also reordered the
namespace and type checking, as that is also simpler.

gcc/cp/
* name-lookup.c (qualify_lookup): Drop lambda checking here.
Reorder namespace & type checking.
(lookup_name_1): Do hidden lambda checking here.

pushed
--
Nathan Sidwell
diff --git i/gcc/cp/name-lookup.c w/gcc/cp/name-lookup.c
index ad9c92da254..c68ea09e610 100644
--- i/gcc/cp/name-lookup.c
+++ w/gcc/cp/name-lookup.c
@@ -5221,24 +5221,16 @@ qualify_lookup (tree val, LOOK_want want)
   if (val == NULL_TREE)
 return false;
 
-  if (bool (want & LOOK_want::NAMESPACE) && TREE_CODE (val) == NAMESPACE_DECL)
-return true;
-
   if (bool (want & LOOK_want::TYPE))
 {
   tree target_val = strip_using_decl (val);
 
-  if (TREE_CODE (target_val) == TYPE_DECL
-	  || TREE_CODE (target_val) == TEMPLATE_DECL)
+  if (TREE_CODE (STRIP_TEMPLATE (target_val)) == TYPE_DECL)
 	return true;
 }
 
   if (bool (want & LOOK_want::TYPE_NAMESPACE))
-return false;
-
-  /* Look through lambda things that we shouldn't be able to see.  */
-  if (!bool (want & LOOK_want::HIDDEN_LAMBDA) && is_lambda_ignored_entity (val))
-return false;
+return TREE_CODE (val) == NAMESPACE_DECL;
 
   return true;
 }
@@ -6430,7 +6422,10 @@ lookup_name_1 (tree name, LOOK_where where, LOOK_want want)
   tree val = NULL_TREE;
 
   gcc_checking_assert (unsigned (where) != 0);
-
+  /* If we're looking for hidden lambda things, we shouldn't be
+ looking in namespace scope.  */
+  gcc_checking_assert (!bool (want & LOOK_want::HIDDEN_LAMBDA)
+		   || !bool (where & LOOK_where::NAMESPACE));
   query_oracle (name);
 
   /* Conversion operators are handled specially because ordinary
@@ -6481,7 +6476,10 @@ lookup_name_1 (tree name, LOOK_where where, LOOK_want want)
 	  continue;
 
 	/* If this is the kind of thing we're looking for, we're done.  */
-	if (qualify_lookup (iter->value, want))
+	if (iter->value
+	&& (bool (want & LOOK_want::HIDDEN_LAMBDA)
+		|| !is_lambda_ignored_entity (iter->value))
+	&& qualify_lookup (iter->value, want))
 	  binding = iter->value;
 	else if (bool (want & LOOK_want::TYPE)
 		 && qualify_lookup (iter->type, want))


[PATCH] RX add control register PC

2020-08-18 Thread Darius Galis

Hello,

The following patch is adding the PC control register.
It also updates the __builtin_rx_mvfc() function, since
according to the documentation, the PC register cannot be specified
as dest.

The regression was tested with the following command:

make -k check-gcc RUNTESTFLAGS=--target_board=rx-sim

There were no additionals failures.

Please let me know if this is OK, Thank you!
Darius Galis

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index b834a2c..3436c25 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2020-08-17  Darius Galis  
+
+   * config/rx/rx.md (CTRLREG_PC): Add.
+   * config/rx/rx.c (CTRLREG_PC): Add
+   (rx_expand_builtin_mvtc): Add warning: PC register cannot
+   be used as dest.
+
 2020-08-03  Jonathan Wakely  
 
 	* doc/cpp.texi (Variadic Macros): Use the exact ... token in

diff --git a/gcc/config/rx/rx.c b/gcc/config/rx/rx.c
index 151ad39..1cc88d9 100644
--- a/gcc/config/rx/rx.c
+++ b/gcc/config/rx/rx.c
@@ -639,6 +639,7 @@ rx_print_operand (FILE * file, rtx op, int letter)
   switch (INTVAL (op))
{
case CTRLREG_PSW:   fprintf (file, "psw"); break;
+   case CTRLREG_PC:fprintf (file, "pc"); break;
case CTRLREG_USP:   fprintf (file, "usp"); break;
case CTRLREG_FPSW:  fprintf (file, "fpsw"); break;
case CTRLREG_BPSW:  fprintf (file, "bpsw"); break;
@@ -2474,6 +2475,14 @@ rx_expand_builtin_mvtc (tree exp)
   if (! REG_P (arg2))
 arg2 = force_reg (SImode, arg2);
 
+  if (INTVAL(arg1) == 1/*PC*/)

+  {
+   warning (0, "invalid control register for mvtc : %d - using 'psw'",
+   (int) INTVAL (arg1));
+   arg1 = const0_rtx;
+  }
+
+
   emit_insn (gen_mvtc (arg1, arg2));
 
   return NULL_RTX;

diff --git a/gcc/config/rx/rx.md b/gcc/config/rx/rx.md
index df08a9e..0739f58 100644
--- a/gcc/config/rx/rx.md
+++ b/gcc/config/rx/rx.md
@@ -77,6 +77,7 @@
(UNSPEC_PID_ADDR   52)
 
(CTRLREG_PSW		0)

+   (CTRLREG_PC 1)
(CTRLREG_USP2)
(CTRLREG_FPSW   3)
(CTRLREG_BPSW   8)

--
Ing. Darius Galiș
Software Engineer - CyberTHOR Studios Ltd.



Re: [PATCH] vxworks: Fix GCC selftests for *-wrs-vxworks7-* targets

2020-08-18 Thread Olivier Hainque
Hi Iain,

> On 18 Aug 2020, at 13:45, Iain Buclaw  wrote:
> 
> Attached is the change as per your proposal.
> 
>   * config/vxworks.h (VXWORKS_ADDITIONAL_CPP_SPEC): Replace -nostdinc
>   with -fself-tests.
> #undef VXWORKS_ADDITIONAL_CPP_SPEC
> #define VXWORKS_ADDITIONAL_CPP_SPEC \
> - "%{!nostdinc:  \
> + "%{!fself-test=*:  \
> %{isystem*} \
> %{mrtp: -idirafter %:getenv(VSB_DIR /h) \
> -idirafter %:getenv(VSB_DIR /share/h)   \
> @@ -55,7 +60,7 @@ along with GCC; see the file COPYING3.  If not see
> 
> #undef VXWORKS_ADDITIONAL_CPP_SPEC
> #define VXWORKS_ADDITIONAL_CPP_SPEC   \
> - "%{!nostdinc:   \
> + "%{!fself-test=*:   \

Thanks for the updated proposal.

Sorry, I have been unclear: If I'm reading the spec of
-nostdinc correctly, I think we should still prevent those
CPP switches from being added if the option is provided.

Can you please amend just this part to prevent the addition
of the following switches if either -nostdinc or -fself-test
is provided ?

Ok, for me with this change, assuming -nostdinc in
SELFTEST_FLAGS didn't have other uses than the one documented
in the attached comment (I'm not familiar enough with the
self-tests to know for sure).

Thanks!

Olivier



Re: [PATCH] AArch64: Add if condition in aarch64_function_value [PR96479]

2020-08-18 Thread Christophe Lyon via Gcc-patches
On Tue, 18 Aug 2020 at 05:38, qiaopeixin  wrote:
>
> Hi Richard,
>
> Thanks for the review and explanation.
>
> The previous fix adding if condition of TARGET_FLOAT does crash glibc-2.29.
>
> I checked the past log of writing the function aarch64_init_cumulative_args, 
> and did not find the reason why Alan Lawrence added TREE_PUBLIC (fndecl) as 
> one condition for entering the function type check. Maybe Alan could clarify? 
> I tried to delete TREE_PUBLIC (fndecl), which turns out could solve both the 
> glibc problem and the previous ICE problem. A new fix is made as following, 
> passed bootstrap and deja test. I believe this fix is reasonable, since the 
> function type should be checked no matter if it has external linkage or not.
>
> The function aarch64_init_cumulative_args checks the function types and 
> should catch the error that "-mgeneral-regs-only" is incompatible with the 
> use of SIMD/FP registers. In the test case on PR96479, the function myfunc2 
> returns one vector of 4 integers, while it is defined static type. 
> TREE_PUBLIC (fndecl) is set as false and it prevents from entering if 
> statement and checking function types. I delete "TREE_PUBLIC (fndecl)" so 
> that gcc can catch the error through the function 
> aarch64_init_cumulative_args now. The ICE on PR96479 can report the 
> diagnostic error with this fix. The patch for the fix is attached as 
> following:
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index b7f5bc76f1b..9ce83dce131 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -6017,7 +6017,7 @@ aarch64_init_cumulative_args (CUMULATIVE_ARGS *pcum,
>
>if (!silent_p
>&& !TARGET_FLOAT
> -  && fndecl && TREE_PUBLIC (fndecl)
> +  && fndecl
>&& fntype && fntype != error_mark_node)
>  {
>const_tree type = TREE_TYPE (fntype);
>
> Christophe, thanks for your tests on glibc-2.29. With the above fix, I built 
> glibc-2.29, and the previous error does not show up now. Could you please 
> check if this fix works?

Hi,

I confirm this works OK for my testing (aarch64-linux-gnu and aarch64-elf)

Thanks,

Christophe

>
> Do you have any suggestions on this fix?
>
> All the best,
> Peixin
>
>
> -Original Message-
> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
> Sent: Thursday, August 13, 2020 8:19 PM
> To: Christophe Lyon 
> Cc: qiaopeixin ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] AArch64: Add if condition in aarch64_function_value 
> [PR96479]
>
> Christophe Lyon  writes:
> > On Thu, 13 Aug 2020 at 03:54, qiaopeixin  wrote:
> >>
> >> Thanks for the review and commit.
> >>
> >> All the best,
> >> Peixin
> >>
> >> -Original Message-
> >> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
> >> Sent: 2020年8月13日 0:25
> >> To: qiaopeixin 
> >> Cc: gcc-patches@gcc.gnu.org
> >> Subject: Re: [PATCH] AArch64: Add if condition in
> >> aarch64_function_value [PR96479]
> >>
> >> qiaopeixin  writes:
> >> > Hi,
> >> >
> >> > The test case vector-subscript-2.c in the gcc testsuit will report an 
> >> > ICE in the expand pass since '-mgeneral-regs-only' is incompatible with 
> >> > the use of V4SI mode. I propose to report the diagnostic information 
> >> > instead of ICE, and the problem has been discussed on PR 96479.
> >> >
> >> > I attached the patch to solve the problem. Bootstrapped and tested on 
> >> > aarch64-linux-gnu. Any suggestions?
> >>
> >> Thanks, pushed.  I was initially sceptical because raising an error here 
> >> and in aarch64_layout_arg is a hack.  Both functions are just query 
> >> functions and shouldn't have any side effects.
> >>
> >> The approach we took for FP modes seemed better: we define the FP move 
> >> patterns unconditionally, and raise an error if we try to emit an FP move 
> >> with !TARGET_FLOAT.  This defers any error reporting until we actually try 
> >> to generate code that depends on TARGET_FLOAT.
> >>
> >> But I guess SIMD stuff is different.  There's no reason in principle why 
> >> you can't use:
> >>
> >>   unsigned short __attribute__((vector_size(8)))
> >>
> >> *within* a function with -mgeneral-regs-only.  It would just need to be 
> >> emulated, in the same way as for:
> >>
> >>   unsigned short __attribute__((vector_size(4)))
> >>
> >> So it would be wrong to define the SIMD move patterns unconditionally and 
> >> raise an error there.
> >>
> >> So all in all, I agree this is the best we can do given the current 
> >> infrastructure.
> >>
> >
> > Since this patch was committed my buildbot is broken for
> > aarch64-linux-gnu because it now fails to build glibc-2.29:
> > ../stdlib/bits/stdlib-float.h: In function 'atof':
> > ../stdlib/bits/stdlib-float.h:26:1: error: '-mgeneral-regs-only' is
> > incompatible with the use of floating-point types
>
> Thanks for the heads-up.  I've reverted the patch for now.
>
> Looking more closely, it seems like aarch64_init_cumulative_args already 
> tries to catch 

Re: [PATCH] vxworks: Fix GCC selftests for *-wrs-vxworks7-* targets

2020-08-18 Thread Iain Buclaw via Gcc-patches
Excerpts from Olivier Hainque's message of August 18, 2020 10:01 am:
> Hello Iain,
> 
>> On 17 Aug 2020, at 10:08, Iain Buclaw  wrote:
>> 
>> Hi,
>> 
>> Currently when building a cross-compiler targeting arm-wrs-vxworks7, the
>> selftests fail unless the VSB_DIR environment variable is set.
> 
>> The same !nostdinc condition is used for VXWORKS_ADDITIONAL_CPP_SPEC.
>> 
>> OK for mainline?
> 
> Thanks for proposing this patch.
> 
> I'd rather remove the self-tests -> -nostdinc thing, apparently
> introduced explicitly for VxWorks and this dependency on environment
> variables in specs.
> 
> Can you please just test for fself-test instead, in both cases, with
> a comment like
> 
> "Self-tests may be run in contexts where the VxWorks environment
> isn't available.  Prevent attempts at designating the location of
> runtime header files, libraries or startfiles, which would fail
> on unset environment variables and aren't needed for such tests."
> 

Hi Oliver,

Attached is the change as per your proposal.

Iain.

---
gcc/ChangeLog:

* Makefile.in (SELFTEST_FLAGS): Remove -nostdinc.
* config/vxworks.h (VXWORKS_ADDITIONAL_CPP_SPEC): Replace -nostdinc
with -fself-tests.
(STARTFILE_PREFIX_SPEC): Avoid using VSB_DIR if -fself-tests is used.

---
diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 79e854aa938..a71585239da 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1995,8 +1995,6 @@ rest.cross: specs
 
 # GCC's selftests.
 # Specify a dummy input file to placate the driver.
-# Specify -nostdinc to work around missing WIND_BASE environment variable
-# required for *-wrs-vxworks-* targets.
 # Specify -o /dev/null so the output of -S is discarded. More importantly
 # It does not try to create a file with the name "null.s" on POSIX and
 # "nul.s" on Windows. Because on Windows "nul" is a reserved file name.
@@ -2005,7 +2003,7 @@ rest.cross: specs
 # Specify the path to gcc/testsuite/selftests within the srcdir
 # as an argument to -fself-test.
 DEVNULL=$(if $(findstring mingw,$(build)),nul,/dev/null)
-SELFTEST_FLAGS = -nostdinc $(DEVNULL) -S -o $(DEVNULL) \
+SELFTEST_FLAGS = $(DEVNULL) -S -o $(DEVNULL) \
-fself-test=$(srcdir)/testsuite/selftests
 
 # Run the selftests during the build once we have a driver and the frontend,
diff --git a/gcc/config/vxworks.h b/gcc/config/vxworks.h
index d648d2f23cb..4aa1e320743 100644
--- a/gcc/config/vxworks.h
+++ b/gcc/config/vxworks.h
@@ -36,11 +36,16 @@ along with GCC; see the file COPYING3.  If not see
 
 /* Since we provide a default -isystem, expand -isystem on the command
line early.  */
+
+/* Self-tests may be run in contexts where the VxWorks environment isn't
+   available.  Prevent attempts at designating the location of runtime header
+   files, libraries or startfiles, which would fail on unset environment
+   variables and aren't needed for such tests.  */
 #if TARGET_VXWORKS7
 
 #undef VXWORKS_ADDITIONAL_CPP_SPEC
 #define VXWORKS_ADDITIONAL_CPP_SPEC \
- "%{!nostdinc:  \
+ "%{!fself-test=*:  \
 %{isystem*} \
 %{mrtp: -idirafter %:getenv(VSB_DIR /h) \
 -idirafter %:getenv(VSB_DIR /share/h)   \
@@ -55,7 +60,7 @@ along with GCC; see the file COPYING3.  If not see
 
 #undef VXWORKS_ADDITIONAL_CPP_SPEC
 #define VXWORKS_ADDITIONAL_CPP_SPEC\
- "%{!nostdinc: \
+ "%{!fself-test=*: \
 %{isystem*}\
 %{mrtp: -idirafter %:getenv(WIND_USR /h)   \
-idirafter %:getenv(WIND_USR /h/wrn/coreip) \
@@ -108,7 +113,8 @@ along with GCC; see the file COPYING3.  If not see
 
 #if TARGET_VXWORKS7
 #undef  STARTFILE_PREFIX_SPEC
-#define STARTFILE_PREFIX_SPEC "%:getenv(VSB_DIR /usr/lib/common)"
+#define STARTFILE_PREFIX_SPEC \
+  "%{!fself-test=*:%:getenv(VSB_DIR /usr/lib/common)}"
 #define TLS_SYM "-u __tls__"
 #else
 #define TLS_SYM ""



Re: [PATCH v2] options: Make --help= to emit values post-overrided

2020-08-18 Thread Richard Sandiford
"Kewen.Lin"  writes:
>   * opts-global.c (decode_options): Call target_option_override_hook
>   before it prints for --help=*.

OK, thanks.

Richard

> diff --git a/gcc/opts-global.c b/gcc/opts-global.c
> index b1a8429dc3c..69fe2b4f3b1 100644
> --- a/gcc/opts-global.c
> +++ b/gcc/opts-global.c
> @@ -327,8 +327,14 @@ decode_options (struct gcc_options *opts, struct 
> gcc_options *opts_set,
>unsigned i;
>const char *arg;
>  
> -  FOR_EACH_VEC_ELT (help_option_arguments, i, arg)
> -print_help (opts, lang_mask, arg);
> +  if (!help_option_arguments.is_empty ())
> +{
> +  /* Make sure --help=* see the overridden values.  */
> +  target_option_override_hook ();
> +
> +  FOR_EACH_VEC_ELT (help_option_arguments, i, arg)
> + print_help (opts, lang_mask, arg);
> +}
>  }
>  
>  /* Hold command-line options associated with stack limitation.  */


Re: [PATCH] aarch64: Don't generate invalid zero/sign-extend syntax

2020-08-18 Thread Iain Sandoe via Gcc-patches

Richard Sandiford  wrote:


Alex Coplan  writes:

Note that an obvious omission here is that this patch does not touch the
mult patterns such as *add__mult_. I found
that I couldn't hit these patterns with C code since multiplications by
powers of two always get turned into shifts by earlier RTL passes. If
there's a way to reliably hit these patterns, then perhaps these should
be updated as well.


Hmm.  Feels like we should either update them or delete them.  E.g.:

 *adds__multp2
 *subs__multp2


FWIW add_extvdi_multp2 seems to fire for me, building libstdc++ (c++11
cow-wstring-inst) on my [very experimental] initial attempts at a Darwin  
port.


(I see these failures too because the platform assembler is based off the  
LLVM

 backend which complains)

Iain


were added alongside the adds3.c and subs3.c tests that you're updating,
so if the tests don't/no longer need the multp2 patterns to pass,
there's a good chance that the patterns are redundant.

For reasons I never understood, the canonical representation is to use
(mult …) for powers of 2 inside a (mem …) but shifts outside of (mem …)s.
So perhaps the patterns were originally for address calculations that had
been moved outside of a (mem …) and not updated to shifts instead of mults.

AFAICT the full list of affected patterns is:

 *adds__multp2
 *subs__multp2
 *add__mult_
 *add_uxt_multp2
 *sub_uxt_multp2

Is that right?  If so, I think we should consider a follow-on patch
to delete them.


Testing:
* New test which checks for the correct syntax in all updated
  patterns (fails before and passes after the aarch64.md change).
* New test can be assembled by both GAS and llvm-mc following the
  change.
* Bootstrapped and regtested on aarch64-none-linux-gnu.

OK for master?


OK as-is if paired with a follow-on patch to delete the patterns above
(preapproved if it passes testing).  Also OK without a follow-on patch
if the fix is extended to the patterns above too (but the first option
is better :-)).

Thanks for taking the time to find a test for each pattern.

Richard





[PATCH v2] options: Make --help= to emit values post-overrided

2020-08-18 Thread Kewen.Lin via Gcc-patches
Hi Segher,

on 2020/8/15 上午6:01, Segher Boessenkool wrote:
> Hi!
> 
> On Fri, Aug 14, 2020 at 01:42:24PM +0800, Kewen.Lin wrote:
>>> I think personally I'd prefer an option (3): call
>>> target_option_override_hook directly in decode_options,
>>> if help_option_arguments is nonempty.  Like you say,
>>> decode_options appears to be the only caller of print_help.
>>
>> Good idea!  The related patch is attached, different from opts_alt{1,2}
>> it could still call target_option_override_hook even if we won't call
>> print_specific_help eventually for some special cases like lang_mask is
>> CL_DRIVER or include_flags is empty.  But I think it's fine.
> 
>> --- a/gcc/opts-global.c
>> +++ b/gcc/opts-global.c
>> @@ -327,8 +327,14 @@ decode_options (struct gcc_options *opts, struct 
>> gcc_options *opts_set,
>>unsigned i;
>>const char *arg;
>>  
>> -  FOR_EACH_VEC_ELT (help_option_arguments, i, arg)
>> -print_help (opts, lang_mask, arg);
>> +  if (!help_option_arguments.is_empty ())
>> +{
>> +  /* Consider post-overrided values for --help=*.  */
> 
> I'd say
>   /* Make sure --help=* see the overridden values.  */
> 

This looks better, thanks for polishing!  The updated one is attached.

BR,
Kewen
-
gcc/ChangeLog:

* opts-global.c (decode_options): Call target_option_override_hook
before it prints for --help=*.
diff --git a/gcc/opts-global.c b/gcc/opts-global.c
index b1a8429dc3c..69fe2b4f3b1 100644
--- a/gcc/opts-global.c
+++ b/gcc/opts-global.c
@@ -327,8 +327,14 @@ decode_options (struct gcc_options *opts, struct 
gcc_options *opts_set,
   unsigned i;
   const char *arg;
 
-  FOR_EACH_VEC_ELT (help_option_arguments, i, arg)
-print_help (opts, lang_mask, arg);
+  if (!help_option_arguments.is_empty ())
+{
+  /* Make sure --help=* see the overridden values.  */
+  target_option_override_hook ();
+
+  FOR_EACH_VEC_ELT (help_option_arguments, i, arg)
+   print_help (opts, lang_mask, arg);
+}
 }
 
 /* Hold command-line options associated with stack limitation.  */


[PATCH 3/4 v2] ivopts: Consider cost_step on different forms during unrolling

2020-08-18 Thread Kewen.Lin via Gcc-patches
Hi Bin,

> I see, it's similar to the auto-increment case where cost should be
> recorded only once.  So this is okay given 1) fine predicting
> rtl-unroll is likely impossible here; 2) the patch has very limited
> impact.
> 
Really appreciate your time and patience!

I extended the previous version to address Richard S.'s comments on
candidates with the same base/step but different offsets here:
https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547014.html.

The previous version only allows the candidate derived from the group
of interest, this updated patch extends it to those ones which have the
same bases/steps and same/different offsets but in the acceptable range
by considering unrolling.

For one particular case like: 

for (i = 0; i < SIZE; i++)
  y[i] = a * x[i] + z[i];

we will mark reg_offset_p for IV candidates on x as below:
   - (unsigned long) (x_18(D) + 8)// only mark this before.
   - x_18(D) + 8
   - (unsigned long) (x_18(D) + 24)
   - (unsigned long) ((vector(2) double *) (x_18(D) + 8) + 18446744073709551600)
   ...

Do you mind to have a review again?  Thanks in advance!

Bootstrapped/regtested on powerpc64le-linux-gnu P8 and P9.

SPEC2017 P9 performance run has no remarkable degradations/improvements.

BR,
Kewen
-
gcc/ChangeLog:

* tree-ssa-loop-ivopts.c (struct iv_cand): New field reg_offset_p.
(struct ivopts_data): New field consider_reg_offset_for_unroll_p.
(mark_reg_offset_candidates): New function.
(add_candidate_1): Set reg_offset_p to false for new candidate.
(set_group_iv_cost): Scale up group cost with estimate_unroll_factor if
consider_reg_offset_for_unroll_p.
(determine_iv_cost): Increase step cost with estimate_unroll_factor if
consider_reg_offset_for_unroll_p.
(tree_ssa_iv_optimize_loop): Call estimate_unroll_factor, update
consider_reg_offset_for_unroll_p.

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 1d2697ae1ba..5a19b53c8d5 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -473,6 +473,9 @@ struct iv_cand
   struct iv *orig_iv;  /* The original iv if this cand is added from biv with
   smaller type.  */
   bool doloop_p;   /* Whether this is a doloop candidate.  */
+  bool reg_offset_p;   /* Whether this is available for an address type group
+  where its all uses are valid to adopt reg_offset
+  addressing mode even considering unrolling.  */
 };
 
 /* Hashtable entry for common candidate derived from iv uses.  */
@@ -653,6 +656,10 @@ struct ivopts_data
 
   /* Whether the loop has doloop comparison use.  */
   bool doloop_use_p;
+
+  /* Whether need to consider register offset addressing mode for the loop with
+ upcoming unrolling by estimated unroll factor.  */
+  bool consider_reg_offset_for_unroll_p;
 };
 
 /* An assignment of iv candidates to uses.  */
@@ -2731,6 +2738,112 @@ split_address_groups (struct ivopts_data *data)
 }
 }
 
+/* For each address type group, it finds the address-based IV candidates with
+   the same base and step, for those that are available to be used for the
+   whole group with reg_offset addressing mode by considering the address 
offset
+   difference and increased offset with unrolling factor estimation, mark them
+   as reg_offset_p.  */
+
+static void
+mark_reg_offset_candidates (struct ivopts_data *data)
+{
+  class loop *loop = data->current_loop;
+  gcc_assert (data->current_loop->estimated_unroll > 1);
+  bool any_reg_offset_p = false;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+fprintf (dump_file, ":\n");
+
+  auto valid_reg_offset_p
+= [] (struct iv_use *use, poly_uint64 off, poly_uint64 max_inc) {
+   if (!addr_offset_valid_p (use, off))
+ return false;
+   if (!addr_offset_valid_p (use, off + max_inc))
+ return false;
+   return true;
+  };
+
+  for (unsigned i = 0; i < data->vgroups.length (); i++)
+{
+  struct iv_group *group = data->vgroups[i];
+
+  if (address_p (group->type))
+   {
+ struct iv_use *head_use = group->vuses[0];
+ if (!tree_fits_poly_int64_p (head_use->iv->step))
+   continue;
+
+ poly_int64 step = tree_to_poly_int64 (head_use->iv->step);
+ /* Max extra offset to be added due to unrolling.  */
+ poly_int64 max_increase = (loop->estimated_unroll - 1) * step;
+
+ tree use_base = head_use->addr_base;
+ STRIP_NOPS (use_base);
+
+ struct iv_use *last_use = NULL;
+ unsigned group_size = group->vuses.length ();
+ gcc_assert (group_size >= 1);
+ if (maybe_ne (head_use->addr_offset,
+   group->vuses[group_size - 1]->addr_offset))
+   last_use = group->vuses[group_size - 1];
+
+ unsigned j;
+ bitmap_iterator bi;
+ EXECUTE_IF_SET_IN_BITMAP (group->related_cands, 

Re: [PATCH] libstdc++: testsuite: Address random failure in pthread_create() [PR54185]

2020-08-18 Thread Jonathan Wakely via Gcc-patches

On 13/08/20 18:15 -0400, Lewis Hyatt via Libstdc++ wrote:

Hello-

The attached patch was discussed briefly on PR 54185 here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54185#c14
The test case for this PR sometimes fails due to random failures in
pthread_create() that are not related to the original PR. This patch fixes
it up by ignoring those failures. The test case was designed to repeat the
same test 1000 times to attempt to reproduce a race condition, so I think is
OK if some of those iterations are simply skipped.

Thanks for taking a look at it; I can commit it if it makes sense.

-Lewis



libstdc++: testsuite: Address random failure in pthread_create() [PR54185]

The test for this PR calls pthread_create() many times in a row, which may fail
with EAGAIN sometimes. Avoid generating a test failure in this case.

libstdc++-v3/ChangeLog:

PR libstdc++/54185
* testsuite/30_threads/condition_variable/54185.cc: Make test robust
to random pthread_create() failures.


Thanks for the patch. It certainly looks reasonable, but I wonder if
the attached version wouldn't be (very slightly) better. The
difference is that instead of just giving up at the first EAGAIN we
keep trying. This way we might be able to create a few more threads
before the loop finishes. If we still keep failing, it works the same.

I've also added a check that the failures are due to EAGAIN, and we'll
still terminate if there's some other problem. I'm assuming that your
failures are EAGAIN. Do you know why that's happening? Does your
system a low value for RLIMIT_NPROC or something?

The failures for that testcase on AIX appear to be different. It just
segfaults after destroying the condition_variable, which probably
means there's a POSIX conformance issue in AIX's pthread_cond_t.


diff --git a/libstdc++-v3/testsuite/30_threads/condition_variable/54185.cc b/libstdc++-v3/testsuite/30_threads/condition_variable/54185.cc
index ea0d5bb8740..8ccb79e6de6 100644
--- a/libstdc++-v3/testsuite/30_threads/condition_variable/54185.cc
+++ b/libstdc++-v3/testsuite/30_threads/condition_variable/54185.cc
@@ -31,31 +31,48 @@
 std::condition_variable* cond = nullptr;
 std::mutex mx;
 int started = 0;
+bool notified = false;
 int constexpr NUM_THREADS = 10;
 
-void do_thread_a()
+void do_thread_a(bool wait)
 {
   std::unique_lock lock(mx);
-  if(++started >= NUM_THREADS)
+  if (++started >= NUM_THREADS)
   {
+notified = true;
 cond->notify_all();
 delete cond;
 cond = nullptr;
   }
-  else
-cond->wait(lock);
+  else if (wait)
+cond->wait(lock, [] { return notified; });
 }
 
-int main(){
+int main()
+{
   std::vector vec;
-  for(int j = 0; j < 1000; ++j)
+  for (int j = 0; j < 1000; ++j)
   {
 started = 0;
+notified = false;
 cond = new std::condition_variable;
 for (int i = 0; i < NUM_THREADS; ++i)
-  vec.emplace_back(_thread_a);
-for (int i = 0; i < NUM_THREADS; ++i)
-  vec[i].join();
+  {
+	try
+	  {
+	vec.emplace_back(_thread_a, true);
+	  }
+	catch(const std::system_error& e)
+	  {
+	if (e.code() == std::errc::resource_unavailable_try_again)
+	  // Thread creation may fail due to resource limits; run serially.
+	  do_thread_a(false);
+	else
+	  throw;
+	  }
+  }
+for (auto& thread : vec)
+  thread.join();
 vec.clear();
   }
 }


Re: [PATCH] aarch64: Don't generate invalid zero/sign-extend syntax

2020-08-18 Thread Richard Sandiford
Alex Coplan  writes:
> Note that an obvious omission here is that this patch does not touch the
> mult patterns such as *add__mult_. I found
> that I couldn't hit these patterns with C code since multiplications by
> powers of two always get turned into shifts by earlier RTL passes. If
> there's a way to reliably hit these patterns, then perhaps these should
> be updated as well.

Hmm.  Feels like we should either update them or delete them.  E.g.:

  *adds__multp2
  *subs__multp2

were added alongside the adds3.c and subs3.c tests that you're updating,
so if the tests don't/no longer need the multp2 patterns to pass,
there's a good chance that the patterns are redundant.

For reasons I never understood, the canonical representation is to use
(mult …) for powers of 2 inside a (mem …) but shifts outside of (mem …)s.
So perhaps the patterns were originally for address calculations that had
been moved outside of a (mem …) and not updated to shifts instead of mults.

AFAICT the full list of affected patterns is:

  *adds__multp2
  *subs__multp2
  *add__mult_
  *add_uxt_multp2
  *sub_uxt_multp2

Is that right?  If so, I think we should consider a follow-on patch
to delete them.

> Testing:
>  * New test which checks for the correct syntax in all updated
>patterns (fails before and passes after the aarch64.md change).
>  * New test can be assembled by both GAS and llvm-mc following the
>change.
>  * Bootstrapped and regtested on aarch64-none-linux-gnu.
>
> OK for master?

OK as-is if paired with a follow-on patch to delete the patterns above
(preapproved if it passes testing).  Also OK without a follow-on patch
if the fix is extended to the patterns above too (but the first option
is better :-)).

Thanks for taking the time to find a test for each pattern.

Richard


Re: [PATCH] vxworks: Fix GCC selftests for *-wrs-vxworks7-* targets

2020-08-18 Thread Olivier Hainque
Hello Iain,

> On 17 Aug 2020, at 10:08, Iain Buclaw  wrote:
> 
> Hi,
> 
> Currently when building a cross-compiler targeting arm-wrs-vxworks7, the
> selftests fail unless the VSB_DIR environment variable is set.

> The same !nostdinc condition is used for VXWORKS_ADDITIONAL_CPP_SPEC.
> 
> OK for mainline?

Thanks for proposing this patch.

I'd rather remove the self-tests -> -nostdinc thing, apparently
introduced explicitly for VxWorks and this dependency on environment
variables in specs.

Can you please just test for fself-test instead, in both cases, with
a comment like

"Self-tests may be run in contexts where the VxWorks environment
isn't available.  Prevent attempts at designating the location of
runtime header files, libraries or startfiles, which would fail
on unset environment variables and aren't needed for such tests."

?

Thanks in advance,

Best Regards,

Olivier

> Iain.
> 
> ---
> gcc/ChangeLog:
> 
>   * config/vxworks.h (STARTFILE_PREFIX_SPEC): Avoid using VSB_DIR if
>   -nostdinc is used.
> ---
> gcc/config/vxworks.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/config/vxworks.h b/gcc/config/vxworks.h
> index d648d2f23cb..065c9e12b88 100644
> --- a/gcc/config/vxworks.h
> +++ b/gcc/config/vxworks.h
> @@ -108,7 +108,7 @@ along with GCC; see the file COPYING3.  If not see
> 
> #if TARGET_VXWORKS7
> #undef  STARTFILE_PREFIX_SPEC
> -#define STARTFILE_PREFIX_SPEC "%:getenv(VSB_DIR /usr/lib/common)"
> +#define STARTFILE_PREFIX_SPEC "%{!nostdinc:%:getenv(VSB_DIR 
> /usr/lib/common)}"
> #define TLS_SYM "-u __tls__"
> #else
> #define TLS_SYM ""
> -- 
> 2.20.1
> 



[committed] d: Fix ICE Segmentation fault during RTL pass: expand on armhf/armel/s390x

2020-08-18 Thread Iain Buclaw via Gcc-patches
Hi,

This patch fixes an ICE that occurred when RTL was expanding D front-end
generated code.  Now DECL_BY_REFERENCE is only ever set if the return
type is TREE_ADDRESSABLE.

Regstrapped on x86_64-linux-gnu{-m64,-m32,-mx32} configurations,
committed to mainline and backported to releases/gcc-10.

Regards
Iain.

---
gcc/d/ChangeLog:

PR d/96301
* decl.cc (DeclVisitor::visit (FuncDeclaration *)): Only return
non-trivial structs by invisible reference.

gcc/testsuite/ChangeLog:

PR d/96301
* gdc.dg/pr96301a.d: New test.
* gdc.dg/pr96301b.d: New test.
* gdc.dg/pr96301c.d: New test.
---
 gcc/d/decl.cc   | 17 +++--
 gcc/testsuite/gdc.dg/pr96301a.d | 31 +++
 gcc/testsuite/gdc.dg/pr96301b.d | 25 +
 gcc/testsuite/gdc.dg/pr96301c.d | 25 +
 4 files changed, 92 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gdc.dg/pr96301a.d
 create mode 100644 gcc/testsuite/gdc.dg/pr96301b.d
 create mode 100644 gcc/testsuite/gdc.dg/pr96301c.d

diff --git a/gcc/d/decl.cc b/gcc/d/decl.cc
index 72c8a8cff06..295f780170a 100644
--- a/gcc/d/decl.cc
+++ b/gcc/d/decl.cc
@@ -958,11 +958,14 @@ public:
   {
tree resdecl = DECL_RESULT (fndecl);
 
-   TREE_TYPE (resdecl)
- = build_reference_type (TREE_TYPE (resdecl));
-   DECL_BY_REFERENCE (resdecl) = 1;
-   TREE_ADDRESSABLE (resdecl) = 0;
-   relayout_decl (resdecl);
+   /* Return non-trivial structs by invisible reference.  */
+   if (TREE_ADDRESSABLE (TREE_TYPE (resdecl)))
+ {
+   TREE_TYPE (resdecl) = build_reference_type (TREE_TYPE (resdecl));
+   DECL_BY_REFERENCE (resdecl) = 1;
+   TREE_ADDRESSABLE (resdecl) = 0;
+   relayout_decl (resdecl);
+ }
 
if (d->nrvo_var)
  {
@@ -972,7 +975,9 @@ public:
DECL_NAME (resdecl) = DECL_NAME (var);
/* Don't forget that we take its address.  */
TREE_ADDRESSABLE (var) = 1;
-   resdecl = build_deref (resdecl);
+
+   if (DECL_BY_REFERENCE (resdecl))
+ resdecl = build_deref (resdecl);
 
SET_DECL_VALUE_EXPR (var, resdecl);
DECL_HAS_VALUE_EXPR_P (var) = 1;
diff --git a/gcc/testsuite/gdc.dg/pr96301a.d b/gcc/testsuite/gdc.dg/pr96301a.d
new file mode 100644
index 000..314904bbd06
--- /dev/null
+++ b/gcc/testsuite/gdc.dg/pr96301a.d
@@ -0,0 +1,31 @@
+// https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96301
+// { dg-additional-options "-fPIC" { target fpic } }
+// { dg-do compile }
+struct Type
+{
+size_t length;
+int* ptr;
+}
+
+class Container
+{
+Type children;
+
+void remove(void* data)
+{
+Type remove(Type range)
+{
+auto result = range;
+if (result.front)
+return result;
+assert(0);
+}
+if (data)
+remove(children);
+}
+}
+
+int front(Type a)
+{
+return a.ptr[0];
+}
diff --git a/gcc/testsuite/gdc.dg/pr96301b.d b/gcc/testsuite/gdc.dg/pr96301b.d
new file mode 100644
index 000..3d978beebc8
--- /dev/null
+++ b/gcc/testsuite/gdc.dg/pr96301b.d
@@ -0,0 +1,25 @@
+// https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96301
+// { dg-additional-options "-fPIC" { target fpic } }
+// { dg-do compile }
+class Container
+{
+int[100] children;
+
+void remove(void* data)
+{
+int[100] remove(int[100] range)
+{
+auto result = range;
+if (result.front)
+return result;
+assert(0);
+}
+if (data)
+remove(children);
+}
+}
+
+int front(int[100] a)
+{
+return a.ptr[0];
+}
diff --git a/gcc/testsuite/gdc.dg/pr96301c.d b/gcc/testsuite/gdc.dg/pr96301c.d
new file mode 100644
index 000..c9094625016
--- /dev/null
+++ b/gcc/testsuite/gdc.dg/pr96301c.d
@@ -0,0 +1,25 @@
+// https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96301
+// { dg-additional-options "-fPIC" { target fpic } }
+// { dg-do compile }
+class Container
+{
+int[] children;
+
+void remove(void* data)
+{
+int[] remove(int[] range)
+{
+auto result = range;
+if (result.front)
+return result;
+assert(0);
+}
+if (data)
+remove(children);
+}
+}
+
+int front(int[] a)
+{
+return a.ptr[0];
+}
-- 
2.25.1



[PATCH]Adjust testcase.

2020-08-18 Thread Hongtao Liu via Gcc-patches
Hi:
  Rewriting testcase with cpp source file, then compare operator could
be used directly for vector, this would avoid impact of vectorizer.

gcc/testsuite/ChangeLog:
PR target/96667
* gcc.target/i386/avx512bw-pr96246-1.c: Moved to...
* g++.target/i386/avx512bw-pr96246-1.C: ...here.
* gcc.target/i386/avx512bw-pr96246-2.c: Moved to...
* g++.target/i386/avx512bw-pr96246-2.C: ...here.
* gcc.target/i386/avx512vl-pr96246-1.c: Moved to...
* g++.target/i386/avx512vl-pr96246-1.C: ...here.
* gcc.target/i386/avx512vl-pr96246-2.c: Moved to...
* g++.target/i386/avx512vl-pr96246-2.C: ...here.

-- 
BR,
Hongtao
From 51395137a777c1f9562ac7b0258acf5edf9d360d Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Tue, 18 Aug 2020 15:06:01 +0800
Subject: [PATCH] Adjust testcase.

gcc/testsuite/ChangeLog:
	PR target/96667
	* gcc.target/i386/avx512bw-pr96246-1.c: Moved to...
	* g++.target/i386/avx512bw-pr96246-1.C: ...here.
	* gcc.target/i386/avx512bw-pr96246-2.c: Moved to...
	* g++.target/i386/avx512bw-pr96246-2.C: ...here.
	* gcc.target/i386/avx512vl-pr96246-1.c: Moved to...
	* g++.target/i386/avx512vl-pr96246-1.C: ...here.
	* gcc.target/i386/avx512vl-pr96246-2.c: Moved to...
	* g++.target/i386/avx512vl-pr96246-2.C: ...here.
---
 .../i386/avx512bw-pr96246-1.C}| 11 --
 .../i386/avx512bw-pr96246-2.C}| 20 +--
 .../i386/avx512vl-pr96246-1.C}| 11 --
 .../i386/avx512vl-pr96246-2.C}| 20 +--
 4 files changed, 18 insertions(+), 44 deletions(-)
 rename gcc/testsuite/{gcc.target/i386/avx512bw-pr96246-1.c => g++.target/i386/avx512bw-pr96246-1.C} (68%)
 rename gcc/testsuite/{gcc.target/i386/avx512bw-pr96246-2.c => g++.target/i386/avx512bw-pr96246-2.C} (74%)
 rename gcc/testsuite/{gcc.target/i386/avx512vl-pr96246-1.c => g++.target/i386/avx512vl-pr96246-1.C} (73%)
 rename gcc/testsuite/{gcc.target/i386/avx512vl-pr96246-2.c => g++.target/i386/avx512vl-pr96246-2.C} (76%)

diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-pr96246-1.c b/gcc/testsuite/g++.target/i386/avx512bw-pr96246-1.C
similarity index 68%
rename from gcc/testsuite/gcc.target/i386/avx512bw-pr96246-1.c
rename to gcc/testsuite/g++.target/i386/avx512bw-pr96246-1.C
index 2bfcc840a91..eec844460f1 100644
--- a/gcc/testsuite/gcc.target/i386/avx512bw-pr96246-1.c
+++ b/gcc/testsuite/g++.target/i386/avx512bw-pr96246-1.C
@@ -1,8 +1,8 @@
 /* PR target/96246 */
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize -mavx512bw" } */
-/* { dg-final { scan-assembler-times "vpblendm\[bwdq\]\[\t ]" 4 } } */
-/* { dg-final { scan-assembler-times "vblendmp\[sd\]\[\t ]" 2 } } */
+/* { dg-options "-O2 -std=c++14 -mavx512bw" } */
+/* { dg-final { scan-assembler-times "vpblendm\[bwdq\]\[\t \]" 4 } } */
+/* { dg-final { scan-assembler-times "vblendmp\[sd\]\[\t \]" 2 } } */
 
 typedef char v64qi __attribute__((vector_size (64)));
 typedef short v32hi __attribute__((vector_size (64)));
@@ -16,10 +16,7 @@ typedef double v8df __attribute__((vector_size (64)));
   __attribute__ ((noipa))\
   foo_##vtype (vtype a, vtype b, vtype c, vtype d)	\
   {			\
-vtype e;		\
-for (int i = 0; i != num; i++)			\
-  e[i] = a[i] > b[i] ? c[i] : d[i];			\
-return e;		\
+return a > b ? c : d;\
   }
 
 COMPILE_TEST (v64qi, 64);
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-pr96246-2.c b/gcc/testsuite/g++.target/i386/avx512bw-pr96246-2.C
similarity index 74%
rename from gcc/testsuite/gcc.target/i386/avx512bw-pr96246-2.c
rename to gcc/testsuite/g++.target/i386/avx512bw-pr96246-2.C
index 422fcfe4ea8..b96b7c7c932 100644
--- a/gcc/testsuite/gcc.target/i386/avx512bw-pr96246-2.c
+++ b/gcc/testsuite/g++.target/i386/avx512bw-pr96246-2.C
@@ -1,19 +1,9 @@
 /* PR target/96246 */
 /* { dg-do run } */
 /* { dg-require-effective-target avx512bw } */
-/* { dg-options "-Ofast -mavx512bw" } */
+/* { dg-options "-O2 -std=c++14 -mavx512bw" } */
 
-#ifndef CHECK
-#define CHECK "avx512f-helper.h"
-#endif
-
-#include CHECK
-
-#ifndef TEST
-#define TEST avx512bw_test
-#endif
-
-#include "avx512bw-pr96246-1.c"
+#include "avx512bw-pr96246-1.C"
 
 #define RUNTIME_TEST(vtype, num)			\
   do			\
@@ -34,9 +24,8 @@
 }			\
   while (0)
 
-static void
-__attribute__ ((optimize (0)))
-TEST (void)
+int
+main (void)
 {
   RUNTIME_TEST (v64qi, 64);
   RUNTIME_TEST (v32hi, 32);
@@ -44,4 +33,5 @@ TEST (void)
   RUNTIME_TEST (v8di, 8);
   RUNTIME_TEST (v16sf, 16);
   RUNTIME_TEST (v8df, 8);
+  return 0;
 }
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-pr96246-1.c b/gcc/testsuite/g++.target/i386/avx512vl-pr96246-1.C
similarity index 73%
rename from gcc/testsuite/gcc.target/i386/avx512vl-pr96246-1.c
rename to gcc/testsuite/g++.target/i386/avx512vl-pr96246-1.C
index 95357d6fc84..66eb9d25f1e 100644
--- a/gcc/testsuite/gcc.target/i386/avx512vl-pr96246-1.c
+++ b/gcc/testsuite/g++.target/i386/avx512vl-pr96246-1.C
@@ -1,8 

*PING* / Re: [Patch] Fortran: Add 'device_type' clause to OpenMP's declare target

2020-08-18 Thread Tobias Burnus

On 8/7/20 5:03 PM, Tobias Burnus wrote:

This patch adds the device_type(any|nohost|host)
clause for 'omp declare target' to Fortran.

In OpenMP 5.0, it has no effect on variables but
only on procedures – in TR8 (and later), it also
affects variables.

This patch adds this clause to either – except that
the middle end does not seem to like 'target link'
with that clause – for normal variables, common
blocks are accepted. (In line with OpenMP 5, the
middle end ignores the clause for variables.)

OK?

Tobias


-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


Re: [PATCH 3/3] Power10: Add tests for PCREL_OPT support.

2020-08-18 Thread Michael Meissner via Gcc-patches
[PATCH 3/3] Power10: Add tests for PCREL_OPT support.

These are the tests for the PCREL_OPT support.

gcc/testsuite/
2020-08-18  Michael Meissner  

* gcc.target/powerpc/pcrel-opt-inc-di.c: New PCREL_OPT test.
* gcc.target/powerpc/pcrel-opt-ld-df.c: New PCREL_OPT test.
* gcc.target/powerpc/pcrel-opt-ld-di.c: New PCREL_OPT test.
* gcc.target/powerpc/pcrel-opt-ld-hi.c: New PCREL_OPT test.
* gcc.target/powerpc/pcrel-opt-ld-qi.c: New PCREL_OPT test.
* gcc.target/powerpc/pcrel-opt-ld-sf.c: New PCREL_OPT test.
* gcc.target/powerpc/pcrel-opt-ld-si.c: New PCREL_OPT test.
* gcc.target/powerpc/pcrel-opt-ld-vector.c: New PCREL_OPT test.
* gcc.target/powerpc/pcrel-opt-st-df.c: New PCREL_OPT test.
* gcc.target/powerpc/pcrel-opt-st-di.c: New PCREL_OPT test.
* gcc.target/powerpc/pcrel-opt-st-hi.c: New PCREL_OPT test.
* gcc.target/powerpc/pcrel-opt-st-qi.c: New PCREL_OPT test.
* gcc.target/powerpc/pcrel-opt-st-sf.c: New PCREL_OPT test.
* gcc.target/powerpc/pcrel-opt-st-si.c: New PCREL_OPT test.
* gcc.target/powerpc/pcrel-opt-st-vector.c: New PCREL_OPT test.
---
 .../gcc.target/powerpc/pcrel-opt-inc-di.c  | 18 +
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-df.c | 36 ++
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-di.c | 43 ++
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-hi.c | 42 +
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-qi.c | 42 +
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-sf.c | 42 +
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-si.c | 41 +
 .../gcc.target/powerpc/pcrel-opt-ld-vector.c   | 36 ++
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-df.c | 36 ++
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-di.c | 43 ++
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-hi.c | 42 +
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-qi.c | 42 +
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-sf.c | 36 ++
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-si.c | 41 +
 .../gcc.target/powerpc/pcrel-opt-st-vector.c   | 36 ++
 15 files changed, 576 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-inc-di.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-df.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-di.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-hi.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-qi.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-sf.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-si.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-vector.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-df.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-di.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-hi.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-qi.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-sf.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-si.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-vector.c

diff --git a/gcc/testsuite/gcc.target/powerpc/pcrel-opt-inc-di.c 
b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-inc-di.c
new file mode 100644
index 000..f165068
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-inc-di.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_pcrel } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
+
+#define TYPE   unsigned int
+
+/* Test whether using an external variable twice (doing an increment) prevents
+   the PCREL_OPT optimization.  */
+extern TYPE ext;
+
+void
+inc (void)
+{
+  ext++;   /* No PCREL_OPT (uses address twice).  */
+}
+
+/* { dg-final { scan-assembler-not "R_PPC64_PCREL_OPT" } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-df.c 
b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-df.c
new file mode 100644
index 000..d35862f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-df.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_pcrel } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
+
+#define TYPE   double
+#define LARGE  0x2
+
+/* Test whether we get the right number of PCREL_OPT optimizations for
+   double.  */
+extern TYPE ext[];
+
+TYPE
+get (void)
+{
+  return ext[0];   /* PCREL_OPT relocation.  */
+}
+
+TYPE
+get2 (void)
+{
+  return ext[2];   /* PCREL_OPT relocation.  */
+}
+
+TYPE
+get_large (void)
+{
+  return ext[LARGE];   /* No PCREL_OPT (load 

Re: [PATCH 2/3] Power10: Add PCREL_OPT store support.

2020-08-18 Thread Michael Meissner via Gcc-patches
[PATCH 2/3] Power10: Add PCREL_OPT store support.

This patch adds support for optimizing power10 stores to an external variable
to eliminate loading the address of the variable, and then doing a subsequent
store using that address.

I have built compilers with and without these set of 3 patches doing a
bootstrap build and make check.  There were no regressions, and the new tests
passed.  Can I check these patches into the master branch for GCC?  Because
this is new functionality, I do not intend to back port these patches to GCC 10
at this time.

gcc/
2020-08-18  Michael Meissner  

* config/rs6000/pcrel-opt.c (counters): Add fields to count number
of PCREL_OPT stores that were processed.
(do_pcrel_opt_store): New function to do PCREL_OPT stores.
(do_pcrel_opt_addr): Add support to optimize PCREL_OPT stores.
(do_pcrel_opt_pass): Print out statistics for PCREL_OPT stores.
* config/rs6000/pcrel-opt.md (UNSPEC_PCREL_OPT_ST_ADDR): New
unspec.
(UNSPEC_PCREL_OPT_ST_RELOC): New unspec.
(pcrel_opt_st_addr): New insns for PCREL_OPT stores.
(pcrel_opt_st): New insns for QI/HI/SI PCREL_OPT stores.
(pcrel_opt_stdi): New insn to optimize DI PCREL_OPT stores.
(pcrel_opt_stsf): New insn to optimize SF PCREL_OPT stores.
(pcrel_opt_stdf): New insn to optimize DF PCREL_OPT stores.
(pcrel_opt_st): New insns to optimize vector PCREL_OPT
stores.
* config/rs6000/rs6000.c (rs6000_delegitimize_address): Add
support to de-legitimize PCREL_OPT stores.
---
 gcc/config/rs6000/pcrel-opt.c  | 259 +++--
 gcc/config/rs6000/pcrel-opt.md | 115 +-
 gcc/config/rs6000/rs6000.c |   3 +-
 3 files changed, 367 insertions(+), 10 deletions(-)

diff --git a/gcc/config/rs6000/pcrel-opt.c b/gcc/config/rs6000/pcrel-opt.c
index 10b4bc4..61dce67 100644
--- a/gcc/config/rs6000/pcrel-opt.c
+++ b/gcc/config/rs6000/pcrel-opt.c
@@ -53,6 +53,43 @@
 
We only look for a single usage in the basic block where the external
address is loaded.  Multiple uses or references in another basic block will
+   force us to not use the PCREL_OPT relocation.
+
+   We also optimize stores to the address of an external variable using the
+   PCREL_GOT relocation and a single store that uses that external address.  If
+   that is found we create the PCREL_OPT relocation to possibly convert:
+
+   pld addr_reg,var@pcrel@got(0),1
+
+   
+
+   stw data_reg,0(addr_reg)
+
+   into:
+
+   pstw data_reg,var@pcrel(0),1
+
+   
+
+   nop
+
+   If the variable is not defined in the main program or the code using it is
+   not in the main program, the linker put the address in the .got section and
+   do:
+
+   .section .got
+   .Lvar_got:
+   .dword var
+
+   .section .text
+   pld addr_reg,.Lvar_got@pcrel(0),1
+
+   
+
+   stw data_reg,0(addr_reg)
+
+   We only look for a single usage in the basic block where the external
+   address is loaded.  Multiple uses or references in another basic block will
force us to not use the PCREL_OPT relocation.  */
 
 #define IN_TARGET_CODE 1
@@ -82,11 +119,11 @@
 #include "insn-codes.h"
 
 
-// Maximum number of insns to scan between the load address and the load that
-// uses that address.  This can be bumped up if desired.  If the insns are far
-// enough away, the PCREL_OPT optimization probably does not help, since the
-// load of the external address has probably completed by the time we do the
-// load of the variable at that address.
+// Maximum number of insns to scan between the load address and the load or
+// store that uses that address.  This can be bumped up if desired.  If the
+// insns are far enough away, the PCREL_OPT optimization probably does not
+// help, since the load of the external address has probably completed by the
+// time we do the load or store of the variable at that address.
 const int MAX_PCREL_OPT_INSNS  = 10;
 
 /* Next PCREL_OPT label number.  */
@@ -97,6 +134,8 @@ static struct {
   unsigned long extern_addrs;
   unsigned long loads;
   unsigned long load_separation[MAX_PCREL_OPT_INSNS+1];
+  unsigned long stores;
+  unsigned long store_separation[MAX_PCREL_OPT_INSNS+1];
 } counters;
 
 
@@ -306,6 +345,156 @@ do_pcrel_opt_load (rtx_insn *addr_insn,   // insn 
loading address
 }
 
 
+// Optimize a PC-relative load address to be used in a store.
+
+// If the sequence of insns is safe to use the PCREL_OPT optimization (i.e. no
+// additional references to the address register, the address register dies at
+// the load, and no references to the load), convert insns of the form:
+//
+// (set (reg:DI addr)
+//  (symbol_ref:DI "ext_symbol"))
+//
+// ...
+//
+// (set (mem: (reg:DI addr))
+//  (reg: value))
+//
+// into:
+//
+// (parallel [(set (reg:DI addr)
+// 

Re: [PATCH 1/3] Power10: Add PCREL_OPT load support

2020-08-18 Thread Michael Meissner via Gcc-patches
[PATCH 1/3] Power10: Add PCREL_OPT load support.

This patch adds support for optimizing power10 loads of an external variable to
eliminate loading the address of the variable, and then doing a subsequent load
using that address.

I have built compilers with and without these set of 3 patches doing a
bootstrap build and make check.  There were no regressions, and the new tests
passed.  Can I check these patches into the master branch for GCC?  Because
this is new functionality, I do not intend to back port these patches to GCC 10
at this time.

gcc/
2020-08-18  Michael Meissner  

* config.gcc (powerpc*-*-*): Add pcrel-opt.o.
(rs6000*-*-*): Add pcrel-opt.o.
* config/rs6000/pcrel-opt.c: New file.
* config/rs6000/pcrel-opt.md: New file.
* config/rs6000/predicates.md (d_form_memory): New predicate.
* config/rs6000/rs6000-cpus.def (OTHER_POWER10_MASKS): Add
-mpcrel-opt.
(POWERPC_MASKS): Add -mpcrel-opt.
* config/rs6000/rs6000-passes.def: Add PCREL_OPT pass.
* config/rs6000/rs6000-protos.h (reg_to_non_prefixed): New
declaration.
(make_pass_pcrel_opt): New declaration.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Add
support for -mpcrel-opt.
(rs6000_delegitimize_address): Add support for PCREL_OPT
addresses.
(print_operand, 'r' case): New operand for PCREL_OPT.
(rs6000_opt_masks): Add -mpcrel-opt.
(rs6000_asm_output_opcode): Reset flag to emit the initial 'p'
after use.
* config/rs6000/rs6000.md (loads_extern_addr attribute): New
attribute.
(isa attribute): Add pcrel_opt sub-case.
(enabled attribute): Add support for pcrel_opt.
(pcrel_extern_addr): Set loads_extern_addr attribute.
(toplevel): Include pcrel-opt.md.
* config/rs6000/rs6000.opt (-mpcrel-opt): New debug option.
* config/rs6000/t-rs6000 (pcrel-opt.o): Add build rule.
(MD_INCLUDES): Add pcrel-opt.md.
---
 gcc/config.gcc  |   6 +-
 gcc/config/rs6000/pcrel-opt.c   | 656 
 gcc/config/rs6000/pcrel-opt.md  | 248 ++
 gcc/config/rs6000/predicates.md |  23 ++
 gcc/config/rs6000/rs6000-cpus.def   |   2 +
 gcc/config/rs6000/rs6000-passes.def |   8 +
 gcc/config/rs6000/rs6000-protos.h   |   2 +
 gcc/config/rs6000/rs6000.c  |  40 ++-
 gcc/config/rs6000/rs6000.md |  14 +-
 gcc/config/rs6000/rs6000.opt|   4 +
 gcc/config/rs6000/t-rs6000  |   7 +-
 11 files changed, 1001 insertions(+), 9 deletions(-)
 create mode 100644 gcc/config/rs6000/pcrel-opt.c
 create mode 100644 gcc/config/rs6000/pcrel-opt.md

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 2370368..605d743 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -505,7 +505,7 @@ or1k*-*-*)
;;
 powerpc*-*-*)
cpu_type=rs6000
-   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
rs6000-call.o"
+   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
rs6000-call.o pcrel-opt.o"
extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h"
extra_headers="${extra_headers} bmi2intrin.h bmiintrin.h"
extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h"
@@ -520,6 +520,7 @@ powerpc*-*-*)
esac
extra_options="${extra_options} g.opt fused-madd.opt 
rs6000/rs6000-tables.opt"
target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/rs6000-logue.c 
\$(srcdir)/config/rs6000/rs6000-call.c"
+   target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/pcrel-opt.c"
;;
 pru-*-*)
cpu_type=pru
@@ -531,8 +532,9 @@ riscv*)
;;
 rs6000*-*-*)
extra_options="${extra_options} g.opt fused-madd.opt 
rs6000/rs6000-tables.opt"
-   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
rs6000-call.o"
+   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
rs6000-call.o pcrel-opt.o"
target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/rs6000-logue.c 
\$(srcdir)/config/rs6000/rs6000-call.c"
+   target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/pcrel-opt.c"
;;
 sparc*-*-*)
cpu_type=sparc
diff --git a/gcc/config/rs6000/pcrel-opt.c b/gcc/config/rs6000/pcrel-opt.c
new file mode 100644
index 000..10b4bc4
--- /dev/null
+++ b/gcc/config/rs6000/pcrel-opt.c
@@ -0,0 +1,656 @@
+/* Subroutines used support the pc-relative linker optimization.
+   Copyright (C) 2020 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or 

[PATCH 0/3] Power10 PCREL_OPT support

2020-08-18 Thread Michael Meissner via Gcc-patches
The ELF-v2 ISA 3.1 support for Power10 has relocations to optimize cases where
the code is references an external variable in only one location.  This patch
is similar to the optimizations that the linker already does to optimize TOC
accesses.

I will be submitting 3 patches as follow-ups to this message:

* The first patch adds support for PCREL_OPT loads;
* The second patch adds support for PCREL_OPT stores; (and)
* The third patch adds the tests.

If the program is compiled to be the main program, and the variable is defined
in the main program, these relocations will convert loading up the address of
the external variable and then doing a load or store using that address to be
doing the prefixed load or store directly and converting the second instruction
into a NOP.

For example, consider the following program:

extern int ext_variable;

int ret_var (void)
{
  return ext_variable;
}

void store_var (int i)
{
  ext_variable = i;
}

Currently on power10, the compiler compiles this as:

ret_var:
pld 9,ext_variable@got@pcrel
lwa 3,0(9)
blr

store_var:
pld 9,ext_variable@got@pcrel
stw 3,0(9)
blr

That is, it loads up the address of 'ext_variable' from the GOT table into
register r9, and then uses r9 as a base register to reference the actual
variable.

The linker does optimize the case where you are compiling the main program, and
the variable is also defined in the main program to be:

ret_var:
pla 9,ext_variable,1
lwa 3,0(9)
blr

store_var:
pla 9,ext_variable,1
stw 3,0(9)
blr

These patches generate:

ret_var:
pld 9,ext_variable@got@pcrel
.Lpcrel1:
.reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8)
lwa 3,0(9)
blr

store_var:
pld 9,ext_variable@got@pcrel
.Lpcrel2:
.reloc .Lpcrel2-8,R_PPC64_PCREL_OPT,.-(.Lpcrel2-8)
stw 3,0(9)
blr

Note, the label for locating the PLD occurs after the PLD and not before it.
This is so that if the assembler adds a NOP in front of the PLD to align it,
the relocations will still work.

If the linker can, it will convert the code into:

ret_var:
plwa3,ext_variable,1
nop
blr

store_var:
pstw3,ext_variable,1
nop
blr

These patches allow the load of the address to not be physically adjacent to
the actual load or store, which should allow for better code.

For loads, there must no references to the register that is being loaded
between the PLD and the actual load.

For stores, it becomes a little trickier, in that the register being stored
must be live at the time the PLD instruction is done, and it must continue to
be live and unmodified between the PLD and the store.

For both loads and stores, there must be only one reference to the address
being loaded into a base register, and that base register must die at the point
of the load/store.

In order to do this, the pass that converts the load address and load/store
must occur late in the compilation cycle.  In particular, the second scheduler
pass will duplicate and optimize some of the references and it will produce an
invalid program.  In the past, Segher has said that we should be able to move
it earlier.  I have my doubts whether that is feasible.  What I would like to
do is put these patches into GCC 11, which will enable many of the cases that
we want to optimize.

Then somebody else can take a swing at doing the optimization to allow the code
to do this optimization earlier.  That way, even if we can't get the super
optimized code to work, we at least will get the majority of cases to work.

For reference, here is what the current compiler generates for a medium code
model system targeting power9 with the TOC support:

.section".toc","aw"
.LC0:
.quad   ext_variable
.section".text"

ret_var:
.LCF0:
0:  addis   2,12,.TOC.-.LCF0@ha
addi2,2,.TOC.-.LCF0@l
.localentry ret_var,.-ret_var
addis   9,2,.LC0@toc@ha
ld  9,.LC0@toc@l(9)
lwa 3,0(9)
blr

.section".toc","aw"
.set .LC1,.LC0

.section".text"
store_var:
.LCF1:
0:  addis   2,12,.TOC.-.LCF1@ha
addi2,2,.TOC.-.LCF1@l
.localentry store_var,.-store_var
addis   9,2,.LC1@toc@ha
ld  9,.LC1@toc@l(9)
  

Re: [PATCH]Don't use pinsr for struct initialization.

2020-08-18 Thread Uros Bizjak via Gcc-patches
On Tue, Aug 18, 2020 at 4:23 AM Hongtao Liu  wrote:
>
> On Fri, Aug 14, 2020 at 5:57 PM Uros Bizjak  wrote:
> >
> > On Fri, Aug 14, 2020 at 8:03 AM Hongtao Liu  wrote:
> > >
> > > Hi:
> > >   For struct initialization, when it fits in a TImode, gcc will use
> > > pinsr insn which causes poor codegen described in PR93897 and PR96562.
> >
> > You should probably remove TImode handling also from ix86_expand_pextr.
> >
>
> Yes, but i failed to construct a testcase to cover this part.
> Anyway, the regression test for i386/x86-64 backend is ok, bootstrap is ok.
> I also run the patch on SPEC2017, no big impact.
>
> > Uros.
> >
> > >   Bootstrap is ok, regression test is ok for i386/x86-64 backend.
> > >   Ok for trunk?
> > >
> > > ChangeLog
> > > gcc/
> > > PR target/96562
> > > PR target/93897
> > > * config/i386/i386-expand.c (ix86_expand_pinsr): Don't use
> > > pinsr for TImode.
> > >
> > > gcc/testsuite/
> > > * gcc.target/i386/pr96562-1.c: New test.
> > >
> > > --
> > > BR,
> > > Hongtao
>
> Update patch.

OK for mainline and backports.

Thanks,
Uros.