Re: [PATCH] ipa-inline: Improve growth accumulation for recursive calls

2020-08-12 Thread Feng Xue OS via Gcc-patches
> Hello,
> with Martin we spent some time looking into exchange2 and my
> understanding of the problem is the following:
> 
> There is the self recursive function digits_2 with the property that it
> has 10 nested loops and calls itself from the innermost.
> Now we do not do amazing job on guessing the profile since it is quite
> atypical. First observation is that the callback frequencly needs to be
> less than 1 otherwise the program never terminates, however with 10
> nested loops one needs to predict every loop to iterate just few times
> and conditionals guarding them as not very likely. For that we added
> PRED_LOOP_GUARD_WITH_RECURSION some time ago and I fixed it yesterday
> (causing regression in exhange since the bad profile turned out to
> disable some harmful vectorization) and I also now added a cap to the
> self recursive frequency so things to not get mispropagated by ipa-cp.

With default setting of PRED_LOOP_GUARD_WITH_RECURSION, static profile
estimation for exchange2 is far from accurate, the hottest recursive function
is predicted as infrequent. However, this low execution estimation works fine
with IRA. I've tried to tweak likelihood of the predictor, same as you,
performance was degraded when estimated profile increased. This regression
is also found to be correlated with IRA, which produces much more register
spills than default. In presence of deep loops and high register pressure, IRA
behaves more sensitively to profile estimation, and this exhibits an unwanted
property of current IRA algorithm. I've described it in a tracker
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90174).

Feng

> 
> Now if ipa-cp decides to duplicate digits few times we have a new
> problem.  The tree of recursion is orgnaized in a way that the depth is
> bounded by 10 (which GCC does not know) and moreover most time is not
> spent on very deep levels of recursion.
> 
> For that you have the patch which increases frequencies of recursively
> cloned nodes, however it still seems to me as very specific hack for
> exchange: I do not see how to guess where most of time is spent.
> Even for very regular trees, by master theorem, it depends on very
> little differences in the estimates of recursion frequency whether most
> of time is spent on the top of tree, bottom or things are balanced.
> 
> With algorithms doing backtracing, like exhchange, the likelyness of
> recusion reduces with deeper recursion level, but we do not know how
> quickly and what the level is.
> 
>> From: Xiong Hu Luo 
>> 
>>  For SPEC2017 exchange2, there is a large recursive functiondigits_2(function
>>  size 1300) generates specialized node from digits_2.1 to digits_2.8 with 
>> added
>>  build option:
>> 
>>  --param ipa-cp-eval-threshold=1 --param ipa-cp-unit-growth=80
>> 
>>  ipa-inline pass will consider inline these nodes called only once, but these
>>  large functions inlined too deeply will cause serious register spill and
>>  performance down as followed.
>> 
>>  inlineA: brute (inline digits_2.1, 2.2, 2.3, 2.4) -> digits_2.5 (inline 
>> 2.6, 2.7, 2.8)
>>  inlineB: digits_2.1 (inline digits_2.2, 2.3) -> call digits_2.4 (inline 
>> digits_2.5, 2.6) -> call digits_2.7 (inline 2.8)
>>  inlineC: brute (inline digits_2) -> call 2.1 -> 2.2 (inline 2.3) -> 2.4 -> 
>> 2.5 -> 2.6 (inline 2.7 ) -> 2.8
>>  inlineD: brute -> call digits_2 -> call 2.1 -> call 2.2 -> 2.3 -> 2.4 -> 
>> 2.5 -> 2.6 -> 2.7 -> 2.8
>> 
>>  Performance diff:
>>  inlineB is ~25% faster than inlineA;
>>  inlineC is ~20% faster than inlineB;
>>  inlineD is ~30% faster than inlineC.
>> 
>>  The master GCC code now generates inline sequence like inlineB, this patch
>>  makes the ipa-inline pass behavior like inlineD by:
>>   1) The growth acumulation for recursive calls by adding the growth data
>>  to the edge when edge's caller is inlined into another function to avoid
>>  inline too deeply;
>>   2) And if caller and callee are both specialized from same node, the edge
>>  should also be considered as recursive edge.
>> 
>>  SPEC2017 test shows GEOMEAN improve +2.75% in total(+0.56% without 
>> exchange2).
>>  Any comments?  Thanks.
>> 
>>  523.xalancbmk_r +1.32%
>>  541.leela_r +1.51%
>>  548.exchange2_r +31.87%
>>  507.cactuBSSN_r +0.80%
>>  526.blender_r   +1.25%
>>  538.imagick_r   +1.82%
>> 
>>  gcc/ChangeLog:
>> 
>>  2020-08-12  Xionghu Luo  
>> 
>>* cgraph.h (cgraph_edge::recursive_p): Return true if caller and
>>callee and specialized from same node.
>>* ipa-inline-analysis.c (do_estimate_growth_1): Add caller's
>>inlined_to growth to edge whose caller is inlined.
>>  ---
>>   gcc/cgraph.h  | 2 ++
>>   gcc/ipa-inline-analysis.c | 3 +++
>>   2 files changed, 5 insertions(+)
>> 
>>  diff --git a/gcc/cgraph.h b/gcc/cgraph.h
>>  index 0211f08964f..11903ac1960 100644
>>  --- a/gcc/cgraph.h
>>  +++ b/gcc/cgraph.h
>>  @@ -3314,6 +3314,8 @@ cgraph_edge::recursive_p (void)
>> cgraph_node *c = 

gcc.dg/pr94600-5.c .. -8.c: Align struct t0 explictly, as a type, PR middle-end/94600

2020-08-12 Thread Hans-Peter Nilsson via Gcc-patches
Committed as obvious.

The bitfield-struct t0 in gcc.dg/pr94600-1.c ..-4.c is assigned through
a pointer that is a (volatile-and-pointer-)cast literal, so gcc doesn't
need to be otherwise told that the address is aligned.  But, variants
pr94600-5.c ..-8.c are assigned through a "volatile t0 *", and rely on
the *type* being naturally aligned, or that the machine has
non-strict-alignment moves.

Unfortunately, systems exist (for some definitions of exist) where
such structs aren't always naturally aligned, for example if it
contains only (small) bitfields, even though the size is a naturally
accessible size.  Specifically, the mmix-knuth-mmixware port has only
*byte* alignment for this struct.  (If an int is added to the struct,
alignment is promoted.)  IOW, a prerequisite of the test is false: the
struct doesn't have the same alignment as an integer of the same size.
The effect is assignment in byte-size pieces, and the test fails.
(For a non-volatile assignment, memcpy is called.)  That's easily
fixable by defining the type as having a specific alignment.  This is
also closer to the type in the original code, and also as the first
variants aren't affected, no second thought or re-visit of pre-fixed
compiler is needed.  I don't plan to back-port this to gcc-10 branch
however.  I did sanity-check that the tests still pass on
ppc64le-linux.

gcc/testsuite:

PR middle-end/94600
* gcc.dg/pr94600-5.c, gcc.dg/pr94600-6.c, gcc.dg/pr94600-7.c,
gcc.dg/pr94600-8.c: Align t0 to 4-byte boundary.

--- gcc/gcc/testsuite/gcc.dg/pr94600-5.c.orig   Mon Jul 13 21:02:59 2020
+++ gcc/gcc/testsuite/gcc.dg/pr94600-5.cSun Aug  9 05:03:32 2020
@@ -9,7 +9,7 @@ typedef struct {
   unsigned int f1 : 11;
   unsigned int f2 : 10;
   unsigned int f3 : 7;
-} t0;
+} t0 __attribute__((__aligned__(4)));
 
 static t0 a0[] = {
  { .f0 = 7, .f1 = 99, .f3 = 1, },
--- gcc/gcc/testsuite/gcc.dg/pr94600-6.c.orig   Mon Jul 13 21:02:59 2020
+++ gcc/gcc/testsuite/gcc.dg/pr94600-6.cSun Aug  9 05:05:36 2020
@@ -9,7 +9,7 @@ typedef struct {
   unsigned int f1 : 11;
   unsigned int f2 : 10;
   unsigned int f3 : 7;
-} t0;
+} t0 __attribute__((__aligned__(4)));
 
 void
 bar(volatile t0 *b)
--- gcc/gcc/testsuite/gcc.dg/pr94600-7.c.orig   Mon Jul 13 21:02:59 2020
+++ gcc/gcc/testsuite/gcc.dg/pr94600-7.cSun Aug  9 05:05:47 2020
@@ -9,7 +9,7 @@ typedef struct {
   unsigned int f1 : 11;
   unsigned int f2 : 10;
   unsigned int f3 : 7;
-} t0;
+} t0 __attribute__((__aligned__(4)));
 
 static t0 a0[] = {
  { .f0 = 7, .f1 = 99, .f3 = 1, },
--- gcc/gcc/testsuite/gcc.dg/pr94600-8.c.orig   Mon Jul 13 21:02:59 2020
+++ gcc/gcc/testsuite/gcc.dg/pr94600-8.cSun Aug  9 05:05:54 2020
@@ -9,7 +9,7 @@ typedef struct {
   unsigned int f1 : 11;
   unsigned int f2 : 10;
   unsigned int f3 : 7;
-} t0;
+} t0 __attribute__((__aligned__(4)));
 
 void
 bar(volatile t0 *b)


Re: [Patch, fortran] PR93671 - gfortran 8-10 ICE on intrinsic assignment to allocatable derived-type component of coarray

2020-08-12 Thread Jerry DeLisle via Gcc-patches

This look good, OK to commit.

Thanks,

Jerry

On 8/10/20 8:03 AM, Andre Vehreschild wrote:

Hi folks,

long time, no see.  I was asked by Damian to do some Coarray stuff in gfortran
so here is the first step on fixing a bug. The issue at hand is, that the
coarray handling is not propagated correctly and later on the coarray-token
not generated/retrieved from the correct position leading to coarray programs to
crash/hang. This patch fixes at least the misbehavior reported in the PR. More
to come.

Regtests ok on FC31.x86_64. Ok for trunk?

Regards,
Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de




Re: [PATCH v2] rs6000: ICE when using an MMA type as a function param or return value [PR96506]

2020-08-12 Thread Peter Bergner via Gcc-patches
On 8/12/20 8:00 PM, Segher Boessenkool wrote:
> On Wed, Aug 12, 2020 at 03:32:18PM -0500, Peter Bergner wrote:
>> --- a/gcc/config/rs6000/rs6000-call.c
>> +++ b/gcc/config/rs6000/rs6000-call.c
>> @@ -6444,8 +6444,26 @@ machine_mode
>>  rs6000_promote_function_mode (const_tree type ATTRIBUTE_UNUSED,
>>machine_mode mode,
>>int *punsignedp ATTRIBUTE_UNUSED,
>> -  const_tree, int)
>> +  const_tree, int for_return)
>>  {
>> +  /* Warning: this is a static local variable and not always NULL!  */
>> +  static struct function *fn = NULL;
> 
> It may be just me that always misses "static" on locals, heh.  But
> please comment what this is *for*: to warn only once per function.  You
> could choose a better variable name to say that, too.

Ok, how about this comment then?

@@ -6444,8 +6444,30 @@ machine_mode
 rs6000_promote_function_mode (const_tree type ATTRIBUTE_UNUSED,
  machine_mode mode,
  int *punsignedp ATTRIBUTE_UNUSED,
- const_tree, int)
+ const_tree, int for_return)
 {
+  /* Warning: this is a static local variable and not always NULL!
+ This function is called multiple times for the same function
+ and return value.  PREV_FUNC is used to keep track of the
+ first time we encounter a function's return value in order
+ to not report an error with that return value multiple times.  */
+  static struct function *prev_func = NULL;
+
+  /* We do not allow MMA types being used as return values.  Only report
+ the invalid return value usage the first time we encounter it.  */
+  if (for_return
+  && prev_func != cfun
+  && (mode == POImode || mode == PXImode))
+{
+  /* Record we have now handled function CFUN, so the next time we
+are called, we do not re-report the same error.  */
+  prev_func = cfun;
+  if (TYPE_CANONICAL (type) != NULL_TREE)
+   type = TYPE_CANONICAL (type);
+  error ("invalid use of MMA type %qs as a function return value",
+IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type;
+}
+
   PROMOTE_MODE (mode, *punsignedp, type);
 
   return mode;

Peter


RE: [PATCH] AArch64: Add if condition in aarch64_function_value [PR96479]

2020-08-12 Thread qiaopeixin
Thanks for the review and commit.

All the best,
Peixin

-Original Message-
From: Richard Sandiford [mailto:richard.sandif...@arm.com] 
Sent: 2020年8月13日 0:25
To: qiaopeixin 
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] AArch64: Add if condition in aarch64_function_value 
[PR96479]

qiaopeixin  writes:
> Hi,
>
> The test case vector-subscript-2.c in the gcc testsuit will report an ICE in 
> the expand pass since '-mgeneral-regs-only' is incompatible with the use of 
> V4SI mode. I propose to report the diagnostic information instead of ICE, and 
> the problem has been discussed on PR 96479.
>
> I attached the patch to solve the problem. Bootstrapped and tested on 
> aarch64-linux-gnu. Any suggestions?

Thanks, pushed.  I was initially sceptical because raising an error here and in 
aarch64_layout_arg is a hack.  Both functions are just query functions and 
shouldn't have any side effects.

The approach we took for FP modes seemed better: we define the FP move patterns 
unconditionally, and raise an error if we try to emit an FP move with 
!TARGET_FLOAT.  This defers any error reporting until we actually try to 
generate code that depends on TARGET_FLOAT.

But I guess SIMD stuff is different.  There's no reason in principle why you 
can't use:

  unsigned short __attribute__((vector_size(8)))

*within* a function with -mgeneral-regs-only.  It would just need to be 
emulated, in the same way as for:

  unsigned short __attribute__((vector_size(4)))

So it would be wrong to define the SIMD move patterns unconditionally and raise 
an error there.

So all in all, I agree this is the best we can do given the current 
infrastructure.

Thanks,
Richard



Re: [PATCH v2] rs6000: ICE when using an MMA type as a function param or return value [PR96506]

2020-08-12 Thread Segher Boessenkool
Hi!

On Wed, Aug 12, 2020 at 03:32:18PM -0500, Peter Bergner wrote:
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -6444,8 +6444,26 @@ machine_mode
>  rs6000_promote_function_mode (const_tree type ATTRIBUTE_UNUSED,
> machine_mode mode,
> int *punsignedp ATTRIBUTE_UNUSED,
> -   const_tree, int)
> +   const_tree, int for_return)
>  {
> +  /* Warning: this is a static local variable and not always NULL!  */
> +  static struct function *fn = NULL;

It may be just me that always misses "static" on locals, heh.  But
please comment what this is *for*: to warn only once per function.  You
could choose a better variable name to say that, too.

"struct function" is GTY, will this work this way, btw?

So I am worried about that; other than that, this is just fine (if you
tune the comment a bit).

Thanks,


Segher


Re: [PATCH] rs6000: ICE when using an MMA type as a function param

2020-08-12 Thread Segher Boessenkool
On Wed, Aug 12, 2020 at 02:24:33PM -0500, Peter Bergner wrote:
> On 8/11/20 9:00 PM, Segher Boessenkool wrote:
> > On Sun, Aug 09, 2020 at 10:03:35PM -0500, Peter Bergner wrote:
> >> +/* { dg-options "-mdejagnu-cpu=power10 -O2 -w" } */
> > 
> > Do you need -w or could a less heavy hammer work as well?
> 
> So adding:
> 
> extern void bar0(); etc. was enough to get rid of the warnings, so
> I'll add that and remove the use of -w.

Great.

> >> It's a static local variable, so how is it always zero and unused?
> > 
> > Oh, trickiness with it being called a second time.  Ouch!
> > 
> > This needs a H U G E comment then...  Or better, get rid of that?
> 
> We cannot really get rid if it.

"Implement that some other way".  Expecting the compiler to only call
the param handling thing for the same function eight times in order,
never interleaved with something else, is asking for trouble.

But okay; just document the static :-)

> I'll modify the test case and add a comment here and then resend the patch.

Thanks!


Segher


Re: [PATCH] ipa-inline: Improve growth accumulation for recursive calls

2020-08-12 Thread Segher Boessenkool
Hi!

On Wed, Aug 12, 2020 at 09:03:35PM +0200, Richard Biener wrote:
> On August 12, 2020 7:53:07 PM GMT+02:00, Jan Hubicka  wrote:
> >> From: Xiong Hu Luo 
> >> 523.xalancbmk_r +1.32%
> >> 541.leela_r +1.51%
> >> 548.exchange2_r +31.87%
> >> 507.cactuBSSN_r +0.80%
> >> 526.blender_r   +1.25%
> >> 538.imagick_r   +1.82%

> >> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> >> index 0211f08964f..11903ac1960 100644
> >> --- a/gcc/cgraph.h
> >> +++ b/gcc/cgraph.h
> >> @@ -3314,6 +3314,8 @@ cgraph_edge::recursive_p (void)
> >>cgraph_node *c = callee->ultimate_alias_target ();
> >>if (caller->inlined_to)
> >>  return caller->inlined_to->decl == c->decl;
> >> +  else if (caller->clone_of && c->clone_of)
> >> +return caller->clone_of->decl == c->clone_of->decl;
> >>else
> >>  return caller->decl == c->decl;
> >
> >If you clone the function so it is no longer self recursive, it does
> >not
> >make much sense to lie to optimizers that the function is still
> >recursive.

Like Richard says below (if I understand him right, sorry if not), the
function still *is* recursive in its group of clones.

> >The inlining would be harmful even if the programer did cloning by
> >hand.
> >I guess main problem is the extreme register pressure issue combining
> >loop depth of 10 in caller with loop depth of 10 in callee just because
> >the function is called once.
> >
> >The negative effect is most likely also due to wrong profile estimate
> >which drives IRA to optimize wrong spot.  But I wonder if we simply
> >don't want to teach inlining function called once to not construct
> >large
> >loop depths?  Something like do not inline if caller loop depth
> >is over 3 or so?
> 
> I don't think that's good by itself (consider leaf functions and x86 xmm reg 
> ABI across calls). Even with large loop depth abstraction penalty removal can 
> make inlining worth it. For the testcase the recursiveness is what looks 
> special (recursion from a deeper loop nest level). 

Yes, the loop stuff / register pressure issues might help for the
exchange result, but what about the other five above?


Segher


Re: [PATCH 1/3] vec: add exact argument for various grow functions.

2020-08-12 Thread Martin Sebor via Gcc-patches

On 8/12/20 6:28 AM, Martin Liška wrote:

On 8/11/20 4:58 PM, Martin Sebor wrote:

On 8/11/20 5:36 AM, Martin Liška wrote:

Hello.

All right, I did it in 3 steps:
1) - new exact argument is added (no default value) - I tested the on 
x86_64-linux-gnu

and I build all cross targets.
2) set default value of exact = false
3) change places which calculate its own growth to use the default 
argument


The usual intent of a default argument is to supply a value the function
is the most commonly invoked with.   But in this case it looks like it's
the opposite: most of the callers (hundreds) provide the non-default
value (true) and only a handful make use of the default.  I feel I must
be  missing something.  What is it?


You are right, but Richi wanted to make this transformation in more 
defensive way.
I'm eventually planning to drop the explicit 'true' argument for most of 
the places

except selective scheduling and LTO streaming.


If it's just transitional on the way toward the usual approach
then that seems fine (although even then I can't say I understand
the rationale for going this circuitous route).

Thanks
Martin



I guess Richi can defend his strategy for this ;) ?

Martin



Martin



I would like to install first 1) and then wait some time before the 
rest is installed.


Thoughts?
Martin








Re: [PATCH 2/2] PowerPC: Add power10 IEEE 128-bit min/max/cmove.

2020-08-12 Thread Segher Boessenkool
On Tue, Aug 11, 2020 at 11:32:32PM -0400, Michael Meissner wrote:
> > > +  /* See if we can use the ISA 3.1 min/max/compare instructions for IEEE
> > > + 128-bit floating point.  At present, don't worry about doing 
> > > conditional
> > > + moves with different types for the comparison and movement (unlike 
> > > SF/DF,
> > > + where you can do a conditional test between double and use float as 
> > > the
> > > + if/then parts. */
> > 
> > Why is that?  That makes the code quite different too (harder to
> > review), but also, it could just use existing code more.
> 
> It is a combinatorial expansion problem.

Yes, we now have twice the code!

> To grow this for IEEE 128, you would need two iterators, each with 4 elements
> (DF, SF, KF, and optionally TF).  Thus you would need 16 patterns to represent
> all of the patterns.  I can do this, I just didn't think it was worth it.

No, you need *one* pattern instead of two!

(It doesn't matter much what happens behind the scenes we already have
many thousands of patterns, a few more or less doesn't matter).

> In addition, it becomes more involved due to constraints.  Currently for SF/DF
> you essentially want to use any vector register at this point (when I added it
> in 2016, there was still the support for limiting whether SF/DF could use the
> Altivec registers).  But for IEEE 128-bit fp types, you want "v" for the
> registers used for comparison.  You might want "v" for the contitional move
> result, or you might want "wa".

We handle this same problem with attributes usually.

> I looked at combining the SF/DF and IEEE 128-bit cases, but it was becoming 
> too
> complex to describe these cases.  It is doable, but it makes the diffs even
> harder to read.

(The end code is what matters, not the diffs)
I don't see why.  You could be right of course, but please convince me.

> > So why do we want the asymmetrical xsmincqp instead of xsminqp?  This
> > should be documented at the very least.  (We also should have min/max
> > that work correctly without -ffast-math, of course :-( ).
> 
> We don't have an xsminqp or xsmaxqp instruction in power10.  We only have
> xsmincqp and xsmaxcqp instructions.

Oh wow, I hadn't noticed that before (or I had pushed it all the way
back, like any other bad dream).  Eww.

So we are limited to only generating this insn with -ffast-math.  Bah.
(smin/smax on float cannot be used without fast math).


Segher


Re: [PATCH 2/5] C front end support to detect out-of-bounds accesses to array parameters

2020-08-12 Thread Joseph Myers
On Fri, 7 Aug 2020, Martin Sebor via Gcc-patches wrote:

> > I don't see anything in the tests in this patch to cover this sort of case
> > (arrays of pointers, including arrays of pointers to arrays etc.).
> 
> I've added a few test cases and reworked the declarator parsing
> (get_parm_array_spec) a bit, fixing some bugs.

I don't think get_parm_array_spec is yet logically right (and I don't see 
tests of the sort of cases I'm concerned about, such as arrays of pointers 
to arrays, or pointers with attributes applied to them).

You have logic

+  if (pd->kind == cdk_pointer
+ && (!next || next->kind == cdk_id))
+   {
+ /* Do nothing for the common case of a pointer.  The fact that
+the parameter is one can be deduced from the absence of
+an arg spec for it.  */
+ return attrs;
+   }

which is correct as far as it goes (when it returns with nothing done, 
it's correct to do so, because the argument is indeed a pointer), but 
incomplete:

* Maybe cdk_pointer is followed by cdk_attrs before cdk_id.  In this case 
the code won't return.

* Maybe the code is correct to continue because we're in the case of an 
array of pointers (cdk_array follows).  But as I understand it, the intent 
is to set up an "arg spec" that describes only the (multidimensional) 
array that is the parameter itself - not any array pointed to.  And it 
looks to me like, in the case of an array of pointers to arrays, both sets 
of array bounds would end up in the spec constructed.

What I think is correct is for both cdk_pointer and cdk_function to result 
in the spec built up so far being cleared (regardless of what follows 
cdk_pointer or cdk_function), rather than early return, so that the spec 
present at the end is for the innermost sequence of array declarators 
(possibly with attributes involved as well).  (cdk_function shouldn't 
actually be an issue, since functions can't return arrays or functions, 
but logically it seems appropriate to treat it like cdk_pointer.)

Then, the code

+  if (pd->kind == cdk_id)
+   {
+ /* Extract the upper bound from a parameter of an array type.  */

also seems misplaced.  If the type specifiers for the parameter are a 
typedef for an array type, that array type should be processed *before* 
the declarator to get the correct semantics (as if the bounds from those 
type specifiers were given in the declarator), not at the end which gets 
that type out of order with respect to array declarators.  (Processing 
before the declarator also means clearing the results of that processing 
if a pointer declarator is encountered at any point, because in that case 
the array type in the type specifiers is irrelevant.)

The logic

+ /* Skip all constant bounds except the most significant one.
+The interior ones are included in the array type.  */
+ if (next && (next->kind == cdk_array || next->kind == cdk_pointer))
+   continue;

is another example of code that fails to look past cdk_attrs.

-- 
Joseph S. Myers
jos...@codesourcery.com


[Committed] PR target/96558: Only call ix86_expand_clear with GENERAL_REGS.

2020-08-12 Thread Roger Sayle

The following patch tightens the predicates of the peephole2 from my recent
"Integer min/max improvements patch" to only hoist clearing a register when
that register is a general register.  Calling ix86_expand_clear with regs
other than GENERAL_REGS is not supported.

The following patch has been tested on x86_64-pc-linux-gnu with a
"make bootstrap" and "make -k check" with no new failures, and fixes
the new test case.  Committed as obvious to fix the immediate regression.
An additional patch (for a supplementary fix) is in preparation.

2020-08-12  Roger Sayle  
Uroš Bizjak  

gcc/ChangeLog
PR target/96558
* config/i386/i386.md (peephole2): Only reorder register clearing
instructions to allow use of xor for general registers.

gcc/testsuite/ChangeLog
PR target/96558
* gcc.dg/pr96558.c: New test.


Sorry for the breakage.
Roger
--
Roger Sayle
NextMove Software
Cambridge, UK

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index f3799ac..9d4e669 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -18938,7 +18938,7 @@
 ;; i.e. prefer "xorl %eax,%eax; test/cmp" over "test/cmp; movl $0, %eax".
 (define_peephole2
   [(set (reg FLAGS_REG) (match_operand 0))
-   (set (match_operand:SWI 1 "register_operand") (const_int 0))]
+   (set (match_operand:SWI 1 "general_reg_operand") (const_int 0))]
   "peep2_regno_dead_p (0, FLAGS_REG)
&& !reg_overlap_mentioned_p (operands[1], operands[0])"
[(set (match_dup 2) (match_dup 0))]
diff --git a/gcc/testsuite/gcc.dg/pr96558.c b/gcc/testsuite/gcc.dg/pr96558.c
new file mode 100644
index 000..2f5739e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr96558.c
@@ -0,0 +1,32 @@
+/* PR target/96558 */
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2 -fno-expensive-optimizations -fno-gcse" } */
+
+int ky;
+long int h1;
+__int128 f1;
+
+int
+sd (void);
+
+int __attribute__ ((simd))
+i8 (void)
+{
+  __int128 vh;
+
+  if (sd () == 0)
+h1 = 0;
+
+  do
+{
+  long int lf = (long int) f1 ? h1 : 0;
+
+  ky += lf;
+  vh = lf | f1;
+  f1 = 1;
+}
+  while (vh < (f1 ^ 2));
+
+  return 0;
+}
+


[PATCH v2] rs6000: ICE when using an MMA type as a function param or return value [PR96506]

2020-08-12 Thread Peter Bergner via Gcc-patches
rs6000: ICE when using an MMA type as a function param or return value [PR96506]

PR96506 shows a problem where we ICE on illegal usage, namely using MMA
types for function arguments and return values.  The solution is to flag
these illegal usages as errors early, before we ICE.

The patch below is functionally identical to the previous patch.
The differences are that I've added more comments around the use of the
static local variable and I added prototypes for the test case's extern
bar* functions which allowed me to remove the now unneeded -w option.

Ok for trunk now?  Ok for GCC 10 after some bake in?

Peter


gcc/
PR target/96506
* config/rs6000/rs6000-call.c (rs6000_promote_function_mode): Disallow
MMA types as return values.
(rs6000_function_arg): Disallow MMA types as function arguments.

gcc/testsuite/
PR target/96506
* gcc.target/powerpc/pr96506.c: New test.

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 189497efb45..869e4973a16 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -6444,8 +6444,26 @@ machine_mode
 rs6000_promote_function_mode (const_tree type ATTRIBUTE_UNUSED,
  machine_mode mode,
  int *punsignedp ATTRIBUTE_UNUSED,
- const_tree, int)
+ const_tree, int for_return)
 {
+  /* Warning: this is a static local variable and not always NULL!  */
+  static struct function *fn = NULL;
+
+  /* We do not allow MMA types being used as return values.  Only report
+ the invalid return value usage the first time we encounter it.  */
+  if (for_return
+  && fn != cfun
+  && (mode == POImode || mode == PXImode))
+{
+  /* Record we have now handled function CFUN, so the next time we
+are called, we do not re-report the same error.  */
+  fn = cfun;
+  if (TYPE_CANONICAL (type) != NULL_TREE)
+   type = TYPE_CANONICAL (type);
+  error ("invalid use of MMA type %qs as a function return value",
+IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type;
+}
+
   PROMOTE_MODE (mode, *punsignedp, type);
 
   return mode;
@@ -7396,6 +7414,16 @@ rs6000_function_arg (cumulative_args_t cum_v, const 
function_arg_info )
   machine_mode elt_mode;
   int n_elts;
 
+  /* We do not allow MMA types being used as function arguments.  */
+  if (mode == POImode || mode == PXImode)
+{
+  if (TYPE_CANONICAL (type) != NULL_TREE)
+   type = TYPE_CANONICAL (type);
+  error ("invalid use of MMA operand of type %qs as a function parameter",
+IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type;
+  return NULL_RTX;
+}
+
   /* Return a marker to indicate whether CR1 needs to set or clear the
  bit that V.4 uses to say fp args were passed in registers.
  Assume that we don't need the marker for software floating point,
diff --git a/gcc/testsuite/gcc.target/powerpc/pr96506.c 
b/gcc/testsuite/gcc.target/powerpc/pr96506.c
new file mode 100644
index 000..b1b40c5a5c8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr96506.c
@@ -0,0 +1,66 @@
+/* PR target/96506 */
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+extern void bar0();
+extern void bar1();
+extern void bar2();
+extern void bar3();
+
+typedef __vector_pair vpair_t;
+typedef __vector_quad vquad_t;
+
+/* Verify we flag errors on the following.  */
+
+void
+foo0 (void)
+{
+  __vector_pair v;
+  bar0 (v); /* { dg-error "invalid use of MMA operand of type .__vector_pair. 
as a function parameter" } */
+}
+
+void
+foo1 (void)
+{
+  vpair_t v;
+  bar1 (v); /* { dg-error "invalid use of MMA operand of type .__vector_pair. 
as a function parameter" } */
+}
+
+void
+foo2 (void)
+{
+  __vector_quad v;
+  bar2 (v); /* { dg-error "invalid use of MMA operand of type .__vector_quad. 
as a function parameter" } */
+}
+
+void
+foo3 (void)
+{
+  vquad_t v;
+  bar3 (v); /* { dg-error "invalid use of MMA operand of type .__vector_quad. 
as a function parameter" } */
+}
+
+__vector_pair
+foo4 (__vector_pair *src) /* { dg-error "invalid use of MMA type 
.__vector_pair. as a function return value" } */
+{
+  return *src;
+}
+
+vpair_t
+foo5 (vpair_t *src) /* { dg-error "invalid use of MMA type .__vector_pair. as 
a function return value" } */
+{
+  return *src;
+}
+
+__vector_quad
+foo6 (__vector_quad *src) /* { dg-error "invalid use of MMA type 
.__vector_quad. as a function return value" } */
+{
+  return *src;
+}
+
+vquad_t
+foo7 (vquad_t *src) /* { dg-error "invalid use of MMA type .__vector_quad. as 
a function return value" } */
+{
+  return *src;
+}



[committed] [OG10] Add nvptx support for subword compare-and-swap

2020-08-12 Thread Kwok Cheung Yeung

Hello

I have committed the patch previously posted at 
https://gcc.gnu.org/pipermail/gcc-patches/2020-July/550291.html to support 
atomic compare-and-swap operations on 8-bit and 16-bit types on nvptx to the 
devel/omp/gcc-10 branch only.


Kwok
commit 9dc77fbd268ea138797ecc340cf6d9ddc13795c8
Author: Kwok Cheung Yeung 
Date:   Wed Aug 12 12:37:20 2020 -0700

nvptx: Add support for subword compare-and-swap

This adds support for __sync_val_compare_and_swap and
__sync_bool_compare_and_swap for 1-byte and 2-byte long
values, which are not natively supported on nvptx.

2020-08-12  Kwok Cheung Yeung  

libgcc/
* config/nvptx/atomic.c: New.
* config/nvptx/t-nvptx (LIB2ADD): Add atomic.c.

gcc/testsuite/
* gcc.target/nvptx/sync.c: New.

libgomp/
* testsuite/libgomp.c-c++-common/reduction-16.c: New.

diff --git a/gcc/testsuite/ChangeLog.omp b/gcc/testsuite/ChangeLog.omp
index e03cd1b..aba7d39 100644
--- a/gcc/testsuite/ChangeLog.omp
+++ b/gcc/testsuite/ChangeLog.omp
@@ -1,3 +1,7 @@
+2020-08-12  Kwok Cheung Yeung  
+
+   * gcc.target/nvptx/sync.c: New.
+
 2020-07-28  Kwok Cheung Yeung  
 
* c-c++-common/goacc/routine-4.c (seq, vector, worker, gang): Revert
diff --git a/gcc/testsuite/gcc.target/nvptx/sync.c 
b/gcc/testsuite/gcc.target/nvptx/sync.c
new file mode 100644
index 000..a573824
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/sync.c
@@ -0,0 +1,143 @@
+/* { dg-do run } */
+
+/* Test basic functionality of the intrinsics.  */
+
+/* This is a copy of gcc.dg/ia64-sync-2.c, extended to test 8-bit and 16-bit
+   values as well.  */
+
+/* Ideally this test should require sync_char_short and sync_int_long, but we
+   only support a subset at the moment.  */
+
+__extension__ typedef __SIZE_TYPE__ size_t;
+
+extern void abort (void);
+extern void *memcpy (void *, const void *, size_t);
+extern int memcmp (const void *, const void *, size_t);
+
+static char AC[4];
+static char init_qi[4] = { -30,-30,-50,-50 };
+static char test_qi[4] = { -115,-115,25,25 };
+
+static void
+do_qi (void)
+{
+  if (__sync_val_compare_and_swap(AC+0, -30, -115) != -30)
+abort ();
+  if (__sync_val_compare_and_swap(AC+0, -30, -115) != -115)
+abort ();
+  if (__sync_bool_compare_and_swap(AC+1, -30, -115) != 1)
+abort ();
+  if (__sync_bool_compare_and_swap(AC+1, -30, -115) != 0)
+abort ();
+
+  if (__sync_val_compare_and_swap(AC+2, AC[2], 25) != -50)
+abort ();
+  if (__sync_val_compare_and_swap(AC+2, AC[2], 25) != 25)
+abort ();
+  if (__sync_bool_compare_and_swap(AC+3, AC[3], 25) != 1)
+abort ();
+  if (__sync_bool_compare_and_swap(AC+3, AC[3], 25) != 1)
+abort ();
+}
+
+static short AS[4];
+static short init_hi[4] = { -30,-30,-50,-50 };
+static short test_hi[4] = { -115,-115,25,25 };
+
+static void
+do_hi (void)
+{
+  if (__sync_val_compare_and_swap(AS+0, -30, -115) != -30)
+abort ();
+  if (__sync_val_compare_and_swap(AS+0, -30, -115) != -115)
+abort ();
+  if (__sync_bool_compare_and_swap(AS+1, -30, -115) != 1)
+abort ();
+  if (__sync_bool_compare_and_swap(AS+1, -30, -115) != 0)
+abort ();
+
+  if (__sync_val_compare_and_swap(AS+2, AS[2], 25) != -50)
+abort ();
+  if (__sync_val_compare_and_swap(AS+2, AS[2], 25) != 25)
+abort ();
+  if (__sync_bool_compare_and_swap(AS+3, AS[3], 25) != 1)
+abort ();
+  if (__sync_bool_compare_and_swap(AS+3, AS[3], 25) != 1)
+abort ();
+}
+
+static int AI[4];
+static int init_si[4] = { -30,-30,-50,-50 };
+static int test_si[4] = { -115,-115,25,25 };
+
+static void
+do_si (void)
+{
+  if (__sync_val_compare_and_swap(AI+0, -30, -115) != -30)
+abort ();
+  if (__sync_val_compare_and_swap(AI+0, -30, -115) != -115)
+abort ();
+  if (__sync_bool_compare_and_swap(AI+1, -30, -115) != 1)
+abort ();
+  if (__sync_bool_compare_and_swap(AI+1, -30, -115) != 0)
+abort ();
+
+  if (__sync_val_compare_and_swap(AI+2, AI[2], 25) != -50)
+abort ();
+  if (__sync_val_compare_and_swap(AI+2, AI[2], 25) != 25)
+abort ();
+  if (__sync_bool_compare_and_swap(AI+3, AI[3], 25) != 1)
+abort ();
+  if (__sync_bool_compare_and_swap(AI+3, AI[3], 25) != 1)
+abort ();
+}
+
+static long AL[4];
+static long init_di[4] = { -30,-30,-50,-50 };
+static long test_di[4] = { -115,-115,25,25 };
+
+static void
+do_di (void)
+{
+  if (__sync_val_compare_and_swap(AL+0, -30, -115) != -30)
+abort ();
+  if (__sync_val_compare_and_swap(AL+0, -30, -115) != -115)
+abort ();
+  if (__sync_bool_compare_and_swap(AL+1, -30, -115) != 1)
+abort ();
+  if (__sync_bool_compare_and_swap(AL+1, -30, -115) != 0)
+abort ();
+
+  if (__sync_val_compare_and_swap(AL+2, AL[2], 25) != -50)
+abort ();
+  if (__sync_val_compare_and_swap(AL+2, AL[2], 25) != 25)
+abort ();
+  if (__sync_bool_compare_and_swap(AL+3, AL[3], 25) != 1)
+abort ();
+  if (__sync_bool_compare_and_swap(AL+3, AL[3], 25) != 1)
+abort ();
+}
+
+int main()
+{
+  

Re: [PATCH] c++: Fixing the wording of () aggregate-init [PR92812]

2020-08-12 Thread Jason Merrill via Gcc-patches

On 8/11/20 7:34 PM, Marek Polacek wrote:

P1975R0 tweaks the static_cast wording: it says that "An expression e can be
explicitly converted to a type T if [...] T is an aggregate type having a first
element x and there is an implicit conversion sequence from e to the type of
x."  This already works for classes, e.g.:

   struct Aggr { int x; int y; };
   Aggr a = static_cast(1);

for which we create TARGET_EXPR .

The proposal also mentions "If T is ``array of unknown bound of U'',
this direct-initialization defines the type of the expression as U[1]" which
suggest that this should work for arrays (they're aggregates too, after all):

   int (&)[3] = static_cast(42);
   int (&)[1] = static_cast(42);

So I handled that specifically in build_static_cast_1: wrap the
expression in { } and initialize from that.  For the 'r' case above
this creates TARGET_EXPR .

There are multiple things in play, as usual, so the tests test brace
elision, narrowing, explicit constructors, and lifetime extension too.
I think it's in line with what we discussed on the core reflector.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


gcc/cp/ChangeLog:

PR c++/92812
* typeck.c (build_static_cast_1): Implement P1975R0 by allowing
static_cast to aggregate type.

gcc/testsuite/ChangeLog:

PR c++/92812
* g++.dg/cpp2a/paren-init27.C: New test.
* g++.dg/cpp2a/paren-init28.C: New test.
* g++.dg/cpp2a/paren-init29.C: New test.
* g++.dg/cpp2a/paren-init30.C: New test.
* g++.dg/cpp2a/paren-init31.C: New test.
* g++.dg/cpp2a/paren-init32.C: New test.
---
  gcc/cp/typeck.c   | 14 +
  gcc/testsuite/g++.dg/cpp2a/paren-init27.C | 24 +++
  gcc/testsuite/g++.dg/cpp2a/paren-init28.C | 15 ++
  gcc/testsuite/g++.dg/cpp2a/paren-init29.C | 15 ++
  gcc/testsuite/g++.dg/cpp2a/paren-init30.C | 23 ++
  gcc/testsuite/g++.dg/cpp2a/paren-init31.C | 10 ++
  gcc/testsuite/g++.dg/cpp2a/paren-init32.C | 21 
  7 files changed, 122 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/paren-init27.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/paren-init28.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/paren-init29.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/paren-init30.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/paren-init31.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/paren-init32.C

diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index a557f3439a8..9166156a5d5 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -7480,6 +7480,20 @@ build_static_cast_1 (location_t loc, tree type, tree 
expr, bool c_cast_p,
   t.  */
result = perform_direct_initialization_if_possible (type, expr,
  c_cast_p, complain);
+  /* P1975 allows static_cast(42), as well as static_cast(42),
+ which initialize the first element of the aggregate.  We need to handle
+ the array case specifically.  */
+  if (result == NULL_TREE
+  && cxx_dialect >= cxx20
+  && TREE_CODE (type) == ARRAY_TYPE)
+{
+  /* Create { EXPR } and perform direct-initialization from it.  */
+  tree e = build_constructor_single (init_list_type_node, NULL_TREE, expr);
+  CONSTRUCTOR_IS_DIRECT_INIT (e) = true;
+  CONSTRUCTOR_IS_PAREN_INIT (e) = true;
+  result = perform_direct_initialization_if_possible (type, e, c_cast_p,
+ complain);
+}
if (result)
  {
if (processing_template_decl)
diff --git a/gcc/testsuite/g++.dg/cpp2a/paren-init27.C 
b/gcc/testsuite/g++.dg/cpp2a/paren-init27.C
new file mode 100644
index 000..0b8cbe33b69
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/paren-init27.C
@@ -0,0 +1,24 @@
+// PR c++/92812
+// P1975R0
+// { dg-do run { target c++20 } }
+// { dg-options "-Wall -Wextra" }
+
+struct Aggr { int x; int y; };
+struct Base { int i; Base(int i_) : i{i_} { } };
+struct BaseAggr : Base { };
+struct X { };
+struct AggrSDM { static X x; int i; int j; };
+
+int
+main ()
+{
+  Aggr a = static_cast(42); // { dg-warning "missing initializer" }
+  if (a.x != 42 || a.y != 0)
+__builtin_abort ();
+  BaseAggr b = static_cast(42);
+  if (b.i != 42)
+__builtin_abort ();
+  AggrSDM s = static_cast(42); // { dg-warning "missing initializer" }
+  if (s.i != 42 || s.j != 0)
+__builtin_abort ();
+}
diff --git a/gcc/testsuite/g++.dg/cpp2a/paren-init28.C 
b/gcc/testsuite/g++.dg/cpp2a/paren-init28.C
new file mode 100644
index 000..8c57dc8e155
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/paren-init28.C
@@ -0,0 +1,15 @@
+// PR c++/92812
+// P1975R0
+// { dg-do compile { target c++20 } }
+
+// In both cases the reference declarations lifetime-extend the array
+// temporary.
+int (&)[3] = static_cast(42);
+int (&)[1] = static_cast(42);
+
+// Make sure we've 

Re: [committed] libstdc++: Make self-move well-defined for containers [PR 85828]

2020-08-12 Thread Jonathan Wakely via Gcc-patches

On 12/08/20 20:37 +0100, Jonathan Wakely wrote:

The C++ LWG recently confirmed that self-move assignment should not have
undefined behaviour for standard containers (see the proposed resolution
of LWG 2839). The result should be a valid but unspecified value, just
like other times when a container is moved from.

Our std::list, std::__cxx11::basic_string and unordered containers all
have bugs which result in undefined behaviour.

For std::list the problem is that we clear the previous contents using
_M_clear() instead of clear(). This means the _M_next, _M_prev and
_M_size members are not zeroed, and so after we "update" them (with
their existing values), we are left with dangling pointers and a
non-zero size, but no elements.

For the unordered containers the problem is similar. _Hashtable first
deallocates the existing contents, then takes ownership of the pointers
from the RHS object (which has just had its contents deallocated so the
pointers are dangling).

For std::basic_string it's a little more subtle. When the string is
local (i.e. fits in the SSO buffer) we use char_traits::copy to copy the
contents from this->data() to __rhs.data(). When &__rhs == this that


Oops, that's backwards. We copy from __rhs.data() to this->data().



[committed] libstdc++: Make self-move well-defined for containers [PR 85828]

2020-08-12 Thread Jonathan Wakely via Gcc-patches
The C++ LWG recently confirmed that self-move assignment should not have
undefined behaviour for standard containers (see the proposed resolution
of LWG 2839). The result should be a valid but unspecified value, just
like other times when a container is moved from.

Our std::list, std::__cxx11::basic_string and unordered containers all
have bugs which result in undefined behaviour.

For std::list the problem is that we clear the previous contents using
_M_clear() instead of clear(). This means the _M_next, _M_prev and
_M_size members are not zeroed, and so after we "update" them (with
their existing values), we are left with dangling pointers and a
non-zero size, but no elements.

For the unordered containers the problem is similar. _Hashtable first
deallocates the existing contents, then takes ownership of the pointers
from the RHS object (which has just had its contents deallocated so the
pointers are dangling).

For std::basic_string it's a little more subtle. When the string is
local (i.e. fits in the SSO buffer) we use char_traits::copy to copy the
contents from this->data() to __rhs.data(). When &__rhs == this that
copy violates the precondition that the ranges don't overlap. We only
need to check for self-move for this case where it's local, because the
only other case that can be true for self-move is that it's non-local
but the allocators compare equal. In that case the data pointer is
neither deallocated nor leaked, so the result is well-defined.

This patch also makes a small optimization for std::deque move
assignment, to use the efficient move when is_always_equal is false, but
the allocators compare equal at runtime.

Finally, we need to remove all the Debug Mode checks which abort the
program when a self-move is detected, because it's not undefined to do
that.

Before PR 85828 can be closed we should also look into fixing
std::shuffle so it doesn't do any redundant self-swaps.

libstdc++-v3/ChangeLog:

PR libstdc++/85828
* include/bits/basic_string.h (operator=(basic_string&&)): Check
for self-move before copying with char_traits::copy.
* include/bits/hashtable.h (operator=(_Hashtable&&)): Check for
self-move.
* include/bits/stl_deque.h (_M_move_assign1(deque&&, false_type)):
Check for equal allocators.
* include/bits/stl_list.h (_M_move_assign(list&&, true_type)):
Call clear() instead of _M_clear().
* include/debug/formatter.h (__msg_self_move_assign): Change
comment.
* include/debug/macros.h (__glibcxx_check_self_move_assign):
(_GLIBCXX_DEBUG_VERIFY): Remove.
* include/debug/safe_container.h (operator=(_Safe_container&&)):
Remove assertion check for safe move and make it well-defined.
* include/debug/safe_iterator.h (operator=(_Safe_iterator&&)):
Remove assertion check for self-move.
* include/debug/safe_local_iterator.h
(operator=(_Safe_local_iterator&&)): Likewise.
* testsuite/21_strings/basic_string/cons/char/self_move.cc: New test.
* testsuite/23_containers/deque/cons/self_move.cc: New test.
* testsuite/23_containers/forward_list/cons/self_move.cc: New test.
* testsuite/23_containers/list/cons/self_move.cc: New test.
* testsuite/23_containers/set/cons/self_move.cc: New test.
* testsuite/23_containers/unordered_set/cons/self_move.cc: New test.
* testsuite/23_containers/vector/cons/self_move.cc: New test.

Tested powerpc64le-linux. Committed to trunk.

commit c2fb0a1a2e7a0fb15cf3cf876f621902ccd273f0
Author: Jonathan Wakely 
Date:   Wed Aug 12 20:36:00 2020

libstdc++: Make self-move well-defined for containers [PR 85828]

The C++ LWG recently confirmed that self-move assignment should not have
undefined behaviour for standard containers (see the proposed resolution
of LWG 2839). The result should be a valid but unspecified value, just
like other times when a container is moved from.

Our std::list, std::__cxx11::basic_string and unordered containers all
have bugs which result in undefined behaviour.

For std::list the problem is that we clear the previous contents using
_M_clear() instead of clear(). This means the _M_next, _M_prev and
_M_size members are not zeroed, and so after we "update" them (with
their existing values), we are left with dangling pointers and a
non-zero size, but no elements.

For the unordered containers the problem is similar. _Hashtable first
deallocates the existing contents, then takes ownership of the pointers
from the RHS object (which has just had its contents deallocated so the
pointers are dangling).

For std::basic_string it's a little more subtle. When the string is
local (i.e. fits in the SSO buffer) we use char_traits::copy to copy the
contents from this->data() to __rhs.data(). When &__rhs == this that
copy violates the precondition that the ranges 

Re: [PATCH] rs6000: ICE when using an MMA type as a function param

2020-08-12 Thread Peter Bergner via Gcc-patches
On 8/11/20 9:00 PM, Segher Boessenkool wrote:
> On Sun, Aug 09, 2020 at 10:03:35PM -0500, Peter Bergner wrote:
>> +/* { dg-options "-mdejagnu-cpu=power10 -O2 -w" } */
> 
> Do you need -w or could a less heavy hammer work as well?

So adding:

extern void bar0(); etc. was enough to get rid of the warnings, so
I'll add that and remove the use of -w.


>> It's a static local variable, so how is it always zero and unused?
> 
> Oh, trickiness with it being called a second time.  Ouch!
> 
> This needs a H U G E comment then...  Or better, get rid of that?

We cannot really get rid if it.  With the code as is, we see for the
following test:

__vector_quad
foo (__vector_quad *src)
{
  return *src;
}

bug.i: In function ‘foo’:
bug.i:2:1: error: invalid use of MMA type ‘__vector_quad’ as a function return 
value
2 | foo (__vector_quad *src)
  | ^~~

versus without the fn != cfun condition:

bug.i: In function ‘foo’:
bug.i:2:1: error: invalid use of MMA type ‘__vector_quad’ as a function return 
value
2 | foo (__vector_quad *src)
  | ^~~
bug.i:2:1: error: invalid use of MMA type ‘__vector_quad’ as a function return 
value
bug.i:2:1: error: invalid use of MMA type ‘__vector_quad’ as a function return 
value
bug.i:2:1: error: invalid use of MMA type ‘__vector_quad’ as a function return 
value
bug.i:2:1: error: invalid use of MMA type ‘__vector_quad’ as a function return 
value
bug.i:2:1: error: invalid use of MMA type ‘__vector_quad’ as a function return 
value
bug.i:2:1: error: invalid use of MMA type ‘__vector_quad’ as a function return 
value
bug.i:4:10: error: invalid use of MMA type ‘__vector_quad’ as a function return 
value
4 |   return *src;
  |  ^~~~


I'll modify the test case and add a comment here and then resend the patch.

Peter



Re: [COMMITTED 0/4] bpf: backports to releases/gcc-10

2020-08-12 Thread Jose E. Marchesi via Gcc-patches


> On 8/12/20 8:19 PM, Jose E. Marchesi wrote:
>> Hi Martin.
>> 
 I left the changelog entry dates of the original commits untouched,
 and added `(cherry-pick from commit XXX)' lines to the commit
 messages.  Hope that is ok... please let me know otherwise!
>>>
>>> Hello.
>>>
>>> For creating a backport please use contrib/git-backport.py script.
>>> The script basically runs 'git cherry-pick -x' and reverts all modifications
>>> of ChangeLog files.
>>>
>>> So your line:
>>>
>>> (cherry pick of commit af30b83b50953fbbe671d93d44ea6ac2f7a50ce9)
>>>
>>> should be:
>>> (cherry picked from commit af30b83b50953fbbe671d93d44ea6ac2f7a50ce9)
>>>
>>> You can then check your commits with git gcc-verify hook (from
>>> contrib/gcc-git-customization.sh).
>> I checked each commit with git gcc-verify, but since I didn't know
>> about
>> git-backport.py I used cherry-pick -n and reverted the ChangeLog entries
>> by hand.
>
> I see!
>
> Can you please paste content of the 'git gcc-verify -p HEAD~xxx..HEAD' for
> the backport? Can you see the 'Backported from master:' lines in there?

See below.  There are no such lines in the generated ChangeLog entries.

I looked in contrib/gcc-changelog/git_commit.py, and I can guess this is
due to:

1) CHERRY_PICK_PREFIX = '(cherry picked from commit ' and I used
   a slightly differnt wording.

2) If I am not mistaken while reading the script, the CHERRY_PICK line
   should be part of the ChangeLog entries (indented, etc) and I did put
   it before the ChangeLog entries instead.


Checking d8f4f1903eac5853b51e56b8e80743bc66adbb3d: OK
-- gcc/ChangeLog -- 
2020-08-07  Jose E. Marchesi  

* config/bpf/bpf.md: Remove trailing whitespaces.
* config/bpf/constraints.md: Likewise.
* config/bpf/predicates.md: Likewise.
-- gcc/testsuite/ChangeLog -- 
2020-08-07  Jose E. Marchesi  

* gcc.target/bpf/diag-funargs-2.c: Remove trailing whitespaces.
* gcc.target/bpf/skb-ancestor-cgroup-id.c: Likewise.
* gcc.target/bpf/helper-xdp-adjust-meta.c: Likewise.
* gcc.target/bpf/helper-xdp-adjust-head.c: Likewise.
* gcc.target/bpf/helper-tcp-check-syncookie.c: Likewise.
* gcc.target/bpf/helper-sock-ops-cb-flags-set.c
* gcc.target/bpf/helper-sysctl-set-new-value.c: Likewise.
* gcc.target/bpf/helper-sysctl-get-new-value.c: Likewise.
* gcc.target/bpf/helper-sysctl-get-name.c: Likewise.
* gcc.target/bpf/helper-sysctl-get-current-value.c: Likewise.
* gcc.target/bpf/helper-strtoul.c: Likewise.
* gcc.target/bpf/helper-strtol.c: Likewise.
* gcc.target/bpf/helper-sock-map-update.c: Likewise.
* gcc.target/bpf/helper-sk-storage-get.c: Likewise.
* gcc.target/bpf/helper-sk-storage-delete.c: Likewise.
* gcc.target/bpf/helper-sk-select-reuseport.c: Likewise.
* gcc.target/bpf/helper-sk-release.c: Likewise.
* gcc.target/bpf/helper-sk-redirect-map.c: Likewise.
* gcc.target/bpf/helper-sk-lookup-upd.c: Likewise.
* gcc.target/bpf/helper-sk-lookup-tcp.c: Likewise.
* gcc.target/bpf/helper-skb-change-head.c: Likewise.
* gcc.target/bpf/helper-skb-cgroup-id.c: Likewise.
* gcc.target/bpf/helper-skb-adjust-room.c: Likewise.
* gcc.target/bpf/helper-set-hash.c: Likewise.
* gcc.target/bpf/helper-setsockopt.c: Likewise.
* gcc.target/bpf/helper-redirect-map.c: Likewise.
* gcc.target/bpf/helper-rc-repeat.c: Likewise.
* gcc.target/bpf/helper-rc-keydown.c: Likewise.
* gcc.target/bpf/helper-probe-read-str.c: Likewise.
* gcc.target/bpf/helper-perf-prog-read-value.c: Likewise.
* gcc.target/bpf/helper-perf-event-read-value.c: Likewise.
* gcc.target/bpf/helper-override-return.c: Likewise.
* gcc.target/bpf/helper-msg-redirect-map.c: Likewise.
* gcc.target/bpf/helper-msg-pull-data.c: Likewise.
* gcc.target/bpf/helper-msg-cork-bytes.c: Likewise.
* gcc.target/bpf/helper-msg-apply-bytes.c: Likewise.
* gcc.target/bpf/helper-lwt-seg6-store-bytes.c: Likewise.
* gcc.target/bpf/helper-lwt-seg6-adjust-srh.c: Likewise.
* gcc.target/bpf/helper-lwt-seg6-action.c: Likewise.
* gcc.target/bpf/helper-lwt-push-encap.c: Likewise.
* gcc.target/bpf/helper-get-socket-uid.c: Likewise.
* gcc.target/bpf/helper-get-socket-cookie.c: Likewise.
* gcc.target/bpf/helper-get-local-storage.c: Likewise.
* gcc.target/bpf/helper-get-current-cgroup-id.c: Likewise.
* gcc.target/bpf/helper-getsockopt.c: Likewise.
* gcc.target/bpf/diag-funargs-3.c: Likewise.
Checking 9248c12b2915a07e1e258ff2226b2241663312d1: OK
-- gcc/ChangeLog -- 
2020-08-06  Jose E. Marchesi  

* config/bpf/bpf-helpers.h (KERNEL_HELPER): Define.
(KERNEL_VERSION): Remove.
* config/bpf/bpf-helpers.def: Delete.
* config/bpf/bpf.c (bpf_handle_fndecl_attribute): New 

Re: [PATCH] ipa-inline: Improve growth accumulation for recursive calls

2020-08-12 Thread Richard Biener via Gcc-patches
On August 12, 2020 7:53:07 PM GMT+02:00, Jan Hubicka  wrote:
>Hello,
>with Martin we spent some time looking into exchange2 and my
>understanding of the problem is the following:
>
>There is the self recursive function digits_2 with the property that it
>has 10 nested loops and calls itself from the innermost.
>Now we do not do amazing job on guessing the profile since it is quite
>atypical. First observation is that the callback frequencly needs to be
>less than 1 otherwise the program never terminates, however with 10
>nested loops one needs to predict every loop to iterate just few times
>and conditionals guarding them as not very likely. For that we added
>PRED_LOOP_GUARD_WITH_RECURSION some time ago and I fixed it yesterday
>(causing regression in exhange since the bad profile turned out to
>disable some harmful vectorization) and I also now added a cap to the
>self recursive frequency so things to not get mispropagated by ipa-cp.
>
>Now if ipa-cp decides to duplicate digits few times we have a new
>problem.  The tree of recursion is orgnaized in a way that the depth is
>bounded by 10 (which GCC does not know) and moreover most time is not
>spent on very deep levels of recursion.
>
>For that you have the patch which increases frequencies of recursively
>cloned nodes, however it still seems to me as very specific hack for
>exchange: I do not see how to guess where most of time is spent. 
>Even for very regular trees, by master theorem, it depends on very
>little differences in the estimates of recursion frequency whether most
>of time is spent on the top of tree, bottom or things are balanced.
>
>With algorithms doing backtracing, like exhchange, the likelyness of
>recusion reduces with deeper recursion level, but we do not know how
>quickly and what the level is.
>
>> From: Xiong Hu Luo 
>> 
>> For SPEC2017 exchange2, there is a large recursive
>functiondigits_2(function
>> size 1300) generates specialized node from digits_2.1 to digits_2.8
>with added
>> build option:
>> 
>> --param ipa-cp-eval-threshold=1 --param ipa-cp-unit-growth=80
>> 
>> ipa-inline pass will consider inline these nodes called only once,
>but these
>> large functions inlined too deeply will cause serious register spill
>and
>> performance down as followed.
>> 
>> inlineA: brute (inline digits_2.1, 2.2, 2.3, 2.4) -> digits_2.5
>(inline 2.6, 2.7, 2.8)
>> inlineB: digits_2.1 (inline digits_2.2, 2.3) -> call digits_2.4
>(inline digits_2.5, 2.6) -> call digits_2.7 (inline 2.8)
>> inlineC: brute (inline digits_2) -> call 2.1 -> 2.2 (inline 2.3) ->
>2.4 -> 2.5 -> 2.6 (inline 2.7 ) -> 2.8
>> inlineD: brute -> call digits_2 -> call 2.1 -> call 2.2 -> 2.3 -> 2.4
>-> 2.5 -> 2.6 -> 2.7 -> 2.8
>> 
>> Performance diff:
>> inlineB is ~25% faster than inlineA;
>> inlineC is ~20% faster than inlineB;
>> inlineD is ~30% faster than inlineC.
>> 
>> The master GCC code now generates inline sequence like inlineB, this
>patch
>> makes the ipa-inline pass behavior like inlineD by:
>>  1) The growth acumulation for recursive calls by adding the growth
>data
>> to the edge when edge's caller is inlined into another function to
>avoid
>> inline too deeply;
>>  2) And if caller and callee are both specialized from same node, the
>edge
>> should also be considered as recursive edge.
>> 
>> SPEC2017 test shows GEOMEAN improve +2.75% in total(+0.56% without
>exchange2).
>> Any comments?  Thanks.
>> 
>> 523.xalancbmk_r +1.32%
>> 541.leela_r +1.51%
>> 548.exchange2_r +31.87%
>> 507.cactuBSSN_r +0.80%
>> 526.blender_r   +1.25%
>> 538.imagick_r   +1.82%
>> 
>> gcc/ChangeLog:
>> 
>> 2020-08-12  Xionghu Luo  
>> 
>>  * cgraph.h (cgraph_edge::recursive_p): Return true if caller and
>>  callee and specialized from same node.
>>  * ipa-inline-analysis.c (do_estimate_growth_1): Add caller's
>>  inlined_to growth to edge whose caller is inlined.
>> ---
>>  gcc/cgraph.h  | 2 ++
>>  gcc/ipa-inline-analysis.c | 3 +++
>>  2 files changed, 5 insertions(+)
>> 
>> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
>> index 0211f08964f..11903ac1960 100644
>> --- a/gcc/cgraph.h
>> +++ b/gcc/cgraph.h
>> @@ -3314,6 +3314,8 @@ cgraph_edge::recursive_p (void)
>>cgraph_node *c = callee->ultimate_alias_target ();
>>if (caller->inlined_to)
>>  return caller->inlined_to->decl == c->decl;
>> +  else if (caller->clone_of && c->clone_of)
>> +return caller->clone_of->decl == c->clone_of->decl;
>>else
>>  return caller->decl == c->decl;
>
>If you clone the function so it is no longer self recursive, it does
>not
>make much sense to lie to optimizers that the function is still
>recursive.
>
>The inlining would be harmful even if the programer did cloning by
>hand.
>I guess main problem is the extreme register pressure issue combining
>loop depth of 10 in caller with loop depth of 10 in callee just because
>the function is called once.
>
>The negative effect is most likely also due to wrong profile estimate
>which drives IRA to optimize wrong 

Re: [COMMITTED 0/4] bpf: backports to releases/gcc-10

2020-08-12 Thread Martin Liška

On 8/12/20 8:19 PM, Jose E. Marchesi wrote:


Hi Martin.


I left the changelog entry dates of the original commits untouched,
and added `(cherry-pick from commit XXX)' lines to the commit
messages.  Hope that is ok... please let me know otherwise!


Hello.

For creating a backport please use contrib/git-backport.py script.
The script basically runs 'git cherry-pick -x' and reverts all modifications
of ChangeLog files.

So your line:

(cherry pick of commit af30b83b50953fbbe671d93d44ea6ac2f7a50ce9)

should be:
(cherry picked from commit af30b83b50953fbbe671d93d44ea6ac2f7a50ce9)

You can then check your commits with git gcc-verify hook (from
contrib/gcc-git-customization.sh).


I checked each commit with git gcc-verify, but since I didn't know about
git-backport.py I used cherry-pick -n and reverted the ChangeLog entries
by hand.


I see!

Can you please paste content of the 'git gcc-verify -p HEAD~xxx..HEAD' for
the backport? Can you see the 'Backported from master:' lines in there?

Thanks,
Martin



Now I know better for the next time :)
Thanks for the info!





[no subject]

2020-08-12 Thread Ian Lance Taylor via Gcc-patches
This libgo patch by Clément Chigot correctly handles AIX FAT library
creation.  The previous patch wasn't working everytime.  Especially
when AR had "-X32_64", the new .so would replace the default one and
not just being added.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
de93b8d5bfb3d2652e5e166f3e4db1c25a3a2c57
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index 08daa1a5924..e443282d0e8 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-e08f1d7d1bc14c0a29eb9ee17980f14fa2397239
+fe5d94c5792f7f990004c3dee0ea501835512200
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/libgo/Makefile.am b/libgo/Makefile.am
index 88ea2728bc3..1112ee27df6 100644
--- a/libgo/Makefile.am
+++ b/libgo/Makefile.am
@@ -1244,8 +1244,15 @@ endif
 all-local: $(ALL_LOCAL_DEPS)
 
 MAJOR=$(firstword $(subst :, ,$(libtool_VERSION)))
+
+# If we want to use "AR -r" when creating AIX FAT archives,
+# AR must be stripped of all its -X flags.
+# Otherwize, if AR was defined with -X32_64, the replace option would
+# erase the default .so when adding the extra one. There is no
+# order priority within -X flags.
 add-aix-fat-library: all-multi
@if test "$(MULTIBUILDTOP)" = ""; then \
- ${AR} -X$(AIX_DEFAULT_ARCH) rc .libs/$(PACKAGE).a 
../ppc$(AIX_DEFAULT_ARCH)/$(PACKAGE)/.libs/$(PACKAGE).so.$(MAJOR); \
- ${AR} -X$(AIX_DEFAULT_ARCH) rc 
../pthread/$(PACKAGE)/.libs/$(PACKAGE).a 
../pthread/ppc$(AIX_DEFAULT_ARCH)/$(PACKAGE)/.libs/$(PACKAGE).so.$(MAJOR); \
+ arx=`echo $(AR) | sed -e 's/-X[^ ]*//g'`; \
+ $${arx} -X$(AIX_EXTRA_ARCH) rc .libs/$(PACKAGE).a 
../ppc$(AIX_EXTRA_ARCH)/$(PACKAGE)/.libs/$(PACKAGE).so.$(MAJOR); \
+ $${arx} -X$(AIX_EXTRA_ARCH) rc 
../pthread/$(PACKAGE)/.libs/$(PACKAGE).a 
../pthread/ppc$(AIX_EXTRA_ARCH)/$(PACKAGE)/.libs/$(PACKAGE).so.$(MAJOR); \
fi
diff --git a/libgo/configure.ac b/libgo/configure.ac
index db5848e36ad..abc58b87b53 100644
--- a/libgo/configure.ac
+++ b/libgo/configure.ac
@@ -38,12 +38,12 @@ case ${host} in
 GOCFLAGS="$GOCFLAGS -fno-section-anchors"
 
 # Check default architecture for FAT library creation
-if test -z "`$(CC) -x c -E /dev/null -g3 -o - | grep 64BIT`" ; then
-AIX_DEFAULT_ARCH='64'
+if test -z "`$CC -x c -E /dev/null -g3 -o - | grep 64BIT`" ; then
+AIX_EXTRA_ARCH='64'
 else
-AIX_DEFAULT_ARCH='32'
+AIX_EXTRA_ARCH='32'
 fi
-AC_SUBST(AIX_DEFAULT_ARCH)
+AC_SUBST(AIX_EXTRA_ARCH)
 ;;
 esac
 


Re: [COMMITTED 0/4] bpf: backports to releases/gcc-10

2020-08-12 Thread Jose E. Marchesi via Gcc-patches


Hi Martin.

>> I left the changelog entry dates of the original commits untouched,
>> and added `(cherry-pick from commit XXX)' lines to the commit
>> messages.  Hope that is ok... please let me know otherwise!
>
> Hello.
>
> For creating a backport please use contrib/git-backport.py script.
> The script basically runs 'git cherry-pick -x' and reverts all modifications
> of ChangeLog files.
>
> So your line:
>
> (cherry pick of commit af30b83b50953fbbe671d93d44ea6ac2f7a50ce9)
>
> should be:
> (cherry picked from commit af30b83b50953fbbe671d93d44ea6ac2f7a50ce9)
>
> You can then check your commits with git gcc-verify hook (from
> contrib/gcc-git-customization.sh).

I checked each commit with git gcc-verify, but since I didn't know about
git-backport.py I used cherry-pick -n and reverted the ChangeLog entries
by hand.

Now I know better for the next time :)
Thanks for the info!


Re: [COMMITTED 0/4] bpf: backports to releases/gcc-10

2020-08-12 Thread Martin Liška

On 8/12/20 5:06 PM, Jose E. Marchesi via Gcc-patches wrote:

I left the changelog entry dates of the original commits untouched,
and added `(cherry-pick from commit XXX)' lines to the commit
messages.  Hope that is ok... please let me know otherwise!


Hello.

For creating a backport please use contrib/git-backport.py script.
The script basically runs 'git cherry-pick -x' and reverts all modifications
of ChangeLog files.

So your line:

(cherry pick of commit af30b83b50953fbbe671d93d44ea6ac2f7a50ce9)

should be:
(cherry picked from commit af30b83b50953fbbe671d93d44ea6ac2f7a50ce9)

You can then check your commits with git gcc-verify hook (from 
contrib/gcc-git-customization.sh).

Thanks,
Martin




Re: [PATCH] ipa-inline: Improve growth accumulation for recursive calls

2020-08-12 Thread Jan Hubicka
Hello,
with Martin we spent some time looking into exchange2 and my
understanding of the problem is the following:

There is the self recursive function digits_2 with the property that it
has 10 nested loops and calls itself from the innermost.
Now we do not do amazing job on guessing the profile since it is quite
atypical. First observation is that the callback frequencly needs to be
less than 1 otherwise the program never terminates, however with 10
nested loops one needs to predict every loop to iterate just few times
and conditionals guarding them as not very likely. For that we added
PRED_LOOP_GUARD_WITH_RECURSION some time ago and I fixed it yesterday
(causing regression in exhange since the bad profile turned out to
disable some harmful vectorization) and I also now added a cap to the
self recursive frequency so things to not get mispropagated by ipa-cp.

Now if ipa-cp decides to duplicate digits few times we have a new
problem.  The tree of recursion is orgnaized in a way that the depth is
bounded by 10 (which GCC does not know) and moreover most time is not
spent on very deep levels of recursion.

For that you have the patch which increases frequencies of recursively
cloned nodes, however it still seems to me as very specific hack for
exchange: I do not see how to guess where most of time is spent. 
Even for very regular trees, by master theorem, it depends on very
little differences in the estimates of recursion frequency whether most
of time is spent on the top of tree, bottom or things are balanced.

With algorithms doing backtracing, like exhchange, the likelyness of
recusion reduces with deeper recursion level, but we do not know how
quickly and what the level is.

> From: Xiong Hu Luo 
> 
> For SPEC2017 exchange2, there is a large recursive functiondigits_2(function
> size 1300) generates specialized node from digits_2.1 to digits_2.8 with added
> build option:
> 
> --param ipa-cp-eval-threshold=1 --param ipa-cp-unit-growth=80
> 
> ipa-inline pass will consider inline these nodes called only once, but these
> large functions inlined too deeply will cause serious register spill and
> performance down as followed.
> 
> inlineA: brute (inline digits_2.1, 2.2, 2.3, 2.4) -> digits_2.5 (inline 2.6, 
> 2.7, 2.8)
> inlineB: digits_2.1 (inline digits_2.2, 2.3) -> call digits_2.4 (inline 
> digits_2.5, 2.6) -> call digits_2.7 (inline 2.8)
> inlineC: brute (inline digits_2) -> call 2.1 -> 2.2 (inline 2.3) -> 2.4 -> 
> 2.5 -> 2.6 (inline 2.7 ) -> 2.8
> inlineD: brute -> call digits_2 -> call 2.1 -> call 2.2 -> 2.3 -> 2.4 -> 2.5 
> -> 2.6 -> 2.7 -> 2.8
> 
> Performance diff:
> inlineB is ~25% faster than inlineA;
> inlineC is ~20% faster than inlineB;
> inlineD is ~30% faster than inlineC.
> 
> The master GCC code now generates inline sequence like inlineB, this patch
> makes the ipa-inline pass behavior like inlineD by:
>  1) The growth acumulation for recursive calls by adding the growth data
> to the edge when edge's caller is inlined into another function to avoid
> inline too deeply;
>  2) And if caller and callee are both specialized from same node, the edge
> should also be considered as recursive edge.
> 
> SPEC2017 test shows GEOMEAN improve +2.75% in total(+0.56% without exchange2).
> Any comments?  Thanks.
> 
> 523.xalancbmk_r +1.32%
> 541.leela_r +1.51%
> 548.exchange2_r +31.87%
> 507.cactuBSSN_r +0.80%
> 526.blender_r   +1.25%
> 538.imagick_r   +1.82%
> 
> gcc/ChangeLog:
> 
> 2020-08-12  Xionghu Luo  
> 
>   * cgraph.h (cgraph_edge::recursive_p): Return true if caller and
>   callee and specialized from same node.
>   * ipa-inline-analysis.c (do_estimate_growth_1): Add caller's
>   inlined_to growth to edge whose caller is inlined.
> ---
>  gcc/cgraph.h  | 2 ++
>  gcc/ipa-inline-analysis.c | 3 +++
>  2 files changed, 5 insertions(+)
> 
> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> index 0211f08964f..11903ac1960 100644
> --- a/gcc/cgraph.h
> +++ b/gcc/cgraph.h
> @@ -3314,6 +3314,8 @@ cgraph_edge::recursive_p (void)
>cgraph_node *c = callee->ultimate_alias_target ();
>if (caller->inlined_to)
>  return caller->inlined_to->decl == c->decl;
> +  else if (caller->clone_of && c->clone_of)
> +return caller->clone_of->decl == c->clone_of->decl;
>else
>  return caller->decl == c->decl;

If you clone the function so it is no longer self recursive, it does not
make much sense to lie to optimizers that the function is still
recursive.

The inlining would be harmful even if the programer did cloning by hand.
I guess main problem is the extreme register pressure issue combining
loop depth of 10 in caller with loop depth of 10 in callee just because
the function is called once.

The negative effect is most likely also due to wrong profile estimate
which drives IRA to optimize wrong spot.  But I wonder if we simply
don't want to teach inlining function called once to not construct large
loop depths?  Something like do not inline if caller loop depth
is 

Re: [PATCH] ipa: fix bit CPP when combined with IPA bit CP

2020-08-12 Thread Martin Liška

There's an updated version of the patch that is approved by Honza.

I'm going to install it (and I'll backport it as well).

Martin
>From 906bc8b5bc1815471fea4f79b9f48a78ad9d6592 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Wed, 12 Aug 2020 09:21:51 +0200
Subject: [PATCH] ipa: fix bit CPP when combined with IPA bit CP

As mentioned in the PR, let's consider the following example:

int
__attribute__((noinline))
foo(int arg)
{
  if (arg == 3)
return 1;
  if (arg == 4)
return 123;

  __builtin_unreachable ();
}

during WPA we find all calls of the function
(yes the call with value 5 is UBSAN):

  Node: foo/0:
param [0]: 5 [loc_time: 4, loc_size: 2, prop_time: 0, prop_size: 0]
   3 [loc_time: 3, loc_size: 3, prop_time: 0, prop_size: 0]
 ctxs: VARIABLE
 Bits: value = 0x5, mask = 0x6

in LTRANS we have the following VRP info:

  # RANGE [3, 3] NONZERO 3

when we AND masks in get_default_value we end up with 6 & 3 = 2 (0x010).
That means the only second (least significant bit) is unknown and
value (5 = 0x101) & ~mask gives us either 7 (0x111) or 5 (0x101).

That's why if (arg_2(D) == 3) gets optimized to false.

gcc/ChangeLog:

	PR ipa/96482
	* ipa-cp.c (ipcp_bits_lattice::meet_with_1): Drop value bits
	for bits that are unknown.
	(ipcp_bits_lattice::set_to_constant): Likewise.
	* tree-ssa-ccp.c (get_default_value): Add sanity check that
	IPA CP bit info has all bits set to zero in bits that
	are unknown.

gcc/testsuite/ChangeLog:

	PR ipa/96482
	* gcc.dg/ipa/pr96482.c: New test.
---
 gcc/ipa-cp.c   |  3 +-
 gcc/testsuite/gcc.dg/ipa/pr96482.c | 44 ++
 gcc/tree-ssa-ccp.c |  3 ++
 3 files changed, 49 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr96482.c

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 945a69977f3..2b21280d919 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -1011,7 +1011,7 @@ ipcp_bits_lattice::set_to_constant (widest_int value, widest_int mask)
 {
   gcc_assert (top_p ());
   m_lattice_val = IPA_BITS_CONSTANT;
-  m_value = value;
+  m_value = wi::bit_and (wi::bit_not (mask), value);
   m_mask = mask;
   return true;
 }
@@ -1048,6 +1048,7 @@ ipcp_bits_lattice::meet_with_1 (widest_int value, widest_int mask,
 
   widest_int old_mask = m_mask;
   m_mask = (m_mask | mask) | (m_value ^ value);
+  m_value &= value;
 
   if (wi::sext (m_mask, precision) == -1)
 return set_to_bottom ();
diff --git a/gcc/testsuite/gcc.dg/ipa/pr96482.c b/gcc/testsuite/gcc.dg/ipa/pr96482.c
new file mode 100644
index 000..68ead798d28
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr96482.c
@@ -0,0 +1,44 @@
+/* PR ipa/96482 */
+/* { dg-do run } */
+/* { dg-options "-O2 -flto"  } */
+/* { dg-require-effective-target lto } */
+
+int
+__attribute__((noinline))
+foo(int arg)
+{
+  if (arg == 3)
+return 1;
+  if (arg == 4)
+return 123;
+
+  __builtin_unreachable ();
+}
+
+int
+__attribute__((noinline))
+baz(int x)
+{
+  if (x != 0)
+return foo(3); /* called */
+
+  return 1;
+}
+
+int
+__attribute__((noinline))
+bar(int x)
+{
+  if (x == 0)
+return foo(5); /* not executed */
+
+  return 1;
+}
+
+int main(int argc, char **argv)
+{
+  if (bar(argc) != baz(argc))
+__builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/tree-ssa-ccp.c b/gcc/tree-ssa-ccp.c
index 7e3921869b8..65dffe06530 100644
--- a/gcc/tree-ssa-ccp.c
+++ b/gcc/tree-ssa-ccp.c
@@ -306,6 +306,9 @@ get_default_value (tree var)
 		{
 		  val.lattice_val = CONSTANT;
 		  val.value = value;
+		  widest_int ipa_value = wi::to_widest (value);
+		  /* Unknown bits from IPA CP must be equal to zero.  */
+		  gcc_assert (wi::bit_and (ipa_value, mask) == 0);
 		  val.mask = mask;
 		  if (nonzero_bits != -1)
 		val.mask &= extend_mask (nonzero_bits,
-- 
2.28.0



Re: [Patch, fortran] PR93671 - gfortran 8-10 ICE on intrinsic assignment to allocatable derived-type component of coarray

2020-08-12 Thread Thomas Koenig via Gcc-patches

Hi Andre,


Regtests ok on FC31.x86_64. Ok for trunk?


Good thing you're back!  Any help with bugfixing is
highly appreciated, and Coarrays certainly can use
some work.

The patch is OK for trunk.

Best regards

Thomas


Re: [PATCH] diagnostics: Add new option -fdiagnostics-plain-output

2020-08-12 Thread Richard Sandiford
Lewis Hyatt  writes:
> Hello-
>
> Attached is the patch I mentioned in another discussion here:
> https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551442.html
>
> This adds a new option -fdiagnostics-plain-output that currently means the
> same thing as:
> -fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers
> -fdiagnostics-color=never -fdiagnostics-urls=never
>
> The idea is that over time, if diagnostics output changes to get more bells
> and whistles by default (such as the UTF-8 output I suggested in the above
> discussion), -fdiagnostics-plain-output will adjust to turn that back off,
> so that the testsuite needs only the one option and doesn't need to get
> updated every time something new is added. It seems to me that when
> diagnostics change, it's otherwise a bit hard to update the testsuite
> correctly, especially for compat.exp that is often not run by default. I
> think this would also be useful for utilities that want to parse the
> diagnostics (but aren't using -fdiagnostics-output-format=json).
>
> BTW, I considered avoiding adding a new switch by having this option take
> the form -fdiagnostics-output-format=plain instead, but this seems to have
> problematic semantics when multiple related options are specified. Given that
> this option needs to be expanded early in the parsing process, so that it
> can be compatible with the special handling for -fdiagnostics-color, it
> seemed best to just make it a simple option with no negated form.
>
> I hope this may be useful, please let me know if you'd like me to push
> it. bootstrap and regtest were done for all languages on x86-64 Linux, all
> tests the same before and after, and same for the compat.exp with
> alternate compiler GCC 8.2.

Thanks for doing this.  LGTM except for a couple of very minor things:

> […]
> @@ -981,6 +982,42 @@ decode_cmdline_options_to_array (unsigned int argc, 
> const char **argv,
> argv[++i] = replacement;
>   }
>  
> +  /* Expand -fdiagnostics-plain-output to its constituents.  This needs
> +  to happen here so that prune_options can handle -fdiagnostics-color
> +  specially.  */
> +  if (!strcmp (opt, "-fdiagnostics-plain-output"))
> + {
> +   /* If you have changed the default diagnostics output, and this new
> +   output is not appropriately "plain" (e.g., the change needs to be
> +   undone in order for the testsuite to work properly), then please do
> +   the following:

With GCC formatting, the paragraph should be indented under “If you…”.

> +  1.  Add the necessary option to undo the new behavior to
> +  the array below.
> +  2.  Update the documentation for -fdiagnostics-plain-output
> +  in invoke.texi.
> +   */

…and this should be on the previous line (“.  */”).

> +   const char *const expanded_args[] = {
> + "-fno-diagnostics-show-caret",
> + "-fno-diagnostics-show-line-numbers",
> + "-fdiagnostics-color=never",
> + "-fdiagnostics-urls=never",
> +   };

Hadn't expected it to be this complicated :-)  But I agree with your
reasoning: it looks like this is the correct way given the special
handling of -fdiagnostic-color (and potentially other -fdiagnostic
options in future).

> +   const int num_expanded
> + = sizeof expanded_args / sizeof (*expanded_args);

Simplifies to “ARRAY_SIZE (expanded_args)”.

> +   opt_array_len += num_expanded - 1;
> +   opt_array = XRESIZEVEC (struct cl_decoded_option,
> +   opt_array, opt_array_len);
> +   for (int j = 0; j < num_expanded;)
> + {
> +   j += decode_cmdline_option (expanded_args + j, lang_mask,
> +   _array[num_decoded_options]);

Might be worth using the same "for" loop structure as the outer loop,
assigning the number of decoded options to “n”.  Neither's better than
the other, but it makes it clearer that there's nothing special going on.

> +   num_decoded_options++;
> + }
> +
> +   n = 1;
> +   continue;
> + }
> +
>n = decode_cmdline_option (argv + i, lang_mask,
>_array[num_decoded_options]);
>num_decoded_options++;
> diff --git a/gcc/testsuite/lib/c-compat.exp b/gcc/testsuite/lib/c-compat.exp
> index 9493c214aea..5f43c5a6a57 100644
> --- a/gcc/testsuite/lib/c-compat.exp
> +++ b/gcc/testsuite/lib/c-compat.exp
> @@ -36,24 +36,34 @@ load_lib target-libpath.exp
>  proc compat-use-alt-compiler { } {
>  global GCC_UNDER_TEST ALT_CC_UNDER_TEST
>  global compat_same_alt compat_alt_caret compat_alt_color 
> compat_no_line_no
> -global compat_alt_urls
> +global compat_alt_urls compat_alt_plain_output
>  global TEST_ALWAYS_FLAGS
>  
>  # We don't need to do this if the alternate compiler is actually
>  # the same as the compiler under test.
>  if { $compat_same_alt == 0 } then {
>   

Re: [PATCH] AArch64: Add if condition in aarch64_function_value [PR96479]

2020-08-12 Thread Richard Sandiford
qiaopeixin  writes:
> Hi,
>
> The test case vector-subscript-2.c in the gcc testsuit will report an ICE in 
> the expand pass since '-mgeneral-regs-only' is incompatible with the use of 
> V4SI mode. I propose to report the diagnostic information instead of ICE, and 
> the problem has been discussed on PR 96479.
>
> I attached the patch to solve the problem. Bootstrapped and tested on 
> aarch64-linux-gnu. Any suggestions?

Thanks, pushed.  I was initially sceptical because raising an error here
and in aarch64_layout_arg is a hack.  Both functions are just query
functions and shouldn't have any side effects.

The approach we took for FP modes seemed better: we define the FP move
patterns unconditionally, and raise an error if we try to emit an FP move
with !TARGET_FLOAT.  This defers any error reporting until we actually
try to generate code that depends on TARGET_FLOAT.

But I guess SIMD stuff is different.  There's no reason in principle why
you can't use:

  unsigned short __attribute__((vector_size(8)))

*within* a function with -mgeneral-regs-only.  It would just need to be
emulated, in the same way as for:

  unsigned short __attribute__((vector_size(4)))

So it would be wrong to define the SIMD move patterns unconditionally
and raise an error there.

So all in all, I agree this is the best we can do given the current
infrastructure.

Thanks,
Richard


Re: [PATCH] options: Make --help= to emit values post-overrided

2020-08-12 Thread Richard Sandiford
"Kewen.Lin"  writes:
> Hi Segher,
>
> on 2020/8/7 锟斤拷锟斤拷10:42, Segher Boessenkool wrote:
>> Hi!
>> 
>> On Fri, Aug 07, 2020 at 10:44:10AM +0800, Kewen.Lin wrote:
 I think this makes a lot of sense.

> btw, not sure whether it's a good idea to move target_option_override_hook
> call into print_specific_help and use one function local static
> variable to control it's called once for all kinds of help dumping
> (possible combination), then can remove the calls in function 
> common_handle_option.

 I cannot easily imagine what that will look like...  it could easily be
 worse than what you have here (callbacks aren't so nice, but there are
 worse things).

>>>
>>> I attached opts_alt2.diff to be more specific for this, both alt1 and alt2
>>> follow the existing callback scheme, alt2 aims to avoid possible multiple
>>> times target_option_override_hook calls when we have several --help= or
>>> similar, but I guess alt1 is also fine since the hook should be allowed to
>>> be called more than once.

Yeah.  I guess ideally (and independently of this patch) we'd have a
flag_checking assert that override_options is idempotent, but that
might be tricky to implement.

It looks like there's a subtle (pre-existing) difference in what --help
and --help= do.  --help already calls target_option_override_hook,
but does it at the point that the option occurs.  --help= instead
queues the help until we've finished processing other arguments,
and would therefore take later options into account.

I don't know that one is obviously better than the other though.

> […]
> *opts_alt1.diff*
>
> gcc/ChangeLog:
>
>   * opts-global.c (decode_options): Adjust call to print_help.
>   * opts.c (print_help): Add one function point argument
>   target_option_override_hook and call it before print_specific_help.
>   * opts.h (print_help): Add one more argument to function declare.

I think personally I'd prefer an option (3): call
target_option_override_hook directly in decode_options,
if help_option_arguments is nonempty.  Like you say,
decode_options appears to be the only caller of print_help.

Thanks,
Richard


Re: [PATCH] Fix up flag_cunroll_grow_size handling in presence of optimize attr [PR96535]

2020-08-12 Thread Segher Boessenkool
Hi Jakub,

On Wed, Aug 12, 2020 at 10:20:12AM +0200, Jakub Jelinek wrote:
> This patch moves the unrolling related handling from process_options into
> finish_options which is invoked whenever the options are being finalized,
> and the rs6000 specific parts into the override_options_after_change hook
> which is called for optimize attribute handling (and unfortunately also
> th cfun changes, but what the hook does is cheap) and I've added a call to
> that from rs6000_override_options_internal, so it is also called on cmdline
> processing and for target attribute.

The rs6000 parts look fine, thank you for working on this!

One tiny thing:

> +/* This target function is similar to the hook TARGET_OPTION_OVERRIDE
> +   but is called when the optimize level is changed via an attribute or
> +   pragma or when it is reset at the end of the code affected by the
> +   attribute or pragma.  It is not called at the beginning of compilation
> +   when TARGET_OPTION_OVERRIDE is called so if you want to perform these
> +   actions then, you should have TARGET_OPTION_OVERRIDE call
> +   TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE.  */

That is generic documentation.  The second half isn't relevant (you did
that already :-) )


Segher


Re: [PATCH] gimple-fold: Don't optimize wierdo floating point value reads [PR95450]

2020-08-12 Thread Jakub Jelinek via Gcc-patches
On Wed, Aug 12, 2020 at 04:30:35PM +0200, Richard Biener wrote:
> Not a final review but if we care for this kind of normalization at all
> the we should do so consistently, thus for both encode and interpret and
> for all modes.  For FP we could also check if we'd consider the values
> equal rather than whether we would en/decode them to the same bit pattern
> (which might or might not be what an actual ISA gpr<->fpr reg move would
> do)

I think the verification that what we encode can be interpreted back
woiuld be only an internal consistency check (so perhaps for ENABLE_CHECKING
if flag_checking only, but if both directions perform it, then we need
to avoid mutual recursion).
While for the other direction (interpretation), at least for the broken by
design long doubles we just know we can't represent in GCC all valid values.
The other floating point formats are just theoretical case, perhaps we would
canonicalize something to a value that wouldn't trigger invalid exception
when without canonicalization it would trigger it at runtime, so let's just
ignore those.

Adjusted (so far untested) patch to do it in native_interpret_real instead
and limit it to the MODE_COMPOSITE_P cases, for which e.g.
fold-const.c/simplify-rtx.c punts in several other places too because we just
know we can't represent everything.

E.g.
  /* Don't constant fold this floating point operation if the
 result may dependent upon the run-time rounding mode and
 flag_rounding_math is set, or if GCC's software emulation
 is unable to accurately represent the result.  */
  if ((flag_rounding_math
   || (MODE_COMPOSITE_P (mode) && !flag_unsafe_math_optimizations))
  && (inexact || !real_identical (, )))
return NULL_TREE;
Or perhaps guard it with MODE_COMPOSITE_P (mode) && 
!flag_unsafe_math_optimizations
too, thus break what gnulib / m4 does with -ffast-math, but not normally?

2020-08-12  Jakub Jelinek  

PR target/95450
* fold-const.c (native_interpret_real): For MODE_COMPOSITE_P modes
punt if the to be returned REAL_CST does not encode to the bitwise
same representation.

* gcc.target/powerpc/pr95450.c: New test.

--- gcc/fold-const.c.jj 2020-08-04 10:02:43.434408528 +0200
+++ gcc/fold-const.c2020-08-12 17:16:31.893226663 +0200
@@ -8327,7 +8327,19 @@ native_interpret_real (tree type, const
 }
 
   real_from_target (, tmp, mode);
-  return build_real (type, r);
+  tree ret = build_real (type, r);
+  if (MODE_COMPOSITE_P (mode))
+{
+  /* For floating point values in composite modes, punt if this folding
+doesn't preserve bit representation.  As the mode doesn't have fixed
+precision while GCC pretends it does, there could be valid values that
+GCC can't really represent accurately.  See PR95450.  */
+  unsigned char buf[24];
+  if (native_encode_expr (ret, buf, total_bytes, 0) != total_bytes
+ || memcmp (ptr, buf, total_bytes) != 0)
+   ret = NULL_TREE;
+}
+  return ret;
 }
 
 
--- gcc/testsuite/gcc.target/powerpc/pr95450.c.jj   2020-08-12 
17:10:51.402872241 +0200
+++ gcc/testsuite/gcc.target/powerpc/pr95450.c  2020-08-12 17:10:51.402872241 
+0200
@@ -0,0 +1,29 @@
+/* PR target/95450 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-not "return \[0-9.e+]\+;" "optimized" } } */
+
+/* Verify this is not optimized for double double into return 
floating_point_constant,
+   as while that constant is the maximum normalized floating point value, it 
needs
+   107 bit precision, which is more than GCC supports for this format.  */
+
+#if __LDBL_MANT_DIG__ == 106
+union U
+{
+  struct { double hi; double lo; } dd;
+  long double ld;
+};
+
+const union U g = { { __DBL_MAX__, __DBL_MAX__ / (double)134217728UL / 
(double)134217728UL } };
+#else
+struct S
+{
+  long double ld;
+} g;
+#endif
+
+long double
+foo (void)
+{
+  return g.ld;
+}


Jakub



Re: [Patch] Fortran: Add support for OpenMP's nontemporal clause

2020-08-12 Thread Jakub Jelinek via Gcc-patches
On Wed, Aug 12, 2020 at 04:55:03PM +0200, Tobias Burnus wrote:
> Low-hanging fruit: Add support for 'omp simd's nontemporal clause,
> which is part of the OpenMP 5.0 features suppored by the ME + C/C++.

To be precise, the ME just ignores it.
While GIMPLE has support for gimple_assign_nontemporal_move_p assignments,
neither omp-low nor omp-expand actually does attempt to set those on
accesses to variables with that clause, and it is e.g. unclear if it is a
safe thing to do.  Don't non-temporal stores e.g. bypass caches on some
architectures, so if e.g. there are stores to those variables shortly before
or after the simd region, won't that make it unclear what values will
appear?  Or do we e.g. need to add some memory barrier instruction at the
start and/or end of the region?  What about non-direct accesses to the
variables?  E.g. if we shouldn't punt if they are address taken etc.
Or would it need to be preserved in the middle-end somehow and resolved
after IPA so that e.g. C++ abstraction would be optimized away where
possible?

Your patch LGTM though.

Jakub



[COMMITTED 1/4] bpf: add support for the -mxbpf option

2020-08-12 Thread Jose E. Marchesi via Gcc-patches
This patch adds support for a new option -mxbpf.  This tells GCC to
generate code for an expanded version of BPF that relaxes some of the
restrictions imposed by BPF.

(cherry pick of 51e10276d6792f67f1d88d90f299e7ac1b1f1f24)

2020-05-19  Jose E. Marchesi  

gcc/
* config/bpf/bpf.opt (mxbpf): New option.
* doc/invoke.texi (Option Summary): Add -mxbpf.
(eBPF Options): Document -mxbbpf.
---
 gcc/config/bpf/bpf.opt | 6 ++
 gcc/doc/invoke.texi| 6 +-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/gcc/config/bpf/bpf.opt b/gcc/config/bpf/bpf.opt
index 78b93c55575..6aa858408f1 100644
--- a/gcc/config/bpf/bpf.opt
+++ b/gcc/config/bpf/bpf.opt
@@ -108,6 +108,12 @@ Enum(bpf_kernel) String(5.1) Value(LINUX_V5_1)
 EnumValue
 Enum(bpf_kernel) String(5.2) Value(LINUX_V5_2)
 
+; Use xBPF extensions.
+
+mxbpf
+Target Report Mask(XBPF)
+Generate xBPF.
+
 ; Selecting big endian or little endian targets.
 
 mbig-endian
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4dc64ffe9dd..3cbf7640bbf 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -863,7 +863,7 @@ Objective-C and Objective-C++ Dialects}.
 
 @emph{eBPF Options}
 @gccoptlist{-mbig-endian -mlittle-endian -mkernel=@var{version}
--mframe-limit=@var{bytes}}
+-mframe-limit=@var{bytes} -mxbpf}
 
 @emph{FR30 Options}
 @gccoptlist{-msmall-model  -mno-lsim}
@@ -21045,6 +21045,10 @@ Generate code for a big-endian target.
 @item -mlittle-endian
 @opindex mlittle-endian
 Generate code for a little-endian target.  This is the default.
+
+@item -mxbpf
+Generate code for an expanded version of BPF, which relaxes some of
+the restrictions imposed by the BPF architecture.
 @end table
 
 @node FR30 Options
-- 
2.25.0.2.g232378479e



[COMMITTED 0/4] bpf: backports to releases/gcc-10

2020-08-12 Thread Jose E. Marchesi via Gcc-patches
Hi people!

Just a few BPF related backports from master to the gcc-10 branch.

I left the changelog entry dates of the original commits untouched,
and added `(cherry-pick from commit XXX)' lines to the commit
messages.  Hope that is ok... please let me know otherwise! :)

Salud!

Jose E. Marchesi (4):
  bpf: add support for the -mxbpf option
  bpf: do not save/restore callee-saved registers in function
prolog/epilog
  bpf: more flexible support for kernel helpers
  bpf: remove trailing whitespaces from source files

 gcc/config/bpf/bpf-helpers.def| 194 ---
 gcc/config/bpf/bpf-helpers.h  | 530 +++---
 gcc/config/bpf/bpf.c  | 305 +-
 gcc/config/bpf/bpf.md |   2 +-
 gcc/config/bpf/bpf.opt|   6 +
 gcc/config/bpf/constraints.md |   1 -
 gcc/config/bpf/predicates.md  |   1 -
 gcc/doc/extend.texi   | 172 +-
 gcc/doc/invoke.texi   |  10 +-
 gcc/testsuite/gcc.target/bpf/diag-funargs-2.c |   1 -
 gcc/testsuite/gcc.target/bpf/diag-funargs-3.c |   1 -
 gcc/testsuite/gcc.target/bpf/helper-bind.c|   4 +-
 .../gcc.target/bpf/helper-bpf-redirect.c  |   4 +-
 .../gcc.target/bpf/helper-clone-redirect.c|   4 +-
 .../gcc.target/bpf/helper-csum-diff.c |   4 +-
 .../gcc.target/bpf/helper-csum-update.c   |   4 +-
 .../bpf/helper-current-task-under-cgroup.c|   4 +-
 .../gcc.target/bpf/helper-fib-lookup.c|   4 +-
 .../bpf/helper-get-cgroup-classid.c   |   4 +-
 .../bpf/helper-get-current-cgroup-id.c|   6 +-
 .../gcc.target/bpf/helper-get-current-comm.c  |   4 +-
 .../bpf/helper-get-current-pid-tgid.c |   4 +-
 .../gcc.target/bpf/helper-get-current-task.c  |   4 +-
 .../bpf/helper-get-current-uid-gid.c  |   4 +-
 .../gcc.target/bpf/helper-get-hash-recalc.c   |   4 +-
 .../gcc.target/bpf/helper-get-listener-sock.c |   4 +-
 .../gcc.target/bpf/helper-get-local-storage.c |   6 +-
 .../gcc.target/bpf/helper-get-numa-node-id.c  |   4 +-
 .../gcc.target/bpf/helper-get-prandom-u32.c   |   4 +-
 .../gcc.target/bpf/helper-get-route-realm.c   |   4 +-
 .../bpf/helper-get-smp-processor-id.c |   4 +-
 .../gcc.target/bpf/helper-get-socket-cookie.c |   6 +-
 .../gcc.target/bpf/helper-get-socket-uid.c|   6 +-
 .../gcc.target/bpf/helper-get-stack.c |   4 +-
 .../gcc.target/bpf/helper-get-stackid.c   |   4 +-
 .../gcc.target/bpf/helper-getsockopt.c|   8 +-
 .../gcc.target/bpf/helper-ktime-get-ns.c  |   4 +-
 .../gcc.target/bpf/helper-l3-csum-replace.c   |   4 +-
 .../gcc.target/bpf/helper-l4-csum-replace.c   |   4 +-
 .../gcc.target/bpf/helper-lwt-push-encap.c|   6 +-
 .../gcc.target/bpf/helper-lwt-seg6-action.c   |   8 +-
 .../bpf/helper-lwt-seg6-adjust-srh.c  |   7 +-
 .../bpf/helper-lwt-seg6-store-bytes.c |   7 +-
 .../gcc.target/bpf/helper-map-delete-elem.c   |   5 +-
 .../gcc.target/bpf/helper-map-lookup-elem.c   |   5 +-
 .../gcc.target/bpf/helper-map-peek-elem.c |   5 +-
 .../gcc.target/bpf/helper-map-pop-elem.c  |   5 +-
 .../gcc.target/bpf/helper-map-push-elem.c |   4 +-
 .../gcc.target/bpf/helper-map-update-elem.c   |   4 +-
 .../gcc.target/bpf/helper-msg-apply-bytes.c   |   6 +-
 .../gcc.target/bpf/helper-msg-cork-bytes.c|   6 +-
 .../gcc.target/bpf/helper-msg-pop-data.c  |   4 +-
 .../gcc.target/bpf/helper-msg-pull-data.c |   9 +-
 .../gcc.target/bpf/helper-msg-push-data.c |   4 +-
 .../gcc.target/bpf/helper-msg-redirect-hash.c |   4 +-
 .../gcc.target/bpf/helper-msg-redirect-map.c  |   7 +-
 .../gcc.target/bpf/helper-override-return.c   |   6 +-
 .../gcc.target/bpf/helper-perf-event-output.c |   3 +-
 .../bpf/helper-perf-event-read-value.c|   6 +-
 .../gcc.target/bpf/helper-perf-event-read.c   |   4 +-
 .../bpf/helper-perf-prog-read-value.c |   6 +-
 .../gcc.target/bpf/helper-probe-read-str.c|   6 +-
 .../gcc.target/bpf/helper-probe-read.c|   4 +-
 .../gcc.target/bpf/helper-probe-write-user.c  |   4 +-
 .../gcc.target/bpf/helper-rc-keydown.c|   7 +-
 .../gcc.target/bpf/helper-rc-pointer-rel.c|   4 +-
 .../gcc.target/bpf/helper-rc-repeat.c |   6 +-
 .../gcc.target/bpf/helper-redirect-map.c  |   6 +-
 .../gcc.target/bpf/helper-set-hash-invalid.c  |   4 +-
 .../gcc.target/bpf/helper-set-hash.c  |   6 +-
 .../gcc.target/bpf/helper-setsockopt.c|   7 +-
 .../gcc.target/bpf/helper-sk-fullsock.c   |   4 +-
 .../gcc.target/bpf/helper-sk-lookup-tcp.c |  12 +-
 .../gcc.target/bpf/helper-sk-lookup-upd.c |  12 +-
 .../gcc.target/bpf/helper-sk-redirect-hash.c  |   5 +-
 .../gcc.target/bpf/helper-sk-redirect-map.c   |   6 +-
 .../gcc.target/bpf/helper-sk-release.c|   6 +-
 .../bpf/helper-sk-select-reuseport.c  |   8 +-
 .../gcc.target/bpf/helper-sk-storage-delete.c |   6 +-
 

[COMMITTED 2/4] bpf: do not save/restore callee-saved registers in function prolog/epilog

2020-08-12 Thread Jose E. Marchesi via Gcc-patches
BPF considers that every call to a function allocates a fresh set of
registers that are available to the callee, of which the first five
may have bee initialized with the function arguments.  This is
implemented by both interpreter and JIT in the Linux kernel.

This is enforced by the kernel BPF verifier, which will reject any
code in which non-initialized registers are accessed before being
written.  Consequently, the spill instructions generated in function
prologue were causing the verifier to reject our compiled programs.

This patch makes GCC to not save/restore callee-saved registers in
function prologue/epilogue, unless xBPF mode is enabled.

(cherry pick of commit 98456a64b0b5c20eeb8f964c7718072ba9b0e568)

2020-05-19  Jose E. Marchesi  

gcc/
* config/bpf/bpf.c (bpf_compute_frame_layout): Include space for
callee saved registers only in xBPF.
(bpf_expand_prologue): Save callee saved registers only in xBPF.
(bpf_expand_epilogue): Likewise for restoring.
* doc/invoke.texi (eBPF Options): Document this is activated by
-mxbpf.

gcc/testsuite/
* gcc.target/bpf/xbpf-callee-saved-regs-1.c: New test.
* gcc.target/bpf/xbpf-callee-saved-regs-2.c: Likewise.
---
 gcc/config/bpf/bpf.c  | 133 ++
 gcc/doc/invoke.texi   |   6 +-
 .../gcc.target/bpf/xbpf-callee-saved-regs-1.c |  17 +++
 .../gcc.target/bpf/xbpf-callee-saved-regs-2.c |  17 +++
 4 files changed, 115 insertions(+), 58 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/bpf/xbpf-callee-saved-regs-1.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/xbpf-callee-saved-regs-2.c

diff --git a/gcc/config/bpf/bpf.c b/gcc/config/bpf/bpf.c
index 368b99c199e..36e08338630 100644
--- a/gcc/config/bpf/bpf.c
+++ b/gcc/config/bpf/bpf.c
@@ -267,15 +267,18 @@ bpf_compute_frame_layout (void)
 
   cfun->machine->local_vars_size += padding_locals;
 
-  /* Set the space used in the stack by callee-saved used registers in
- the current function.  There is no need to round up, since the
- registers are all 8 bytes wide.  */
-  for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
-if ((df_regs_ever_live_p (regno)
-&& !call_used_or_fixed_reg_p (regno))
-   || (cfun->calls_alloca
-   && regno == STACK_POINTER_REGNUM))
-  cfun->machine->callee_saved_reg_size += 8;
+  if (TARGET_XBPF)
+{
+  /* Set the space used in the stack by callee-saved used
+registers in the current function.  There is no need to round
+up, since the registers are all 8 bytes wide.  */
+  for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
+   if ((df_regs_ever_live_p (regno)
+&& !call_used_or_fixed_reg_p (regno))
+   || (cfun->calls_alloca
+   && regno == STACK_POINTER_REGNUM))
+ cfun->machine->callee_saved_reg_size += 8;
+}
 
   /* Check that the total size of the frame doesn't exceed the limit
  imposed by eBPF.  */
@@ -299,38 +302,50 @@ bpf_compute_frame_layout (void)
 void
 bpf_expand_prologue (void)
 {
-  int regno, fp_offset;
   rtx insn;
   HOST_WIDE_INT size;
 
   size = (cfun->machine->local_vars_size
  + cfun->machine->callee_saved_reg_size);
-  fp_offset = -cfun->machine->local_vars_size;
 
-  /* Save callee-saved hard registes.  The register-save-area starts
- right after the local variables.  */
-  for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
+  /* The BPF "hardware" provides a fresh new set of registers for each
+ called function, some of which are initialized to the values of
+ the arguments passed in the first five registers.  In doing so,
+ it saves the values of the registers of the caller, and restored
+ them upon returning.  Therefore, there is no need to save the
+ callee-saved registers here.  What is worse, the kernel
+ implementation refuses to run programs in which registers are
+ referred before being initialized.  */
+  if (TARGET_XBPF)
 {
-  if ((df_regs_ever_live_p (regno)
-  && !call_used_or_fixed_reg_p (regno))
- || (cfun->calls_alloca
- && regno == STACK_POINTER_REGNUM))
-   {
- rtx mem;
+  int regno;
+  int fp_offset = -cfun->machine->local_vars_size;
 
- if (!IN_RANGE (fp_offset, -1 - 0x7fff, 0x7fff))
-   /* This has been already reported as an error in
-  bpf_compute_frame_layout. */
-   break;
- else
+  /* Save callee-saved hard registes.  The register-save-area
+starts right after the local variables.  */
+  for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
+   {
+ if ((df_regs_ever_live_p (regno)
+  && !call_used_or_fixed_reg_p (regno))
+ || (cfun->calls_alloca
+ && regno == STACK_POINTER_REGNUM))
{
- mem = gen_frame_mem (DImode,
-  

[COMMITTED 3/4] bpf: more flexible support for kernel helpers

2020-08-12 Thread Jose E. Marchesi via Gcc-patches
This patch changes the existing support for BPF kernel helpers to be
more flexible, in two main ways.

First, there is no longer a hardcoded list of kernel helpers defined
in the compiler internals.  This is replaced by a new target-specific
attribute `kernel_helper' that the user can use to define her own
helpers, annotating function prototypes.

Second, following feedback from the kernel hackers, the pre-defined
helpers in the distributed bpf-helpers.h are no longer available
conditionally depending on the kernel version used in -mkernel.  The
command-line option stays for now, as it may be useful for other
things.

Target tests and documentation updated.

(cherry pick of commit af30b83b50953fbbe671d93d44ea6ac2f7a50ce9)

2020-08-06  Jose E. Marchesi  

gcc/
* config/bpf/bpf-helpers.h (KERNEL_HELPER): Define.
(KERNEL_VERSION): Remove.
* config/bpf/bpf-helpers.def: Delete.
* config/bpf/bpf.c (bpf_handle_fndecl_attribute): New function.
(bpf_attribute_table): Define.
(bpf_helper_names): Delete.
(bpf_helper_code): Likewise.
(enum bpf_builtins): Adjust to new helpers mechanism.
(bpf_output_call): Likewise.
(bpf_init_builtins): Likewise.
(bpf_init_builtins): Likewise.
* doc/extend.texi (BPF Function Attributes): New section.
(BPF Kernel Helpers): Delete section.

gcc/testsuite/
* gcc.target/bpf/helper-bind.c: Adjust to new kernel helpers
mechanism.
* gcc.target/bpf/helper-bpf-redirect.c: Likewise.
* gcc.target/bpf/helper-clone-redirect.c: Likewise.
* gcc.target/bpf/helper-csum-diff.c: Likewise.
* gcc.target/bpf/helper-csum-update.c: Likewise.
* gcc.target/bpf/helper-current-task-under-cgroup.c: Likewise.
* gcc.target/bpf/helper-fib-lookup.c: Likewise.
* gcc.target/bpf/helper-get-cgroup-classid.c: Likewise.
* gcc.target/bpf/helper-get-current-cgroup-id.c: Likewise.
* gcc.target/bpf/helper-get-current-comm.c: Likewise.
* gcc.target/bpf/helper-get-current-pid-tgid.c: Likewise.
* gcc.target/bpf/helper-get-current-task.c: Likewise.
* gcc.target/bpf/helper-get-current-uid-gid.c: Likewise.
* gcc.target/bpf/helper-get-hash-recalc.c: Likewise.
* gcc.target/bpf/helper-get-listener-sock.c: Likewise.
* gcc.target/bpf/helper-get-local-storage.c: Likewise.
* gcc.target/bpf/helper-get-numa-node-id.c: Likewise.
* gcc.target/bpf/helper-get-prandom-u32.c: Likewise.
* gcc.target/bpf/helper-get-route-realm.c: Likewise.
* gcc.target/bpf/helper-get-smp-processor-id.c: Likewise.
* gcc.target/bpf/helper-get-socket-cookie.c: Likewise.
* gcc.target/bpf/helper-get-socket-uid.c: Likewise.
* gcc.target/bpf/helper-get-stack.c: Likewise.
* gcc.target/bpf/helper-get-stackid.c: Likewise.
* gcc.target/bpf/helper-getsockopt.c: Likewise.
* gcc.target/bpf/helper-ktime-get-ns.c: Likewise.
* gcc.target/bpf/helper-l3-csum-replace.c: Likewise.
* gcc.target/bpf/helper-l4-csum-replace.c: Likewise.
* gcc.target/bpf/helper-lwt-push-encap.c: Likewise.
* gcc.target/bpf/helper-lwt-seg6-action.c: Likewise.
* gcc.target/bpf/helper-lwt-seg6-adjust-srh.c: Likewise.
* gcc.target/bpf/helper-lwt-seg6-store-bytes.c: Likewise.
* gcc.target/bpf/helper-map-delete-elem.c: Likewise.
* gcc.target/bpf/helper-map-lookup-elem.c: Likewise.
* gcc.target/bpf/helper-map-peek-elem.c: Likewise.
* gcc.target/bpf/helper-map-pop-elem.c: Likewise.
* gcc.target/bpf/helper-map-push-elem.c: Likewise.
* gcc.target/bpf/helper-map-update-elem.c: Likewise.
* gcc.target/bpf/helper-msg-apply-bytes.c: Likewise.
* gcc.target/bpf/helper-msg-cork-bytes.c: Likewise.
* gcc.target/bpf/helper-msg-pop-data.c: Likewise.
* gcc.target/bpf/helper-msg-pull-data.c: Likewise.
* gcc.target/bpf/helper-msg-push-data.c: Likewise.
* gcc.target/bpf/helper-msg-redirect-hash.c: Likewise.
* gcc.target/bpf/helper-msg-redirect-map.c: Likewise.
* gcc.target/bpf/helper-override-return.c: Likewise.
* gcc.target/bpf/helper-perf-event-output.c: Likewise.
* gcc.target/bpf/helper-perf-event-read-value.c: Likewise.
* gcc.target/bpf/helper-perf-event-read.c: Likewise.
* gcc.target/bpf/helper-perf-prog-read-value.c: Likewise.
* gcc.target/bpf/helper-probe-read-str.c: Likewise.
* gcc.target/bpf/helper-probe-read.c: Likewise.
* gcc.target/bpf/helper-probe-write-user.c: Likewise.
* gcc.target/bpf/helper-rc-keydown.c: Likewise.
* gcc.target/bpf/helper-rc-pointer-rel.c: Likewise.
* gcc.target/bpf/helper-rc-repeat.c: Likewise.
* gcc.target/bpf/helper-redirect-map.c: Likewise.
* gcc.target/bpf/helper-set-hash-invalid.c: Likewise.
* 

[COMMITTED 4/4] bpf: remove trailing whitespaces from source files

2020-08-12 Thread Jose E. Marchesi via Gcc-patches
This patch is a little cleanup that removes trailing whitespaces from
the bpf backend source files.

(cherry pick of commit e87c540fe43e29663140ed67b98ee437c25696bb)

2020-08-07  Jose E. Marchesi  

gcc/
* config/bpf/bpf.md: Remove trailing whitespaces.
* config/bpf/constraints.md: Likewise.
* config/bpf/predicates.md: Likewise.

gcc/testsuite/
* gcc.target/bpf/diag-funargs-2.c: Remove trailing whitespaces.
* gcc.target/bpf/skb-ancestor-cgroup-id.c: Likewise.
* gcc.target/bpf/helper-xdp-adjust-meta.c: Likewise.
* gcc.target/bpf/helper-xdp-adjust-head.c: Likewise.
* gcc.target/bpf/helper-tcp-check-syncookie.c: Likewise.
* gcc.target/bpf/helper-sock-ops-cb-flags-set.c
* gcc.target/bpf/helper-sysctl-set-new-value.c: Likewise.
* gcc.target/bpf/helper-sysctl-get-new-value.c: Likewise.
* gcc.target/bpf/helper-sysctl-get-name.c: Likewise.
* gcc.target/bpf/helper-sysctl-get-current-value.c: Likewise.
* gcc.target/bpf/helper-strtoul.c: Likewise.
* gcc.target/bpf/helper-strtol.c: Likewise.
* gcc.target/bpf/helper-sock-map-update.c: Likewise.
* gcc.target/bpf/helper-sk-storage-get.c: Likewise.
* gcc.target/bpf/helper-sk-storage-delete.c: Likewise.
* gcc.target/bpf/helper-sk-select-reuseport.c: Likewise.
* gcc.target/bpf/helper-sk-release.c: Likewise.
* gcc.target/bpf/helper-sk-redirect-map.c: Likewise.
* gcc.target/bpf/helper-sk-lookup-upd.c: Likewise.
* gcc.target/bpf/helper-sk-lookup-tcp.c: Likewise.
* gcc.target/bpf/helper-skb-change-head.c: Likewise.
* gcc.target/bpf/helper-skb-cgroup-id.c: Likewise.
* gcc.target/bpf/helper-skb-adjust-room.c: Likewise.
* gcc.target/bpf/helper-set-hash.c: Likewise.
* gcc.target/bpf/helper-setsockopt.c: Likewise.
* gcc.target/bpf/helper-redirect-map.c: Likewise.
* gcc.target/bpf/helper-rc-repeat.c: Likewise.
* gcc.target/bpf/helper-rc-keydown.c: Likewise.
* gcc.target/bpf/helper-probe-read-str.c: Likewise.
* gcc.target/bpf/helper-perf-prog-read-value.c: Likewise.
* gcc.target/bpf/helper-perf-event-read-value.c: Likewise.
* gcc.target/bpf/helper-override-return.c: Likewise.
* gcc.target/bpf/helper-msg-redirect-map.c: Likewise.
* gcc.target/bpf/helper-msg-pull-data.c: Likewise.
* gcc.target/bpf/helper-msg-cork-bytes.c: Likewise.
* gcc.target/bpf/helper-msg-apply-bytes.c: Likewise.
* gcc.target/bpf/helper-lwt-seg6-store-bytes.c: Likewise.
* gcc.target/bpf/helper-lwt-seg6-adjust-srh.c: Likewise.
* gcc.target/bpf/helper-lwt-seg6-action.c: Likewise.
* gcc.target/bpf/helper-lwt-push-encap.c: Likewise.
* gcc.target/bpf/helper-get-socket-uid.c: Likewise.
* gcc.target/bpf/helper-get-socket-cookie.c: Likewise.
* gcc.target/bpf/helper-get-local-storage.c: Likewise.
* gcc.target/bpf/helper-get-current-cgroup-id.c: Likewise.
* gcc.target/bpf/helper-getsockopt.c: Likewise.
* gcc.target/bpf/diag-funargs-3.c: Likewise.
---
 gcc/config/bpf/bpf.md | 2 +-
 gcc/config/bpf/constraints.md | 1 -
 gcc/config/bpf/predicates.md  | 1 -
 gcc/testsuite/gcc.target/bpf/diag-funargs-2.c | 1 -
 gcc/testsuite/gcc.target/bpf/diag-funargs-3.c | 1 -
 .../gcc.target/bpf/helper-get-current-cgroup-id.c | 2 +-
 gcc/testsuite/gcc.target/bpf/helper-get-local-storage.c   | 2 +-
 gcc/testsuite/gcc.target/bpf/helper-get-socket-cookie.c   | 2 +-
 gcc/testsuite/gcc.target/bpf/helper-get-socket-uid.c  | 2 +-
 gcc/testsuite/gcc.target/bpf/helper-getsockopt.c  | 2 +-
 gcc/testsuite/gcc.target/bpf/helper-lwt-push-encap.c  | 2 +-
 gcc/testsuite/gcc.target/bpf/helper-lwt-seg6-action.c | 2 +-
 gcc/testsuite/gcc.target/bpf/helper-lwt-seg6-adjust-srh.c | 5 ++---
 .../gcc.target/bpf/helper-lwt-seg6-store-bytes.c  | 2 +-
 gcc/testsuite/gcc.target/bpf/helper-msg-apply-bytes.c | 2 +-
 gcc/testsuite/gcc.target/bpf/helper-msg-cork-bytes.c  | 2 +-
 gcc/testsuite/gcc.target/bpf/helper-msg-pull-data.c   | 2 +-
 gcc/testsuite/gcc.target/bpf/helper-msg-redirect-map.c| 5 ++---
 gcc/testsuite/gcc.target/bpf/helper-override-return.c | 2 +-
 .../gcc.target/bpf/helper-perf-event-read-value.c | 2 +-
 .../gcc.target/bpf/helper-perf-prog-read-value.c  | 2 +-
 gcc/testsuite/gcc.target/bpf/helper-probe-read-str.c  | 2 +-
 gcc/testsuite/gcc.target/bpf/helper-rc-keydown.c  | 5 ++---
 gcc/testsuite/gcc.target/bpf/helper-rc-repeat.c   | 2 +-
 gcc/testsuite/gcc.target/bpf/helper-redirect-map.c| 2 +-
 gcc/testsuite/gcc.target/bpf/helper-set-hash.c| 2 +-
 gcc/testsuite/gcc.target/bpf/helper-setsockopt.c  | 2 +-
 

[Patch] Fortran: Add support for OpenMP's nontemporal clause

2020-08-12 Thread Tobias Burnus

Low-hanging fruit: Add support for 'omp simd's nontemporal clause,
which is part of the OpenMP 5.0 features suppored by the ME + C/C++.

OK?

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
Fortran: Add support for OpenMP's nontemporal clause

gcc/fortran/ChangeLog:

	* gfortran.h: Add OMP_LIST_NONTEMPORAL.
	* dump-parse-tree.c (show_omp_clauses): Dump it
	* openmp.c (enum omp_mask1): Add OMP_CLAUSE_NOTEMPORAL.
	(OMP_SIMD_CLAUSES): Add it.
	(gfc_match_omp_clauses): Match nontemporal clause.
	* trans-openmp.c (gfc_trans_omp_clauses): Process
	nontemporal clause.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/nontemporal-1.f90: New test.
	* gfortran.dg/gomp/nontemporal-2.f90: New test.

 gcc/fortran/dump-parse-tree.c|  1 +
 gcc/fortran/gfortran.h   |  1 +
 gcc/fortran/openmp.c |  8 +++-
 gcc/fortran/trans-openmp.c   |  3 +++
 gcc/testsuite/gfortran.dg/gomp/nontemporal-1.f90 | 25 +++
 gcc/testsuite/gfortran.dg/gomp/nontemporal-2.f90 | 26 
 6 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/dump-parse-tree.c b/gcc/fortran/dump-parse-tree.c
index 71d0e7d00f5..6e265f4520d 100644
--- a/gcc/fortran/dump-parse-tree.c
+++ b/gcc/fortran/dump-parse-tree.c
@@ -1595,6 +1595,7 @@ show_omp_clauses (gfc_omp_clauses *omp_clauses)
 	  case OMP_LIST_IS_DEVICE_PTR: type = "IS_DEVICE_PTR"; break;
 	  case OMP_LIST_USE_DEVICE_PTR: type = "USE_DEVICE_PTR"; break;
 	  case OMP_LIST_USE_DEVICE_ADDR: type = "USE_DEVICE_ADDR"; break;
+	  case OMP_LIST_NONTEMPORAL: type = "NONTEMPORAL"; break;
 	  default:
 	gcc_unreachable ();
 	  }
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 48b2ab14fdb..559d3c6b8b8 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -1276,6 +1276,7 @@ enum
   OMP_LIST_IS_DEVICE_PTR,
   OMP_LIST_USE_DEVICE_PTR,
   OMP_LIST_USE_DEVICE_ADDR,
+  OMP_LIST_NONTEMPORAL,
   OMP_LIST_NUM
 };
 
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index f402febc211..c44a2530b88 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -794,6 +794,7 @@ enum omp_mask1
   OMP_CLAUSE_IS_DEVICE_PTR,
   OMP_CLAUSE_LINK,
   OMP_CLAUSE_NOGROUP,
+  OMP_CLAUSE_NOTEMPORAL,
   OMP_CLAUSE_NUM_TASKS,
   OMP_CLAUSE_PRIORITY,
   OMP_CLAUSE_SIMD,
@@ -1510,6 +1511,11 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
 	  c->nogroup = needs_space = true;
 	  continue;
 	}
+	  if ((mask & OMP_CLAUSE_NOTEMPORAL)
+	  && gfc_match_omp_variable_list ("nontemporal (",
+	  >lists[OMP_LIST_NONTEMPORAL],
+	  true) == MATCH_YES)
+	continue;
 	  if ((mask & OMP_CLAUSE_NOTINBRANCH)
 	  && !c->notinbranch
 	  && !c->inbranch
@@ -2591,7 +2597,7 @@ cleanup:
   (omp_mask (OMP_CLAUSE_PRIVATE) | OMP_CLAUSE_LASTPRIVATE		\
| OMP_CLAUSE_REDUCTION | OMP_CLAUSE_COLLAPSE | OMP_CLAUSE_SAFELEN	\
| OMP_CLAUSE_LINEAR | OMP_CLAUSE_ALIGNED | OMP_CLAUSE_SIMDLEN	\
-   | OMP_CLAUSE_IF | OMP_CLAUSE_ORDER)
+   | OMP_CLAUSE_IF | OMP_CLAUSE_ORDER | OMP_CLAUSE_NOTEMPORAL)
 #define OMP_TASK_CLAUSES \
   (omp_mask (OMP_CLAUSE_PRIVATE) | OMP_CLAUSE_FIRSTPRIVATE		\
| OMP_CLAUSE_SHARED | OMP_CLAUSE_IF | OMP_CLAUSE_DEFAULT		\
diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 7891a7e651b..063d4c145e2 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -2290,6 +2290,9 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
 	case OMP_LIST_IS_DEVICE_PTR:
 	  clause_code = OMP_CLAUSE_IS_DEVICE_PTR;
 	  goto add_clause;
+	case OMP_LIST_NONTEMPORAL:
+	  clause_code = OMP_CLAUSE_NONTEMPORAL;
+	  goto add_clause;
 
 	add_clause:
 	  omp_clauses
diff --git a/gcc/testsuite/gfortran.dg/gomp/nontemporal-1.f90 b/gcc/testsuite/gfortran.dg/gomp/nontemporal-1.f90
new file mode 100644
index 000..21a94db0ba8
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/nontemporal-1.f90
@@ -0,0 +1,25 @@
+! { dg-do compile }
+! { dg-additional-options "-O2 -fdump-tree-original" }
+
+module m
+  integer :: a(:), b(1024), c(1024), d(1024)
+  allocatable :: a
+end module m
+
+subroutine foo
+  use m
+  implicit none
+  integer :: i
+  !$omp simd nontemporal (a, b)
+  do i = 1, 1024
+a(i) = b(i) + c(i)
+  end do
+
+  !$omp simd nontemporal (d)
+  do i = 1, 1024
+d(i) = 2 * c(i)
+  end do
+end subroutine foo
+
+! { dg-final { scan-tree-dump-times "#pragma omp simd linear\\(i:1\\) nontemporal\\(a\\) nontemporal\\(b\\)" 1 "original" } }
+! { dg-final { scan-tree-dump-times "#pragma omp simd linear\\(i:1\\) nontemporal\\(d\\)" 1 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/gomp/nontemporal-2.f90 b/gcc/testsuite/gfortran.dg/gomp/nontemporal-2.f90
new file mode 100644
index 000..c880bedb1e2
--- 

Re: [PATCH] Fix up flag_cunroll_grow_size handling in presence of optimize attr [PR96535]

2020-08-12 Thread Richard Biener
On August 12, 2020 10:20:12 AM GMT+02:00, Jakub Jelinek  
wrote:
>Hi!
>
>As the testcase in the PR shows (not included in the patch, as
>it seems quite fragile to observe unrolling in the IL), the
>introduction of
>flag_cunroll_grow_size broke optimize attribute related to loop
>unrolling.
>The problem is that the new option flag is set (if not set explicitly)
>only
>in process_options and in rs6000_option_override_internal (and there
>only if
>global_init_p).  So, this means that while it is Optimization option,
>it
>will only be set based on the command line
>-funroll-loops/-O3/-fpeel-loops
>or -funroll-all-loops, which means that if command line does include
>any of
>those, it is enabled even for functions that will through optimize
>attribute
>have all of those disabled, and if command line does not include those,
>it will not be enabled for functions that will through optimize
>attribute
>have any of those enabled.
>
>process_options is called just once, so IMHO it should be handling only
>non-Optimization option adjustments (various other options suffer from
>that
>too, but as this is a regression from 10.1 on the 10 branch, changing
>those
>is not appropriate).  Similarly, rs6000_option_override_internal is
>called
>only once (with global_init_p) and then for target attribute handling,
>but
>not for optimize attribute handling.
>
>This patch moves the unrolling related handling from process_options
>into
>finish_options which is invoked whenever the options are being
>finalized,
>and the rs6000 specific parts into the override_options_after_change
>hook
>which is called for optimize attribute handling (and unfortunately also
>th cfun changes, but what the hook does is cheap) and I've added a call
>to
>that from rs6000_override_options_internal, so it is also called on
>cmdline
>processing and for target attribute.
>
>Furthermore, it stops using AUTODETECT_VALUE, which can work only once,
>and instead uses the global_options_set.x_... flags.
>
>Bootstrapped/regtested on {x86_64,i686,powerpc64{,le}}-linux, ok for
>trunk
>and after a while 10.3?

OK. 

Richard. 

>2020-08-12  Jakub Jelinek  
>
>   PR tree-optimization/96535
>   * toplev.c (process_options): Move flag_unroll_loops and
>   flag_cunroll_grow_size handling from here to ...
>   * opts.c (finish_options): ... here.  For flag_cunroll_grow_size,
>   don't check for AUTODETECT_VALUE, but instead check
>   opts_set->x_flag_cunroll_grow_size.
>   * common.opt (funroll-completely-grow-size): Default to 0.
>   * config/rs6000/rs6000.c (TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE):
>   Redefine.
>   (rs6000_override_options_after_change): New function.
>   (rs6000_option_override_internal): Call it.  Move there the
>   flag_cunroll_grow_size, unroll_only_small_loops and
>   flag_rename_registers handling.
>
>--- gcc/toplev.c.jj2020-08-11 14:20:35.179934850 +0200
>+++ gcc/toplev.c   2020-08-11 14:44:06.861586574 +0200
>@@ -1474,16 +1474,6 @@ process_options (void)
>   flag_abi_version = 2;
> }
> 
>-  /* Unrolling all loops implies that standard loop unrolling must
>also
>- be done.  */
>-  if (flag_unroll_all_loops)
>-flag_unroll_loops = 1;
>-
>-  /* Allow cunroll to grow size accordingly.  */
>-  if (flag_cunroll_grow_size == AUTODETECT_VALUE)
>-flag_cunroll_grow_size
>-  = flag_unroll_loops || flag_peel_loops || optimize >= 3;
>-
>   /* web and rename-registers help when run after loop unrolling.  */
>   if (flag_web == AUTODETECT_VALUE)
> flag_web = flag_unroll_loops;
>--- gcc/opts.c.jj  2020-08-11 14:20:35.169934987 +0200
>+++ gcc/opts.c 2020-08-11 14:43:47.578850847 +0200
>@@ -1142,11 +1142,21 @@ finish_options (struct gcc_options *opts
> 
>/* Control IPA optimizations based on different -flive-patching level. 
>*/
>   if (opts->x_flag_live_patching)
>-{
>-  control_options_for_live_patching (opts, opts_set,
>-   opts->x_flag_live_patching,
>-   loc);
>-}
>+control_options_for_live_patching (opts, opts_set,
>+ opts->x_flag_live_patching,
>+ loc);
>+
>+  /* Unrolling all loops implies that standard loop unrolling must
>also
>+ be done.  */
>+  if (opts->x_flag_unroll_all_loops)
>+opts->x_flag_unroll_loops = 1;
>+
>+  /* Allow cunroll to grow size accordingly.  */
>+  if (!opts_set->x_flag_cunroll_grow_size)
>+opts->x_flag_cunroll_grow_size
>+  = (opts->x_flag_unroll_loops
>+ || opts->x_flag_peel_loops
>+ || opts->x_optimize >= 3);
> }
> 
> #define LEFT_COLUMN   27
>--- gcc/common.opt.jj  2020-08-03 22:54:51.328532939 +0200
>+++ gcc/common.opt 2020-08-11 14:42:14.935120568 +0200
>@@ -2884,7 +2884,7 @@ Common Report Var(flag_unroll_all_loops)
> Perform loop unrolling for all loops.
> 
> funroll-completely-grow-size
>-Undocumented Var(flag_cunroll_grow_size) Init(2) 

Re: [PATCH][AVX512][PR96246] Merge two define_insn: _blendm, _load_mask.

2020-08-12 Thread Kirill Yukhin via Gcc-patches
Hello,

On 22 июл 12:59, Hongtao Liu via Gcc-patches wrote:
>   Those two define_insns have same pattern, and
> _load_mask would always be matched since it show up
> earlier in the md file, and it may lose some opportunity in
> pass_reload since _load_mask only have constraint "0C"
> for operand2, and "v" constraint in _vblendm would never
> be matched.
> 
> 2020-07-21  Hongtao Liu  
> 
> gcc/
>PR target/96246
> * config/i386/sse.md (_load_mask,
> _load_mask): Extend to generate blendm
> instructions.
> (_blendm, _blendm): Change
> define_insn to define_expand.
> 
> gcc/testsuite/
> * gcc.target/i386/avx512bw-pr96246-1.c: New test.
> * gcc.target/i386/avx512bw-pr96246-2.c: New test.
> * gcc.target/i386/avx512vl-pr96246-1.c: New test.
> * gcc.target/i386/avx512vl-pr96246-2.c: New test.
> * gcc.target/i386/avx512bw-vmovdqu16-1.c: New test.
> * gcc.target/i386/avx512bw-vmovdqu8-1.c: New test.
> * gcc.target/i386/avx512f-vmovapd-1.c: New test.
> * gcc.target/i386/avx512f-vmovaps-1.c: New test.
> * gcc.target/i386/avx512f-vmovdqa32-1.c: New test.
> * gcc.target/i386/avx512f-vmovdqa64-1.c: New test.
> * gcc.target/i386/avx512vl-pr92686-movcc-1.c: New test.
> * gcc.target/i386/avx512vl-pr96246-1.c: New test.
> * gcc.target/i386/avx512vl-pr96246-2.c: New test.
> * gcc.target/i386/avx512vl-vmovapd-1.c: New test.
> * gcc.target/i386/avx512vl-vmovaps-1.c: New test.
> * gcc.target/i386/avx512vl-vmovdqa32-1.c: New test.
> * gcc.target/i386/avx512vl-vmovdqa64-1.c: New test.

Your patch is OK for trunk.

--
K


Re: [PATCH] gimple-fold: Don't optimize wierdo floating point value reads [PR95450]

2020-08-12 Thread Richard Biener
On August 12, 2020 10:45:12 AM GMT+02:00, Jakub Jelinek  
wrote:
>Hi!
>
>My patch to introduce native_encode_initializer to fold_ctor_reference
>apparently broke gnulib/m4 on powerpc64.
>There it uses a const union with two doubles and corresponding IBM
>double
>double long double which actually is the largest normalizable long
>double
>value (1 ulp higher than __LDBL_MAX__).  The reason our __LDBL_MAX__ is
>smaller is that we internally treat the double double type as one
>having
>106-bit precision, but it actually has a variable 53-bit to 2000-ish
>bit precision
>and for the
>0x1.f7c000p+1023L
>value gnulib uses we need 107-bit precision, therefore for GCC
>__LDBL_MAX__
>is
>0x1.f78000p+1023L
>Before my changes, we wouldn't be able to fold_ctor_reference it and it
>worked fine at runtime, but with the change we are able to do that, but
>because it is larger than anything we can handle internally, we treat
>it
>weirdly.  Similar problem would be if somebody creates this way valid,
>but much more than 106 bit precision e.g. 1.0 + 1.0e-768.
>Now, I think similar problem could happen e.g. on i?86/x86_64 with long
>double there, it also has some weird values in the format, e.g. the
>unnormals, pseudo infinities and various other magic values.
>
>This patch for floating point types (including vector and complex types
>with such elements) will try to encode the returned value again and
>punt
>if it has different memory representation from the original.  Note,
>this
>is only done in the path where native_encode_initializer was used, in
>order
>not to affect e.g. just reading an unpunned long double value; the
>value
>should be compiler generated in that case and thus should be properly
>representable.  It will punt also if e.g. the padding bits are
>initialized
>to non-zero values.
>
>Or should I do this in native_interpret_real instead, so that we punt
>even
>on say VIEW_CONVERT_EXPR from an integral value containing such weird
>bits?
>
>And, do we want to do it for all floating point constants, or just
>COMPOSITE_MODE_P (element_mode (type)) ones (i.e. only for double
>double)?

Not a final review but if we care for this kind of normalization at all the we 
should do so consistently, thus for both encode and interpret and for all 
modes. For FP we could also check if we'd consider the values equal rather than 
whether we would en/decode them to the same bit pattern (which might or might 
not be what an actual ISA gpr<->fpr reg move would do) 

Richard. 

>Bootstrapped/regtested on {x86_64,i686,powerpc64{,le}}-linux.
>
>2020-08-12  Jakub Jelinek  
>
>   PR target/95450
>   * gimple-fold.c (fold_ctor_reference): When interpreting bytes
>   from native_encode_initializer into a floating point type,
>   verify if it will be encoded back into the same memory representation
>   and punt otherwise.
>
>--- gcc/gimple-fold.c.jj   2020-08-04 11:31:26.580268603 +0200
>+++ gcc/gimple-fold.c  2020-08-11 19:00:59.147564022 +0200
>@@ -7090,7 +7090,19 @@ fold_ctor_reference (tree type, tree cto
> int len = native_encode_initializer (ctor, buf, size /
>BITS_PER_UNIT,
>  offset / BITS_PER_UNIT);
> if (len > 0)
>-  return native_interpret_expr (type, buf, len);
>+  {
>+ret = native_interpret_expr (type, buf, len);
>+if (ret && FLOAT_TYPE_P (type))
>+  {
>+/* For floating point values, punt if this folding
>+   doesn't preserve bit representation (canonicalizes some
>+   bits e.g. in NaN, etc.), see PR95450.  */
>+unsigned char ver[MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT];
>+if (native_encode_initializer (ret, ver, len, 0) != len
>+|| memcmp (buf, ver, len) != 0)
>+  ret = NULL_TREE;
>+  }
>+  }
>   }
> 
>   return ret;
>--- gcc/testsuite/gcc.target/powerpc/pr95450.c.jj  2020-08-11
>19:21:35.654633211 +0200
>+++ gcc/testsuite/gcc.target/powerpc/pr95450.c 2020-08-11
>19:23:24.176147695 +0200
>@@ -0,0 +1,29 @@
>+/* PR target/95450 */
>+/* { dg-do compile } */
>+/* { dg-options "-O2 -fdump-tree-optimized" } */
>+/* { dg-final { scan-tree-dump-not "return \[0-9.e+]\+;" "optimized" }
>} */
>+
>+/* Verify this is not optimized for double double into return
>floating_point_constant,
>+   as while that constant is the maximum normalized floating point
>value, it needs
>+   107 bit precision, which is more than GCC supports for this format.
> */
>+
>+#if __LDBL_MANT_DIG__ == 106
>+union U
>+{
>+  struct { double hi; double lo; } dd;
>+  long double ld;
>+};
>+
>+const union U g = { { __DBL_MAX__, __DBL_MAX__ / (double)134217728UL /
>(double)134217728UL } };
>+#else
>+struct S
>+{
>+  long double ld;
>+} g;
>+#endif
>+
>+long double
>+foo (void)
>+{
>+  return g.ld;
>+}
>
>   Jakub



[committed][testsuite, nvptx] Borrow ia64-sync-*.c test-cases in gcc.target/nvptx

2020-08-12 Thread Tom de Vries
Hi,

In absence of nvptx-enabling for effective target sync_int_long (see PR96494),
copy a few test-cases to gcc.target/nvptx.

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[testsuite, nvptx] Borrow ia64-sync-*.c test-cases in gcc.target/nvptx

---
 gcc/testsuite/gcc.target/nvptx/ia64-sync-1.c | 2 ++
 gcc/testsuite/gcc.target/nvptx/ia64-sync-2.c | 2 ++
 gcc/testsuite/gcc.target/nvptx/ia64-sync-3.c | 2 ++
 gcc/testsuite/gcc.target/nvptx/ia64-sync-4.c | 3 +++
 4 files changed, 9 insertions(+)

diff --git a/gcc/testsuite/gcc.target/nvptx/ia64-sync-1.c 
b/gcc/testsuite/gcc.target/nvptx/ia64-sync-1.c
new file mode 100644
index 000..7685a799642
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/ia64-sync-1.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "../../gcc.dg/ia64-sync-1.c"
diff --git a/gcc/testsuite/gcc.target/nvptx/ia64-sync-2.c 
b/gcc/testsuite/gcc.target/nvptx/ia64-sync-2.c
new file mode 100644
index 000..d229b5f9181
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/ia64-sync-2.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "../../gcc.dg/ia64-sync-2.c"
diff --git a/gcc/testsuite/gcc.target/nvptx/ia64-sync-3.c 
b/gcc/testsuite/gcc.target/nvptx/ia64-sync-3.c
new file mode 100644
index 000..353fd74da57
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/ia64-sync-3.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "../../gcc.dg/ia64-sync-3.c"
diff --git a/gcc/testsuite/gcc.target/nvptx/ia64-sync-4.c 
b/gcc/testsuite/gcc.target/nvptx/ia64-sync-4.c
new file mode 100644
index 000..3547429fe09
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/ia64-sync-4.c
@@ -0,0 +1,3 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -finline-functions" } */
+#include "../../gcc.dg/ia64-sync-4.c"


[PATCH][testsuite, nvptx] Add effective target sync_int_long_stack

2020-08-12 Thread Tom de Vries
Hi,

The nvptx target currently doesn't support effective target sync_int_long,
although it has support for 32-bit and 64-bit atomic.

When enabling sync_int_long for nvptx, we run into a failure in
gcc.dg/pr86314.c:
...
 nvptx-run: error getting kernel result: operation not supported on \
   global/shared address space
...
due to a ptx restriction:  accesses to local memory are illegal, and the
test-case does an atomic operation on a stack address, which is mapped to
local memory.

Fix this by adding a target sync_int_long_stack, wich returns false for nvptx,
which can be used to mark test-cases that require sync_int_long support for
stack address.

Build on nvptx and tested with make check-gcc.

OK for trunk?

Thanks,
- Tom

[testsuite, nvptx] Add effective target sync_int_long_stack

gcc/testsuite/ChangeLog:

PR target/96494
* lib/target-supports.exp (check_effective_target_sync_int_long):
Return 1 for nvptx.
(check_effective_target_sync_int_long_stack): New proc.
* gcc.dg/pr86314.c: Require effective target sync_int_long_stack.

---
 gcc/testsuite/gcc.dg/pr86314.c|  2 +-
 gcc/testsuite/lib/target-supports.exp | 11 ++-
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/pr86314.c b/gcc/testsuite/gcc.dg/pr86314.c
index 8962a3cf2ff..565fb02eee2 100644
--- a/gcc/testsuite/gcc.dg/pr86314.c
+++ b/gcc/testsuite/gcc.dg/pr86314.c
@@ -1,5 +1,5 @@
 // PR target/86314
-// { dg-do run { target sync_int_long } }
+// { dg-do run { target sync_int_long_stack } }
 // { dg-options "-O2" }
 
 __attribute__((noinline, noclone)) unsigned long
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index e79015b4d54..a870b1de275 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -7704,7 +7704,16 @@ proc check_effective_target_sync_int_long { } {
 || [istarget cris-*-*]
 || ([istarget sparc*-*-*] && [check_effective_target_sparc_v9])
 || ([istarget arc*-*-*] && [check_effective_target_arc_atomic])
-|| [check_effective_target_mips_llsc] }}]
+|| [check_effective_target_mips_llsc]
+|| [istarget nvptx*-*-*]
+}}]
+}
+
+proc check_effective_target_sync_int_long_stack { } {
+return [check_cached_effective_target sync_int_long_stack {
+  expr { ![istarget nvptx*-*-*]
+&& [check_effective_target_sync_int_long]   
+}}]
 }
 
 # Return 1 if the target supports atomic operations on "char" and "short".


Re: [PATCH 1/3] vec: add exact argument for various grow functions.

2020-08-12 Thread Martin Liška

On 8/11/20 4:58 PM, Martin Sebor wrote:

On 8/11/20 5:36 AM, Martin Liška wrote:

Hello.

All right, I did it in 3 steps:
1) - new exact argument is added (no default value) - I tested the on 
x86_64-linux-gnu
and I build all cross targets.
2) set default value of exact = false
3) change places which calculate its own growth to use the default argument


The usual intent of a default argument is to supply a value the function
is the most commonly invoked with.   But in this case it looks like it's
the opposite: most of the callers (hundreds) provide the non-default
value (true) and only a handful make use of the default.  I feel I must
be  missing something.  What is it?


You are right, but Richi wanted to make this transformation in more defensive 
way.
I'm eventually planning to drop the explicit 'true' argument for most of the 
places
except selective scheduling and LTO streaming.

I guess Richi can defend his strategy for this ;) ?

Martin



Martin



I would like to install first 1) and then wait some time before the rest is 
installed.

Thoughts?
Martin






[PATCH] ipa: fix bit CPP when combined with IPA bit CP

2020-08-12 Thread Martin Liška

Hello.

First, I must confess I'm not familiar with bit constant propagation.
However, I learnt something during debugging session related to this PR.
 
As mentioned in the PR, let's consider the following example:


int
__attribute__((noinline))
foo(int arg)
{
  if (arg == 3)
return 1;
  if (arg == 4)
return 123;

  __builtin_unreachable ();
}

during WPA we find all calls of the function
(yes the call with value 5 is UBSAN):

  Node: foo/0:
param [0]: 5 [loc_time: 4, loc_size: 2, prop_time: 0, prop_size: 0]
   3 [loc_time: 3, loc_size: 3, prop_time: 0, prop_size: 0]
 ctxs: VARIABLE
 Bits: value = 0x5, mask = 0x6

in LTRANS we have the following VRP info:

  # RANGE [3, 3] NONZERO 3

when we AND masks in get_default_value we end up with 6 & 3 = 2 (0x010).
That means the only second (least significant bit) is unknown and
value (5 = 0x101) & ~mask gives us either 7 (0x111) or 5 (0x101).

That's why if (arg_2(D) == 3) gets optimized to false.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Thoughts?
Martin

gcc/ChangeLog:

PR ipa/96482
* tree-ssa-ccp.c (get_default_value): Known bits from IPA bits
CP should be masked against nonzero bits get from VRP.

gcc/testsuite/ChangeLog:

PR ipa/96482
* gcc.dg/ipa/pr96482.c: New test.
---
 gcc/testsuite/gcc.dg/ipa/pr96482.c | 44 ++
 gcc/tree-ssa-ccp.c | 14 ++
 2 files changed, 53 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr96482.c

diff --git a/gcc/testsuite/gcc.dg/ipa/pr96482.c 
b/gcc/testsuite/gcc.dg/ipa/pr96482.c
new file mode 100644
index 000..68ead798d28
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr96482.c
@@ -0,0 +1,44 @@
+/* PR ipa/96482 */
+/* { dg-do run } */
+/* { dg-options "-O2 -flto"  } */
+/* { dg-require-effective-target lto } */
+
+int
+__attribute__((noinline))
+foo(int arg)
+{
+  if (arg == 3)
+return 1;
+  if (arg == 4)
+return 123;
+
+  __builtin_unreachable ();
+}
+
+int
+__attribute__((noinline))
+baz(int x)
+{
+  if (x != 0)
+return foo(3); /* called */
+
+  return 1;
+}
+
+int
+__attribute__((noinline))
+bar(int x)
+{
+  if (x == 0)
+return foo(5); /* not executed */
+
+  return 1;
+}
+
+int main(int argc, char **argv)
+{
+  if (bar(argc) != baz(argc))
+__builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/tree-ssa-ccp.c b/gcc/tree-ssa-ccp.c
index 7e3921869b8..d8ba5379c4a 100644
--- a/gcc/tree-ssa-ccp.c
+++ b/gcc/tree-ssa-ccp.c
@@ -299,24 +299,28 @@ get_default_value (tree var)
  wide_int nonzero_bits = get_nonzero_bits (var);
  tree value;
  widest_int mask;
+ tree type = TREE_TYPE (var);
 
 	  if (SSA_NAME_VAR (var)

  && TREE_CODE (SSA_NAME_VAR (var)) == PARM_DECL
  && ipcp_get_parm_bits (SSA_NAME_VAR (var), , ))
{
  val.lattice_val = CONSTANT;
- val.value = value;
+ /* Known bits from IPA CP should be masked
+with nonzero_bits.  */
+ wide_int ipa_value = wi::to_wide (value);
+ ipa_value &= nonzero_bits;
+ val.value = wide_int_to_tree (type, ipa_value);
+
  val.mask = mask;
  if (nonzero_bits != -1)
-   val.mask &= extend_mask (nonzero_bits,
-TYPE_SIGN (TREE_TYPE (var)));
+   val.mask &= extend_mask (nonzero_bits, TYPE_SIGN (type));
}
  else if (nonzero_bits != -1)
{
  val.lattice_val = CONSTANT;
  val.value = build_zero_cst (TREE_TYPE (var));
- val.mask = extend_mask (nonzero_bits,
- TYPE_SIGN (TREE_TYPE (var)));
+ val.mask = extend_mask (nonzero_bits, TYPE_SIGN (type));
}
}
}
--
2.28.0



Re: [PATCH][testsuite] Add effective target large_initializer

2020-08-12 Thread Jakub Jelinek via Gcc-patches
On Wed, Aug 12, 2020 at 01:05:26PM +0200, Tom de Vries wrote:
> [testsuite] Add effective target large_initializer
> 
> gcc/testsuite/ChangeLog:
> 
>   PR testsuite/96566
>   * lib/target-supports.exp (check_effective_target_large_initializer):
>   New proc.
>   * gcc.dg/builtin-object-size-21.c: Require large_initializer.
>   * gcc.dg/strlenopt-55.c: Same.

Ok, thanks.

Jakub



[Committed] Update email address

2020-08-12 Thread Senthil Kumar Selvaraj via Gcc-patches

This patch updates my email address in the MAINTAINERS file

2020-08-12  Senthil Kumar Selvaraj  

	* MAINTAINERS: Update my email address.


diff --git MAINTAINERS MAINTAINERS
index 0b825c7ea6d..217ec9c9eca 100644
--- MAINTAINERS
+++ MAINTAINERS
@@ -588,7 +588,7 @@ Stefan Schulze Frielinghaus 

 Tilo Schwarz   
 Martin Sebor   
 Svein Seldal   
-Senthil Kumar Selvaraj 

+Senthil Kumar Selvaraj 
 Thiemo Seufer  
 Bill Seurer
 Tim Shen   


[PATCH][testsuite] Add effective target large_initializer

2020-08-12 Thread Tom de Vries
Hi,

When compiling builtin-object-size-21.c for nvptx, cc1 times out while
emitting the initializer for global variable xm3_3.

With x86_64, we are able to emit the initializer with a few lines of assembly:
...
xm3_3:
.byte   0
.zero   9223372036854775803
.byte   1
.byte   2
.byte   3
...
but with nvptx, we don't have some something similar available, and thus
generate:
...
  .visible .global .align 1 .u32 xm3_3[2305843009213693952] =
  { 0, 0, 0, ...
...

Introduce an effective target large_initializer, returning false for nvptx,
and require it for test-cases with large initializers.

Tested on nvptx with make check-gcc.

OK for trunk?

Thanks,
- Tom

[testsuite] Add effective target large_initializer

gcc/testsuite/ChangeLog:

PR testsuite/96566
* lib/target-supports.exp (check_effective_target_large_initializer):
New proc.
* gcc.dg/builtin-object-size-21.c: Require large_initializer.
* gcc.dg/strlenopt-55.c: Same.

---
 gcc/testsuite/gcc.dg/builtin-object-size-21.c |  3 ++-
 gcc/testsuite/gcc.dg/strlenopt-55.c   |  3 ++-
 gcc/testsuite/lib/target-supports.exp | 11 +++
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-21.c 
b/gcc/testsuite/gcc.dg/builtin-object-size-21.c
index 1c42374ba89..7e0f85ffdf3 100644
--- a/gcc/testsuite/gcc.dg/builtin-object-size-21.c
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-21.c
@@ -1,7 +1,8 @@
 /* PR middle-end/92815 - spurious -Wstringop-overflow writing into
a flexible array of an extern struct
{ dg-do compile }
-   { dg-options "-Wall -fdump-tree-optimized" } */
+   { dg-options "-Wall -fdump-tree-optimized" }
+   { dg-require-effective-target large_initializer } */
 
 #define PTRDIFF_MAX __PTRDIFF_MAX__
 
diff --git a/gcc/testsuite/gcc.dg/strlenopt-55.c 
b/gcc/testsuite/gcc.dg/strlenopt-55.c
index ea6fb22a2ed..ca89ecd3c53 100644
--- a/gcc/testsuite/gcc.dg/strlenopt-55.c
+++ b/gcc/testsuite/gcc.dg/strlenopt-55.c
@@ -3,7 +3,8 @@
 
Verify that strlen() of braced initialized array is folded
{ dg-do compile }
-   { dg-options "-O1 -Wall -fdump-tree-gimple -fdump-tree-optimized" } */
+   { dg-options "-O1 -Wall -fdump-tree-gimple -fdump-tree-optimized" }
+   { dg-require-effective-target large_initializer } */
 
 #include "strlenopt.h"
 
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index e79015b4d54..4e0d45aaae5 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -10424,3 +10424,14 @@ proc check_effective_target_msp430_large {} {
#endif
 } ""]
 }
+
+# Return 1 if the target has an efficient means to encode large initializers
+# in the assembly.
+
+proc check_effective_target_large_initializer { } {
+if { [istarget nvptx*-*-*] } {
+   return 0
+}
+
+return 1
+}


RE: Do not combine PRED_LOOP_GUARD and PRED_LOOP_GUARD_WITH_RECURSION

2020-08-12 Thread Tamar Christina
Hi Honza,

> -Original Message-
> From: Gcc-patches  On Behalf Of Jan
> Hubicka
> Sent: Tuesday, August 11, 2020 11:04 AM
> To: gcc-patches@gcc.gnu.org
> Subject: Do not combine PRED_LOOP_GUARD and
> PRED_LOOP_GUARD_WITH_RECURSION
> 
> Hi,
> this patch avoids both PRED_LOOP_GUARD and
> PRED_LOOP_GUARD_WITH_RECURSION to be attached to one edge.  We
> have logic that prevents same predictor to apply to one edge twice, but since
> we split LOOP_GUARD to two more specialized cases, this no longer fires.
> 
> Double prediction happens in exchange benchmark and leads to
> unrealistically low hitrates on some edges which in turn leads to bad IPA
> profile and misguides ipa-cp.
> 
> Unforutnately it seems that the bad profile also leads to bit better
> performance by disabling some of loop stuff, but that really ought to be done
> in some meaningful way, not by an accident.

Hmm the regression on exchange2 is 11%. Or 1.22% on specint 2017 overall..
This is a rather big regression.. What would be the correct way to do this?

Kind Regards,
Tamar

> 
> Bootstrapped/regtested x86_64-linux, comitted.
> 
> Honza
> 
> gcc/ChangeLog:
> 
> 2020-08-11  Jan Hubicka  
> 
>   * predict.c (not_loop_guard_equal_edge_p): New function.
>   (maybe_predict_edge): New function.
>   (predict_paths_for_bb): Use it.
>   (predict_paths_leading_to_edge): Use it.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-08-11  Jan Hubicka  
> 
>   * gcc.dg/ipa/ipa-clone-2.c: Lower threshold from 500 to 400.
> 
> diff --git a/gcc/predict.c b/gcc/predict.c index 2164a06e083..4c4bba54939
> 100644
> --- a/gcc/predict.c
> +++ b/gcc/predict.c
> @@ -3122,6 +3122,35 @@ tree_guess_outgoing_edge_probabilities
> (basic_block bb)
>bb_predictions = NULL;
>  }
> 
> 
> 
> +/* Filter function predicate that returns true for a edge predicate P
> +   if its edge is equal to DATA.  */
> +
> +static bool
> +not_loop_guard_equal_edge_p (edge_prediction *p, void *data) {
> +  return p->ep_edge != (edge)data || p->ep_predictor !=
> +PRED_LOOP_GUARD; }
> +
> +/* Predict edge E with PRED unless it is already predicted by some predictor
> +   considered equivalent.  */
> +
> +static void
> +maybe_predict_edge (edge e, enum br_predictor pred, enum prediction
> +taken) {
> +  if (edge_predicted_by_p (e, pred, taken))
> +return;
> +  if (pred == PRED_LOOP_GUARD
> +  && edge_predicted_by_p (e, PRED_LOOP_GUARD_WITH_RECURSION,
> taken))
> +return;
> +  /* Consider PRED_LOOP_GUARD_WITH_RECURSION superrior to
> LOOP_GUARD.
> +*/
> +  if (pred == PRED_LOOP_GUARD_WITH_RECURSION)
> +{
> +  edge_prediction **preds = bb_predictions->get (e->src);
> +  if (preds)
> + filter_predictions (preds, not_loop_guard_equal_edge_p, e);
> +}
> +  predict_edge_def (e, pred, taken);
> +}
>  /* Predict edges to successors of CUR whose sources are not postdominated
> by
> BB by PRED and recurse to all postdominators.  */
> 
> @@ -3177,10 +3206,7 @@ predict_paths_for_bb (basic_block cur,
> basic_block bb,
>regions that are only reachable by abnormal edges.  We simply
>prevent visiting given BB twice.  */
>if (found)
> - {
> -   if (!edge_predicted_by_p (e, pred, taken))
> -predict_edge_def (e, pred, taken);
> - }
> + maybe_predict_edge (e, pred, taken);
>else if (bitmap_set_bit (visited, e->src->index))
>   predict_paths_for_bb (e->src, e->src, pred, taken, visited, in_loop);
>  }
> @@ -3223,7 +3249,7 @@ predict_paths_leading_to_edge (edge e, enum
> br_predictor pred,
>if (!has_nonloop_edge)
>  predict_paths_for_bb (bb, bb, pred, taken, auto_bitmap (), in_loop);
>else
> -predict_edge_def (e, pred, taken);
> +maybe_predict_edge (e, pred, taken);
>  }
> 
> 
> 
>  /* This is used to carry information about basic blocks.  It is diff --git
> a/gcc/testsuite/gcc.dg/ipa/ipa-clone-2.c b/gcc/testsuite/gcc.dg/ipa/ipa-
> clone-2.c
> index d513020ee8b..53ae25a1e24 100644
> --- a/gcc/testsuite/gcc.dg/ipa/ipa-clone-2.c
> +++ b/gcc/testsuite/gcc.dg/ipa/ipa-clone-2.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O3 -fdump-ipa-cp-details -fno-early-inlining --param ipa-
> cp-max-recursive-depth=8" } */
> +/* { dg-options "-O3 -fdump-ipa-cp-details -fno-early-inlining --param
> +ipa-cp-max-recursive-depth=8 --param=ipa-cp-eval-threshold=400" } */
> 
>  int fn();
> 


[committed][nvptx] Fix array dimension in nvptx_assemble_decl_begin

2020-08-12 Thread Tom de Vries
Hi,

When compiling test-case builtin-object-size-21.c, cc1 emits:
...
  .visible .global .align 1 .u32 xm3_3[-2305843009213693951] =
...
for:
...
struct Ax_m3 { char a[PTRDIFF_MAX - 3], ax[]; };

struct Ax_m3 xm3_3 = { { 0 }, { 1, 2, 3 } };
...

Fix this by:
- changing the printing format for unsigned HOST_WIDE_INT init_frag.remaining
  to HOST_WIDE_INT_PRINT_UNSIGNED
- changing the type of local variable elt_size in nvptx_assemble_decl_begin
  to unsigned HOST_WIDE_INT.
such that we have:
...
  .visible .global .align 1 .u32 xm3_3[2305843009213693952] =
...
where 2305843009213693952 == 0x2000, so the array is claiming
0x8000 bytes, which is one more than PTRDIFF_MAX.  This is due
to using .u32 instead of .u8, so strictly speaking we should downgrade to
using .u8 in this case, but that corner-case problem doesn't look urgent
enough to fix in this commit.

Build on nvptx, tested with make check-gcc.

Committed to trunk.

Thanks,
- Tom

[nvptx] Fix array dimension in nvptx_assemble_decl_begin

gcc/ChangeLog:

* config/nvptx/nvptx.c (nvptx_assemble_decl_begin): Make elt_size an
unsigned HOST_WIDE_INT.  Print init_frag.remaining using
HOST_WIDE_INT_PRINT_UNSIGNED.

---
 gcc/config/nvptx/nvptx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index cf53a921e5b..39d0275493a 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -2202,7 +2202,7 @@ nvptx_assemble_decl_begin (FILE *file, const char *name, 
const char *section,
 /* Neither vector nor complex types can contain the other.  */
 type = TREE_TYPE (type);
 
-  unsigned elt_size = int_size_in_bytes (type);
+  unsigned HOST_WIDE_INT elt_size = int_size_in_bytes (type);
 
   /* Largest mode we're prepared to accept.  For BLKmode types we
  don't know if it'll contain pointer constants, so have to choose
@@ -2232,7 +2232,7 @@ nvptx_assemble_decl_begin (FILE *file, const char *name, 
const char *section,
   if (size)
 /* We make everything an array, to simplify any initialization
emission.  */
-fprintf (file, "[" HOST_WIDE_INT_PRINT_DEC "]", init_frag.remaining);
+fprintf (file, "[" HOST_WIDE_INT_PRINT_UNSIGNED "]", init_frag.remaining);
   else if (atype)
 fprintf (file, "[]");
 }


[PATCH] ipa-inline: Improve growth accumulation for recursive calls

2020-08-12 Thread Xionghu Luo via Gcc-patches
From: Xiong Hu Luo 

For SPEC2017 exchange2, there is a large recursive functiondigits_2(function
size 1300) generates specialized node from digits_2.1 to digits_2.8 with added
build option:

--param ipa-cp-eval-threshold=1 --param ipa-cp-unit-growth=80

ipa-inline pass will consider inline these nodes called only once, but these
large functions inlined too deeply will cause serious register spill and
performance down as followed.

inlineA: brute (inline digits_2.1, 2.2, 2.3, 2.4) -> digits_2.5 (inline 2.6, 
2.7, 2.8)
inlineB: digits_2.1 (inline digits_2.2, 2.3) -> call digits_2.4 (inline 
digits_2.5, 2.6) -> call digits_2.7 (inline 2.8)
inlineC: brute (inline digits_2) -> call 2.1 -> 2.2 (inline 2.3) -> 2.4 -> 2.5 
-> 2.6 (inline 2.7 ) -> 2.8
inlineD: brute -> call digits_2 -> call 2.1 -> call 2.2 -> 2.3 -> 2.4 -> 2.5 -> 
2.6 -> 2.7 -> 2.8

Performance diff:
inlineB is ~25% faster than inlineA;
inlineC is ~20% faster than inlineB;
inlineD is ~30% faster than inlineC.

The master GCC code now generates inline sequence like inlineB, this patch
makes the ipa-inline pass behavior like inlineD by:
 1) The growth acumulation for recursive calls by adding the growth data
to the edge when edge's caller is inlined into another function to avoid
inline too deeply;
 2) And if caller and callee are both specialized from same node, the edge
should also be considered as recursive edge.

SPEC2017 test shows GEOMEAN improve +2.75% in total(+0.56% without exchange2).
Any comments?  Thanks.

523.xalancbmk_r +1.32%
541.leela_r +1.51%
548.exchange2_r +31.87%
507.cactuBSSN_r +0.80%
526.blender_r   +1.25%
538.imagick_r   +1.82%

gcc/ChangeLog:

2020-08-12  Xionghu Luo  

* cgraph.h (cgraph_edge::recursive_p): Return true if caller and
callee and specialized from same node.
* ipa-inline-analysis.c (do_estimate_growth_1): Add caller's
inlined_to growth to edge whose caller is inlined.
---
 gcc/cgraph.h  | 2 ++
 gcc/ipa-inline-analysis.c | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 0211f08964f..11903ac1960 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -3314,6 +3314,8 @@ cgraph_edge::recursive_p (void)
   cgraph_node *c = callee->ultimate_alias_target ();
   if (caller->inlined_to)
 return caller->inlined_to->decl == c->decl;
+  else if (caller->clone_of && c->clone_of)
+return caller->clone_of->decl == c->clone_of->decl;
   else
 return caller->decl == c->decl;
 }
diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
index 148efbc09ef..ba0cf836364 100644
--- a/gcc/ipa-inline-analysis.c
+++ b/gcc/ipa-inline-analysis.c
@@ -434,6 +434,9 @@ do_estimate_growth_1 (struct cgraph_node *node, void *data)
  continue;
}
   d->growth += estimate_edge_growth (e);
+  if (e->caller->inlined_to)
+   d->growth += ipa_fn_summaries->get (e->caller->inlined_to)->growth;
+
   if (d->growth > d->cap)
return true;
 }
-- 
2.25.1



Re: [PATCH] middle-end: Recognize idioms for bswap32 and bswap64 in match.pd.

2020-08-12 Thread Marc Glisse

On Wed, 12 Aug 2020, Roger Sayle wrote:


This patch is inspired by a small code fragment in comment #3 of
bugzilla PR rtl-optimization/94804.  That snippet appears almost
unrelated to the topic of the PR, but recognizing __builtin_bswap64
from two __builtin_bswap32 calls, seems like a clever/useful trick.
GCC's optabs.c contains the inverse logic to expand bswap64 by
IORing two bswap32 calls, so this transformation/canonicalization
is safe, even on targets without suitable optab support.  But
on x86_64, the swap64 of the test case becomes a single instruction.


This patch has been tested on x86_64-pc-linux-gnu with a "make
bootstrap" and a "make -k check" with no new failures.
Ok for mainline?


Your tests seem to assume that int has 32 bits and long 64.

+  (if (operand_equal_p (@0, @2, 0)

Why not reuse @0 instead of introducing @2 in the pattern? Similarly, it 
may be a bit shorter to reuse @1 instead of a new @3 (I don't think the 
tricks with @@ will be needed here).


+   && types_match (TREE_TYPE (@0), uint64_type_node)

that seems very specific. What goes wrong with a signed type for instance?

+(simplify
+  (bit_ior:c
+(lshift
+  (convert (BUILT_IN_BSWAP16 (convert (bit_and @0
+  INTEGER_CST@1
+  (INTEGER_CST@2))
+(convert (BUILT_IN_BSWAP16 (convert (rshift @3
+   INTEGER_CST@4)

I didn't realize we kept this useless bit_and when casting to a smaller 
type. We probably get a different pattern on 16-bit targets, but a pattern 
they do not match won't hurt them.


--
Marc Glisse


Re: [PATCH] arm: Clear canary value after stack_protect_test [PR96191]

2020-08-12 Thread Christophe Lyon via Gcc-patches
On Tue, 11 Aug 2020 at 18:42, Richard Sandiford
 wrote:
>
> Christophe Lyon  writes:
> > On Mon, 10 Aug 2020 at 17:27, Richard Sandiford
> >  wrote:
> >>
> >> Christophe Lyon  writes:
> >> > On Wed, 5 Aug 2020 at 16:33, Richard Sandiford
> >> >  wrote:
> >> >>
> >> >> The stack_protect_test patterns were leaving the canary value in the
> >> >> temporary register, meaning that it was often still in registers on
> >> >> return from the function.  An attacker might therefore have been
> >> >> able to use it to defeat stack-smash protection for a later function.
> >> >>
> >> >> Tested on arm-linux-gnueabi, arm-linux-gnueabihf and armeb-eabi.
> >> >> I tested the thumb1.md part using arm-linux-gnueabi with the
> >> >> test flags -march=armv5t -mthumb.  OK for trunk and branches?
> >> >>
> >> >> As I mentioned in the corresponding aarch64 patch, this is needed
> >> >> to make arm conform to GCC's current -fstack-protector implementation.
> >> >> However, I think we should reconsider whether the zeroing is actually
> >> >> necessary and what it's actually protecting against.  I'll send a
> >> >> separate message about that to gcc@.  But since the port isn't even
> >> >> self-consistent (the *set patterns do clear the registers), I think
> >> >> we should do this first rather than wait for any outcome of that
> >> >> discussion.
> >> >>
> >> >> Richard
> >> >>
> >> >>
> >> >> gcc/
> >> >> PR target/96191
> >> >> * config/arm/arm.md (arm_stack_protect_test_insn): Zero out
> >> >> operand 2 after use.
> >> >> * config/arm/thumb1.md (thumb1_stack_protect_test_insn): 
> >> >> Likewise.
> >> >>
> >> >> gcc/testsuite/
> >> >> * gcc.target/arm/stack-protector-1.c: New test.
> >> >> * gcc.target/arm/stack-protector-2.c: Likewise.
> >> >
> >> > Hi Richard,
> >> >
> >> > The new tests fail when compiled with -mcpu=cortex-mXX because gas 
> >> > complains:
> >> > use of r13 is deprecated
> >> > It has a comment saying: "In the Thumb-2 ISA, use of R13 as Rm is
> >> > deprecated, but valid."
> >> >
> >> > It's a minor nuisance, I'm not sure what the best way of getting rid of 
> >> > it?
> >> > Add #ifndef __thumb2__ around CHECK(r13) ?
> >>
> >> Hmm, maybe we should just drop that line altogether.  It wasn't exactly
> >> likely that r13 would be the register to leak the value :-)
> >>
> >> Should I post a patch or do you already have one ready?
> >
> > I was about to push the patch that removes the line CHECK(r13).
> >
> > However, I've noticed that when using -mcpu=cortex-m[01], we have an
> > error from gas:
> > Error: Thumb does not support this addressing mode -- `str r0,[sp,#-8]!'
>
> Seems like writing a correct arm.exp test is almost as difficult
> (for me) as writing a correct vect.exp test :-)

:-) Yeah, there are way too many combinations


> > This patch replaces the str instruction with
> >  sub   sp, sp, #8
> >  str r0, [sp]
> > and removes the check for r13, which is unlikely to leak the canary
> > value.
> >
> > 2020-08-11  Christophe Lyon  
> >
> >   gcc/testsuite/
> >   * gcc.target/arm/stack-protector-1.c: Adapt code to Cortex-M
> >   restrictions.
>
> OK, thanks.  I'm afraid this is already on GCC 10 and 9, so OK there too.
> I'll fold this in when backporting to GCC 8.
>
Thanks, pushed to master, gcc-9 and gcc-10.

> Richard


Re: [PATCH] testsuite: Fix gcc.target/arm/multilib.exp use of gcc_opts

2020-08-12 Thread Christophe Lyon via Gcc-patches
On Tue, 11 Aug 2020 at 18:40, Richard Sandiford
 wrote:
>
> Christophe Lyon via Gcc-patches  writes:
> > This patch fixes an incorrect parameter passing for $gcc_opts, which
> > produces a DejaGnu error: (DejaGnu) proc "gcc_opts" does not exist.
>
> Huh, wonder how that went unnoticed for so long…

Me too... it was introduced on 2020-02-06 by r10-6475

>
> > 2020-08-11  Christophe Lyon  
> >
> > gcc/testsuite/
> > * gcc.target/arm/multilib.exp: Fix parameter passing for gcc_opts.
>
> OK everywhere that needs it, thanks.

So pushed to gcc-10 too.

Thanks,

Christophe

>
> Richard
>
> > diff --git a/gcc/testsuite/gcc.target/arm/multilib.exp 
> > b/gcc/testsuite/gcc.target/arm/multilib.exp
> > index f67a92a..c5f3c02 100644
> > --- a/gcc/testsuite/gcc.target/arm/multilib.exp
> > +++ b/gcc/testsuite/gcc.target/arm/multilib.exp
> > @@ -40,7 +40,7 @@ proc multilib_config {profile} {
> >  proc check_multi_dir { gcc_opts multi_dir } {
> >  global tool
> >
> > -set options [list "additional_flags=[concat "--print-multi-directory" 
> > [gcc_opts]]"]
> > +set options [list "additional_flags=[concat "--print-multi-directory" 
> > $gcc_opts]"]
> >  set gcc_output [${tool}_target_compile "" "" "none" $options]
> >  if { [string match "$multi_dir\n" $gcc_output] } {
> >   pass "multilibdir $gcc_opts $multi_dir"


Re: [PATCH] testsuite: Add -fno-common to pr82374.c [PR94077]

2020-08-12 Thread Jakub Jelinek via Gcc-patches
On Wed, Aug 12, 2020 at 05:03:45PM +0800, Kewen.Lin wrote:
> Hi,
> 
> As the PR comments show, the case gcc.dg/gomp/pr82374.c fails
> on Power7 since gcc8.  But it passes from gcc10.  By looking
> into the difference, it's due to that gcc10 sets -fno-common
> as default, which makes vectorizer force the alignment and
> be able to use aligned vector load/store on those targets which
> doesn't support unaligned vector load/store (here it's Power7).
> 
> As Jakub suggested in the PR, this patch is to append -fno-common
> into dg-options.
> 
> Verified with gcc8 release on ppc64-redhat-linux (Power7).
> 
> Is it ok for gcc8 and gcc9 release?
> 
> I guess for gcc10 and trunk, we can just let it alone?
> 
> BR,
> Kewen
> -
> gcc/testsuite/ChangeLog:
> 
>   PR testsuite/94077
>   * gcc.dg/gomp/pr82374.c: Add option -fno-common.

Ok for 8/9, thanks.

Jakub



[PATCH] testsuite: Add -fno-common to pr82374.c [PR94077]

2020-08-12 Thread Kewen.Lin via Gcc-patches
Hi,

As the PR comments show, the case gcc.dg/gomp/pr82374.c fails
on Power7 since gcc8.  But it passes from gcc10.  By looking
into the difference, it's due to that gcc10 sets -fno-common
as default, which makes vectorizer force the alignment and
be able to use aligned vector load/store on those targets which
doesn't support unaligned vector load/store (here it's Power7).

As Jakub suggested in the PR, this patch is to append -fno-common
into dg-options.

Verified with gcc8 release on ppc64-redhat-linux (Power7).

Is it ok for gcc8 and gcc9 release?

I guess for gcc10 and trunk, we can just let it alone?

BR,
Kewen
-
gcc/testsuite/ChangeLog:

PR testsuite/94077
* gcc.dg/gomp/pr82374.c: Add option -fno-common.

diff --git a/gcc/testsuite/gcc.dg/gomp/pr82374.c 
b/gcc/testsuite/gcc.dg/gomp/pr82374.c
index 453266e..e63a2f5 100644
--- a/gcc/testsuite/gcc.dg/gomp/pr82374.c
+++ b/gcc/testsuite/gcc.dg/gomp/pr82374.c
@@ -1,6 +1,9 @@
 /* PR tree-optimization/82374 */
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-vectorize -fdump-tree-vect-details" } */
+/* Option -fno-common makes vectorizer able to force alignment and ensures
+   vectorization can succeed even on targets lacking of unaligned vector
+   load/store.  */
+/* { dg-options "-O2 -fno-tree-vectorize -fdump-tree-vect-details -fno-common" 
} */
 /* { dg-additional-options "-mavx -mno-avx2" { target i?86-*-* x86_64-*-* } } 
*/
 /* { dg-additional-options "-mvsx" { target powerpc_vsx_ok } } */


Re: [AArch64] Upgrade integer MLA intrinsics to GCC vector extensions

2020-08-12 Thread Richard Sandiford
James Greenhalgh  writes:
> Hi,
>
> As subject, this patch rewrites the mla intrinsics to use a + b * c rather
> than inline assembler, thereby opening them to CSE, scheduling, etc.

Looks good for the unsigned ones.  For the signed ones, there's a risk
that the functions might become subject to the usual UB for signed
overflow, rather than acting just like the instructions do.  (Realise
that isn't unique to these functions, but it'd be good not to introduce
more instances of it.)

So for the signed ones, it might be safer to cast to the unsigned type,
do the operation, and then cast back.

Thanks,
Richard

> Bootstrapped and tested on aarch64-none-linux-gnu.
>
> OK?
>
> Thanks,
> James
>
> ---
>
> gcc/Changelog:
>
> 2020-08-11  James Greenhalgh  
>
>   config/aarch64/arm_neon.h (vmla_s8): Upgrade to C rather than asm.
>   (vmla_s16): Likewise.
>   (vmla_s32): Likewise.
>   (vmla_u8): Likewise.
>   (vmla_u16): Likewise.
>   (vmla_u32): Likewise.
>   (vmlaq_s8): Likewise.
>   (vmlaq_s16): Likewise.
>   (vmlaq_s32): Likewise.
>   (vmlaq_u8): Likewise.
>   (vmlaq_u16): Likewise.
>   (vmlaq_u32): Likewise.
>
> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index 50f8b23bc17..aa548e4e6c7 100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -7400,72 +7400,42 @@ __extension__ extern __inline int8x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vmla_s8 (int8x8_t __a, int8x8_t __b, int8x8_t __c)
>  {
> -  int8x8_t __result;
> -  __asm__ ("mla %0.8b, %2.8b, %3.8b"
> -   : "=w"(__result)
> -   : "0"(__a), "w"(__b), "w"(__c)
> -   : /* No clobbers */);
> -  return __result;
> +  return __a + __b * __c;
>  }
>  
>  __extension__ extern __inline int16x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vmla_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
>  {
> -  int16x4_t __result;
> -  __asm__ ("mla %0.4h, %2.4h, %3.4h"
> -   : "=w"(__result)
> -   : "0"(__a), "w"(__b), "w"(__c)
> -   : /* No clobbers */);
> -  return __result;
> +  return __a + __b * __c;
>  }
>  
>  __extension__ extern __inline int32x2_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vmla_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c)
>  {
> -  int32x2_t __result;
> -  __asm__ ("mla %0.2s, %2.2s, %3.2s"
> -   : "=w"(__result)
> -   : "0"(__a), "w"(__b), "w"(__c)
> -   : /* No clobbers */);
> -  return __result;
> +  return __a + __b * __c;
>  }
>  
>  __extension__ extern __inline uint8x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vmla_u8 (uint8x8_t __a, uint8x8_t __b, uint8x8_t __c)
>  {
> -  uint8x8_t __result;
> -  __asm__ ("mla %0.8b, %2.8b, %3.8b"
> -   : "=w"(__result)
> -   : "0"(__a), "w"(__b), "w"(__c)
> -   : /* No clobbers */);
> -  return __result;
> +  return __a + __b * __c;
>  }
>  
>  __extension__ extern __inline uint16x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vmla_u16 (uint16x4_t __a, uint16x4_t __b, uint16x4_t __c)
>  {
> -  uint16x4_t __result;
> -  __asm__ ("mla %0.4h, %2.4h, %3.4h"
> -   : "=w"(__result)
> -   : "0"(__a), "w"(__b), "w"(__c)
> -   : /* No clobbers */);
> -  return __result;
> +  return __a + __b * __c;
>  }
>  
>  __extension__ extern __inline uint32x2_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vmla_u32 (uint32x2_t __a, uint32x2_t __b, uint32x2_t __c)
>  {
> -  uint32x2_t __result;
> -  __asm__ ("mla %0.2s, %2.2s, %3.2s"
> -   : "=w"(__result)
> -   : "0"(__a), "w"(__b), "w"(__c)
> -   : /* No clobbers */);
> -  return __result;
> +  return __a + __b * __c;
>  }
>  
>  #define vmlal_high_lane_s16(a, b, c, d) \
> @@ -7941,72 +7911,42 @@ __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vmlaq_s8 (int8x16_t __a, int8x16_t __b, int8x16_t __c)
>  {
> -  int8x16_t __result;
> -  __asm__ ("mla %0.16b, %2.16b, %3.16b"
> -   : "=w"(__result)
> -   : "0"(__a), "w"(__b), "w"(__c)
> -   : /* No clobbers */);
> -  return __result;
> +  return __a + __b * __c;
>  }
>  
>  __extension__ extern __inline int16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vmlaq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c)
>  {
> -  int16x8_t __result;
> -  __asm__ ("mla %0.8h, %2.8h, %3.8h"
> -   : "=w"(__result)
> -   : "0"(__a), "w"(__b), "w"(__c)
> -   : /* No clobbers */);
> -  return __result;
> +  return __a + __b * __c;
>  }
>  
>  __extension__ extern __inline int32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vmlaq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c)
>  {
> -  int32x4_t 

Re: [AArch64] Upgrade integer MLA intrinsics to GCC vector extensions

2020-08-12 Thread Christophe Lyon via Gcc-patches
Hi James,

On Wed, 12 Aug 2020 at 10:40, James Greenhalgh  wrote:
>
>
> Hi,
>
> As subject, this patch rewrites the mla intrinsics to use a + b * c rather
> than inline assembler, thereby opening them to CSE, scheduling, etc.
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
>

Do we have tests that make sure we still generate the mla instructions?

> OK?
>
> Thanks,
> James
>
> ---
>
> gcc/Changelog:
>
> 2020-08-11  James Greenhalgh  
>
> config/aarch64/arm_neon.h (vmla_s8): Upgrade to C rather than asm.
> (vmla_s16): Likewise.
> (vmla_s32): Likewise.
> (vmla_u8): Likewise.
> (vmla_u16): Likewise.
> (vmla_u32): Likewise.
> (vmlaq_s8): Likewise.
> (vmlaq_s16): Likewise.
> (vmlaq_s32): Likewise.
> (vmlaq_u8): Likewise.
> (vmlaq_u16): Likewise.
> (vmlaq_u32): Likewise.
>


Re: [PR96519] Re: [PATCH][testsuite] Add gcc.dg/ia64-sync-5.c

2020-08-12 Thread Richard Sandiford
Kwok Cheung Yeung  writes:
> Hello
>
> On 06/08/2020 1:23 pm, Tom de Vries wrote:
>  > +static char AC[4];
>  > +static char init_qi[4] = { -30,-30,-50,-50 };
>  > +static char test_qi[4] = { -115,-115,25,25 };
>  > +
>  > +static void
>  > +do_qi (void)
>  > +{
>  > +  if (__sync_val_compare_and_swap(AC+0, -30, -115) != -30)
>  > +abort ();
>
> If 'char' is unsigned by default, then init_qi will contain { 226, 226, 206, 
> 206} and test_qi { 141, 141, 25, 25 }, which will result in the comparison 
> against -30 failing when the previous value of AC[0] is implicitly promoted 
> to 
> signed int. This can be fixed by making the array element types explicitly 
> signed.
>
> This issue is tracked as issue 96519 on the tracker. I have checked that the 
> test now passes on PowerPC and Aarch64. Is the fix okay for trunk?
>
> Thanks
>
> Kwok
>
> commit fc6ac3af45a238da0bd65e020ae6f0f165b57b87
> Author: Kwok Cheung Yeung 
> Date:   Tue Aug 11 09:41:10 2020 -0700
>
> Fix gcc.dg/ia64-sync-5.c for architectures with unsigned char as default 
> (PR 96519)
> 
> If char is unsigned, then comparisons of the char array elements against
> negative integers in the test will fail as values in the array will always
> be positive, and will remain so when promoted to signed int.
> 
> 2020-08-11  Kwok Cheung Yeung  
> 
>   PR testsuite/96519
> 
>   gcc/testsuite/
>   * gcc.dg/ia64-sync-5.c (AC, init_qi, test_qi): Change element type to
>   signed char.

OK, thanks.

Richard


[PATCH] gimple-fold: Don't optimize wierdo floating point value reads [PR95450]

2020-08-12 Thread Jakub Jelinek via Gcc-patches
Hi!

My patch to introduce native_encode_initializer to fold_ctor_reference
apparently broke gnulib/m4 on powerpc64.
There it uses a const union with two doubles and corresponding IBM double
double long double which actually is the largest normalizable long double
value (1 ulp higher than __LDBL_MAX__).  The reason our __LDBL_MAX__ is
smaller is that we internally treat the double double type as one having
106-bit precision, but it actually has a variable 53-bit to 2000-ish bit 
precision
and for the
0x1.f7c000p+1023L
value gnulib uses we need 107-bit precision, therefore for GCC __LDBL_MAX__
is
0x1.f78000p+1023L
Before my changes, we wouldn't be able to fold_ctor_reference it and it
worked fine at runtime, but with the change we are able to do that, but
because it is larger than anything we can handle internally, we treat it
weirdly.  Similar problem would be if somebody creates this way valid,
but much more than 106 bit precision e.g. 1.0 + 1.0e-768.
Now, I think similar problem could happen e.g. on i?86/x86_64 with long
double there, it also has some weird values in the format, e.g. the
unnormals, pseudo infinities and various other magic values.

This patch for floating point types (including vector and complex types
with such elements) will try to encode the returned value again and punt
if it has different memory representation from the original.  Note, this
is only done in the path where native_encode_initializer was used, in order
not to affect e.g. just reading an unpunned long double value; the value
should be compiler generated in that case and thus should be properly
representable.  It will punt also if e.g. the padding bits are initialized
to non-zero values.

Or should I do this in native_interpret_real instead, so that we punt even
on say VIEW_CONVERT_EXPR from an integral value containing such weird bits?

And, do we want to do it for all floating point constants, or just
COMPOSITE_MODE_P (element_mode (type)) ones (i.e. only for double double)?

Bootstrapped/regtested on {x86_64,i686,powerpc64{,le}}-linux.

2020-08-12  Jakub Jelinek  

PR target/95450
* gimple-fold.c (fold_ctor_reference): When interpreting bytes
from native_encode_initializer into a floating point type,
verify if it will be encoded back into the same memory representation
and punt otherwise.

--- gcc/gimple-fold.c.jj2020-08-04 11:31:26.580268603 +0200
+++ gcc/gimple-fold.c   2020-08-11 19:00:59.147564022 +0200
@@ -7090,7 +7090,19 @@ fold_ctor_reference (tree type, tree cto
  int len = native_encode_initializer (ctor, buf, size / BITS_PER_UNIT,
   offset / BITS_PER_UNIT);
  if (len > 0)
-   return native_interpret_expr (type, buf, len);
+   {
+ ret = native_interpret_expr (type, buf, len);
+ if (ret && FLOAT_TYPE_P (type))
+   {
+ /* For floating point values, punt if this folding
+doesn't preserve bit representation (canonicalizes some
+bits e.g. in NaN, etc.), see PR95450.  */
+ unsigned char ver[MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT];
+ if (native_encode_initializer (ret, ver, len, 0) != len
+ || memcmp (buf, ver, len) != 0)
+   ret = NULL_TREE;
+   }
+   }
}
 
   return ret;
--- gcc/testsuite/gcc.target/powerpc/pr95450.c.jj   2020-08-11 
19:21:35.654633211 +0200
+++ gcc/testsuite/gcc.target/powerpc/pr95450.c  2020-08-11 19:23:24.176147695 
+0200
@@ -0,0 +1,29 @@
+/* PR target/95450 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-not "return \[0-9.e+]\+;" "optimized" } } */
+
+/* Verify this is not optimized for double double into return 
floating_point_constant,
+   as while that constant is the maximum normalized floating point value, it 
needs
+   107 bit precision, which is more than GCC supports for this format.  */
+
+#if __LDBL_MANT_DIG__ == 106
+union U
+{
+  struct { double hi; double lo; } dd;
+  long double ld;
+};
+
+const union U g = { { __DBL_MAX__, __DBL_MAX__ / (double)134217728UL / 
(double)134217728UL } };
+#else
+struct S
+{
+  long double ld;
+} g;
+#endif
+
+long double
+foo (void)
+{
+  return g.ld;
+}

Jakub



[AArch64] Upgrade integer MLA intrinsics to GCC vector extensions

2020-08-12 Thread James Greenhalgh

Hi,

As subject, this patch rewrites the mla intrinsics to use a + b * c rather
than inline assembler, thereby opening them to CSE, scheduling, etc.

Bootstrapped and tested on aarch64-none-linux-gnu.

OK?

Thanks,
James

---

gcc/Changelog:

2020-08-11  James Greenhalgh  

config/aarch64/arm_neon.h (vmla_s8): Upgrade to C rather than asm.
(vmla_s16): Likewise.
(vmla_s32): Likewise.
(vmla_u8): Likewise.
(vmla_u16): Likewise.
(vmla_u32): Likewise.
(vmlaq_s8): Likewise.
(vmlaq_s16): Likewise.
(vmlaq_s32): Likewise.
(vmlaq_u8): Likewise.
(vmlaq_u16): Likewise.
(vmlaq_u32): Likewise.

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 50f8b23bc17..aa548e4e6c7 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -7400,72 +7400,42 @@ __extension__ extern __inline int8x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vmla_s8 (int8x8_t __a, int8x8_t __b, int8x8_t __c)
 {
-  int8x8_t __result;
-  __asm__ ("mla %0.8b, %2.8b, %3.8b"
-   : "=w"(__result)
-   : "0"(__a), "w"(__b), "w"(__c)
-   : /* No clobbers */);
-  return __result;
+  return __a + __b * __c;
 }
 
 __extension__ extern __inline int16x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vmla_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
 {
-  int16x4_t __result;
-  __asm__ ("mla %0.4h, %2.4h, %3.4h"
-   : "=w"(__result)
-   : "0"(__a), "w"(__b), "w"(__c)
-   : /* No clobbers */);
-  return __result;
+  return __a + __b * __c;
 }
 
 __extension__ extern __inline int32x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vmla_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c)
 {
-  int32x2_t __result;
-  __asm__ ("mla %0.2s, %2.2s, %3.2s"
-   : "=w"(__result)
-   : "0"(__a), "w"(__b), "w"(__c)
-   : /* No clobbers */);
-  return __result;
+  return __a + __b * __c;
 }
 
 __extension__ extern __inline uint8x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vmla_u8 (uint8x8_t __a, uint8x8_t __b, uint8x8_t __c)
 {
-  uint8x8_t __result;
-  __asm__ ("mla %0.8b, %2.8b, %3.8b"
-   : "=w"(__result)
-   : "0"(__a), "w"(__b), "w"(__c)
-   : /* No clobbers */);
-  return __result;
+  return __a + __b * __c;
 }
 
 __extension__ extern __inline uint16x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vmla_u16 (uint16x4_t __a, uint16x4_t __b, uint16x4_t __c)
 {
-  uint16x4_t __result;
-  __asm__ ("mla %0.4h, %2.4h, %3.4h"
-   : "=w"(__result)
-   : "0"(__a), "w"(__b), "w"(__c)
-   : /* No clobbers */);
-  return __result;
+  return __a + __b * __c;
 }
 
 __extension__ extern __inline uint32x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vmla_u32 (uint32x2_t __a, uint32x2_t __b, uint32x2_t __c)
 {
-  uint32x2_t __result;
-  __asm__ ("mla %0.2s, %2.2s, %3.2s"
-   : "=w"(__result)
-   : "0"(__a), "w"(__b), "w"(__c)
-   : /* No clobbers */);
-  return __result;
+  return __a + __b * __c;
 }
 
 #define vmlal_high_lane_s16(a, b, c, d) \
@@ -7941,72 +7911,42 @@ __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vmlaq_s8 (int8x16_t __a, int8x16_t __b, int8x16_t __c)
 {
-  int8x16_t __result;
-  __asm__ ("mla %0.16b, %2.16b, %3.16b"
-   : "=w"(__result)
-   : "0"(__a), "w"(__b), "w"(__c)
-   : /* No clobbers */);
-  return __result;
+  return __a + __b * __c;
 }
 
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vmlaq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c)
 {
-  int16x8_t __result;
-  __asm__ ("mla %0.8h, %2.8h, %3.8h"
-   : "=w"(__result)
-   : "0"(__a), "w"(__b), "w"(__c)
-   : /* No clobbers */);
-  return __result;
+  return __a + __b * __c;
 }
 
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vmlaq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c)
 {
-  int32x4_t __result;
-  __asm__ ("mla %0.4s, %2.4s, %3.4s"
-   : "=w"(__result)
-   : "0"(__a), "w"(__b), "w"(__c)
-   : /* No clobbers */);
-  return __result;
+  return __a + __b * __c;
 }
 
 __extension__ extern __inline uint8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vmlaq_u8 (uint8x16_t __a, uint8x16_t __b, uint8x16_t __c)
 {
-  uint8x16_t __result;
-  __asm__ ("mla %0.16b, %2.16b, %3.16b"
-   : "=w"(__result)
-   : "0"(__a), "w"(__b), "w"(__c)
-   : /* No clobbers */);
-  return __result;
+  return __a + __b * __c;
 }
 
 __extension__ extern __inline uint16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 

[PATCH] middle-end: Recognize idioms for bswap32 and bswap64 in match.pd.

2020-08-12 Thread Roger Sayle

This patch is inspired by a small code fragment in comment #3 of
bugzilla PR rtl-optimization/94804.  That snippet appears almost
unrelated to the topic of the PR, but recognizing __builtin_bswap64
from two __builtin_bswap32 calls, seems like a clever/useful trick.
GCC's optabs.c contains the inverse logic to expand bswap64 by
IORing two bswap32 calls, so this transformation/canonicalization
is safe, even on targets without suitable optab support.  But
on x86_64, the swap64 of the test case becomes a single instruction.


This patch has been tested on x86_64-pc-linux-gnu with a "make
bootstrap" and a "make -k check" with no new failures.
Ok for mainline?


2020-08-12  Roger Sayle  

gcc/ChangeLog
* match.pd (((T)bswapX(x)<>C) -> bswapY(x)):
New simplifications to recognize __builtin_bswap{32,64}.

gcc/testsuite/ChangeLog
* gcc.dg/fold-bswap-1.c: New test.


Thanks in advance,
Roger
--
Roger Sayle
NextMove Software
Cambridge, UK

diff --git a/gcc/match.pd b/gcc/match.pd
index 7e5c5a6..d4efbf3 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3410,6 +3410,39 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(bswap (bitop:c (bswap @0) @1))
(bitop @0 (bswap @1)
 
+/* Recognize ((T)bswap32(x)<<32)|bswap32(x>>32) as bswap64(x).  */
+(simplify
+  (bit_ior:c
+(lshift
+  (convert (BUILT_IN_BSWAP32 (convert@4 @0)))
+  INTEGER_CST@1)
+(convert (BUILT_IN_BSWAP32 (convert@5 (rshift @2
+ INTEGER_CST@3)
+  (if (operand_equal_p (@0, @2, 0)
+   && types_match (type, uint64_type_node)
+   && types_match (TREE_TYPE (@0), uint64_type_node)
+   && types_match (TREE_TYPE (@4), uint32_type_node)
+   && types_match (TREE_TYPE (@5), uint32_type_node)
+   && wi::to_widest (@1) == 32
+   && wi::to_widest (@3) == 32)
+(BUILT_IN_BSWAP64 @0)))
+
+/* Recognize ((T)bswap16(x)<<16)|bswap16(x>>16) as bswap32(x).  */
+(simplify
+  (bit_ior:c
+(lshift
+  (convert (BUILT_IN_BSWAP16 (convert (bit_and @0
+  INTEGER_CST@1
+  (INTEGER_CST@2))
+(convert (BUILT_IN_BSWAP16 (convert (rshift @3
+   INTEGER_CST@4)
+  (if (operand_equal_p (@0, @3, 0)
+   && types_match (type, uint32_type_node)
+   && types_match (TREE_TYPE (@0), uint32_type_node)
+   && wi::to_widest (@1) == 65535
+   && wi::to_widest (@2) == 16
+   && wi::to_widest (@4) == 16)
+(BUILT_IN_BSWAP32 @0)))
 
 /* Combine COND_EXPRs and VEC_COND_EXPRs.  */
 
diff --git a/gcc/testsuite/gcc.dg/fold-bswap-1.c 
b/gcc/testsuite/gcc.dg/fold-bswap-1.c
new file mode 100644
index 000..f14f731
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-bswap-1.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+unsigned int swap32(unsigned int x)
+{
+unsigned int a = __builtin_bswap16(x);
+x >>= 16;
+a <<= 16;
+return __builtin_bswap16(x) | a;
+}
+
+unsigned long swap64(unsigned long x)
+{
+unsigned long a = __builtin_bswap32(x);
+x >>= 32;
+a <<= 32;
+return __builtin_bswap32(x) | a;
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_bswap32" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__builtin_bswap64" 1 "optimized" } } */
+


[PATCH] Fix up flag_cunroll_grow_size handling in presence of optimize attr [PR96535]

2020-08-12 Thread Jakub Jelinek via Gcc-patches
Hi!

As the testcase in the PR shows (not included in the patch, as
it seems quite fragile to observe unrolling in the IL), the introduction of
flag_cunroll_grow_size broke optimize attribute related to loop unrolling.
The problem is that the new option flag is set (if not set explicitly) only
in process_options and in rs6000_option_override_internal (and there only if
global_init_p).  So, this means that while it is Optimization option, it
will only be set based on the command line -funroll-loops/-O3/-fpeel-loops
or -funroll-all-loops, which means that if command line does include any of
those, it is enabled even for functions that will through optimize attribute
have all of those disabled, and if command line does not include those,
it will not be enabled for functions that will through optimize attribute
have any of those enabled.

process_options is called just once, so IMHO it should be handling only
non-Optimization option adjustments (various other options suffer from that
too, but as this is a regression from 10.1 on the 10 branch, changing those
is not appropriate).  Similarly, rs6000_option_override_internal is called
only once (with global_init_p) and then for target attribute handling, but
not for optimize attribute handling.

This patch moves the unrolling related handling from process_options into
finish_options which is invoked whenever the options are being finalized,
and the rs6000 specific parts into the override_options_after_change hook
which is called for optimize attribute handling (and unfortunately also
th cfun changes, but what the hook does is cheap) and I've added a call to
that from rs6000_override_options_internal, so it is also called on cmdline
processing and for target attribute.

Furthermore, it stops using AUTODETECT_VALUE, which can work only once,
and instead uses the global_options_set.x_... flags.

Bootstrapped/regtested on {x86_64,i686,powerpc64{,le}}-linux, ok for trunk
and after a while 10.3?

2020-08-12  Jakub Jelinek  

PR tree-optimization/96535
* toplev.c (process_options): Move flag_unroll_loops and
flag_cunroll_grow_size handling from here to ...
* opts.c (finish_options): ... here.  For flag_cunroll_grow_size,
don't check for AUTODETECT_VALUE, but instead check
opts_set->x_flag_cunroll_grow_size.
* common.opt (funroll-completely-grow-size): Default to 0.
* config/rs6000/rs6000.c (TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE):
Redefine.
(rs6000_override_options_after_change): New function.
(rs6000_option_override_internal): Call it.  Move there the
flag_cunroll_grow_size, unroll_only_small_loops and
flag_rename_registers handling.

--- gcc/toplev.c.jj 2020-08-11 14:20:35.179934850 +0200
+++ gcc/toplev.c2020-08-11 14:44:06.861586574 +0200
@@ -1474,16 +1474,6 @@ process_options (void)
   flag_abi_version = 2;
 }
 
-  /* Unrolling all loops implies that standard loop unrolling must also
- be done.  */
-  if (flag_unroll_all_loops)
-flag_unroll_loops = 1;
-
-  /* Allow cunroll to grow size accordingly.  */
-  if (flag_cunroll_grow_size == AUTODETECT_VALUE)
-flag_cunroll_grow_size
-  = flag_unroll_loops || flag_peel_loops || optimize >= 3;
-
   /* web and rename-registers help when run after loop unrolling.  */
   if (flag_web == AUTODETECT_VALUE)
 flag_web = flag_unroll_loops;
--- gcc/opts.c.jj   2020-08-11 14:20:35.169934987 +0200
+++ gcc/opts.c  2020-08-11 14:43:47.578850847 +0200
@@ -1142,11 +1142,21 @@ finish_options (struct gcc_options *opts
 
   /* Control IPA optimizations based on different -flive-patching level.  */
   if (opts->x_flag_live_patching)
-{
-  control_options_for_live_patching (opts, opts_set,
-opts->x_flag_live_patching,
-loc);
-}
+control_options_for_live_patching (opts, opts_set,
+  opts->x_flag_live_patching,
+  loc);
+
+  /* Unrolling all loops implies that standard loop unrolling must also
+ be done.  */
+  if (opts->x_flag_unroll_all_loops)
+opts->x_flag_unroll_loops = 1;
+
+  /* Allow cunroll to grow size accordingly.  */
+  if (!opts_set->x_flag_cunroll_grow_size)
+opts->x_flag_cunroll_grow_size
+  = (opts->x_flag_unroll_loops
+ || opts->x_flag_peel_loops
+ || opts->x_optimize >= 3);
 }
 
 #define LEFT_COLUMN27
--- gcc/common.opt.jj   2020-08-03 22:54:51.328532939 +0200
+++ gcc/common.opt  2020-08-11 14:42:14.935120568 +0200
@@ -2884,7 +2884,7 @@ Common Report Var(flag_unroll_all_loops)
 Perform loop unrolling for all loops.
 
 funroll-completely-grow-size
-Undocumented Var(flag_cunroll_grow_size) Init(2) Optimization
+Undocumented Var(flag_cunroll_grow_size) Optimization
 ; Internal undocumented flag, allow size growth during complete unrolling
 
 ; Nonzero means that loop optimizer may 

Re: [PATCH] PR libstdc++/71579 assert that type traits are not misused with an incomplete type

2020-08-12 Thread Antony Polukhin via Gcc-patches
Fixed patch for type traits hardening

Changelog

2020-08-12  Antony Polukhin  

PR libstdc/71579
* include/std/type_traits (invoke_result, is_nothrow_invocable_r)
Add static_asserts to make sure that the argument of the type
trait is not misused with incomplete types.
(is_swappable_with, is_nothrow_swappable_with): Add static_asserts
to make sure that the first and second arguments of the type trait
are not misused with incomplete types.
* testsuite/20_util/invoke_result/incomplete_neg.cc: New test.
* testsuite/20_util/is_nothrow_invocable/incomplete_neg.cc: New test.
* testsuite/20_util/is_nothrow_swappable/incomplete_neg.cc: New test.
* testsuite/20_util/is_nothrow_swappable_with/incomplete_neg.cc: New
test.
* testsuite/20_util/is_swappable_with/incomplete_neg.cc: New test.


-- 
Best regards,
Antony Polukhin
diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 426febc..62f1190 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -2811,13 +2811,23 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct is_swappable_with
 : public __is_swappable_with_impl<_Tp, _Up>::type
-{ };
+{
+  static_assert(std::__is_complete_or_unbounded(__type_identity<_Tp>{}),
+   "first template argument must be a complete class or an unbounded 
array");
+  static_assert(std::__is_complete_or_unbounded(__type_identity<_Up>{}),
+   "second template argument must be a complete class or an unbounded 
array");
+};
 
   /// is_nothrow_swappable_with
   template
 struct is_nothrow_swappable_with
 : public __is_nothrow_swappable_with_impl<_Tp, _Up>::type
-{ };
+{
+  static_assert(std::__is_complete_or_unbounded(__type_identity<_Tp>{}),
+   "first template argument must be a complete class or an unbounded 
array");
+  static_assert(std::__is_complete_or_unbounded(__type_identity<_Up>{}),
+   "second template argument must be a complete class or an unbounded 
array");
+};
 
 #if __cplusplus >= 201402L
   /// is_swappable_with_v
@@ -2952,7 +2962,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct invoke_result
 : public __invoke_result<_Functor, _ArgTypes...>
-{ };
+{
+  
static_assert(std::__is_complete_or_unbounded(__type_identity<_Functor>{}),
+   "_Functor must be a complete class or an unbounded array");
+};
 
   /// std::invoke_result_t
   template
@@ -3001,7 +3014,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 struct is_nothrow_invocable_r
 : __and_<__is_nt_invocable_impl<__invoke_result<_Fn, _ArgTypes...>, _Ret>,
  __call_is_nothrow_<_Fn, _ArgTypes...>>::type
-{ };
+{
+  static_assert(std::__is_complete_or_unbounded(__type_identity<_Fn>{}),
+   "_Fn must be a complete class or an unbounded array");
+};
 
   /// std::is_invocable_v
   template
diff --git a/libstdc++-v3/testsuite/20_util/invoke_result/incomplete_neg.cc 
b/libstdc++-v3/testsuite/20_util/invoke_result/incomplete_neg.cc
new file mode 100644
index 000..da58a8b
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/invoke_result/incomplete_neg.cc
@@ -0,0 +1,30 @@
+// { dg-do compile { target c++17 } }
+
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-error "must be a complete class" "" { target *-*-* } 0 }
+
+#include 
+
+class X;
+
+void test01()
+{
+  std::invoke_result();// { dg-error "required from 
here" }
+  std::invoke_result(); // { dg-error "required from here" }
+}
diff --git 
a/libstdc++-v3/testsuite/20_util/is_nothrow_invocable/incomplete_neg.cc 
b/libstdc++-v3/testsuite/20_util/is_nothrow_invocable/incomplete_neg.cc
new file mode 100644
index 000..ad16809
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/is_nothrow_invocable/incomplete_neg.cc
@@ -0,0 +1,33 @@
+// { dg-do compile { target c++17 } }
+
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any 

RE: [PATCH] x86_64: Use peephole2 to eliminate redundant moves.

2020-08-12 Thread Roger Sayle
Hi Uros,

Many thanks for the review, and your help/explanation around constant
materialization on x86_64/i386.

Your suggestion to try x86_64_general_operand as a predicate was awesome!
Not only does this improvement survive "make bootstrap" and "make -k check"
on x86_64-pc-linux-gnu, and fix the previous test case, but this now triggers
15070 times during stage2 and stage3, and an additional 8926 times
during make check.  Many thanks for the impressive improvement.

Here's the version as committed.

2020-08-12  Roger Sayle  
Uroš Bizjak  

gcc/ChangeLog
* config/i386/i386.md (peephole2): Reduce unnecessary
register shuffling produced by register allocation.

Cheers,
Roger
--

-Original Message-
From: Uros Bizjak  
Sent: 11 August 2020 11:14
To: Roger Sayle 
Cc: GCC Patches 
Subject: Re: [PATCH] x86_64: Use peephole2 to eliminate redundant moves.

On Tue, Aug 11, 2020 at 9:34 AM Roger Sayle  wrote:
>
>
> The recent fix for mul_widen_cost revealed an interesting quirk of 
> ira/reload register allocation on x86_64.  As shown in 
> https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551648.html
> for gcc.target/i386/pr71321.c we generate the following code that 
> performs unnecessary register shuffling.
>
> movl$-51, %edx
> movl%edx, %eax
> mulb%dil
>
> which is caused by reload generating the following instructions 
> (notice the set of the first register is dead in the 2nd insn):
>
> (insn 7 4 36 2 (set (reg:QI 1 dx [94])
> (const_int -51 [0xffcd])) {*movqi_internal}
>  (expr_list:REG_EQUIV (const_int -51 [0xffcd])
> (nil)))
> (insn 36 7 8 2 (set (reg:QI 0 ax [93])
> (reg:QI 1 dx [94])) {*movqi_internal}
>  (expr_list:REG_DEAD (reg:QI 1 dx [94])
> (nil)))
>
> Various discussions in bugzilla seem to point to reload preferring not 
> to load constants directly into CLASS_LIKELY_SPILLED_P registers.

This can extend the lifetime of a register over the instruction that needs one 
of the CLASS_LIKELY_SPILLED_P registers. Various MUL, DIV and even shift insns 
were able to choke the allocator for x86 targets, so this is a small price to 
pay to avoid regalloc failure.

> Whatever the cause, one solution (workaround), that doesn't involve 
> rewriting a register allocator, is to use peephole2 to spot this 
> weirdness and eliminate it.  In fact, this use case is (probably) the 
> reason peephole optimizers were originally developed, but it's a 
> little disappointing this application of them is still required today.  
> On a positive note, this clean-up is cheap, as we're already 
> traversing the instruction stream with liveness (REG_DEAD notes) 
> already calculated.
>
> With this peephole2 the above three instructions (from pr71321.c) are 
> replaced with:
>
> movl$-51, %eax
> mulb%dil
>
> This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
> and "make -k check" with no new failures.  This peephole triggers
> 1435 during stage2 and stage3 of a bootstrap, and a further 1274 times 
> during "make check".  The most common case is DX_REG->AX_REG (as 
> above) which occurs 421 times.  I've restricted this pattern to 
> immediate constant loads into general operand registers, which fixes 
> this particular problem, but broader predicates may help similar cases.
> Ok for mainline?
>
> 2020-08-11  Roger Sayle  
>
> * config/i386/i386.md (peephole2): Reduce unnecessary
> register shuffling produced by register allocation.

LGTM, but I wonder if the allocator is also too conservative with memory 
operands. Perhaps x86_64_general_operand can be used here.

Uros.
>
> Thanks in advance,
> Roger
> --
> Roger Sayle
> NextMove Software
> Cambridge, UK
>
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 4e916bf..f3799ac 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -18946,6 +18946,16 @@
   operands[2] = gen_rtx_REG (GET_MODE (operands[0]), FLAGS_REG);
   ix86_expand_clear (operands[1]);
 })
+
+;; Reload dislikes loading constants directly into class_likely_spilled
+;; hard registers.  Try to tidy things up here.
+(define_peephole2
+  [(set (match_operand:SWI 0 "general_reg_operand")
+   (match_operand:SWI 1 "x86_64_general_operand"))
+   (set (match_operand:SWI 2 "general_reg_operand")
+   (match_dup 0))]
+  "peep2_reg_dead_p (2, operands[0])"
+  [(set (match_dup 2) (match_dup 1))])
 
 ;; Misc patterns (?)
 


Hi, let's help out your readers

2020-08-12 Thread Andrea Goh
Hello,
I’m Andrea, a freelance content writer for personal finance. I wanted to talk 
to you about contributing an article to your site lemire.me
You’ve written about personal finance topics, and it tends to provoke a 
positive response in your audience. I was thinking I could write another post, 
and perhaps providing a different perspective.
I’d be writing this piece just for you and your site. This isn’t some article 
I’ve had lying around looking for a home. 
Let's help your readers boost their knowledge about saving be financially savvy 
Shoot me an email and we can discuss it further or figure out the next steps.
Cheers!Don't want emails from us anymore? Reply to this email with the word 
"UNSUBSCRIBE" in the subject line.
SM, 2 Havelock Road, Singapore, Alabama, 465464, Singapore


[Committed] IBM Z: Fix PR96308

2020-08-12 Thread Andreas Krebbel via Gcc-patches
For the testcase a symbol with a TLS reloc and an unary minus is being
generated.  The backend didn't handle this correctly.

In s390_cannot_force_const_mem an unary minus on a symbolic constant
is rejected now since gas would not allow this.

legitimize_tls_address now makes the NEG rtx the outermost operation
by pulling it out of the CONST rtx.

Bootstrapped and regression tested on s390x.

Committed to mainline.

gcc/ChangeLog:

PR target/96308
* config/s390/s390.c (s390_cannot_force_const_mem): Reject an
unary minus for everything not being a numeric constant.
(legitimize_tls_address): Move a NEG out of the CONST rtx.

gcc/testsuite/ChangeLog:

PR target/96308
* g++.dg/pr96308.C: New test.
---
 gcc/config/s390/s390.c | 25 +
 gcc/testsuite/g++.dg/pr96308.C |  7 +++
 2 files changed, 32 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/pr96308.C

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 22ac5e43121..5488a5dc5e8 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -4106,6 +4106,18 @@ s390_cannot_force_const_mem (machine_mode mode, rtx x)
   /* Accept all non-symbolic constants.  */
   return false;
 
+case NEG:
+  /* Accept an unary '-' only on scalar numeric constants.  */
+  switch (GET_CODE (XEXP (x, 0)))
+   {
+   case CONST_INT:
+   case CONST_DOUBLE:
+   case CONST_WIDE_INT:
+ return false;
+   default:
+ return true;
+   }
+
 case LABEL_REF:
   /* Labels are OK iff we are non-PIC.  */
   return flag_pic != 0;
@@ -5268,6 +5280,7 @@ legitimize_tls_address (rtx addr, rtx reg)
 {
   switch (XINT (XEXP (addr, 0), 1))
{
+   case UNSPEC_NTPOFF:
case UNSPEC_INDNTPOFF:
  new_rtx = addr;
  break;
@@ -5290,6 +5303,18 @@ legitimize_tls_address (rtx addr, rtx reg)
   new_rtx = force_operand (new_rtx, 0);
 }
 
+  /* (const (neg (unspec (symbol_ref -> (neg (const (unspec 
(symbol_ref */
+  else if (GET_CODE (addr) == CONST && GET_CODE (XEXP (addr, 0)) == NEG)
+{
+  new_rtx = XEXP (XEXP (addr, 0), 0);
+  if (GET_CODE (new_rtx) != SYMBOL_REF)
+   new_rtx = gen_rtx_CONST (Pmode, new_rtx);
+
+  new_rtx = legitimize_tls_address (new_rtx, reg);
+  new_rtx = gen_rtx_NEG (Pmode, new_rtx);
+  new_rtx = force_operand (new_rtx, 0);
+}
+
   else
 gcc_unreachable ();  /* for now ... */
 
diff --git a/gcc/testsuite/g++.dg/pr96308.C b/gcc/testsuite/g++.dg/pr96308.C
new file mode 100644
index 000..9009bba5e82
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr96308.C
@@ -0,0 +1,7 @@
+// { dg-do compile }
+// { dg-options "-Os -fno-move-loop-invariants -std=c++11" }
+
+struct NonTrivial3 {
+  ~NonTrivial3();
+};
+void i() { thread_local NonTrivial3 tlarr[10]; }
-- 
2.25.1



[Committed] IBM Z: Fix PR96456

2020-08-12 Thread Andreas Krebbel via Gcc-patches
The testcase failed because our backend refuses to generate vector
compare instructions for signaling operators with -fno-trapping-math
-fno-finite-math-only.

Bootstrapped and regression tested on s390x.

Committed to mainline.

gcc/ChangeLog:

PR target/96456
* config/s390/s390.h (TARGET_NONSIGNALING_VECTOR_COMPARE_OK): New
macro.
* config/s390/vector.md (vcond_comparison_operator): Use new macro
for the check.

gcc/testsuite/ChangeLog:

PR target/96456
* gcc.target/s390/pr96456.c: New test.
---
 gcc/config/s390/s390.h  |  5 +
 gcc/config/s390/vector.md   |  6 +++---
 gcc/testsuite/gcc.target/s390/pr96456.c | 13 +
 3 files changed, 21 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/pr96456.c

diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h
index e4ef63e4080..ec5128c0af2 100644
--- a/gcc/config/s390/s390.h
+++ b/gcc/config/s390/s390.h
@@ -175,6 +175,11 @@ enum processor_flags
 #define TARGET_VECTOR_LOADSTORE_ALIGNMENT_HINTS 0
 #endif
 
+/* Evaluate to true if it is ok to emit a non-signaling vector
+   comparison.  */
+#define TARGET_NONSIGNALING_VECTOR_COMPARE_OK \
+  (TARGET_VX && !TARGET_VXE && (flag_finite_math_only || !flag_trapping_math))
+
 #ifdef HAVE_AS_MACHINE_MACHINEMODE
 #define S390_USE_TARGET_ATTRIBUTE 1
 #else
diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index 08f2d4cbda6..131bbda09bc 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -622,7 +622,7 @@
 case GT:
 case LTGT:
   /* Signaling vector comparisons are supported only on z14+.  */
-  return TARGET_Z14;
+  return TARGET_VXE || TARGET_NONSIGNALING_VECTOR_COMPARE_OK;
 default:
   return true;
 }
@@ -1534,7 +1534,7 @@
   [(set (match_operand: 0 "register_operand" "=v")
(gt: (match_operand:VFT 1 "register_operand" "v")
   (match_operand:VFT 2 "register_operand" "v")))]
-  "TARGET_VX && !TARGET_VXE && flag_finite_math_only"
+  "TARGET_NONSIGNALING_VECTOR_COMPARE_OK"
   "fchb\t%v0,%v1,%v2"
   [(set_attr "op_type" "VRR")])
 
@@ -1551,7 +1551,7 @@
   [(set (match_operand: 0 "register_operand" "=v")
(ge: (match_operand:VFT 1 "register_operand" "v")
   (match_operand:VFT 2 "register_operand" "v")))]
-  "TARGET_VX && !TARGET_VXE && flag_finite_math_only"
+  "TARGET_NONSIGNALING_VECTOR_COMPARE_OK"
   "fcheb\t%v0,%v1,%v2"
   [(set_attr "op_type" "VRR")])
 
diff --git a/gcc/testsuite/gcc.target/s390/pr96456.c 
b/gcc/testsuite/gcc.target/s390/pr96456.c
new file mode 100644
index 000..ea9e9cd7a37
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/pr96456.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -std=gnu99 -ffast-math -fno-finite-math-only -march=z13" 
} */
+
+int b, c, d;
+double *e;
+int f() {
+  double *a = a;
+  int g = d, f = c, h = b;
+  if (__builtin_expect(f, 0))
+for (; g < h; g++)
+  e[g] = (int)(a[g] >= 0.0 ? g + 0. : a[g]);
+  return 0;
+}
-- 
2.25.1