Re: [PATCH] Improve adc discovery during combine on x86 (PR target/85095)

2018-03-27 Thread Uros Bizjak
On Tue, Mar 27, 2018 at 11:11 PM, Jakub Jelinek  wrote:
> Hi!
>
> In 6.x we've changed unsigned if (a < b) a++; into ADD_OVERFLOW ifn,
> which results in different expanded code, which on the following testcase
> unfortunately doesn't combine anymore into the optimal 3 instructions.
>
> The problem is that we want adc[lq] $0, %reg instruction, but simplify-rtx.c
> leaves the apparently useless (plus something const0_rtx) out, just uses
> something, and there is no pattern that matches that.
>
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?
>
> 2018-03-27  Jakub Jelinek  
>
> PR target/85095
> * config/i386/i386.md (*add3_carry0): New pattern.
>
> * gcc.target/i386/pr85095.c: New test.
>
> --- gcc/config/i386/i386.md.jj  2018-03-27 12:54:54.685244368 +0200
> +++ gcc/config/i386/i386.md 2018-03-27 19:38:43.891451026 +0200
> @@ -6854,6 +6854,23 @@ (define_insn "add3_carry"
> (set_attr "pent_pair" "pu")
> (set_attr "mode" "")])
>
> +(define_insn "*add3_carry0"

Please name this "*add3_carry_0". You will also need to
introduce "*addsi3_carry_zext_0". Probably minus patterns have the
same problem, simplify-rtx probably removes (minus ... const_rtx0),
too.

> +  [(set (match_operand:SWI 0 "nonimmediate_operand" "=m")
> +   (plus:SWI
> + (match_operator:SWI 3 "ix86_carry_flag_operator"
> +   [(match_operand 2 "flags_reg_operand") (const_int 0)])
> + (match_operand:SWI 1 "nonimmediate_operand" "0")))
> +   (clobber (reg:CC FLAGS_REG))]
> +  "ix86_unary_operator_ok (PLUS, mode, operands)"
> +{
> +  operands[4] = const0_rtx;
> +  return "adc{}\t{%4, %0|%0, %4}";

Just use "$0" ("0" in intel syntax) in the insn template.

Uros.


Re: [PATCH] Fix compile-time hog in MPX boundary checking (PR target/84988).

2018-03-27 Thread Martin Liška

On 03/21/2018 01:44 PM, Jakub Jelinek wrote:

On Wed, Mar 21, 2018 at 01:40:08PM +0100, Martin Liška wrote:

2018-03-21  Martin Liska  

PR target/84988
* config/i386/i386.c (ix86_function_arg_advance): Do not call
chkp_type_bounds_count if MPX is not enabled.
---
  gcc/config/i386/i386.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 5b1e962dedb..0693f8fc451 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -8618,7 +8618,8 @@ ix86_function_arg_advance (cumulative_args_t cum_v, 
machine_mode mode,
if (cum->caller)
cfun->machine->outgoing_args_on_stack = true;
  
-  cum->bnds_in_bt = chkp_type_bounds_count (type);

+  if (type && POINTER_BOUNDS_TYPE_P (type))
+   cum->bnds_in_bt = chkp_type_bounds_count (type);


This is weird.  POINTER_BOUNDS_TYPE_P (type)
is TREE_CODE (type) == POINTER_BOUNDS_TYPE,
and for POINTER_BOUNDS_TYPE chkp_type_bounds_count will just unconditionally
return 0.

Jakub



Ok, so should we make the set of cum->bnds_in_bt based on 
flag_check_pointer_bounds flag?

If so, I've got patch that I've tested on my x86_64-linux-gnu machin.

Martin
>From 7b5978e61305c5098a084c2352fcbacb4c347158 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 21 Mar 2018 10:51:32 +0100
Subject: [PATCH] Do not call chkp_type_bounds_count if MPX is not enabled (PR
 target/84988).

gcc/ChangeLog:

2018-03-21  Martin Liska  

	PR target/84988
	* config/i386/i386.c (ix86_function_arg_advance): Do not call
	chkp_type_bounds_count if MPX is not enabled.
---
 gcc/config/i386/i386.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index b4f6aec1434..2b2896f7ac6 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -8618,7 +8618,8 @@ ix86_function_arg_advance (cumulative_args_t cum_v, machine_mode mode,
   if (cum->caller)
 	cfun->machine->outgoing_args_on_stack = true;
 
-  cum->bnds_in_bt = chkp_type_bounds_count (type);
+  if (flag_check_pointer_bounds)
+	cum->bnds_in_bt = chkp_type_bounds_count (type);
 }
 }
 
-- 
2.16.2



Re: [PATCH] [PR c++/84943] allow folding of array indexing indirect_ref

2018-03-27 Thread Alexandre Oliva
On Mar 23, 2018, Jason Merrill  wrote:

> On Fri, Mar 23, 2018 at 4:55 PM, Jason Merrill  wrote:
>> On Fri, Mar 23, 2018 at 12:44 PM, Jason Merrill  wrote:
>>> Seems like cp_fold should update CALL_EXPR_FN with "callee" if non-null.
>> 
>> Did you try this?  That should avoid it being ADDR_EXPR of a decl.

> Oh, I was assuming the ICE was in the middle-end, but it's in
> build_call_a.  And it looks like the problem isn't that it's an
> ADDR_EXPR of a decl, but that the function isn't marked TREE_USED.

Well, yeah.  cp_build_function_call_vec marks the function as used when
function is a FUNCTION_DECL.  In this testcase, it's INDIRECT_REF of
ADDR_EXPR of FUNCTION_DECL.  Since the idea of bypassing cancelling-out
pairs of INDIRECT_REF and ADDR_EXPR, that would have allowed
cp_build_function_call_vec to get to the FUNCTION_DECL and mark it as
used was not accepted, the alternative was to stop build_call_a from
getting to the FUNCTION_DECL, which was very much in line of what you'd
said about preserving source constructs and allowing the significant
differences for some language rules to remain in place.

Now, to me, it is clear that if we are to preserve source level
constructs because they could make some significant different WRT
certain language rules, and to that end we don't want to simplify the
INDIRECT_REF arising from the array indexing with the ADDR_EXPR of the
function-to-pointer decay, then it should follow that we also don't want
to simplify the ADDR_EXPR that build_addr_func would introduce with that
INDIRECT_REF.  That's what the latest patch I proposed does, and it also
solves the potential inconsistency between cp_build_function_call_vec
and build_call_a, in which one of them does not find the FUNCTION_DECL
because it's too deeply hidden within INDIRECT_REFs/ADDR_EXPRs pairs and
so it fails to mark the decl as used, but then the other finds it
because build_addr_func peeled an INDIRECT_REF, and then complains that
the decl is not marked as used.

Now, I don't know what the rules are that could make a difference in
this case, but I must confess that I'm a bit surprised that the
following constructs could possibly be interpreted differently under C++
rules:

  f();
  (&f)();
  (*f)();
  f[0]();
  (*&f)();
  (*&*&*&f)();

Maybe they aren't when we get to cp_build_function_call_vec (any
differences WRT overload resolution would have been taken care of), and
we should use get_callee_fndecl in cp_build_function_call_vec, and
arrange for get_callee_fndecl to peel as many layers of INDIRECT_REF and
ADDR_EXPR as it finds when searching for a FUNCTION_DECL.

Anyway, given the accumulated constraints I've been given WRT to this
bug, I'm afraid I've run out of ideas.  I welcome suggestions as to how
to proceed.

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


Re: [patch,fortran] Bug 69497 - ICE in gfc_free_namespace

2018-03-27 Thread Jerry DeLisle

On 03/27/2018 01:53 PM, Mikael Morin wrote:

Le 26/03/2018 à 03:53, Jerry DeLisle a écrit :

On 03/25/2018 02:11 PM, Mikael Morin wrote:

Le 25/03/2018 à 21:27, Jerry DeLisle a écrit :

On 03/25/2018 10:49 AM, Mikael Morin wrote:

Le 25/03/2018 à 00:25, Jerry DeLisle a écrit :

On 03/24/2018 02:56 PM, Steve Kargl wrote:

On Sat, Mar 24, 2018 at 02:25:36PM -0700, Jerry DeLisle wrote:


diff --git a/gcc/fortran/symbol.c b/gcc/fortran/symbol.c
index ce6b1e93644..997d90b00fd 100644
--- a/gcc/fortran/symbol.c
+++ b/gcc/fortran/symbol.c
@@ -4037,10 +4037,9 @@ gfc_free_namespace (gfc_namespace *ns)
   return;

 ns->refs--;
-  if (ns->refs > 0)
-    return;

-  gcc_assert (ns->refs == 0);
+  if (ns->refs != 0)
+    return;

 gfc_free_statements (ns->code);


The ChangeLog doesn't seem to match the patch.

If ns->refs==0, you free the namespace.
If ns->refs!=0, you return.
So, if ns->refs<0, the namespace is not freed.



That is what I get when I am in hurry. Try this:

 PR fortran/84506
 * symbol.c (gfc_free_namespace): Delete the assert and only if
 refs count equals zero, free the namespece. Otherwise,
 something is halfway and other errors will resound.


Hello,

The assert was put in place to exhibit memory management issues, 
and that’s what it does.
If ns->refs < 0, then it was 0 on the previous call, and ns should 
have been freed at that time.
So if you read ns->refs you are reading garbage, and if you 
decrease it you are writing to memory that you don’t own any more.
I think ICEing at this point is good enough, instead of going 
further down the road.


The problem with ICEing is that it tells the users to report it as a 
bug in the compiler. 


It is a bug in the compiler, albeit one of little concern to us (at 
least when dealing with invalid code): the memory is incorrectly 
managed.


No argument there, just saying in the cases of the PR, it is not 
useful to the user.






This is a lot more useful then a fatal error.  All of the 30 cases I 
tested gave similar reasonable errors.




A fatal error doesn’t actually remove previously emitted (reasonable) 
errors, it just doesn’t let the compiler continue.  I can propose the 
attached patch to convince you.


No need to convince. If you prefer your patch, its OK with me.


I have tried to restore the assert instead.
With the attached patch, freshly regression tested.
I have also checked the 29 cases from the PR.
OK?

Mikael



Good to go Mikael, thanks.

Jerry


Re: [PATCH, rs6000] xmmintrin.h needs to use __vector __bool everywhere

2018-03-27 Thread Segher Boessenkool
Hi Bill,

On Tue, Mar 27, 2018 at 04:10:00PM -0500, Bill Schmidt wrote:
> The xmmintrin.h compatibility header embeds altivec.h to use the Power
> vector intrinsics.  However, it needs to be careful not to use the "bool"
> keyword with vectors so the headers don't cause potential problems in C++ 
> and C11 code when using strict-ANSI.  I noticed a few cases where this was
> happening.  They haven't caused trouble yet, but it's just a matter of time.
> This patch cleans those up.
> 
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu.  Is this okay
> for trunk?

Yes please.  Thanks!


Segher


> 2018-03-27  Bill Schmidt  
> 
>   * config/rs6000/xmmintrin.h (_mm_max_pi16): Use __vector __bool
>   instead of __vector bool.
>   (_mm_max_pu8): Likewise.
>   (_mm_min_pi16): Likewise.


Re: [PATCH] rs6000: -mreadonly-in-sdata (PR82411)

2018-03-27 Thread Segher Boessenkool
On Wed, Mar 07, 2018 at 08:22:54PM +, Segher Boessenkool wrote:
> This adds a new option -mreadonly-in-sdata (on by default) that
> controls whether readonly data can be put in sdata.  (For EABI this
> does nothing, readonly data is put in sdata2 as usual).

I now have backported this to GCC 7 and GCC 6 branches.


Segher


Re: [PATCH], PR target/84914, Fix complex long double multiply/divide on PowerPC -mabi=ieeelongdouble

2018-03-27 Thread Segher Boessenkool
On Fri, Mar 23, 2018 at 03:19:03PM -0400, Michael Meissner wrote:
> 2018-03-23  Michael Meissner  
> 
>   PR target/84914
>   * config/rs6000/rs6000.c (create_complex_muldiv): New helper
>   function to create the function decl for complex long double
>   multiply and divide for -mabi=ieeelongdouble.
>   (init_float128_ieee): Call it.
> 
> [gcc/testsuite]
> 2018-03-23  Michael Meissner  
> 
>   PR target/84914
>   * gcc.target/powerpc/mulkc-2.c: New tests to make sure complex
>   long double multiply/divide uses the correct function.
>   * gcc.target/powerpc/mulkc-3.c: Likewise.
>   * gcc.target/powerpc/divkc-2.c: Likewise.
>   * gcc.target/powerpc/divkc-3.c: Likewise.

Okay for trunk.  Thanks!


Segher


Re: [PATCH. rs6000] Fix PR84912: ICE using -m32 on __builtin_divde*, patch #2

2018-03-27 Thread Segher Boessenkool
Hi!

On Fri, Mar 23, 2018 at 12:41:38PM -0500, Peter Bergner wrote:
> This is the second patch to fix PR84912, which is an ICE when calling some
> extended divide builtin functions.  This patch is relative to the first
> patch.  This fixes the ICE by adding a new mask to the builtin functions
> that are ICEing and then enforcing it is set.  I have also added a helpful
> error message in the case it is not set.

> @@ -15952,6 +15953,10 @@ rs6000_invalid_builtin (enum rs6000_buil
>  name);
>else if ((fnmask & RS6000_BTM_FLOAT128) != 0)
>  error ("builtin function %qs requires the %qs option", name, 
> "-mfloat128");
> +  else if ((fnmask & (RS6000_BTM_POPCNTD | RS6000_BTM_POWERPC64))
> +== (RS6000_BTM_POPCNTD | RS6000_BTM_POWERPC64))
> +error ("builtin function %qs requires the %qs and %qs options",
> +name, "-mcpu=power7 (or newer)", "-m64 or -mpowerpc64");

This does not work for translation, and it quotes the wrong things.
Each %qs should be for exactly one option string.

Looks good otherwise.


Segher


Re: [RFC Patch], PowerPC memory support pre-gcc9, Version 2, Patch #12

2018-03-27 Thread Michael Meissner
When I last worked on fusion, I put a bunch of support to save the insn code of
various functions for creating fusion.  I never actually used these functions,
except printing them out with -mdebug=reg.  This patch deletes the generator
functions for the insns, but it does not delete the actual insns themselves.
The current peephole2 for power8 GPR load fusion, and power9 SF/DF load/store
fusion still generate these insns, but it doesn't use the gen_ to
create the insns.

I have built both little endian and big endian bootstrap compilers and there
were no regressions with these patches.

2018-03-27  Michael Meissner  

* config/rs6000/rs6000.c (struct rs6000_reg_addr): Eliminate
unused insn code fields that were originally meant for adding
fusion operations.
(rs6000_debug_print_mode): Likewise.
(rs6000_init_hard_regno_mode_ok): Likewise.
* config/rs6000/rs6000.md (fusion_gpr_load_): Turn off insn
generator for the fusion functions that are not referenced by name.
(fusion_gpr___store): Likewise.
(fusion_gpr___load): Likewise.
(fusion_vsx___load): Likewise.
(fusion_vsx___store): Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 258818)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -522,15 +522,6 @@ struct rs6000_reg_addr {
   enum insn_code reload_fpr_gpr;   /* INSN to move from FPR to GPR.  */
   enum insn_code reload_gpr_vsx;   /* INSN to move from GPR to VSX.  */
   enum insn_code reload_vsx_gpr;   /* INSN to move from VSX to GPR.  */
-  enum insn_code fusion_gpr_ld;/* INSN for fusing gpr 
ADDIS/loads.  */
-   /* INSNs for fusing addi with loads
-  or stores for each reg. class.  */   
   
-  enum insn_code fusion_addi_ld[(int)N_RELOAD_REG];
-  enum insn_code fusion_addi_st[(int)N_RELOAD_REG];
-   /* INSNs for fusing addis with loads
-  or stores for each reg. class.  */   
   
-  enum insn_code fusion_addis_ld[(int)N_RELOAD_REG];
-  enum insn_code fusion_addis_st[(int)N_RELOAD_REG];
   addr_mask_type addr_mask[(int)N_RELOAD_REG]; /* Valid address masks.  */
   bool scalar_in_vmx_p;/* Scalar value can go in VMX.  
*/
   bool fused_toc;  /* Mode supports TOC fusion.  */
@@ -2393,7 +2384,6 @@ rs6000_debug_print_mode (ssize_t m)
 {
   ssize_t rc;
   int spaces = 0;
-  bool fuse_extra_p;
 
   fprintf (stderr, "Mode: %-5s", GET_MODE_NAME (m));
   for (rc = 0; rc < N_RELOAD_REG; rc++)
@@ -2416,82 +2406,6 @@ rs6000_debug_print_mode (ssize_t m)
   else
 spaces += sizeof ("  Upper=y") - 1;
 
-  fuse_extra_p = ((reg_addr[m].fusion_gpr_ld != CODE_FOR_nothing)
- || reg_addr[m].fused_toc);
-  if (!fuse_extra_p)
-{
-  for (rc = 0; rc < N_RELOAD_REG; rc++)
-   {
- if (rc != RELOAD_REG_ANY)
-   {
- if (reg_addr[m].fusion_addi_ld[rc] != CODE_FOR_nothing
- || reg_addr[m].fusion_addi_ld[rc]  != CODE_FOR_nothing
- || reg_addr[m].fusion_addi_st[rc]  != CODE_FOR_nothing
- || reg_addr[m].fusion_addis_ld[rc] != CODE_FOR_nothing
- || reg_addr[m].fusion_addis_st[rc] != CODE_FOR_nothing)
-   {
- fuse_extra_p = true;
- break;
-   }
-   }
-   }
-}
-
-  if (fuse_extra_p)
-{
-  fprintf (stderr, "%*s  Fuse:", spaces, "");
-  spaces = 0;
-
-  for (rc = 0; rc < N_RELOAD_REG; rc++)
-   {
- if (rc != RELOAD_REG_ANY)
-   {
- char load, store;
-
- if (reg_addr[m].fusion_addis_ld[rc] != CODE_FOR_nothing)
-   load = 'l';
- else if (reg_addr[m].fusion_addi_ld[rc] != CODE_FOR_nothing)
-   load = 'L';
- else
-   load = '-';
-
- if (reg_addr[m].fusion_addis_st[rc] != CODE_FOR_nothing)
-   store = 's';
- else if (reg_addr[m].fusion_addi_st[rc] != CODE_FOR_nothing)
-   store = 'S';
- else
-   store = '-';
-
- if (load == '-' && store == '-')
-   spaces += 5;
- else
-   {
- fprintf (stderr, "%*s%c=%c%c", (spaces + 1), "",
-  reload_reg_map[rc].name[0], load, store);
- spaces = 0;
-   }
-   }
-   }
-
-  if (reg_addr[m].fusion_gpr_ld != CODE_FOR_nothing)
-   {
- fprintf (stderr, "%*sP8gpr", (spaces + 1), "

Re: [PATCH. rs6000] Fix PR84912: ICE using -m32 on __builtin_divde*, patch #1

2018-03-27 Thread Segher Boessenkool
On Fri, Mar 23, 2018 at 12:40:09PM -0500, Peter Bergner wrote:
> This is the first patch to fix PR84912, which is an ICE when calling some
> extended divide builtin functions.  In discussing this offline, we decided
> that all div*o builtin functions make no sense because we don't model the
> OV bit in GCC.  This patch simply removes all div*o builtins and their
> associated documentation.  The next patch will cure the remaining ICEs.
> 
> This passed bootstrap and regtesting on powerpc64-linux with no regressions.
> Ok for mainline?

Okay.  Thanks!

> Do we want this backported to the open release branches too?

It's fine to leave things as-is there, I suppose.  If it makes backports
easier it is fine for 7 as well as for 6 though.

>   * doc/extend.texi (__builtin_divweo): Remove documention for deleted
>   builtin function.

Typo ("documentation").


Segher


Re: [documentation patch] add detail to const and pure attributes

2018-03-27 Thread Pedro Alves
On 03/27/2018 09:19 PM, Martin Sebor wrote:
> On 03/27/2018 01:38 PM, Pedro Alves wrote:
>> On 03/27/2018 07:18 PM, Martin Sebor wrote:
>>> +Because a @code{pure} function can have no side-effects it does not
>>
>> FWIW, I'd suggest rephrasing as:
>>
>>  Because a @code{pure} function cannot have side effects
>>
>> because "can have no side-effects" can be read as
>> "is allowed to have no side effects", which gave me pause
>> when I read it the first time, and is the opposite of
>> what you mean.
> 
> That is what I meant: that const and pure functions are not allowed
> to have any side-effects.  If they did, they could be unexpectedly
> eliminated (i.e., the behavior is undefined when such a function
> does have a side-effect).

I know, but that's not what I read the first time (and found it
odd so I had to re-read).  You can either assume that I'm the
only one that will misunderstand it on first read, or you can
swap a couple words and be sure no one will misunderstand it.

Up to you.

> 
> I don't have a strong preference for one phrasing over the other
> but they both say the same thing.  One is just ever so slightly
> more emphatic.
> 

Thanks,
Pedro Alves


[PATCH] Improve adc discovery during combine on x86 (PR target/85095)

2018-03-27 Thread Jakub Jelinek
Hi!

In 6.x we've changed unsigned if (a < b) a++; into ADD_OVERFLOW ifn,
which results in different expanded code, which on the following testcase
unfortunately doesn't combine anymore into the optimal 3 instructions.

The problem is that we want adc[lq] $0, %reg instruction, but simplify-rtx.c
leaves the apparently useless (plus something const0_rtx) out, just uses
something, and there is no pattern that matches that.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2018-03-27  Jakub Jelinek  

PR target/85095
* config/i386/i386.md (*add3_carry0): New pattern.

* gcc.target/i386/pr85095.c: New test.

--- gcc/config/i386/i386.md.jj  2018-03-27 12:54:54.685244368 +0200
+++ gcc/config/i386/i386.md 2018-03-27 19:38:43.891451026 +0200
@@ -6854,6 +6854,23 @@ (define_insn "add3_carry"
(set_attr "pent_pair" "pu")
(set_attr "mode" "")])
 
+(define_insn "*add3_carry0"
+  [(set (match_operand:SWI 0 "nonimmediate_operand" "=m")
+   (plus:SWI
+ (match_operator:SWI 3 "ix86_carry_flag_operator"
+   [(match_operand 2 "flags_reg_operand") (const_int 0)])
+ (match_operand:SWI 1 "nonimmediate_operand" "0")))
+   (clobber (reg:CC FLAGS_REG))]
+  "ix86_unary_operator_ok (PLUS, mode, operands)"
+{
+  operands[4] = const0_rtx;
+  return "adc{}\t{%4, %0|%0, %4}";
+}
+  [(set_attr "type" "alu")
+   (set_attr "use_carry" "1")
+   (set_attr "pent_pair" "pu")
+   (set_attr "mode" "")])
+
 (define_insn "*addsi3_carry_zext"
   [(set (match_operand:DI 0 "register_operand" "=r")
(zero_extend:DI
--- gcc/testsuite/gcc.target/i386/pr85095.c.jj  2018-03-27 19:49:02.985677415 
+0200
+++ gcc/testsuite/gcc.target/i386/pr85095.c 2018-03-27 19:49:28.076686590 
+0200
@@ -0,0 +1,13 @@
+/* PR target/85095 *
+/* { dg-do compile } */
+/* { dg-options "-O2 -masm=att" } */
+
+unsigned long
+foo (unsigned long a, unsigned long b)
+{
+  a += b;
+  if (a < b) a++;
+  return a;
+}
+
+/* { dg-final { scan-assembler-times "adc\[lq]\t\\\$0," 1 } } */

Jakub


[PATCH, rs6000] xmmintrin.h needs to use __vector __bool everywhere

2018-03-27 Thread Bill Schmidt
Hi,

The xmmintrin.h compatibility header embeds altivec.h to use the Power
vector intrinsics.  However, it needs to be careful not to use the "bool"
keyword with vectors so the headers don't cause potential problems in C++ 
and C11 code when using strict-ANSI.  I noticed a few cases where this was
happening.  They haven't caused trouble yet, but it's just a matter of time.
This patch cleans those up.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu.  Is this okay
for trunk?

Thanks,
Bill


2018-03-27  Bill Schmidt  

* config/rs6000/xmmintrin.h (_mm_max_pi16): Use __vector __bool
instead of __vector bool.
(_mm_max_pu8): Likewise.
(_mm_min_pi16): Likewise.


Index: gcc/config/rs6000/xmmintrin.h
===
--- gcc/config/rs6000/xmmintrin.h   (revision 25)
+++ gcc/config/rs6000/xmmintrin.h   (working copy)
@@ -1398,11 +1398,11 @@ _mm_max_pi16 (__m64 __A, __m64 __B)
 {
 #if _ARCH_PWR8
   __vector signed short a, b, r;
-  __vector bool short c;
+  __vector __bool short c;
 
   a = (__vector signed short)vec_splats (__A);
   b = (__vector signed short)vec_splats (__B);
-  c = (__vector bool short)vec_cmpgt (a, b);
+  c = (__vector __bool short)vec_cmpgt (a, b);
   r = vec_sel (b, a, c);
   return (__builtin_unpack_vector_int128 ((__vector __int128_t)r, 0));
 #else
@@ -1436,11 +1436,11 @@ _mm_max_pu8 (__m64 __A, __m64 __B)
 {
 #if _ARCH_PWR8
   __vector unsigned char a, b, r;
-  __vector bool char c;
+  __vector __bool char c;
 
   a = (__vector unsigned char)vec_splats (__A);
   b = (__vector unsigned char)vec_splats (__B);
-  c = (__vector bool char)vec_cmpgt (a, b);
+  c = (__vector __bool char)vec_cmpgt (a, b);
   r = vec_sel (b, a, c);
   return (__builtin_unpack_vector_int128 ((__vector __int128_t)r, 0));
 #else
@@ -1472,11 +1472,11 @@ _mm_min_pi16 (__m64 __A, __m64 __B)
 {
 #if _ARCH_PWR8
   __vector signed short a, b, r;
-  __vector bool short c;
+  __vector __bool short c;
 
   a = (__vector signed short)vec_splats (__A);
   b = (__vector signed short)vec_splats (__B);
-  c = (__vector bool short)vec_cmplt (a, b);
+  c = (__vector __bool short)vec_cmplt (a, b);
   r = vec_sel (b, a, c);
   return (__builtin_unpack_vector_int128 ((__vector __int128_t)r, 0));
 #else
@@ -1510,11 +1510,11 @@ _mm_min_pu8 (__m64 __A, __m64 __B)
 {
 #if _ARCH_PWR8
   __vector unsigned char a, b, r;
-  __vector bool char c;
+  __vector __bool char c;
 
   a = (__vector unsigned char)vec_splats (__A);
   b = (__vector unsigned char)vec_splats (__B);
-  c = (__vector bool char)vec_cmplt (a, b);
+  c = (__vector __bool char)vec_cmplt (a, b);
   r = vec_sel (b, a, c);
   return (__builtin_unpack_vector_int128 ((__vector __int128_t)r, 0));
 #else



[PATCH] Add pow -> exp hack for SPEC2k17 628.pop2_s (PR tree-optimization/82004)

2018-03-27 Thread Jakub Jelinek
Hi!

As mentioned in the PR, sw_absorption.fppized.f90 relies on pow in
x = log10 (something) - y;
for (...)
  {
x = x + y;
z = pow (10.0, x);
  }
where x + y in the first iteration is exactly -3 to be >= 0.001,
unfortunately with the pow(cst, x) -> exp (log (cst) * x) optimization
with -Ofast and -flto this returns something a few ulps smaller than that
and the benchmark fails.

In the PR I've attached quite large patch that attempts to optimize the
case using x = x * cst;, unfortunately even for -Ofast measures that
generates quite big relative errors when the loop has 400 iterations.

So, instead this simple patch just tries to detect the case where we
have on some edge pow (10.0, integer) and just doesn't attempt to optimize
it in that case to exp.  If glibc folks add an optimized exp10 eventually,
we can switch it later on to emitting exp10 instead.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-03-27  Jakub Jelinek  

PR tree-optimization/82004
* generic-match-head.c (optimize_pow_to_exp): New function.
* gimple-match-head.c (optimize_pow_to_exp): New function.
* match.pd (pow(C,x) -> exp(log(C)*x)): Don't fold if
optimize_pow_to_exp is false.

* gcc.dg/pr82004.c: New test.

--- gcc/generic-match-head.c.jj 2018-02-13 09:33:31.089560180 +0100
+++ gcc/generic-match-head.c2018-03-27 18:28:36.663913272 +0200
@@ -77,3 +77,11 @@ canonicalize_math_after_vectorization_p
 {
   return false;
 }
+
+/* Return true if pow(cst, x) should be optimized into exp(log(cst) * x).  */
+
+static bool
+optimize_pow_to_exp (tree arg0, tree arg1)
+{
+  return false;
+}
--- gcc/gimple-match-head.c.jj  2018-02-13 09:33:31.107560174 +0100
+++ gcc/gimple-match-head.c 2018-03-27 18:48:21.205369113 +0200
@@ -840,3 +840,55 @@ canonicalize_math_after_vectorization_p
 {
   return !cfun || (cfun->curr_properties & PROP_gimple_lvec) != 0;
 }
+
+/* Return true if pow(cst, x) should be optimized into exp(log(cst) * x).
+   As a workaround for SPEC CPU2017 628.pop2_s, don't do it if arg0
+   is 10.0, arg1 = phi_res + cst1 and phi_res = PHI 
+   where cst1 + cst2 is an exact integer, because then pow (10.0, arg1)
+   will likely be exact, while exp (log (10.0) * arg1) might be not. */
+
+static bool
+optimize_pow_to_exp (tree arg0, tree arg1)
+{
+  gcc_assert (TREE_CODE (arg0) == REAL_CST);
+  REAL_VALUE_TYPE ten;
+  real_from_integer (&ten, TYPE_MODE (TREE_TYPE (arg0)), 10, SIGNED);
+  if (!real_identical (TREE_REAL_CST_PTR (arg0), &ten))
+return true;
+
+  if (TREE_CODE (arg1) != SSA_NAME)
+return true;
+
+  gimple *def = SSA_NAME_DEF_STMT (arg1);
+  if (!is_gimple_assign (def)
+  || gimple_assign_rhs_code (def) != PLUS_EXPR
+  || TREE_CODE (gimple_assign_rhs1 (def)) != SSA_NAME
+  || TREE_CODE (gimple_assign_rhs2 (def)) != REAL_CST)
+return true;
+
+  gphi *phi = dyn_cast  (SSA_NAME_DEF_STMT (gimple_assign_rhs1 (def)));
+  if (!phi)
+return true;
+
+  tree cst = NULL_TREE;
+  int n = gimple_phi_num_args (phi);
+  for (int i = 0; i < n; i++)
+{
+  tree arg = PHI_ARG_DEF (phi, i);
+  if (TREE_CODE (arg) != REAL_CST)
+   continue;
+  else if (cst == NULL_TREE)
+   cst = arg;
+  else if (!operand_equal_p (cst, arg, 0))
+   return true;
+}
+
+  tree cst2 = const_binop (PLUS_EXPR, TREE_TYPE (cst), cst,
+  gimple_assign_rhs2 (def));
+  if (cst2
+  && TREE_CODE (cst2) == REAL_CST
+  && real_isinteger (TREE_REAL_CST_PTR (cst2),
+TYPE_MODE (TREE_TYPE (cst2
+return false;
+  return true;
+}
--- gcc/match.pd.jj 2018-03-13 09:12:29.579110925 +0100
+++ gcc/match.pd2018-03-27 18:29:38.292936995 +0200
@@ -4016,7 +4016,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   because exp(log(C)*x), while faster, will have worse precision
   and if x folds into a constant too, that is unnecessary
   pessimization.  */
-   && canonicalize_math_after_vectorization_p ())
+   && canonicalize_math_after_vectorization_p ()
+   && optimize_pow_to_exp (@0, @1))
 (with {
const REAL_VALUE_TYPE *const value = TREE_REAL_CST_PTR (@0);
bool use_exp2 = false;
--- gcc/testsuite/gcc.dg/pr82004.c.jj   2018-03-27 18:02:27.135309786 +0200
+++ gcc/testsuite/gcc.dg/pr82004.c  2018-03-27 17:10:49.070052010 +0200
@@ -0,0 +1,32 @@
+/* PR tree-optimization/82004 */
+/* { dg-do run } */
+/* { dg-options "-Ofast" } */
+
+extern double log10 (double);
+extern double pow (double, double);
+
+__attribute__((noipa)) void
+bar (double x)
+{
+  if (x < 0.001)
+__builtin_abort ();
+  asm volatile ("" : : : "memory");
+}
+
+int
+main ()
+{
+  double d = 0.001;
+  double e = 10.0;
+  double f = (log10 (e) - log10 (d)) / 400.0;
+  double g = log10 (d) - f;
+  volatile int q = 0;
+  int i;
+  if (__builtin_expect (q == 0, 0))
+for (i = 0; i < 400; ++i)
+  {
+g = g + f;
+bar (pow (10.0, g));

Re: [patch,fortran] Bug 69497 - ICE in gfc_free_namespace

2018-03-27 Thread Mikael Morin

Le 26/03/2018 à 03:53, Jerry DeLisle a écrit :

On 03/25/2018 02:11 PM, Mikael Morin wrote:

Le 25/03/2018 à 21:27, Jerry DeLisle a écrit :

On 03/25/2018 10:49 AM, Mikael Morin wrote:

Le 25/03/2018 à 00:25, Jerry DeLisle a écrit :

On 03/24/2018 02:56 PM, Steve Kargl wrote:

On Sat, Mar 24, 2018 at 02:25:36PM -0700, Jerry DeLisle wrote:


diff --git a/gcc/fortran/symbol.c b/gcc/fortran/symbol.c
index ce6b1e93644..997d90b00fd 100644
--- a/gcc/fortran/symbol.c
+++ b/gcc/fortran/symbol.c
@@ -4037,10 +4037,9 @@ gfc_free_namespace (gfc_namespace *ns)
   return;

 ns->refs--;
-  if (ns->refs > 0)
-    return;

-  gcc_assert (ns->refs == 0);
+  if (ns->refs != 0)
+    return;

 gfc_free_statements (ns->code);


The ChangeLog doesn't seem to match the patch.

If ns->refs==0, you free the namespace.
If ns->refs!=0, you return.
So, if ns->refs<0, the namespace is not freed.



That is what I get when I am in hurry. Try this:

 PR fortran/84506
 * symbol.c (gfc_free_namespace): Delete the assert and only if
 refs count equals zero, free the namespece. Otherwise,
 something is halfway and other errors will resound.


Hello,

The assert was put in place to exhibit memory management issues, and 
that’s what it does.
If ns->refs < 0, then it was 0 on the previous call, and ns should 
have been freed at that time.
So if you read ns->refs you are reading garbage, and if you decrease 
it you are writing to memory that you don’t own any more.
I think ICEing at this point is good enough, instead of going 
further down the road.


The problem with ICEing is that it tells the users to report it as a 
bug in the compiler. 


It is a bug in the compiler, albeit one of little concern to us (at 
least when dealing with invalid code): the memory is incorrectly managed.


No argument there, just saying in the cases of the PR, it is not useful 
to the user.






This is a lot more useful then a fatal error.  All of the 30 cases I 
tested gave similar reasonable errors.




A fatal error doesn’t actually remove previously emitted (reasonable) 
errors, it just doesn’t let the compiler continue.  I can propose the 
attached patch to convince you.


No need to convince. If you prefer your patch, its OK with me.


I have tried to restore the assert instead.
With the attached patch, freshly regression tested.
I have also checked the 29 cases from the PR.
OK?

Mikael

2018-03-27  Mikael Morin  

	PR fortran/69497
	* symbol.c (gfc_symbol_done_2): Start freeing namespaces
	from the root.
	(gfc_free_namespace): Restore assert (revert r258839). 

diff --git a/gcc/fortran/symbol.c b/gcc/fortran/symbol.c
index 997d90b00fd..546a4fae0a8 100644
--- a/gcc/fortran/symbol.c
+++ b/gcc/fortran/symbol.c
@@ -4037,10 +4037,11 @@ gfc_free_namespace (gfc_namespace *ns)
 return;
 
   ns->refs--;
-
-  if (ns->refs != 0)
+  if (ns->refs > 0)
 return;
 
+  gcc_assert (ns->refs == 0);
+
   gfc_free_statements (ns->code);
 
   free_sym_tree (ns->sym_root);
@@ -4087,8 +4088,14 @@ gfc_symbol_init_2 (void)
 void
 gfc_symbol_done_2 (void)
 {
-  gfc_free_namespace (gfc_current_ns);
-  gfc_current_ns = NULL;
+  if (gfc_current_ns != NULL)
+{
+  /* free everything from the root.  */
+  while (gfc_current_ns->parent != NULL)
+	gfc_current_ns = gfc_current_ns->parent;
+  gfc_free_namespace (gfc_current_ns);
+  gfc_current_ns = NULL;
+}
   gfc_free_dt_list ();
 
   enforce_single_undo_checkpoint ();


New Spanish PO file for 'cpplib' (version 8.1-b20180128)

2018-03-27 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'cpplib' has been submitted
by the Spanish team of translators.  The file is available at:

http://translationproject.org/latest/cpplib/es.po

(This file, 'cpplib-8.1-b20180128.es.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/cpplib/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/cpplib.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Contents of PO file 'cpplib-8.1-b20180128.es.po'

2018-03-27 Thread Translation Project Robot


cpplib-8.1-b20180128.es.po.gz
Description: Binary data
The Translation Project robot, in the
name of your translation coordinator.



Re: [documentation patch] add detail to const and pure attributes

2018-03-27 Thread Martin Sebor

On 03/27/2018 01:38 PM, Pedro Alves wrote:

On 03/27/2018 07:18 PM, Martin Sebor wrote:

+Because a @code{pure} function can have no side-effects it does not


FWIW, I'd suggest rephrasing as:

 Because a @code{pure} function cannot have side effects

because "can have no side-effects" can be read as
"is allowed to have no side effects", which gave me pause
when I read it the first time, and is the opposite of
what you mean.


That is what I meant: that const and pure functions are not allowed
to have any side-effects.  If they did, they could be unexpectedly
eliminated (i.e., the behavior is undefined when such a function
does have a side-effect).

I don't have a strong preference for one phrasing over the other
but they both say the same thing.  One is just ever so slightly
more emphatic.

Martin


Re: [C++ Patch] PR 85067 ("[8 Regression] ICE with volatile parameter in defaulted copy-constructor")

2018-03-27 Thread Jason Merrill
OK.

On Tue, Mar 27, 2018 at 4:33 AM, Paolo Carlini  wrote:
> Hi,
>
> Volker noticed that a tweak I committed back in September, which tidied the
> diagnostic we produce in C++11 mode for the testcase in c++/68754 causes
> this error recovery regression. We could try restoring the consistency, for
> example along the lines of the patchlet I posted on the audit trail (passes
> testing) but, for 8.1.0 at least, I propose to simply revert that change.
> Tested x86_64-linux.
>
> Thanks, Paolo.
>
> //
>


Re: [C++ PATCH] Fix ICE in cp_build_reference_type (PR c++/85076)

2018-03-27 Thread Jason Merrill
OK.

On Tue, Mar 27, 2018 at 4:52 AM, Jakub Jelinek  wrote:
> Hi!
>
> Both build_{reference,pointer}_type start with if (to_type ==
> error_mark_node) return error_mark_node;
>
> cp_build_reference_type uses build_reference_type, so in many cases it will
> just return error_mark_node if it is passed, but if rval is true, it will
> assume build_reference_type returned some REFERENCE_TYPE instead.
>
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?
>
> 2018-03-27  Jakub Jelinek  
>
> PR c++/85076
> * tree.c (cp_build_reference_type): If to_type is error_mark_node,
> return it right away.
>
> * g++.dg/cpp1y/pr85076.C: New test.
>
> --- gcc/cp/tree.c.jj2018-03-21 21:18:31.738351376 +0100
> +++ gcc/cp/tree.c   2018-03-26 11:22:47.067967708 +0200
> @@ -1078,6 +1078,9 @@ cp_build_reference_type (tree to_type, b
>  {
>tree lvalue_ref, t;
>
> +  if (to_type == error_mark_node)
> +return error_mark_node;
> +
>if (TREE_CODE (to_type) == REFERENCE_TYPE)
>  {
>rval = rval && TYPE_REF_IS_RVALUE (to_type);
> --- gcc/testsuite/g++.dg/cpp1y/pr85076.C.jj 2018-03-26 11:26:55.725047985 
> +0200
> +++ gcc/testsuite/g++.dg/cpp1y/pr85076.C2018-03-26 11:26:41.807043494 
> +0200
> @@ -0,0 +1,6 @@
> +// PR c++/85076
> +// { dg-do compile { target c++14 } }
> +
> +template struct A*;  // { dg-error "expected unqualified-id 
> before" }
> +
> +auto a = [](A) {};   // { dg-error "is not a template|has 
> incomplete type" }
>
> Jakub


Re: [C++ PATCH] Fix ICE on offsetof with volatile struct and static data member array ref (PR c++/85061)

2018-03-27 Thread Jason Merrill
OK.

On Tue, Mar 27, 2018 at 4:54 AM, Jakub Jelinek  wrote:
> Hi!
>
> The following testcase ICEs, because we assert that we see a COMPOUND_EXPR
> only for static data member in a volatile struct, but as the testcase shows,
> we can see it also if using some component of the static data member.
>
> Fixed by using get_base_address, plus, as the check isn't as cheap as
> before, turn the assert into a checking assert only.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2018-03-27  Jakub Jelinek  
>
> PR c++/85061
> * c-common.c (fold_offsetof_1) : Assert that
> get_base_address of the second operand is a VAR_P, rather than the
> operand itself, and use gcc_checking_assert instead of gcc_assert.
>
> * g++.dg/ext/builtin-offsetof3.C: New test.
>
> --- gcc/c-family/c-common.c.jj  2018-03-13 00:38:23.809662252 +0100
> +++ gcc/c-family/c-common.c 2018-03-24 15:21:36.171485128 +0100
> @@ -6272,7 +6272,7 @@ fold_offsetof_1 (tree expr, enum tree_co
>  case COMPOUND_EXPR:
>/* Handle static members of volatile structs.  */
>t = TREE_OPERAND (expr, 1);
> -  gcc_assert (VAR_P (t));
> +  gcc_checking_assert (VAR_P (get_base_address (t)));
>return fold_offsetof_1 (t);
>
>  default:
> --- gcc/testsuite/g++.dg/ext/builtin-offsetof3.C.jj 2018-03-26 
> 11:54:54.338627270 +0200
> +++ gcc/testsuite/g++.dg/ext/builtin-offsetof3.C2018-03-26 
> 11:54:07.992610454 +0200
> @@ -0,0 +1,14 @@
> +// PR c++/85061
> +// { dg-do compile }
> +
> +struct B { int a, b; };
> +struct A
> +{
> +  static int x[2];
> +  static int y;
> +  static B z;
> +};
> +
> +int i = __builtin_offsetof (volatile A, x[0]); // { dg-error "cannot apply 
> 'offsetof' to static data member 'A::x'" }
> +int j = __builtin_offsetof (volatile A, y);// { dg-error "cannot apply 
> 'offsetof' to static data member 'A::y'" }
> +int k = __builtin_offsetof (volatile A, z.a);  // { dg-error "cannot apply 
> 'offsetof' to a non constant address" }
>
> Jakub


Re: [C++ PATCH] Improve cp_fold on vector CONSTRUCTORs (PR c++/85077)

2018-03-27 Thread Jason Merrill
OK.

On Tue, Mar 27, 2018 at 5:18 AM, Jakub Jelinek  wrote:
> Hi!
>
> The following testcase regressed for 8+, because we delayed folding in
> SAVE_EXPRs and end up passing a CONSTRUCTOR with V4SFmode and 4x 0.0
> constants in it rather than a VECTOR_CST to the middle-end folder, which
> uses real_zerop and thus doesn't recognize the CONSTRUCTOR in
> VEC_COND_EXPR  as zero and doesn't fold
> it into ABS_EXPR.
>
> We really should move that folding into match.pd, but that is a GCC 9 task.
>
> Fixed by using fold on vector CONSTRUCTORs, the only thing fold does on
> those is exactly the CONSTRUCTOR -> VECTOR_CST folding when possible.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2018-03-27  Jakub Jelinek  
>
> PR c++/85077
> * cp-gimplify.c (cp_fold) : For ctors with vector
> type call fold to generate VECTOR_CSTs when possible.
>
> * g++.dg/ext/vector35.C: New test.
>
> --- gcc/cp/cp-gimplify.c.jj 2018-03-20 22:05:57.023431462 +0100
> +++ gcc/cp/cp-gimplify.c2018-03-26 16:08:47.728347579 +0200
> @@ -2504,6 +2504,8 @@ cp_fold (tree x)
> CONSTRUCTOR_PLACEHOLDER_BOUNDARY (x)
>   = CONSTRUCTOR_PLACEHOLDER_BOUNDARY (org_x);
>   }
> +   if (VECTOR_TYPE_P (TREE_TYPE (x)))
> + x = fold (x);
> break;
>}
>  case TREE_VEC:
> --- gcc/testsuite/g++.dg/ext/vector35.C.jj  2018-03-26 16:19:39.330809031 
> +0200
> +++ gcc/testsuite/g++.dg/ext/vector35.C 2018-03-26 16:33:43.997330748 +0200
> @@ -0,0 +1,22 @@
> +// PR c++/85077
> +// { dg-do compile }
> +// { dg-options "-Ofast -fdump-tree-forwprop1" }
> +
> +typedef float V __attribute__((vector_size (4 * sizeof (float;
> +typedef double W __attribute__((vector_size (2 * sizeof (double;
> +
> +void
> +foo (V *y)
> +{
> +  V x = *y;
> +  *y = x < 0 ? -x : x;
> +}
> +
> +void
> +bar (W *y)
> +{
> +  W x = *y;
> +  *y = x < 0 ? -x : x;
> +}
> +
> +// { dg-final { scan-tree-dump-times "ABS_EXPR <" 2 "forwprop1" } }
>
> Jakub


Re: [documentation patch] add detail to const and pure attributes

2018-03-27 Thread Pedro Alves
On 03/27/2018 07:18 PM, Martin Sebor wrote:
> +Because a @code{pure} function can have no side-effects it does not

FWIW, I'd suggest rephrasing as:

 Because a @code{pure} function cannot have side effects

because "can have no side-effects" can be read as
"is allowed to have no side effects", which gave me pause
when I read it the first time, and is the opposite of
what you mean.

Thanks,
Pedro Alves


Re: [RFC PATCH for 9] rs6000: Ordered comparisons (PR56864)

2018-03-27 Thread Segher Boessenkool
Hi again,

On Tue, Mar 27, 2018 at 07:59:30PM +0200, Uros Bizjak wrote:
> > (the two compares were combined, by fwprop1) but without the flag we get
> >
> > fcmpo 5,1,2
> > li 3,-1
> > bltlr 5
> > mfcr 3,4
> > rlwinm 3,3,22,1
> > fcmpo 7,1,2
> > blr
> >
> > (it's still combined, but the redundant compare isn't deleted).
> 
> Yes, I think this case will be fixed by wrapping the compare inside UNSPEC.

I don't think that will work, but I haven't tried it yet.  Will do.

> > Are ordered compares faster than unordered on x86?  Strange stuff.
> 
> Not faster, but on x87 unordered compares operate only with registers,
> while some (legacy) ordered can also use memory operands.

Ah ok.  Nasty.  Maybe you should only do ordered by default for the legacy
compares then?


Segher


[patch, fortran, committed] Fix error caused by running front-end optimizatins after error

2018-03-27 Thread Thomas König

Hello world,

I have just committed r258900 as obvious on trunk to fix an
out-of-memory error in front-end optimization, which was
caused by gfortran's AST being in an inconsistent state.

I suspect that this will also fix other, as yet unknown
errors.

I will backport to the other affected branches, gcc-7 and gcc-6,
over the next few days.

Regards

Thomas

2018-03-27  Thomas Koenig  

PR fortran/85084
* frontend-passes.c (gfc_run_passes): Do not run front-end
optimizations if a previous error occurred.

2018-03-27  Thomas Koenig  

PR fortran/85084
* gfortran.dg/matmul_rank_1.f90: New test.
ig25@flaemmli:~/Krempel/MMICE>
! { dg-do compile }
! { dg-additional-options "-ffrontend-optimize" }
! PR 85044 - used to die on allocating a negative amount of memory.
! Test case by Gerhard Steinmetz.
program p
   real :: a(3,3) = 1.0
   real :: b(33)
   b = matmul(a, a) ! { dg-error "Incompatible ranks" }
end
Index: frontend-passes.c
===
--- frontend-passes.c	(revision 258845)
+++ frontend-passes.c	(working copy)
@@ -156,6 +156,10 @@ gfc_run_passes (gfc_namespace *ns)
   check_locus (ns);
 #endif
 
+  gfc_get_errors (&w, &e);
+  if (e > 0)
+return;
+
   if (flag_frontend_optimize || flag_frontend_loop_interchange)
 optimize_namespace (ns);
 
@@ -168,10 +172,6 @@ gfc_run_passes (gfc_namespace *ns)
   expr_array.release ();
 }
 
-  gfc_get_errors (&w, &e);
-  if (e > 0)
-   return;
-
   if (flag_realloc_lhs)
 realloc_strings (ns);
 }


[documentation patch] add detail to const and pure attributes

2018-03-27 Thread Martin Sebor

From some feedback I received on some of the attribute warnings
new in GCC 8 it seems that the manual could stand to be clarified
to explain why it makes no sense for a function declared with
attribute const (and pure) to return void.  The attached patch
adds a bit more text to make it clear.

In addition, I took the opportunity to clarify how attributes
on multiple declarations of the same function are treated.

Martin

gcc/ChangeLog:

	* doc/extend.texi (Common Function Attributes): Clarify.
	(const attribute): Likewise.
	(pure attribute): Likewise.

Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi	(revision 258899)
+++ gcc/doc/extend.texi	(working copy)
@@ -2275,8 +2275,11 @@ on a declaration, followed by an attribute specifi
 parentheses.  You can specify multiple attributes in a declaration by
 separating them by commas within the double parentheses or by
 immediately following an attribute declaration with another attribute
-declaration.  @xref{Attribute Syntax}, for the exact rules on
-attribute syntax and placement.
+declaration.  @xref{Attribute Syntax}, for the exact rules on attribute
+syntax and placement.  Compatible attribute specifications on distinct
+declarations of the same function are merged.  An attribute specification
+that is not compatible with attributes already applied to a declaration
+of the same function is ignored with a warning.
 
 GCC also supports attributes on
 variable declarations (@pxref{Variable Attributes}),
@@ -2499,7 +2502,7 @@ themselves to optimization such as common subexpre
 The @code{const} attribute imposes greater restrictions on a function's
 definition than the similar @code{pure} attribute below because it prohibits
 the function from reading global variables.  Consequently, the presence of
-the attribute on a function declarations allows GCC to emit more efficient
+the attribute on a function declaration allows GCC to emit more efficient
 code for some calls to the function.  Decorating the same function with
 both the @code{const} and the @code{pure} attribute is diagnosed.
 
@@ -2507,8 +2510,9 @@ both the @code{const} and the @code{pure} attribut
 Note that a function that has pointer arguments and examines the data
 pointed to must @emph{not} be declared @code{const}.  Likewise, a
 function that calls a non-@code{const} function usually must not be
-@code{const}.  It does not make sense for a @code{const} function to
-return @code{void}.
+@code{const}.  Because a @code{const} function can have no side-effects
+it does not make sense for such a function to return @code{void}.
+Declaring such functions is diagnosed.
 
 @item constructor
 @itemx destructor
@@ -3218,6 +3222,9 @@ The @code{pure} attribute imposes similar but loos
 a function's defintion than the @code{const} attribute: it allows the
 function to read global variables.  Decorating the same function with
 both the @code{pure} and the @code{const} attribute is diagnosed.
+Because a @code{pure} function can have no side-effects it does not
+make sense for such a function to return @code{void}.  Declaring such
+functions is diagnosed.
 
 @item returns_nonnull
 @cindex @code{returns_nonnull} function attribute


Re: [RFC PATCH for 9] rs6000: Ordered comparisons (PR56864)

2018-03-27 Thread Uros Bizjak
On Tue, Mar 27, 2018 at 7:20 PM, Segher Boessenkool
 wrote:
> Hi!
>
> On Tue, Mar 27, 2018 at 09:30:35AM +0200, Uros Bizjak wrote:
>> +(define_insn "*cmpdd_cmpo"
>> +  [(set (match_operand:CCFP 0 "cc_reg_operand" "=y")
>> + (compare:CCFP (match_operand:DD 1 "gpc_reg_operand" "d")
>> +  (match_operand:DD 2 "gpc_reg_operand" "d")))
>> +   (unspec [(match_dup 1) (match_dup 2)] UNSPEC_CMPO)]
>> +  "TARGET_DFP"
>> +  "dcmpo %0,%1,%2"
>> +  [(set_attr "type" "dfp")])
>>
>> I have had some problems when adding UNSPEC tags as a parallel to a
>> compare for x86. For the testcase:
>>
>> int testo (double a, double b)
>> {
>>   return a == b;
>> }
>>
>> middle end code emits sequence like:
>
> [ snip ]
>
>> and postreload pass removes (insn 10). This was not the case when the
>> compare was implemented with a parallel.
>
> For us this works fine:
>
> fcmpu 7,1,2
> mfcr 3,1
> rlwinm 3,3,31,1
> blr
>
> (eq is not expanded as an ordered compare, only lt gt le ge are, not the
> other twelve).
>
> But say
>
> int testo (double a, double b)
> {
>   if (a < b) return -1;
>   if (a > b) return 1;
>   return 0;
> }
>
> gives with -ffast-math
>
> fcmpu 7,1,2
> li 3,-1
> bltlr 7
> mfcr 3,1
> rlwinm 3,3,30,1
> blr
>
> (the two compares were combined, by fwprop1) but without the flag we get
>
> fcmpo 5,1,2
> li 3,-1
> bltlr 5
> mfcr 3,4
> rlwinm 3,3,22,1
> fcmpo 7,1,2
> blr
>
> (it's still combined, but the redundant compare isn't deleted).

Yes, I think this case will be fixed by wrapping the compare inside UNSPEC.

>> Also, -ffast-math on x86 emits trapping compares for all cases. For
>> that reason, unordered (non-trapping) compares were wrapped in an
>> unspec, with the expectation that -ffast-math can perform some more
>> optimizations with patterns using naked compare RTX without unspec.
>
> My patch expands with:
>
> + if (SCALAR_FLOAT_MODE_P (mode) && HONOR_NANS (mode)
> + && (code == LT || code == GT || code == LE || code == GE))
> +   {
> + rtx unspec = gen_rtx_UNSPEC (VOIDmode, gen_rtvec (2, op0, op1),
> +  UNSPEC_CMPO);
> + compare = gen_rtx_PARALLEL (VOIDmode,
> + gen_rtvec (2, compare, unspec));
> +   }
>
> so we use only unordered compares with -ffast-math (exactly as before
> the patch, in all cases).
>
> It would be ideal if there were two separate compare codes in RTL, or
> some other way to flag it.  Or something that deletes unused ordered
> compares (if they are expressed as a parallel with an unspec).
>
> Are ordered compares faster than unordered on x86?  Strange stuff.

Not faster, but on x87 unordered compares operate only with registers,
while some (legacy) ordered can also use memory operands.

Uros.


Re: [PATCH, fortran] PR85083 - [8 Regression] ICE in gfc_convert_to_structure_constructor, at fortran/primary.c:2915

2018-03-27 Thread Thomas Koenig

Hello Harald,


The attached obvious one-liner adds a missing check for type
compatibility in a structure constructor.

Testcase from report.  Changelogs below.

Regtested on i686-pc-linux-gnu.

Whoever reviews this, please feel free to commit.



Reviewed and dommitted as r258899.  I have changed the test case name to
structure_constructor_15.f90 because this sort of name allows
running tests on a meaningful subset of interest, for example with
make check-fortran RUNTESTFLAGS="dg.exp=gfortran.dg/*constructor*".

Thanks a lot for the patch!

Regards

Thomas


Re: [RFC PATCH for 9] rs6000: Ordered comparisons (PR56864)

2018-03-27 Thread Segher Boessenkool
Hi!

On Tue, Mar 27, 2018 at 09:30:35AM +0200, Uros Bizjak wrote:
> +(define_insn "*cmpdd_cmpo"
> +  [(set (match_operand:CCFP 0 "cc_reg_operand" "=y")
> + (compare:CCFP (match_operand:DD 1 "gpc_reg_operand" "d")
> +  (match_operand:DD 2 "gpc_reg_operand" "d")))
> +   (unspec [(match_dup 1) (match_dup 2)] UNSPEC_CMPO)]
> +  "TARGET_DFP"
> +  "dcmpo %0,%1,%2"
> +  [(set_attr "type" "dfp")])
> 
> I have had some problems when adding UNSPEC tags as a parallel to a
> compare for x86. For the testcase:
> 
> int testo (double a, double b)
> {
>   return a == b;
> }
> 
> middle end code emits sequence like:

[ snip ]

> and postreload pass removes (insn 10). This was not the case when the
> compare was implemented with a parallel.

For us this works fine:

fcmpu 7,1,2
mfcr 3,1
rlwinm 3,3,31,1
blr

(eq is not expanded as an ordered compare, only lt gt le ge are, not the
other twelve).

But say

int testo (double a, double b)
{
  if (a < b) return -1;
  if (a > b) return 1;
  return 0;
}

gives with -ffast-math

fcmpu 7,1,2
li 3,-1
bltlr 7
mfcr 3,1
rlwinm 3,3,30,1
blr

(the two compares were combined, by fwprop1) but without the flag we get

fcmpo 5,1,2
li 3,-1
bltlr 5
mfcr 3,4
rlwinm 3,3,22,1
fcmpo 7,1,2
blr

(it's still combined, but the redundant compare isn't deleted).

> Also, -ffast-math on x86 emits trapping compares for all cases. For
> that reason, unordered (non-trapping) compares were wrapped in an
> unspec, with the expectation that -ffast-math can perform some more
> optimizations with patterns using naked compare RTX without unspec.

My patch expands with:

+ if (SCALAR_FLOAT_MODE_P (mode) && HONOR_NANS (mode)
+ && (code == LT || code == GT || code == LE || code == GE))
+   {
+ rtx unspec = gen_rtx_UNSPEC (VOIDmode, gen_rtvec (2, op0, op1),
+  UNSPEC_CMPO);
+ compare = gen_rtx_PARALLEL (VOIDmode,
+ gen_rtvec (2, compare, unspec));
+   }

so we use only unordered compares with -ffast-math (exactly as before
the patch, in all cases).

It would be ideal if there were two separate compare codes in RTL, or
some other way to flag it.  Or something that deletes unused ordered
compares (if they are expressed as a parallel with an unspec).

Are ordered compares faster than unordered on x86?  Strange stuff.


Segher


Re: [PATCH] i386: Insert ENDBR to trampoline for -fcf-protection=branch -mibt

2018-03-27 Thread H.J. Lu
On Tue, Mar 27, 2018 at 10:08 AM, Uros Bizjak  wrote:
> On Mon, Mar 26, 2018 at 10:42 PM, Tsimbalist, Igor V
>  wrote:
>>> -Original Message-
>>> From: H.J. Lu [mailto:hjl.to...@gmail.com]
>>> Sent: Monday, March 26, 2018 5:59 PM
>>> To: Tsimbalist, Igor V 
>>> Cc: gcc-patches@gcc.gnu.org; Uros Bizjak 
>>> Subject: Re: [PATCH] i386: Insert ENDBR to trampoline for -fcf-
>>> protection=branch -mibt
>>>
>>> On Mon, Mar 26, 2018 at 8:23 AM, Tsimbalist, Igor V
>>>  wrote:
>>> >> -Original Message-
>>> >> From: Lu, Hongjiu
>>> >> Sent: Sunday, March 25, 2018 12:50 AM
>>> >> To: gcc-patches@gcc.gnu.org; Uros Bizjak ;
>>> Tsimbalist,
>>> >> Igor V 
>>> >> Subject: [PATCH] i386: Insert ENDBR to trampoline for -fcf-
>>> >> protection=branch -mibt
>>> >>
>>> >> When -fcf-protection=branch -mibt are used, we need to insert ENDBR
>>> >> to trampoline.  TRAMPOLINE_SIZE is creased by 4 bytes to accommodate
>>> >> 4-byte ENDBR instruction.
>>> >>
>>> >> OK for trunk?
>>> >
>>> > Regarding the test. Is it possible to check what is generated in a
>>> trampoline? In particular, that endbr is generated.
>>> >
>>>
>>> I think run-time test is sufficient.
>>
>> Ok then.
>
> Rubber-stamp OK.
>

Done.

Thanks.

-- 
H.J.


[committed] xfail assertion in c-c++-common/Warray-bounds-3.c (PR 83462)

2018-03-27 Thread Martin Sebor

An assertion in the test fails on a number of targets due to
a missing strlen optimization.  I plan to add the optimization
for GCC 9 but it's too late to add it now.  To prevent the
failure r258896 disables and xfails the assertion for targets
other than x86.  Attached is the change for reference.

I tested the change with a native x86_64-linux compiler and
with a powerpc64le-linux cross-compiler.

Martin

Index: gcc/testsuite/c-c++-common/Warray-bounds-4.c
===
--- gcc/testsuite/c-c++-common/Warray-bounds-4.c	(revision 258895)
+++ gcc/testsuite/c-c++-common/Warray-bounds-4.c	(working copy)
@@ -64,5 +64,10 @@ void test_strcpy_bounds_memarray_range (void)
   TM ("01", "",ma.a5 + i, ma.a5);
   TM ("012", "",   ma.a5 + i, ma.a5);
   TM ("0123", "",  ma.a5 + i, ma.a5); /* { dg-warning "offset 6 from the object at .ma. is out of the bounds of referenced subobject .a5. with type .char\\\[5]. at offset 0" "strcpy" { xfail *-*-* } } */
-  TM ("", "012345", ma.a7 + i, ma.a7);/* { dg-warning "offset 13 from the object at .ma. is out of the bounds of referenced subobject .\(MA::\)?a7. with type .char ?\\\[7]. at offset 5" "strcpy" { xfail sparc*-*-* visium-*-* } } */
+
+#if __i386__ || __x86_64__
+  /* Disabled for non-x86 targets due to bug 83462.  */
+  TM ("", "012345", ma.a7 + i, ma.a7);/* { dg-warning "offset 13 from the object at .ma. is out of the bounds of referenced subobject .\(MA::\)?a7. with type .char ?\\\[7]. at offset 5" "strcpy" { xfail { ! { i?86-*-* x86_64-*-* } } } } */
+#endif
+
 }


Re: [PATCH] i386: Insert ENDBR to trampoline for -fcf-protection=branch -mibt

2018-03-27 Thread Uros Bizjak
On Mon, Mar 26, 2018 at 10:42 PM, Tsimbalist, Igor V
 wrote:
>> -Original Message-
>> From: H.J. Lu [mailto:hjl.to...@gmail.com]
>> Sent: Monday, March 26, 2018 5:59 PM
>> To: Tsimbalist, Igor V 
>> Cc: gcc-patches@gcc.gnu.org; Uros Bizjak 
>> Subject: Re: [PATCH] i386: Insert ENDBR to trampoline for -fcf-
>> protection=branch -mibt
>>
>> On Mon, Mar 26, 2018 at 8:23 AM, Tsimbalist, Igor V
>>  wrote:
>> >> -Original Message-
>> >> From: Lu, Hongjiu
>> >> Sent: Sunday, March 25, 2018 12:50 AM
>> >> To: gcc-patches@gcc.gnu.org; Uros Bizjak ;
>> Tsimbalist,
>> >> Igor V 
>> >> Subject: [PATCH] i386: Insert ENDBR to trampoline for -fcf-
>> >> protection=branch -mibt
>> >>
>> >> When -fcf-protection=branch -mibt are used, we need to insert ENDBR
>> >> to trampoline.  TRAMPOLINE_SIZE is creased by 4 bytes to accommodate
>> >> 4-byte ENDBR instruction.
>> >>
>> >> OK for trunk?
>> >
>> > Regarding the test. Is it possible to check what is generated in a
>> trampoline? In particular, that endbr is generated.
>> >
>>
>> I think run-time test is sufficient.
>
> Ok then.

Rubber-stamp OK.

Uros.


Re: [PATCH][AArch64] XFAIL gcc.target/aarch64/store_v2vec_lanes.c for ILP32

2018-03-27 Thread James Greenhalgh
On Tue, Mar 27, 2018 at 02:39:12PM +0100, Kyrill Tkachov wrote:
> Hi all,
> 
> The test in question fails for ilp32. The initial analysis I did in the PR 
> for it
> is that for ILP32 we generate somewhat different address forms that we'd need 
> to adjust aarch64_classify_address to catch.
> Given the optimisation this test checks for was added for GCC 8 it is not a 
> regression, and improving the codegen on ILP32
> would be an enhancement rather than a fix. So Richi has asked for it to be 
> marked as XFAIL on ILP32, which is what this
> patch does.
> Checked that the test still passes on LP64 and appears as XFAIL on 
> -mabi=ilp32.
> 
> Ok for trunk?

This would count under the obvious rule.

OK.

Thanks,
James

> Thanks,
> Kyrill
> 
> 2018-03-27  Kyrylo Tkachov  
> 
>  PR target/83009
>  * gcc.target/aarch64/store_v2vec_lanes.c: XFAIL for ilp32.

> commit 39e1ef03918b1911cedae37552cbaf1185420aa2
> Author: Kyrylo Tkachov 
> Date:   Tue Mar 27 10:45:04 2018 +0100
> 
> [AArch64] XFAIL gcc.target/aarch64/store_v2vec_lanes.c for ILP32
> 
> diff --git a/gcc/testsuite/gcc.target/aarch64/store_v2vec_lanes.c 
> b/gcc/testsuite/gcc.target/aarch64/store_v2vec_lanes.c
> index 6810db3..990aea3 100644
> --- a/gcc/testsuite/gcc.target/aarch64/store_v2vec_lanes.c
> +++ b/gcc/testsuite/gcc.target/aarch64/store_v2vec_lanes.c
> @@ -26,6 +26,6 @@ construct_lane_2 (long long *y, v2di *z)
> values from consecutive memory into a 2-element vector by using
> a Q-reg LDR.  */
>  
> -/* { dg-final { scan-assembler-times "stp\td\[0-9\]+, d\[0-9\]+" 1 } } */
> -/* { dg-final { scan-assembler-times "stp\tx\[0-9\]+, x\[0-9\]+" 1 } } */
> -/* { dg-final { scan-assembler-not "ins\t" } } */
> +/* { dg-final { scan-assembler-times "stp\td\[0-9\]+, d\[0-9\]+" 1 { xfail 
> ilp32 } } } */
> +/* { dg-final { scan-assembler-times "stp\tx\[0-9\]+, x\[0-9\]+" 1 { xfail 
> ilp32 } } } */
> +/* { dg-final { scan-assembler-not "ins\t" { xfail ilp32 } } } */



Re: [PATCH][ARM][PR82989] Fix unexpected use of NEON instructions for shifts

2018-03-27 Thread Sudakshina Das

On 21/03/18 11:40, Sudakshina Das wrote:

Hi

On 21/03/18 08:51, Christophe Lyon wrote:

On 20 March 2018 at 11:58, Sudakshina Das  wrote:

Hi

On 20/03/18 10:03, Richard Earnshaw (lists) wrote:


On 14/03/18 10:11, Sudakshina Das wrote:


Hi

This patch fixes PR82989 so that we avoid NEON instructions when
-mneon-for-64bits is not enabled. This is more of a short term fix for
the real deeper problem of making and early decision of choosing or
rejecting NEON instructions. There is now a new ticket PR84467 to deal
with the longer term solution.
(Please refer to the discussion in the bug report for more details).

Testing: Bootstrapped and regtested on arm-none-linux-gnueabihf and
added a new test case based on the test given on the bug report.

Ok for trunk and backports for gcc-7 and gcc-6 branches?



OK for trunk.  Please leave it a couple of days before backporting to
ensure that the testcase doesn't tickle any multilib issues.

R.



Thanks. Committed to trunk as r258677. Will wait a week for backporting.


Backported both the commits of trunks to gcc-7 as r258883 and to gcc-6 
as r258884 (Reg-tested for both)


Thanks
Sudi



Sudi



Hi Sudi,

I've noticed that:
FAIL:    gcc.target/arm/pr82989.c scan-assembler-times lsl\\tr[0-9]+,
r[0-9]+, r[0-9] 2
FAIL:    gcc.target/arm/pr82989.c scan-assembler-times lsr\\tr[0-9]+,
r[0-9]+, r[0-9] 2
on target armeb-none-linux-gnueabihf
--with-mode thumb
--with-cpu cortex-a9
--with-fpu neon-fp16

The tests pass when using --with-mode arm

Can you check?


Yes I see this as well. Sorry about this. I am testing a quick fix for 
this at the moment.


Thanks
Sudi



Thanks

Christophe






Sudi


*** gcc/ChangeLog ***

2018-03-14  Sudakshina Das  

  * config/arm/neon.md (ashldi3_neon): Update ?s for constraints
  to favor GPR over NEON registers.
  (di3_neon): Likewise.

*** gcc/testsuite/ChangeLog ***

2018-03-14  Sudakshina Das  

  * gcc.target/arm/pr82989.c: New test.

pr82989.diff


diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 6a6f5d7..1646b21 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1180,12 +1180,12 @@
   )
 (define_insn_and_split "ashldi3_neon"
-  [(set (match_operand:DI 0 "s_register_operand"   "= w,
w,?&r,?r,?&r, ?w,w")
-   (ashift:DI (match_operand:DI 1 "s_register_operand" " 0w, 
w, 0r,

0,  r, 0w,w")
-  (match_operand:SI 2 "general_operand"    "rUm, 
i,  r,

i,  i,rUm,i")))
-   (clobber (match_scratch:SI 3
"= X,

X,?&r, X,  X,  X,X"))
-   (clobber (match_scratch:SI 4
"= X,

X,?&r, X,  X,  X,X"))
-   (clobber (match_scratch:DI 5
"=&w,

X,  X, X,  X, &w,X"))
+  [(set (match_operand:DI 0 "s_register_operand"   "= w, 
w, &r,

r, &r, ?w,?w")
+   (ashift:DI (match_operand:DI 1 "s_register_operand" " 0w, 
w, 0r,

0,  r, 0w, w")
+  (match_operand:SI 2 "general_operand"    "rUm, 
i,  r,

i,  i,rUm, i")))
+   (clobber (match_scratch:SI 3
"= X,

X, &r, X,  X,  X, X"))
+   (clobber (match_scratch:SI 4
"= X,

X, &r, X,  X,  X, X"))
+   (clobber (match_scratch:DI 5
"=&w,

X,  X, X,  X, &w, X"))
  (clobber (reg:CC_C CC_REGNUM))]
 "TARGET_NEON"
 "#"
@@ -1276,7 +1276,7 @@
   ;; ashrdi3_neon
   ;; lshrdi3_neon
   (define_insn_and_split "di3_neon"
-  [(set (match_operand:DI 0 "s_register_operand"    "= w,
w,?&r,?r,?&r,?w,?w")
+  [(set (match_operand:DI 0 "s_register_operand"    "= w, 
w, &r,

r, &r,?w,?w")
 (RSHIFTS:DI (match_operand:DI 1 "s_register_operand" " 0w, 
w, 0r,

0,  r,0w, w")
 (match_operand:SI 2 "reg_or_int_operand" "  r, 
i,  r,

i,  i, r, i")))
  (clobber (match_scratch:SI 3
"=2r, X, &r, X,  X,2r, X"))
diff --git a/gcc/testsuite/gcc.target/arm/pr82989.c
b/gcc/testsuite/gcc.target/arm/pr82989.c
new file mode 100644
index 000..1295ee6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr82989.c
@@ -0,0 +1,38 @@
+/* PR target/82989 */
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-skip-if "avoid conflicts with multilib options" { *-*-* } {
"-mcpu=*" } { "-mcpu=cortex-a8" } } */
+/* { dg-skip-if "avoid conflicts with multilib options" { *-*-* } {
"-mfpu=*" } { "-mfpu=neon" } } */
+/* { dg-skip-if "avoid conflicts with multilib options" { *-*-* } {
"-mfloat-abi=*" } { "-mfloat-abi=hard" } } */
+/* { dg-options "-O2 -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=hard" 
} */

+/* { dg-add-options arm_neon } */
+
+typedef unsigned long long uint64_t;
+
+void f_shr_imm (uint64_t *a )
+{
+  *a += *a >> 32;
+}
+/* { dg-final { scan-assembler-not "vshr*" } } */
+
+void f_shr_reg (uint64_t *a, uint64_t b)
+{
+  *a += *a >> b;
+}
+/* { dg-final { scan-assembler-not "vshl*" } } */
+/* Only 2 times for f_shr_reg. f_shr_imm should not have any.  */
+/

RE: [PATCH][i386,AVX] Fix PR84783 - backport missing permutexvar to GCC7

2018-03-27 Thread Peryt, Sebastian
Hi Jakub,

Gentle ping.

Thanks,
Sebastian

> -Original Message-
> From: Kirill Yukhin [mailto:kirill.yuk...@gmail.com]
> Sent: Friday, March 23, 2018 6:49 AM
> To: ja...@redhat.com; Peryt, Sebastian 
> Cc: 'gcc-patches@gcc.gnu.org' 
> Subject: Re: [PATCH][i386,AVX] Fix PR84783 - backport missing permutexvar to
> GCC7
> 
> Hello Sebastian!
> 
> On 22 мар 13:01, Peryt, Sebastian wrote:
> > Hi,
> >
> > This patch adds missing permutexvar intrinsics for backporting to GCC 7 to
> resolve PR84783.
> >
> > 2018-03-22  Sebastian Peryt  
> >
> > gcc:
> > PR84783
> > * config/i386/avx512vlintrin.h (_mm256_permutexvar_epi64)
> > (_mm256_permutexvar_epi32, _mm256_permutex_epi64): New
> intrinsics.
> >
> > gcc/testsuite:
> > PR84783
> >
> > * gcc.target/i386/avx512vl-vpermd-1.c (_mm256_permutexvar_epi32):
> > Test new intrinsic.
> > * gcc.target/i386/avx512vl-vpermq-imm-1.c
> (_mm256_permutex_epi64):
> > Ditto.
> > * gcc.target/i386/avx512vl-vpermq-var-1.c
> (_mm256_permutexvar_epi64):
> > Ditto.
> > * gcc.target/i386/avx512f-vpermd-2.c: Do not check for AVX512F_LEN.
> > * gcc.target/i386/avx512f-vpermq-imm-2.c: Ditto.
> > * gcc.target/i386/avx512f-vpermq-var-2.c: Ditto.
> >
> > Is it ok for merge?
> Your patch is pretty much simple and is OK to me.
> 
> However, since you're aiming to GCC 7, I'd like to here GM's OK here as well.
> 
> --
> Thanks, K
> 
> >
> > Thanks,
> > Sebastian
> 



Release-manager approval for gcc-8? (was: Re: [PATCH 0/4] ASAN for MIPS (o32))

2018-03-27 Thread Hans-Peter Nilsson
I'm bringing this to the direct attention of the
release-maintainers, asking for approval for gcc-8.
(If this is in your queue already, then sorry for nagging, but
IIUC you both filter gcc-patches@ traffic heavily.)
All patches are to MIPS-specific code.

libsanitizer:
Add __sanitizer.lock.pad initializer, shutting up a warning:
 
Correct struct_kernel_stat_sz for MIPS and don't use .:
 
Enable libsanitizer for 32-bit mips*-*-linux*:
 
Add gcc port bits for MIPS to support -fsanitize=address:
 

> From: Matthew Fortune 
> Date: Fri, 23 Mar 2018 16:19:17 +

> Hans-Peter Nilsson  writes:
> > All patches are together run through the gcc and g++ test-suites
> > to check ASAN results for mipsisa32r2el-linux-gnu.  As of
> > r258635 those results are on par with those for
> > arm-linux-gnueabihf, so without delay I present the current
> > state.  Maybe it's non-intrusive enough to be ok for gcc-8?
> > MIPS maintainers (and interested party) CC:ed.
> 
> >From a MIPS backend perspective all 4 patches are OK. I've done very
> little to support upstream MIPS over this release so these
> improvements are fantastic. I don't know the detail of asan support
> so am going with the idea that your investigation has got to the
> bottom of the problems; thanks for the detailed explanations.
> 
> I'm not sure I should really approve this for GCC-8 but rather ask
> a global maintainer or Jakub/Richard as release managers given I
> can't commit to do much to support the release and I won't want to
> risk burdening others with a late change.
> 
> > For use with -fsanitize=address, you'll need a non-ancient glibc
> > or equivalent (2002-ish), one that iterates on ELF headers for
> > the EH info at exception time (rather, doesn't call
> > __register_frame_info or __register_frame_info_bases at startup,
> > ending up calling malloc/free) or else Asan will try to
> > instrument the call to free and hang on a lock for eternity (or
> > dies on a signal).  But that's no different than for other
> > ports, AFAIU.
> > 
> > So, ok to commit?
> 
> As above, if a global maintainer is happy then yes.
> 
> Matthew
> 
> > 
> > brgds, H-P
> 


Re: [PATCH] Simplify vec_merge of vec_duplicate with const_vector

2018-03-27 Thread H.J. Lu
On Tue, Jun 6, 2017 at 1:25 AM, Kyrill Tkachov
 wrote:
> Hi all,
>
> I'm trying to improve some of the RTL-level handling of vector lane
> operations on aarch64 and that
> involves dealing with a lot of vec_merge operations. One simplification that
> I noticed missing
> from simplify-rtx are combinations of vec_merge with vec_duplicate.
> In this particular case:
> (vec_merge (vec_duplicate (X)) (const_vector [A, B]) (const_int N))
>
> which can be replaced with
>
> (vec_concat (X) (B)) if N == 1 (0b01) or
> (vec_concat (A) (X)) if N == 2 (0b10).
>
> For the aarch64 testcase in this patch this simplifications allows us to try
> to combine:
> (set (reg:V2DI 77 [ x ])
> (vec_concat:V2DI (mem:DI (reg:DI 0 x0 [ y ]) [1 *y_3(D)+0 S8 A64])
> (const_int 0 [0])))
>
> instead of the more complex:
> (set (reg:V2DI 77 [ x ])
> (vec_merge:V2DI (vec_duplicate:V2DI (mem:DI (reg:DI 0 x0 [ y ]) [1
> *y_3(D)+0 S8 A64]))
> (const_vector:V2DI [
> (const_int 0 [0])
> (const_int 0 [0])
> ])
> (const_int 1 [0x1])))
>
>
> For the simplified form above we already have an aarch64 pattern:
> *aarch64_combinez which
> is missing a DI/DFmode version due to an oversight, so this patch extends
> that pattern as well to
> use the VDC mode iterator that includes DI and DFmode (as well as V2HF which
> VD_BHSI was missing).
> The aarch64 hunk is needed to see the benefit of the simplify-rtx.c hunk, so
> I didn't split them
> into separate patches.
>
> Before this for the testcase we'd generate:
> construct_lanedi:
> moviv0.4s, 0
> ldr x0, [x0]
> ins v0.d[0], x0
> ret
>
> construct_lanedf:
> moviv0.2d, 0
> ldr d1, [x0]
> ins v0.d[0], v1.d[0]
> ret
>
> but now we can generate:
> construct_lanedi:
> ldr d0, [x0]
> ret
>
> construct_lanedf:
> ldr d0, [x0]
> ret
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
>
> Ok for trunk?
>
> Thanks,
> Kyrill
>
> 2017-06-06  Kyrylo Tkachov  
>
> * simplify-rtx.c (simplify_ternary_operation, VEC_MERGE):
> Simplify vec_merge of vec_duplicate and const_vector.
> * config/aarch64/predicates.md (aarch64_simd_or_scalar_imm_zero):
> New predicate.
> * config/aarch64/aarch64-simd.md (*aarch64_combinez): Use VDC
> mode iterator.  Update predicate on operand 1 to
> handle non-const_vec constants.  Delete constraints.
> (*aarch64_combinez_be): Likewise for operand 2.
>
> 2017-06-06  Kyrylo Tkachov  
>
> * gcc.target/aarch64/construct_lane_zero_1.c: New test.

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85090

-- 
H.J.


[Patch ARM] Fix PR81863.

2018-03-27 Thread Ramana Radhakrishnan
This has been in my patch stack for quite some time. The problem here
was that we weren't handling arm_word_relocations in
arm_valid_symbolic_address and is the surest fix for this
for GCC8 and GCC7.

Regression tested on arm-none-linux-gnueabihf . Applying to
trunk and backporting to GCC-7 in a day or so.

regards
Ramana

* config/arm/arm.c (arm_valid_symbolic_address): Handle
arm_word_relocations

gcc/testsuite

* gcc.target/arm/pr81863.c: New test.
commit 22e3c20b7e6b5027f07b71ca31c9f65e66537b0b
Author: Ramana Radhakrishnan 
Date:   Tue Mar 13 10:54:04 2018 +

[Patch ARM] Fix PR81863.

This has been in my patch stack for quite some time. The problem here
was that we weren't handling arm_word_relocations in
arm_valid_symbolic_address and is the surest fix for this
for GCC8 and GCC7.

Regression tested on arm-none-linux-gnueabihf . Applying to
trunk and GCC-7

* config/arm/arm.c (arm_valid_symbolic_address): Handle arm_word_relocations

gcc/testsuite

* gcc.target/arm/pr81863.c: New test.

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 90d62e6..09795d3 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -29773,6 +29773,9 @@ arm_valid_symbolic_address_p (rtx addr)
   rtx xop0, xop1 = NULL_RTX;
   rtx tmp = addr;
 
+  if (target_word_relocations)
+return false;
+
   if (GET_CODE (tmp) == SYMBOL_REF || GET_CODE (tmp) == LABEL_REF)
 return true;
 
diff --git a/gcc/testsuite/gcc.target/arm/pr81863.c 
b/gcc/testsuite/gcc.target/arm/pr81863.c
new file mode 100644
index 000..63b1ed6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr81863.c
@@ -0,0 +1,44 @@
+/* testsuite/gcc.target/arm/pr48183.c */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mword-relocations -march=armv7-a -marm" } */
+/* { dg-final { scan-assembler-not "\[\\t \]+movw" } } */
+
+int a, d, f;
+long b;
+struct ww_class {
+  int stamp;
+} c;
+struct stress {
+  int locks;
+  int nlocks;
+};
+void *e;
+int atomic_add_return_relaxed(int *p1) {
+  __builtin_prefetch(p1);
+  return a;
+}
+void atomic_long_inc_return_relaxed(int *p1) {
+  int *v = p1;
+  atomic_add_return_relaxed(v);
+}
+void ww_acquire_init(struct ww_class *p1) {
+  atomic_long_inc_return_relaxed(&p1->stamp);
+}
+void ww_mutex_lock();
+int *get_random_order();
+void stress_inorder_work() {
+  struct stress *g = e;
+  int h = g->nlocks;
+  int *i = &g->locks, *j = get_random_order();
+  do {
+int n;
+ww_acquire_init(&c);
+  retry:
+for (n = 0; n < h; n++)
+  ww_mutex_lock(i[j[n]]);
+f = n;
+if (d)
+  goto retry;
+  } while (b);
+}
+


Re: GCC 8.0.1 Status Report (2018-03-27)

2018-03-27 Thread H.J. Lu
On Tue, Mar 27, 2018 at 6:32 AM, Richard Biener  wrote:
>
> Status
> ==
>
> The GCC 8 trunk is open for regression and documentation fixes.  Following
> past releases we are aiming at a first release candidate mid April though
> if you look at the quality data below that looks ambitious.
>
> So please help tackling (and confirming, bisecting, reducing, etc.)
> regressions to make a timely release of GCC 8 possible.
>
>
> Quality Data
> 
>
> Priority  #   Change from last report
> ---   ---
> P1   22   -  14
> P2  104   -  29
> P3   15   -  42
> P4  183   +  25
> P5   26   -   1
> ---   ---
> Total P1-P3 141   -  85
> Total   350   -  61
>
>
> Previous Report
> ===
>
> https://gcc.gnu.org/ml/gcc/2018-01/msg00083.html

I have 3 Intel CET patches pending review:

https://gcc.gnu.org/ml/gcc-patches/2018-03/msg01356.html
https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00485.html
https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01741.html

I also opened:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85086

I have a patch at

https://github.com/hjl-tools/gcc/commit/e9ff815941406e38fa629947af4d809b9129e860

which requires unwind ABI extension.

I'd like to see them fixed to get working Intel CET support in
GCC 8.

Thanks.

-- 
H.J.


[PATCH][AArch64] XFAIL gcc.target/aarch64/store_v2vec_lanes.c for ILP32

2018-03-27 Thread Kyrill Tkachov

Hi all,

The test in question fails for ilp32. The initial analysis I did in the PR for 
it
is that for ILP32 we generate somewhat different address forms that we'd need 
to adjust aarch64_classify_address to catch.
Given the optimisation this test checks for was added for GCC 8 it is not a 
regression, and improving the codegen on ILP32
would be an enhancement rather than a fix. So Richi has asked for it to be 
marked as XFAIL on ILP32, which is what this
patch does.
Checked that the test still passes on LP64 and appears as XFAIL on -mabi=ilp32.

Ok for trunk?
Thanks,
Kyrill

2018-03-27  Kyrylo Tkachov  

PR target/83009
* gcc.target/aarch64/store_v2vec_lanes.c: XFAIL for ilp32.
commit 39e1ef03918b1911cedae37552cbaf1185420aa2
Author: Kyrylo Tkachov 
Date:   Tue Mar 27 10:45:04 2018 +0100

[AArch64] XFAIL gcc.target/aarch64/store_v2vec_lanes.c for ILP32

diff --git a/gcc/testsuite/gcc.target/aarch64/store_v2vec_lanes.c b/gcc/testsuite/gcc.target/aarch64/store_v2vec_lanes.c
index 6810db3..990aea3 100644
--- a/gcc/testsuite/gcc.target/aarch64/store_v2vec_lanes.c
+++ b/gcc/testsuite/gcc.target/aarch64/store_v2vec_lanes.c
@@ -26,6 +26,6 @@ construct_lane_2 (long long *y, v2di *z)
values from consecutive memory into a 2-element vector by using
a Q-reg LDR.  */
 
-/* { dg-final { scan-assembler-times "stp\td\[0-9\]+, d\[0-9\]+" 1 } } */
-/* { dg-final { scan-assembler-times "stp\tx\[0-9\]+, x\[0-9\]+" 1 } } */
-/* { dg-final { scan-assembler-not "ins\t" } } */
+/* { dg-final { scan-assembler-times "stp\td\[0-9\]+, d\[0-9\]+" 1 { xfail ilp32 } } } */
+/* { dg-final { scan-assembler-times "stp\tx\[0-9\]+, x\[0-9\]+" 1 { xfail ilp32 } } } */
+/* { dg-final { scan-assembler-not "ins\t" { xfail ilp32 } } } */


GCC 8.0.1 Status Report (2018-03-27)

2018-03-27 Thread Richard Biener

Status
==

The GCC 8 trunk is open for regression and documentation fixes.  Following
past releases we are aiming at a first release candidate mid April though
if you look at the quality data below that looks ambitious.

So please help tackling (and confirming, bisecting, reducing, etc.)
regressions to make a timely release of GCC 8 possible.


Quality Data


Priority  #   Change from last report
---   ---
P1   22   -  14
P2  104   -  29
P3   15   -  42
P4  183   +  25
P5   26   -   1
---   ---
Total P1-P3 141   -  85
Total   350   -  61


Previous Report
===

https://gcc.gnu.org/ml/gcc/2018-01/msg00083.html


Re: [PATCH, rtl] Fix PR84878: Segmentation fault in add_cross_iteration_register_deps

2018-03-27 Thread Richard Biener
On Tue, 27 Mar 2018, Peter Bergner wrote:

> On 3/27/18 3:18 AM, Richard Biener wrote:
> > On Mon, 26 Mar 2018, Peter Bergner wrote:
> >>/* Create inter-loop true dependences and anti dependences.  */
> >>for (r_use = DF_REF_CHAIN (last_def); r_use != NULL; r_use = 
> >> r_use->next)
> >>  {
> >> +  /* PR84878: Some definitions of global hard registers may not have
> >> +  any following uses or they may be artificial, so skip them.  */
> >> +  if (DF_REF_INSN_INFO (r_use->ref) == NULL)
> >> +  continue;
> >> +
> > 
> > To me a better check would be DF_REF_IS_ARTIFICIAL (r_use->ref).  But
> > I'm not sure simply ignoring those will be correct?
> 
> I see now I made a massive mistake in nomenclature in calling these
> "artificial" uses.  :-(  What I meant was the forcing of liveness
> for global registers at the exit block similar to what you mentioned
> in your reply.  Sorry about that.
> 
> 
> 
> > In fact artifical refs do have a basic-block, so
> >
> >>rtx_insn *use_insn = DF_REF_INSN (r_use->ref);
> >>  
> >>if (BLOCK_FOR_INSN (use_insn) != g->bb)
> > 
> > should use DF_REF_BB (r_use->ref) instead of indirection through
> > DF_REF_INSN.  Still use_insn is used later but then if the
> > artificial ref is in side g->bb we should better give up here?
> > We don't seem to have use_nodes for these "non-insns".
> 
> Maybe the problem is that we have a r_use->ref at all for these
> non-insns?
> 
> 
> > Somebody with more insight on DF should chime in here and tell
> > me what those "artificial" refs are about ...  there's
> > 
> > /* If this flag is set for an artificial use or def, that ref
> >logically happens at the top of the block.  If it is not set
> >for an artificial use or def, that ref logically happens at the
> >bottom of the block.  This is never set for regular refs.  */
> > DF_REF_AT_TOP = 1 << 1,
> > 
> > so this is kind-of global regs being live across all BBs?  This sounds
> > a bit stupid to me, but well ... IMHO those refs should be at
> > specific insns like calls.
> > 
> > So maybe, with a big fat comment, it is OK to ignore artificial
> > refs in this loop...
> 
> Yeah, I'd like someone else's opinion too, as I know even less about
> real artificial uses (as opposed to my incorrect mention in my first
> post). :-)

If they only appear in the exit/entry block ignoring them should be safe.

But who knows...

Richard.


Re: [PATCH, rtl] Fix PR84878: Segmentation fault in add_cross_iteration_register_deps

2018-03-27 Thread Peter Bergner
On 3/27/18 3:18 AM, Richard Biener wrote:
> On Mon, 26 Mar 2018, Peter Bergner wrote:
>>/* Create inter-loop true dependences and anti dependences.  */
>>for (r_use = DF_REF_CHAIN (last_def); r_use != NULL; r_use = r_use->next)
>>  {
>> +  /* PR84878: Some definitions of global hard registers may not have
>> +  any following uses or they may be artificial, so skip them.  */
>> +  if (DF_REF_INSN_INFO (r_use->ref) == NULL)
>> +continue;
>> +
> 
> To me a better check would be DF_REF_IS_ARTIFICIAL (r_use->ref).  But
> I'm not sure simply ignoring those will be correct?

I see now I made a massive mistake in nomenclature in calling these
"artificial" uses.  :-(  What I meant was the forcing of liveness
for global registers at the exit block similar to what you mentioned
in your reply.  Sorry about that.



> In fact artifical refs do have a basic-block, so
>
>>rtx_insn *use_insn = DF_REF_INSN (r_use->ref);
>>  
>>if (BLOCK_FOR_INSN (use_insn) != g->bb)
> 
> should use DF_REF_BB (r_use->ref) instead of indirection through
> DF_REF_INSN.  Still use_insn is used later but then if the
> artificial ref is in side g->bb we should better give up here?
> We don't seem to have use_nodes for these "non-insns".

Maybe the problem is that we have a r_use->ref at all for these
non-insns?


> Somebody with more insight on DF should chime in here and tell
> me what those "artificial" refs are about ...  there's
> 
> /* If this flag is set for an artificial use or def, that ref
>logically happens at the top of the block.  If it is not set
>for an artificial use or def, that ref logically happens at the
>bottom of the block.  This is never set for regular refs.  */
> DF_REF_AT_TOP = 1 << 1,
> 
> so this is kind-of global regs being live across all BBs?  This sounds
> a bit stupid to me, but well ... IMHO those refs should be at
> specific insns like calls.
> 
> So maybe, with a big fat comment, it is OK to ignore artificial
> refs in this loop...

Yeah, I'd like someone else's opinion too, as I know even less about
real artificial uses (as opposed to my incorrect mention in my first
post). :-)

Peter




[PATCH][AARCH64][PR target/84882] Add mno-strict-align

2018-03-27 Thread Sudakshina Das

Hi

This patch adds the no variant to -mstrict-align and the corresponding
function attribute. To enable the function attribute, I have modified
aarch64_can_inline_p () to allow checks even when the callee function
has no attribute. The need for this is shown by the new test
target_attr_18.c.

Testing: Bootstrapped, regtested and added new tests that are copies
of earlier tests checking -mstrict-align with opposite scan directives.

Is this ok for trunk?

Sudi


*** gcc/ChangeLog ***

2018-03-27  Sudakshina Das  

* common/config/aarch64/aarch64-common.c (aarch64_handle_option):
Check val before adding MASK_STRICT_ALIGN to opts->x_target_flags.
* config/aarch64/aarch64.opt (mstrict-align): Remove RejectNegative.
* config/aarch64/aarch64.c (aarch64_attributes): Mark allow_neg
as true for strict-align.
(aarch64_can_inline_p): Perform checks even when callee has no
attributes to check for strict alignment.
* doc/extend.texi (AArch64 Function Attributes): Document
no-strict-align.
* doc/invoke.texi: (AArch64 Options): Likewise.

*** gcc/testsuite/ChangeLog ***

2018-03-27  Sudakshina Das  

* gcc.target/aarch64/pr84882.c: New test.
* gcc.target/aarch64/target_attr_18.c: Likewise.
diff --git a/gcc/common/config/aarch64/aarch64-common.c b/gcc/common/config/aarch64/aarch64-common.c
index 7fd9305..d5655a0 100644
--- a/gcc/common/config/aarch64/aarch64-common.c
+++ b/gcc/common/config/aarch64/aarch64-common.c
@@ -97,7 +97,10 @@ aarch64_handle_option (struct gcc_options *opts,
   return true;
 
 case OPT_mstrict_align:
-  opts->x_target_flags |= MASK_STRICT_ALIGN;
+  if (val)
+	opts->x_target_flags |= MASK_STRICT_ALIGN;
+  else
+	opts->x_target_flags &= ~MASK_STRICT_ALIGN;
   return true;
 
 case OPT_momit_leaf_frame_pointer:
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 4b5183b..4f35a6c 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -11277,7 +11277,7 @@ static const struct aarch64_attribute_info aarch64_attributes[] =
   { "fix-cortex-a53-843419", aarch64_attr_bool, true, NULL,
  OPT_mfix_cortex_a53_843419 },
   { "cmodel", aarch64_attr_enum, false, NULL, OPT_mcmodel_ },
-  { "strict-align", aarch64_attr_mask, false, NULL, OPT_mstrict_align },
+  { "strict-align", aarch64_attr_mask, true, NULL, OPT_mstrict_align },
   { "omit-leaf-frame-pointer", aarch64_attr_bool, true, NULL,
  OPT_momit_leaf_frame_pointer },
   { "tls-dialect", aarch64_attr_enum, false, NULL, OPT_mtls_dialect_ },
@@ -11640,16 +11640,13 @@ aarch64_can_inline_p (tree caller, tree callee)
   tree caller_tree = DECL_FUNCTION_SPECIFIC_TARGET (caller);
   tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee);
 
-  /* If callee has no option attributes, then it is ok to inline.  */
-  if (!callee_tree)
-return true;
-
   struct cl_target_option *caller_opts
 	= TREE_TARGET_OPTION (caller_tree ? caller_tree
 	   : target_option_default_node);
 
-  struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree);
-
+  struct cl_target_option *callee_opts
+	= TREE_TARGET_OPTION (callee_tree ? callee_tree
+	   : target_option_default_node);
 
   /* Callee's ISA flags should be a subset of the caller's.  */
   if ((caller_opts->x_aarch64_isa_flags & callee_opts->x_aarch64_isa_flags)
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 52eaf8c..1426b45 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -85,7 +85,7 @@ Target RejectNegative Joined Enum(cmodel) Var(aarch64_cmodel_var) Init(AARCH64_C
 Specify the code model.
 
 mstrict-align
-Target Report RejectNegative Mask(STRICT_ALIGN) Save
+Target Report Mask(STRICT_ALIGN) Save
 Don't assume that unaligned accesses are handled by the system.
 
 momit-leaf-frame-pointer
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 93a0ebc..dcda216 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -3605,8 +3605,10 @@ for the command line option @option{-mcmodel=}.
 @item strict-align
 @cindex @code{strict-align} function attribute, AArch64
 Indicates that the compiler should not assume that unaligned memory references
-are handled by the system.  The behavior is the same as for the command-line
-option @option{-mstrict-align}.
+are handled by the system.  To allow the compiler to assume that aligned memory
+references are handled by the system, the inverse attribute
+@code{no-strict-align} can be specified.  The behavior is the same as for the
+command-line option @option{-mstrict-align} and @option{-mno-strict-align}.
 
 @item omit-leaf-frame-pointer
 @cindex @code{omit-leaf-frame-pointer} function attribute, AArch64
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index feacd56..0574d21 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -596,7 +596,7 @@ Objective-C and Objective-C++ Dialects}.
 @gccop

Re: [og7] vector_length extension part 5: libgomp and tests

2018-03-27 Thread Tom de Vries

On 03/02/2018 09:47 PM, Cesar Philippidis wrote:

two test cases.


Committed as separate patch, while ignoring the warnings "using 
vector_length \\(32\\), ignoring 128".


Thanks,
- Tom
[openacc] Add vector_length 128 testcases

2018-03-27  Cesar Philippidis  
	Tom de Vries  

	* testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: New test.
	* testsuite/libgomp.oacc-fortran/gemm.f90: New test.

---
 .../libgomp.oacc-c-c++-common/vred2d-128.c |  57 +++
 libgomp/testsuite/libgomp.oacc-fortran/gemm.f90| 109 +
 3 files changed, 172 insertions(+)

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vred2d-128.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vred2d-128.c
new file mode 100644
index 000..1dc5fe0
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vred2d-128.c
@@ -0,0 +1,57 @@
+/* Test large vector lengths.  */
+
+#include 
+
+#define n 1
+int a1[n], a2[n];
+
+#define gentest(name, outer, inner)		\
+  void name ()	\
+  {		\
+  long i, j, t1, t2, t3;			\
+  _Pragma(outer)\
+  for (i = 0; i < n; i++)			\
+{		\
+  t1 = 0;	\
+  t2 = 0;	\
+  _Pragma(inner)\
+  for (j = i; j < n; j++)			\
+	{	\
+	  t1++;	\
+	  t2--;	\
+	}	\
+  a1[i] = t1;\
+  a2[i] = t2;\
+}		\
+  for (i = 0; i < n; i++)			\
+{		\
+  assert (a1[i] == n-i);			\
+  assert (a2[i] == -(n-i));			\
+}		\
+  }		\
+
+gentest (test1, "acc parallel loop gang vector_length (128)",
+	 "acc loop vector reduction(+:t1) reduction(-:t2)")
+
+gentest (test2, "acc parallel loop gang vector_length (128)",
+	 "acc loop worker vector reduction(+:t1) reduction(-:t2)")
+
+gentest (test3, "acc parallel loop gang worker vector_length (128)",
+	 "acc loop vector reduction(+:t1) reduction(-:t2)")
+
+gentest (test4, "acc parallel loop",
+	 "acc loop reduction(+:t1) reduction(-:t2)")
+
+/* { dg-prune-output "using vector_length \\(32\\), ignoring 128" } */
+
+
+int
+main ()
+{
+  test1 ();
+  test2 ();
+  test3 ();
+  test4 ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/gemm.f90 b/libgomp/testsuite/libgomp.oacc-fortran/gemm.f90
new file mode 100644
index 000..62b8a45
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/gemm.f90
@@ -0,0 +1,109 @@
+! Exercise three levels of parallelism using SGEMM from BLAS.
+
+! { dg-additional-options "-fopenacc-dim=-:-:128" }
+
+! Implicitly set vector_length to 128 using -fopenacc-dim.
+subroutine openacc_sgemm (m, n, k, alpha, a, b, beta, c)
+  integer :: m, n, k
+  real :: alpha, beta
+  real :: a(k,*), b(k,*), c(m,*)
+
+  integer :: i, j, l
+  real :: temp
+
+  !$acc parallel loop copy(c(1:m,1:n)) copyin(a(1:k,1:m),b(1:k,1:n))
+  do j = 1, n
+ !$acc loop
+ do i = 1, m
+temp = 0.0
+!$acc loop reduction(+:temp)
+do l = 1, k
+   temp = temp + a(l,i)*b(l,j)
+end do
+if(beta == 0.0) then
+   c(i,j) = alpha*temp
+else
+   c(i,j) = alpha*temp + beta*c(i,j)
+end if
+ end do
+  end do
+end subroutine openacc_sgemm
+
+! Explicitly set vector_length to 128 using a vector_length clause.
+subroutine openacc_sgemm_128 (m, n, k, alpha, a, b, beta, c)
+  integer :: m, n, k
+  real :: alpha, beta
+  real :: a(k,*), b(k,*), c(m,*)
+
+  integer :: i, j, l
+  real :: temp
+
+  !$acc parallel loop copy(c(1:m,1:n)) copyin(a(1:k,1:m),b(1:k,1:n)) vector_length (128)
+  ! { dg-prune-output "using vector_length \\(32\\), ignoring 128" }
+  do j = 1, n
+ !$acc loop
+ do i = 1, m
+temp = 0.0
+!$acc loop reduction(+:temp)
+do l = 1, k
+   temp = temp + a(l,i)*b(l,j)
+end do
+if(beta == 0.0) then
+   c(i,j) = alpha*temp
+else
+   c(i,j) = alpha*temp + beta*c(i,j)
+end if
+ end do
+  end do
+end subroutine openacc_sgemm_128
+
+subroutine host_sgemm (m, n, k, alpha, a, b, beta, c)
+  integer :: m, n, k
+  real :: alpha, beta
+  real :: a(k,*), b(k,*), c(m,*)
+
+  integer :: i, j, l
+  real :: temp
+
+  do j = 1, n
+ do i = 1, m
+temp = 0.0
+do l = 1, k
+   temp = temp + a(l,i)*b(l,j)
+end do
+if(beta == 0.0) then
+   c(i,j) = alpha*temp
+else
+   c(i,j) = alpha*temp + beta*c(i,j)
+end if
+ end do
+  end do
+end subroutine host_sgemm
+
+program main
+  integer, parameter :: M = 100, N = 50, K = 2000
+  real :: a(K, M), b(K, N), c(M, N), d (M, N), e (M, N)
+  real alpha, beta
+  integer i, j
+
+  a(:,:) = 1.0
+  b(:,:) = 0.25
+
+  c(:,:) = 0.0
+  d(:,:) = 0.0
+  e(:,:) = 0.0
+
+  alpha = 1.05
+  beta = 1.25
+
+  call openacc_sgemm (M, N, K, alpha, a, b, beta, c)
+  call openacc_sgemm_128 (M, N, K, alpha, a, b, beta, d)
+  call host_sgemm (M, N, K, alpha, a, b, beta, e)
+
+  do i = 1, m
+ do j = 1, n
+if (c(i,j) /= e(i,j)) call abort
+if (d(i,j) /= e(i,j)) call abo

Re: [og7] vector_length extension part 4: target hooks and automatic parallelism

2018-03-27 Thread Tom de Vries

On 03/26/2018 06:33 PM, Tom de Vries wrote:

+  loop->mask = targetm.goacc.adjust_parallelism (loop->mask, outer_mask);
loop->mask |= this_mask;


I committed the above, but the original:
...

@@ -1397,6 +1407,8 @@ oacc_loop_auto_partitions (oacc_loop *loop, unsigned 
outer_mask,
}
 
   loop->mask |= this_mask;

+  loop->mask = targetm.goacc.adjust_parallelism (loop->mask, outer_mask);
+
   if (!loop->mask && noisy)
warning_at (loop->loc, 0,
tiling

...
has the two loop->mask lines in the reverse order.

Fixed in attached patch.

Committed.

Thanks,
- Tom
[openacc] Fix adjust_parallism usage in oacc_loop_auto_partitions

2018-03-27  Tom de Vries  

	* omp-offload.c (oacc_loop_auto_partitions): Fix adjust_parallism usage.

---
 gcc/omp-offload.c | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index aa4de24..ed17160 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -1404,8 +1404,8 @@ oacc_loop_auto_partitions (oacc_loop *loop, unsigned outer_mask,
 			" to parallelize element loop");
 	}
 
-  loop->mask = targetm.goacc.adjust_parallelism (loop->mask, outer_mask);
   loop->mask |= this_mask;
+  loop->mask = targetm.goacc.adjust_parallelism (loop->mask, outer_mask);
 
   if (!loop->mask && noisy)
 	warning_at (loop->loc, 0,


[PATCH] Fix PR82847

2018-03-27 Thread Richard Biener

The following attempts to fix PR82847 by introducing a check whether
ssse3 is available and enable vect_perm_short if so.  Somewhat of
a kludge but I can't think of anything better right now.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2018-03-27  Richard Biener  

PR testsuite/82847
* lib/target-supports.exp (check_ssse3_available): New function.
(check_effective_target_vect_perm_short): Enable for x86 if
check_ssse3_available.

Index: gcc/testsuite/lib/target-supports.exp
===
--- gcc/testsuite/lib/target-supports.exp   (revision 258871)
+++ gcc/testsuite/lib/target-supports.exp   (working copy)
@@ -5828,6 +5828,8 @@ proc check_effective_target_vect_perm_sh
 && ![check_effective_target_vect_variable_length])
 || [istarget powerpc*-*-*]
 || [istarget spu-*-*]
+|| (([istarget i?86-*-*] || [istarget x86_64-*-*]
+&& [check_ssse3_available]))
 || ([istarget mips*-*-*]
  && [et-is-effective-target mips_msa])
 || ([istarget s390*-*-*]
@@ -8012,6 +8014,19 @@ proc check_avx_available { } {
 #error unsupported
 #endif
   } ""] } {
+return 1;
+  }
+  return 0;
+}
+
+# Return true if we are compiling for SSSE3 target.
+
+proc check_ssse3_available { } {
+  if { [check_no_compiler_messages sse3a_available assembly {
+#ifndef __SSSE3__
+#error unsupported
+#endif
+  } ""] } {
 return 1;
   }
   return 0;


[PATCH] Fix PR85082

2018-03-27 Thread Richard Biener

>From the alias-oracle walk we don't get here with valueized VUSE
so valueize it before looking up an existing value, otherwise
we'll ICE upon insertion which does valueize.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2018-03-27  Richard Biener  

PR tree-optimization/85082
* tree-ssa-sccvn.c (vn_reference_lookup_or_insert_for_pieces):
Valueize the VUSE.

* gfortran.dg/pr85082.f90: New testcase.

Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 258851)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -1631,7 +1631,7 @@ vn_reference_lookup_or_insert_for_pieces
   vn_reference_s vr1;
   vn_reference_t result;
   unsigned value_id;
-  vr1.vuse = vuse;
+  vr1.vuse = vuse ? SSA_VAL (vuse) : NULL_TREE;
   vr1.operands = operands;
   vr1.type = type;
   vr1.set = set;
Index: gcc/testsuite/gfortran.dg/pr85082.f90
===
--- gcc/testsuite/gfortran.dg/pr85082.f90   (nonexistent)
+++ gcc/testsuite/gfortran.dg/pr85082.f90   (working copy)
@@ -0,0 +1,14 @@
+! { dg-do compile }
+! { dg-options "-Ofast" }
+program p
+   real(4) :: a, b
+   integer(4) :: n, m
+   equivalence (a, n)
+   a = 1024.0
+   m = 8
+   a = 1024.0
+   b = set_exponent(a, m)
+   n = 8
+   a = f(a, n)
+   b = set_exponent(a, m)
+end


[PATCH] Fix PR84067

2018-03-27 Thread Richard Biener

The following guards the fold_plusminus_mult patterns with explicit
single_use checks to avoid regressing gcc.dg/wmul-1.c, that is,
introduction of additional multiplications.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

I've checked with a cross to aarch64 that the FAIL is gone.

Richard.

2018-03-27  Richard Biener  

PR middle-ed/84067
* match.pd ((A * C) +- (B * C) -> (A+-B) * C): Guard with
explicit single_use checks.

Index: gcc/match.pd
===
--- gcc/match.pd(revision 258871)
+++ gcc/match.pd(working copy)
@@ -1948,30 +1948,35 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  && (!FLOAT_TYPE_P (type) || flag_associative_math))
  (for plusminus (plus minus)
   (simplify
-   (plusminus (mult:cs @0 @1) (mult:cs @0 @2))
-   (if (!ANY_INTEGRAL_TYPE_P (type)
-|| TYPE_OVERFLOW_WRAPS (type)
-|| (INTEGRAL_TYPE_P (type)
-   && tree_expr_nonzero_p (@0)
-   && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type)
-(mult (plusminus @1 @2) @0)))
-  /* We cannot generate constant 1 for fract.  */
-  (if (!ALL_FRACT_MODE_P (TYPE_MODE (type)))
-   (simplify
-(plusminus @0 (mult:cs @0 @2))
-(if (!ANY_INTEGRAL_TYPE_P (type)
+   (plusminus (mult:cs@3 @0 @1) (mult:cs@4 @0 @2))
+   (if ((!ANY_INTEGRAL_TYPE_P (type)
 || TYPE_OVERFLOW_WRAPS (type)
 || (INTEGRAL_TYPE_P (type)
 && tree_expr_nonzero_p (@0)
 && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type)
+   /* If @1 +- @2 is constant require a hard single-use on either
+  original operand (but not on both).  */
+   && (single_use (@3) || single_use (@4)))
+(mult (plusminus @1 @2) @0)))
+  /* We cannot generate constant 1 for fract.  */
+  (if (!ALL_FRACT_MODE_P (TYPE_MODE (type)))
+   (simplify
+(plusminus @0 (mult:c@3 @0 @2))
+(if ((!ANY_INTEGRAL_TYPE_P (type)
+ || TYPE_OVERFLOW_WRAPS (type)
+ || (INTEGRAL_TYPE_P (type)
+ && tree_expr_nonzero_p (@0)
+ && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type)
+&& single_use (@3))
  (mult (plusminus { build_one_cst (type); } @2) @0)))
(simplify
-(plusminus (mult:cs @0 @2) @0)
-(if (!ANY_INTEGRAL_TYPE_P (type)
-|| TYPE_OVERFLOW_WRAPS (type)
-|| (INTEGRAL_TYPE_P (type)
-&& tree_expr_nonzero_p (@0)
-&& expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type)
+(plusminus (mult:c@3 @0 @2) @0)
+(if ((!ANY_INTEGRAL_TYPE_P (type)
+ || TYPE_OVERFLOW_WRAPS (type)
+ || (INTEGRAL_TYPE_P (type)
+ && tree_expr_nonzero_p (@0)
+ && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type)
+&& single_use (@3))
  (mult (plusminus @2 { build_one_cst (type); }) @0))
 
 /* Simplifications of MIN_EXPR, MAX_EXPR, fmin() and fmax().  */


RE: [PATCH] Improve TRUTH_{AND,OR}IF_EXPR expansion (PR rtl-optimization/78200)

2018-03-27 Thread Kumar, Venkataramanan
Hi Jakub,

> -Original Message-
> From: Jakub Jelinek 
> Sent: Tuesday, March 27, 2018 4:43 PM
> To: Kumar, Venkataramanan 
> Cc: Richard Biener ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Improve TRUTH_{AND,OR}IF_EXPR expansion (PR rtl-
> optimization/78200)
> 
> On Tue, Mar 27, 2018 at 11:04:35AM +, Kumar, Venkataramanan wrote:
> > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk
> > > if it helps for SPEC?  Venkat, do you think you could benchmark it
> > > in the setup where you've measured the slowdown to see if it helps?
> > > I see the patch changes the loop:
> 
> Thanks for benchmarking it.
> 
> > The patch causes regression  benchmark when I measured on my Ryzen
> box (>4%) .
> >
> > GCC trunk:  -O3 -maxv2 -mprefer-avx128  -O3 -march=znver1
> > 429.mcf  9120238   38.3 *9120227   40.2 
> > *
> >
> > GCC patch:
> > 429.mcf  9120251   36.3 *9120236   38.6 
> > *
> 
> So, has 429.mcf improved then compared to 7.x sufficiently that we can turn
> PR78200 into just [7 Regression], without adding any patch?
Yes, we can mark this PR as GCC 7 regression only. 

There is another PR  (84481) for 429.mcf on Zen regression against 7.x which 
seem to be independent of this issue. 

> 
>   Jakub


Re: [PATCH] Improve TRUTH_{AND,OR}IF_EXPR expansion (PR rtl-optimization/78200)

2018-03-27 Thread Jakub Jelinek
On Tue, Mar 27, 2018 at 11:04:35AM +, Kumar, Venkataramanan wrote:
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk if it
> > helps for SPEC?  Venkat, do you think you could benchmark it in the setup
> > where you've measured the slowdown to see if it helps?  I see the patch
> > changes the loop:

Thanks for benchmarking it.

> The patch causes regression  benchmark when I measured on my Ryzen box (>4%) 
> . 
> 
> GCC trunk:-O3 -maxv2 -mprefer-avx128  -O3 -march=znver1
> 429.mcf  9120238   38.3 *9120227   40.2 *
> 
> GCC patch:
> 429.mcf  9120251   36.3 *9120236   38.6 *

So, has 429.mcf improved then compared to 7.x sufficiently that we can turn
PR78200 into just [7 Regression], without adding any patch?

Jakub


RE: [PATCH] Improve TRUTH_{AND,OR}IF_EXPR expansion (PR rtl-optimization/78200)

2018-03-27 Thread Kumar, Venkataramanan
Hi Jakub,


> -Original Message-
> From: Jakub Jelinek 
> Sent: Tuesday, March 27, 2018 2:40 PM
> To: Richard Biener 
> Cc: gcc-patches@gcc.gnu.org; Kumar, Venkataramanan
> 
> Subject: [PATCH] Improve TRUTH_{AND,OR}IF_EXPR expansion (PR rtl-
> optimization/78200)
> 
> Hi!
> 
> The following patch attempts to improve expansion, if we have code like:
>[local count: 102513059]:
>   if_conversion_var_52 = MEM[base: st1_22, offset: 0B];
>   if (if_conversion_var_52 < 0)
> goto ; [41.00%]
>   else
> goto ; [59.00%]
> 
> ...
> 
>[local count: 60482706]:
>   _81 = _11 == 2;
>   _82 = if_conversion_var_52 > 0;
>   _79 = _81 & _82;
>   if (_79 != 0)
> goto ; [29.26%]
>   else
> goto ; [70.74%]
> 
> Here, the single pred of the bb performed a similar comparison to what one
> of the & (or |) operands does, and as & (or |) is not ordered, we can choose
> which operand we'll expand first.  If we expand if_conversion_var_52 > 0
> first, there is a chance that we can reuse flags from the previous comparison.
> The patch does it only if there are no (non-virtual) phis on the current bb, 
> all
> stmts before the current condition are TERable, so there is nothing that
> would clobber the flags in between.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk if it
> helps for SPEC?  Venkat, do you think you could benchmark it in the setup
> where you've measured the slowdown to see if it helps?  I see the patch
> changes the loop:

The patch causes regression  benchmark when I measured on my Ryzen box (>4%) . 

GCC trunk:  -O3 -maxv2 -mprefer-avx128  -O3 -march=znver1
429.mcf  9120238   38.3 *9120227   40.2 *

GCC patch:
429.mcf  9120251   36.3 *9120236   38.6 *

>   .p2align 4,,10
>   .p2align 3
>  .L11:
> + jle .L10
>   cmpl$2, %eax
> - jne .L10
> - testq   %r8, %r8
> - jg  .L12
> + je  .L12
>   .p2align 4,,10
>   .p2align 3
>  .L10:
>   addq%r9, %rsi
>   cmpq%rsi, %rdx
>   jbe .L35
>  .L13:
>   movl24(%rsi), %eax
>   testl   %eax, %eax
>   jle .L10
>   movq(%rsi), %r8
>   testq   %r8, %r8
>   jns .L11
>   cmpl$1, %eax
>   jne .L10
>  .L12:
>   addq$1, %r10
>   movl$1, %r11d
>   movqst5(,%r10,8), %rax
>   movq%rsi, (%rax)
>   addq%r9, %rsi
>   movq%r8, 8(%rax)
>   movqst5(,%rdi,8), %rax
>   movq%r8, 16(%rax)
>   cmpq%rsi, %rdx
>   ja  .L13
> which I assume shall be an improvement, since we can save one extra
> comparison.

This issue was reported against GCC 7.x. Now looking at the assembly generated, 
it seems GCC trunk is already emitting the correct order needed for MCF.


GCC trunk   GCC patch
.L41:   |  .L41:
  |  jle .L40   
  cmpl$2, %edi|  cmpl$2, %edi
  jne .L40|  je  .L42
  testq   %rdx, %rdx  |  

  jg  .L42


Now the patch avoids extra compare. But it is again emitting  compares in an 
order which is bad for "mcf".

GCC trunk 
cmpl$2, %edi
jne .L40 <== is almost always true.
testq   %rdx, %rdx  

Now
 "jle .L40" <== is almost always false   
  cmpl$2, %edi 
 je  .L42
 
> 
> 2018-03-27  Jakub Jelinek  
>   Richard Biener  
> 
>   PR rtl-optimization/78200
>   * cfgexpand.c (gimple_bb_info_for_bb): New variable.
>   (expand_bb_seq, expand_phi_nodes): New functions.
>   (expand_gimple_cond): Use heuristics if it is desirable to
>   swap TRUTH_{AND,OR}IF_EXPR operands.
>   (expand_gimple_basic_block): Remember GIMPLE il for bbs
>   being expanded or already expanded.
>   (pass_expand::execute): Initialize and then free the
>   gimple_bb_info_for_bb vector.
> 
> --- gcc/cfgexpand.c.jj2018-02-09 06:44:36.413805556 +0100
> +++ gcc/cfgexpand.c   2018-03-26 13:35:57.536509844 +0200
> @@ -2357,6 +2357,34 @@ label_rtx_for_bb (basic_block bb ATTRIBU
>return l;
>  }
> 
> +/* Maps blocks to their GIMPLE IL.  */
> +static vec *gimple_bb_info_for_bb;
> +
> +/* Like bb_seq, except that during expansion returns the GIMPLE seq
> +   even for the blocks that have been already expanded or are being
> +   currently expanded.  */
> +
> +static gimple_seq
> +expand_bb_seq (basic_block bb)
> +{
> +  if ((bb->flags & BB_RTL)
> +  && (unsigned) bb->index < vec_safe_length (gimple_bb_info_for_bb))
> +return (*gimple_bb_info_for_bb)[bb->index].seq;
> +  return bb_seq (bb);
> +}
> +
> +/* Like phi_nodes, except that during expansion returns the GIMPLE PHIs
> +   even for the blocks that have

Re: [C++ PATCH] Fix invalid covariant return error-recovery (PR c++/85068)

2018-03-27 Thread Nathan Sidwell

On 03/27/2018 04:49 AM, Jakub Jelinek wrote:

Hi!

As the comment says, in a valid program we wals find thunk_binfo, but if the
covariancy is invalid, we've already diagnosed error and we might not find
it.  We have case to handle thunk_binfo NULL or not finding it in the chain,
but on the following testcase base_binfo is NULL and we ICE when we try to
access BINFO_TYPE on it.

Fixed thusly, furthermore to match the comment I've added an assertion that
if we don't find thunk_binfo we've indeed already diagnosed an error.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-03-27  Jakub Jelinek  

PR c++/85068
* class.c (update_vtable_entry_for_fn): Don't ICE if base_binfo
is NULL.  Assert if thunk_binfo is NULL then errorcount is non-zero.



ok, thanks.

nathan

--
Nathan Sidwell


Re: [C++ Patch] Fix confusing diagnostics for invalid overrides

2018-03-27 Thread Nathan Sidwell

On 03/27/2018 03:08 AM, Volker Reichelt wrote:

On 03/26/2018 01:19 PM, Nathan Sidwell wrote:

On 03/25/2018 10:18 AM, Volker Reichelt wrote:




How about the following then?
I rephrased the last three errors a little to really make them 
stand-alone errors.


Again, bootstrapped and regtested.
OK for trunk?


looks great, thanks!

nathan

--
Nathan Sidwell


RE: [PATCH,Testsuite,MIPS] Fixing fix-r4000-n.c failure started with r255348

2018-03-27 Thread Matthew Fortune
Hi Paul,

> ChangeLog entries:
> 
> gcc/testsuite/ChangeLog
> 
> 2018-03-24  Chenghua Xu 
> 
> * gcc.target/mips/fix-r4000-1.c: Delete "[^\n]" in dg-final.
> * gcc.target/mips/fix-r4000-2.c: Likewise.
> * gcc.target/mips/fix-r4000-3.c: Likewise.
> * gcc.target/mips/fix-r4000-4.c: Likewise.
> * gcc.target/mips/fix-r4000-5.c: Likewise.
> * gcc.target/mips/fix-r4000-6.c: Likewise.
> * gcc.target/mips/fix-r4000-7.c: Likewise.
> * gcc.target/mips/fix-r4000-8.c: Likewise.
> * gcc.target/mips/fix-r4000-9.c: Likewise.
> * gcc.target/mips/fix-r4000-10.c: Likewise.
> * gcc.target/mips/fix-r4000-7.c: Change dg-final
>   "mulditi3_r4000" instead of "mulditi3".
> * gcc.target/mips/fix-r4000-8.c: Change dg-final
>   "umulditi3_r4000" instead of "umulditi3".

This looks good too. Another good cleanup, OK to commit.

Thanks,
Matthew


RE: [PATCH,Testsuite,MIPS] Fixing umips-stroe16-2.c failure started with r255348

2018-03-27 Thread Matthew Fortune
Hi Paul

> ChangeLog entries:
> 
> gcc/testsuite/ChangeLog
> 
> 2018-03-24  Chenghua Xu 
> 
> * gcc.target/mips/umips-stroe16-2.c: Change "length = 2"
>   to "l=2" in dg-final.

Looks good. Thanks for the cleanup. OK to commit.

Matthew


[PATCH] Fix PR84004

2018-03-27 Thread Richard Biener

Tested on x86_64-unknown-linux-gnu, applied.

Richard.

2018-03-27  Richard Biener  

PR testsuite/84004
* gcc.dg/vect/vect-95.c: Never expect the loop to be peeled for
alignment.

Index: gcc/testsuite/gcc.dg/vect/vect-95.c
===
--- gcc/testsuite/gcc.dg/vect/vect-95.c (revision 258851)
+++ gcc/testsuite/gcc.dg/vect/vect-95.c (working copy)
@@ -56,7 +56,7 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
-/* { dg-final { scan-tree-dump-times "Alignment of access forced using 
peeling" 0 "vect" { xfail {vect_element_align} } } } */
+/* { dg-final { scan-tree-dump-times "Alignment of access forced using 
peeling" 0 "vect" } } */
 
 /* For targets that support unaligned loads we version for the two unaligned 
stores and generate misaligned accesses for the loads. For targets that 


Re: [Patch AArch64] Turn on -fasynchronous-unwind-tables and -funwind-tables by default.

2018-03-27 Thread Ramana Radhakrishnan
On Mon, Mar 19, 2018 at 12:12 PM, James Greenhalgh
 wrote:
> On Tue, Mar 13, 2018 at 01:35:56PM +, Ramana Radhakrishnan wrote:
>> Jakub commented here that
>> https://gcc.gnu.org/ml/gcc-patches/2018-02/msg01325.html we don't turn
>> on fasynchronous-unwind-tables for AArch64. I note that x86_64 turns on
>> funwind-tables as well. Thus this patch turns on both.
>>
>> Note that this doesn't increase code size but will likely increase
>> binary size because we now have a lot more eh_frame / debug_frame
>> information being spat out. We probably need to do get the clang /llvm
>> guys to do the same but I'll prod them separately.
>>
>> Bootstrapped and regression tested on aarch64-none-linux-gnu.
>>
>> Ok ?
>
> OK.
>

Now applied. Maybe the FreeBSD guys want to do the same but that's
their choice ?

CC'ing Andreas just in case he has an opinion.

Ramana

> Thanks,
> James
>


[C++ PATCH] Improve cp_fold on vector CONSTRUCTORs (PR c++/85077)

2018-03-27 Thread Jakub Jelinek
Hi!

The following testcase regressed for 8+, because we delayed folding in
SAVE_EXPRs and end up passing a CONSTRUCTOR with V4SFmode and 4x 0.0
constants in it rather than a VECTOR_CST to the middle-end folder, which
uses real_zerop and thus doesn't recognize the CONSTRUCTOR in
VEC_COND_EXPR  as zero and doesn't fold
it into ABS_EXPR.

We really should move that folding into match.pd, but that is a GCC 9 task.

Fixed by using fold on vector CONSTRUCTORs, the only thing fold does on
those is exactly the CONSTRUCTOR -> VECTOR_CST folding when possible.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-03-27  Jakub Jelinek  

PR c++/85077
* cp-gimplify.c (cp_fold) : For ctors with vector
type call fold to generate VECTOR_CSTs when possible.

* g++.dg/ext/vector35.C: New test.

--- gcc/cp/cp-gimplify.c.jj 2018-03-20 22:05:57.023431462 +0100
+++ gcc/cp/cp-gimplify.c2018-03-26 16:08:47.728347579 +0200
@@ -2504,6 +2504,8 @@ cp_fold (tree x)
CONSTRUCTOR_PLACEHOLDER_BOUNDARY (x)
  = CONSTRUCTOR_PLACEHOLDER_BOUNDARY (org_x);
  }
+   if (VECTOR_TYPE_P (TREE_TYPE (x)))
+ x = fold (x);
break;
   }
 case TREE_VEC:
--- gcc/testsuite/g++.dg/ext/vector35.C.jj  2018-03-26 16:19:39.330809031 
+0200
+++ gcc/testsuite/g++.dg/ext/vector35.C 2018-03-26 16:33:43.997330748 +0200
@@ -0,0 +1,22 @@
+// PR c++/85077
+// { dg-do compile }
+// { dg-options "-Ofast -fdump-tree-forwprop1" }
+
+typedef float V __attribute__((vector_size (4 * sizeof (float;
+typedef double W __attribute__((vector_size (2 * sizeof (double;
+
+void
+foo (V *y)
+{
+  V x = *y;
+  *y = x < 0 ? -x : x;
+}
+
+void
+bar (W *y)
+{
+  W x = *y;
+  *y = x < 0 ? -x : x;
+}
+
+// { dg-final { scan-tree-dump-times "ABS_EXPR <" 2 "forwprop1" } }

Jakub


RE: [PATCH] Improve TRUTH_{AND,OR}IF_EXPR expansion (PR rtl-optimization/78200)

2018-03-27 Thread Kumar, Venkataramanan
Hi Jakub, 

> -Original Message-
> From: Jakub Jelinek 
> Sent: Tuesday, March 27, 2018 2:40 PM
> To: Richard Biener 
> Cc: gcc-patches@gcc.gnu.org; Kumar, Venkataramanan
> 
> Subject: [PATCH] Improve TRUTH_{AND,OR}IF_EXPR expansion (PR rtl-
> optimization/78200)
> 
> Hi!
> 
> The following patch attempts to improve expansion, if we have code like:
>[local count: 102513059]:
>   if_conversion_var_52 = MEM[base: st1_22, offset: 0B];
>   if (if_conversion_var_52 < 0)
> goto ; [41.00%]
>   else
> goto ; [59.00%]
> 
> ...
> 
>[local count: 60482706]:
>   _81 = _11 == 2;
>   _82 = if_conversion_var_52 > 0;
>   _79 = _81 & _82;
>   if (_79 != 0)
> goto ; [29.26%]
>   else
> goto ; [70.74%]
> 
> Here, the single pred of the bb performed a similar comparison to what one
> of the & (or |) operands does, and as & (or |) is not ordered, we can choose
> which operand we'll expand first.  If we expand if_conversion_var_52 > 0
> first, there is a chance that we can reuse flags from the previous comparison.
> The patch does it only if there are no (non-virtual) phis on the current bb, 
> all
> stmts before the current condition are TERable, so there is nothing that
> would clobber the flags in between.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk if it
> helps for SPEC?  Venkat, do you think you could benchmark it in the setup
> where you've measured the slowdown to see if it helps?  I see the patch
> changes the loop:

Sure will benchmark and get back to you. 
 

>   .p2align 4,,10
>   .p2align 3
>  .L11:
> + jle .L10
>   cmpl$2, %eax
> - jne .L10
> - testq   %r8, %r8
> - jg  .L12
> + je  .L12
>   .p2align 4,,10
>   .p2align 3
>  .L10:
>   addq%r9, %rsi
>   cmpq%rsi, %rdx
>   jbe .L35
>  .L13:
>   movl24(%rsi), %eax
>   testl   %eax, %eax
>   jle .L10
>   movq(%rsi), %r8
>   testq   %r8, %r8
>   jns .L11
>   cmpl$1, %eax
>   jne .L10
>  .L12:
>   addq$1, %r10
>   movl$1, %r11d
>   movqst5(,%r10,8), %rax
>   movq%rsi, (%rax)
>   addq%r9, %rsi
>   movq%r8, 8(%rax)
>   movqst5(,%rdi,8), %rax
>   movq%r8, 16(%rax)
>   cmpq%rsi, %rdx
>   ja  .L13
> which I assume shall be an improvement, since we can save one extra
> comparison.
> 
> 2018-03-27  Jakub Jelinek  
>   Richard Biener  
> 
>   PR rtl-optimization/78200
>   * cfgexpand.c (gimple_bb_info_for_bb): New variable.
>   (expand_bb_seq, expand_phi_nodes): New functions.
>   (expand_gimple_cond): Use heuristics if it is desirable to
>   swap TRUTH_{AND,OR}IF_EXPR operands.
>   (expand_gimple_basic_block): Remember GIMPLE il for bbs
>   being expanded or already expanded.
>   (pass_expand::execute): Initialize and then free the
>   gimple_bb_info_for_bb vector.
> 
> --- gcc/cfgexpand.c.jj2018-02-09 06:44:36.413805556 +0100
> +++ gcc/cfgexpand.c   2018-03-26 13:35:57.536509844 +0200
> @@ -2357,6 +2357,34 @@ label_rtx_for_bb (basic_block bb ATTRIBU
>return l;
>  }
> 
> +/* Maps blocks to their GIMPLE IL.  */
> +static vec *gimple_bb_info_for_bb;
> +
> +/* Like bb_seq, except that during expansion returns the GIMPLE seq
> +   even for the blocks that have been already expanded or are being
> +   currently expanded.  */
> +
> +static gimple_seq
> +expand_bb_seq (basic_block bb)
> +{
> +  if ((bb->flags & BB_RTL)
> +  && (unsigned) bb->index < vec_safe_length (gimple_bb_info_for_bb))
> +return (*gimple_bb_info_for_bb)[bb->index].seq;
> +  return bb_seq (bb);
> +}
> +
> +/* Like phi_nodes, except that during expansion returns the GIMPLE PHIs
> +   even for the blocks that have been already expanded or are being
> +   currently expanded.  */
> +
> +static gimple_seq
> +expand_phi_nodes (basic_block bb)
> +{
> +  if ((bb->flags & BB_RTL)
> +  && (unsigned) bb->index < vec_safe_length (gimple_bb_info_for_bb))
> +return (*gimple_bb_info_for_bb)[bb->index].phi_nodes;
> +  return phi_nodes (bb);
> +}
> 
>  /* A subroutine of expand_gimple_cond.  Given E, a fallthrough edge
> of a basic block where we just expanded the conditional at the end, @@ -
> 2475,6 +2503,65 @@ expand_gimple_cond (basic_block bb, gcon
> op0 = gimple_assign_rhs1 (second);
> op1 = gimple_assign_rhs2 (second);
>   }
> +
> +   /* We'll expand RTL for op0 first, see if we'd better expand RTL
> +  for op1 first.  Do that if the previous bb ends with
> +  if (x op cst), op1's def_stmt rhs is x op2 cst and there are
> +  no non-virtual PHIs nor non-TERed stmts in BB before STMT.
> */
> +   while (TREE_CODE (op1) == SSA_NAME
> +  && (code == TRUTH_ANDIF_EXPR || code ==
> TRUTH_ORIF_EXPR)
> +  && single_pred_p (bb))
> + {
> +   

[PATCH] Improve TRUTH_{AND,OR}IF_EXPR expansion (PR rtl-optimization/78200)

2018-03-27 Thread Jakub Jelinek
Hi!

The following patch attempts to improve expansion, if we have code like:
   [local count: 102513059]:
  if_conversion_var_52 = MEM[base: st1_22, offset: 0B];
  if (if_conversion_var_52 < 0)
goto ; [41.00%]
  else
goto ; [59.00%]

...

   [local count: 60482706]:
  _81 = _11 == 2;
  _82 = if_conversion_var_52 > 0;
  _79 = _81 & _82;
  if (_79 != 0)
goto ; [29.26%]
  else
goto ; [70.74%]

Here, the single pred of the bb performed a similar comparison to what
one of the & (or |) operands does, and as & (or |) is not ordered, we can
choose which operand we'll expand first.  If we expand if_conversion_var_52 > 0
first, there is a chance that we can reuse flags from the previous
comparison.  The patch does it only if there are no (non-virtual) phis on the
current bb, all stmts before the current condition are TERable, so there is
nothing that would clobber the flags in between.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk if it
helps for SPEC?  Venkat, do you think you could benchmark it in the setup
where you've measured the slowdown to see if it helps?  I see the patch
changes the loop:
.p2align 4,,10
.p2align 3
 .L11:
+   jle .L10
cmpl$2, %eax
-   jne .L10
-   testq   %r8, %r8
-   jg  .L12
+   je  .L12
.p2align 4,,10
.p2align 3
 .L10:
addq%r9, %rsi
cmpq%rsi, %rdx
jbe .L35
 .L13:
movl24(%rsi), %eax
testl   %eax, %eax
jle .L10
movq(%rsi), %r8
testq   %r8, %r8
jns .L11
cmpl$1, %eax
jne .L10
 .L12:
addq$1, %r10
movl$1, %r11d
movqst5(,%r10,8), %rax
movq%rsi, (%rax)
addq%r9, %rsi
movq%r8, 8(%rax)
movqst5(,%rdi,8), %rax
movq%r8, 16(%rax)
cmpq%rsi, %rdx
ja  .L13
which I assume shall be an improvement, since we can save one extra
comparison.

2018-03-27  Jakub Jelinek  
Richard Biener  

PR rtl-optimization/78200
* cfgexpand.c (gimple_bb_info_for_bb): New variable.
(expand_bb_seq, expand_phi_nodes): New functions.
(expand_gimple_cond): Use heuristics if it is desirable to
swap TRUTH_{AND,OR}IF_EXPR operands.
(expand_gimple_basic_block): Remember GIMPLE il for bbs
being expanded or already expanded.
(pass_expand::execute): Initialize and then free the
gimple_bb_info_for_bb vector.

--- gcc/cfgexpand.c.jj  2018-02-09 06:44:36.413805556 +0100
+++ gcc/cfgexpand.c 2018-03-26 13:35:57.536509844 +0200
@@ -2357,6 +2357,34 @@ label_rtx_for_bb (basic_block bb ATTRIBU
   return l;
 }
 
+/* Maps blocks to their GIMPLE IL.  */
+static vec *gimple_bb_info_for_bb;
+
+/* Like bb_seq, except that during expansion returns the GIMPLE seq
+   even for the blocks that have been already expanded or are being
+   currently expanded.  */
+
+static gimple_seq
+expand_bb_seq (basic_block bb)
+{
+  if ((bb->flags & BB_RTL)
+  && (unsigned) bb->index < vec_safe_length (gimple_bb_info_for_bb))
+return (*gimple_bb_info_for_bb)[bb->index].seq;
+  return bb_seq (bb);
+}
+
+/* Like phi_nodes, except that during expansion returns the GIMPLE PHIs
+   even for the blocks that have been already expanded or are being
+   currently expanded.  */
+
+static gimple_seq
+expand_phi_nodes (basic_block bb)
+{
+  if ((bb->flags & BB_RTL)
+  && (unsigned) bb->index < vec_safe_length (gimple_bb_info_for_bb))
+return (*gimple_bb_info_for_bb)[bb->index].phi_nodes;
+  return phi_nodes (bb);
+}
 
 /* A subroutine of expand_gimple_cond.  Given E, a fallthrough edge
of a basic block where we just expanded the conditional at the end,
@@ -2475,6 +2503,65 @@ expand_gimple_cond (basic_block bb, gcon
  op0 = gimple_assign_rhs1 (second);
  op1 = gimple_assign_rhs2 (second);
}
+
+ /* We'll expand RTL for op0 first, see if we'd better expand RTL
+for op1 first.  Do that if the previous bb ends with
+if (x op cst), op1's def_stmt rhs is x op2 cst and there are
+no non-virtual PHIs nor non-TERed stmts in BB before STMT.  */
+ while (TREE_CODE (op1) == SSA_NAME
+&& (code == TRUTH_ANDIF_EXPR || code == TRUTH_ORIF_EXPR)
+&& single_pred_p (bb))
+   {
+ gimple *def1 = SSA_NAME_DEF_STMT (op1);
+ if (!is_gimple_assign (def1)
+ || (TREE_CODE_CLASS (gimple_assign_rhs_code (def1))
+ != tcc_comparison))
+   break;
+
+ basic_block pred = single_pred (bb);
+ gimple_seq pred_seq = expand_bb_seq (pred);
+ gimple_stmt_iterator i = gsi_last (pred_seq);
+ if (!gsi_end_p (i) && is_gimple_debug (gsi_stmt (i)))

[C++ PATCH] Fix ICE on offsetof with volatile struct and static data member array ref (PR c++/85061)

2018-03-27 Thread Jakub Jelinek
Hi!

The following testcase ICEs, because we assert that we see a COMPOUND_EXPR
only for static data member in a volatile struct, but as the testcase shows,
we can see it also if using some component of the static data member.

Fixed by using get_base_address, plus, as the check isn't as cheap as
before, turn the assert into a checking assert only.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-03-27  Jakub Jelinek  

PR c++/85061
* c-common.c (fold_offsetof_1) : Assert that
get_base_address of the second operand is a VAR_P, rather than the
operand itself, and use gcc_checking_assert instead of gcc_assert.

* g++.dg/ext/builtin-offsetof3.C: New test.

--- gcc/c-family/c-common.c.jj  2018-03-13 00:38:23.809662252 +0100
+++ gcc/c-family/c-common.c 2018-03-24 15:21:36.171485128 +0100
@@ -6272,7 +6272,7 @@ fold_offsetof_1 (tree expr, enum tree_co
 case COMPOUND_EXPR:
   /* Handle static members of volatile structs.  */
   t = TREE_OPERAND (expr, 1);
-  gcc_assert (VAR_P (t));
+  gcc_checking_assert (VAR_P (get_base_address (t)));
   return fold_offsetof_1 (t);
 
 default:
--- gcc/testsuite/g++.dg/ext/builtin-offsetof3.C.jj 2018-03-26 
11:54:54.338627270 +0200
+++ gcc/testsuite/g++.dg/ext/builtin-offsetof3.C2018-03-26 
11:54:07.992610454 +0200
@@ -0,0 +1,14 @@
+// PR c++/85061
+// { dg-do compile }
+
+struct B { int a, b; };
+struct A
+{
+  static int x[2];
+  static int y;
+  static B z;
+};
+
+int i = __builtin_offsetof (volatile A, x[0]); // { dg-error "cannot apply 
'offsetof' to static data member 'A::x'" }
+int j = __builtin_offsetof (volatile A, y);// { dg-error "cannot apply 
'offsetof' to static data member 'A::y'" }
+int k = __builtin_offsetof (volatile A, z.a);  // { dg-error "cannot apply 
'offsetof' to a non constant address" }

Jakub


[C++ PATCH] Fix ICE in cp_build_reference_type (PR c++/85076)

2018-03-27 Thread Jakub Jelinek
Hi!

Both build_{reference,pointer}_type start with if (to_type ==
error_mark_node) return error_mark_node;

cp_build_reference_type uses build_reference_type, so in many cases it will
just return error_mark_node if it is passed, but if rval is true, it will
assume build_reference_type returned some REFERENCE_TYPE instead.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2018-03-27  Jakub Jelinek  

PR c++/85076
* tree.c (cp_build_reference_type): If to_type is error_mark_node,
return it right away.

* g++.dg/cpp1y/pr85076.C: New test.

--- gcc/cp/tree.c.jj2018-03-21 21:18:31.738351376 +0100
+++ gcc/cp/tree.c   2018-03-26 11:22:47.067967708 +0200
@@ -1078,6 +1078,9 @@ cp_build_reference_type (tree to_type, b
 {
   tree lvalue_ref, t;
 
+  if (to_type == error_mark_node)
+return error_mark_node;
+
   if (TREE_CODE (to_type) == REFERENCE_TYPE)
 {
   rval = rval && TYPE_REF_IS_RVALUE (to_type);
--- gcc/testsuite/g++.dg/cpp1y/pr85076.C.jj 2018-03-26 11:26:55.725047985 
+0200
+++ gcc/testsuite/g++.dg/cpp1y/pr85076.C2018-03-26 11:26:41.807043494 
+0200
@@ -0,0 +1,6 @@
+// PR c++/85076
+// { dg-do compile { target c++14 } }
+
+template struct A*;  // { dg-error "expected unqualified-id before" }
+
+auto a = [](A) {};   // { dg-error "is not a template|has incomplete 
type" }

Jakub


[C++ PATCH] Fix invalid covariant return error-recovery (PR c++/85068)

2018-03-27 Thread Jakub Jelinek
Hi!

As the comment says, in a valid program we wals find thunk_binfo, but if the
covariancy is invalid, we've already diagnosed error and we might not find
it.  We have case to handle thunk_binfo NULL or not finding it in the chain,
but on the following testcase base_binfo is NULL and we ICE when we try to
access BINFO_TYPE on it.

Fixed thusly, furthermore to match the comment I've added an assertion that
if we don't find thunk_binfo we've indeed already diagnosed an error.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-03-27  Jakub Jelinek  

PR c++/85068
* class.c (update_vtable_entry_for_fn): Don't ICE if base_binfo
is NULL.  Assert if thunk_binfo is NULL then errorcount is non-zero.

* g++.dg/inherit/covariant22.C: New test.

--- gcc/cp/class.c.jj   2018-03-21 21:18:31.691351383 +0100
+++ gcc/cp/class.c  2018-03-26 10:48:02.648053297 +0200
@@ -2479,19 +2479,20 @@ update_vtable_entry_for_fn (tree t, tree
 order.  Of course it is lame that we have to repeat the
 search here anyway -- we should really be caching pieces
 of the vtable and avoiding this repeated work.  */
- tree thunk_binfo, base_binfo;
+ tree thunk_binfo = NULL_TREE;
+ tree base_binfo = TYPE_BINFO (base_return);
 
  /* Find the base binfo within the overriding function's
 return type.  We will always find a thunk_binfo, except
 when the covariancy is invalid (which we will have
 already diagnosed).  */
- for (base_binfo = TYPE_BINFO (base_return),
-  thunk_binfo = TYPE_BINFO (over_return);
-  thunk_binfo;
-  thunk_binfo = TREE_CHAIN (thunk_binfo))
-   if (SAME_BINFO_TYPE_P (BINFO_TYPE (thunk_binfo),
-  BINFO_TYPE (base_binfo)))
- break;
+ if (base_binfo)
+   for (thunk_binfo = TYPE_BINFO (over_return); thunk_binfo;
+thunk_binfo = TREE_CHAIN (thunk_binfo))
+ if (SAME_BINFO_TYPE_P (BINFO_TYPE (thunk_binfo),
+BINFO_TYPE (base_binfo)))
+   break;
+ gcc_assert (thunk_binfo || errorcount);
 
  /* See if virtual inheritance is involved.  */
  for (virtual_offset = thunk_binfo;
--- gcc/testsuite/g++.dg/inherit/covariant22.C.jj   2018-03-26 
10:51:59.580172775 +0200
+++ gcc/testsuite/g++.dg/inherit/covariant22.C  2018-03-26 10:49:21.038092826 
+0200
@@ -0,0 +1,19 @@
+// PR c++/85068
+// { dg-do compile }
+
+struct A;
+
+struct B
+{
+  virtual A *foo ();   // { dg-error "overriding" }
+};
+
+struct C : virtual B
+{
+  virtual C *foo ();   // { dg-error "invalid covariant return type for" }
+};
+
+struct D : C
+{
+  virtual C *foo ();
+};

Jakub


[C++ Patch] PR 85067 ("[8 Regression] ICE with volatile parameter in defaulted copy-constructor")

2018-03-27 Thread Paolo Carlini

Hi,

Volker noticed that a tweak I committed back in September, which tidied 
the diagnostic we produce in C++11 mode for the testcase in c++/68754 
causes this error recovery regression. We could try restoring the 
consistency, for example along the lines of the patchlet I posted on the 
audit trail (passes testing) but, for 8.1.0 at least, I propose to 
simply revert that change. Tested x86_64-linux.


Thanks, Paolo.

//

/cp
2018-03-27  Paolo Carlini  

PR c++/85067
* method.c (defaulted_late_check): Partially revert r253321 changes,
do not early return upon error.

/testsuite
2018-03-27  Paolo Carlini  

PR c++/85067
* g++.dg/cpp0x/defaulted51.C: New.
* g++.dg/cpp0x/constexpr-68754.C: Adjust.
Index: cp/method.c
===
--- cp/method.c (revision 258870)
+++ cp/method.c (working copy)
@@ -2189,7 +2189,6 @@ defaulted_late_check (tree fn)
 "expected signature", fn);
   inform (DECL_SOURCE_LOCATION (fn),
  "expected signature: %qD", implicit_fn);
-  return;
 }
 
   if (DECL_DELETED_FN (implicit_fn))
Index: testsuite/g++.dg/cpp0x/constexpr-68754.C
===
--- testsuite/g++.dg/cpp0x/constexpr-68754.C(revision 258870)
+++ testsuite/g++.dg/cpp0x/constexpr-68754.C(working copy)
@@ -3,5 +3,5 @@
 
 struct base { };
 struct derived : base {
-  constexpr derived& operator=(derived const&) = default; // { dg-error 
"defaulted declaration" "" { target { ! c++14 } } }
+  constexpr derived& operator=(derived const&) = default; // { dg-error 
"defaulted" "" { target { ! c++14 } } }
 };
Index: testsuite/g++.dg/cpp0x/defaulted51.C
===
--- testsuite/g++.dg/cpp0x/defaulted51.C(nonexistent)
+++ testsuite/g++.dg/cpp0x/defaulted51.C(working copy)
@@ -0,0 +1,15 @@
+// PR c++/85067
+// { dg-do compile { target c++11 } }
+
+template struct A
+{
+  A();
+  A(volatile A&) = default;  // { dg-error "defaulted" }
+};
+
+struct B
+{
+  A<0> a;
+};
+
+B b;


Re: [PATCH, rtl] Fix PR84878: Segmentation fault in add_cross_iteration_register_deps

2018-03-27 Thread Richard Biener
On Mon, 26 Mar 2018, Peter Bergner wrote:

> PR84878 shows an example where we segv while creating data dependence edges
> for SMS.
> 
> ddg.c:add_cross_iteration_register_deps():
> 
>   /* Create inter-loop true dependences and anti dependences.  */
>   for (r_use = DF_REF_CHAIN (last_def); r_use != NULL; r_use = r_use->next)
> {
>   rtx_insn *use_insn = DF_REF_INSN (r_use->ref);
> segv's
> 
> We currently have:
> (gdb) pr def_insn
> (insn 331 321 332 12 (parallel [
> (set (reg:V4SI 239 [ vect__4.11 ])
> (unspec:V4SI [
> (reg:V4SF 134 [ vect_cst__39 ])
> (const_int 0 [0])
> ] UNSPEC_VCTSXS))
> (set (reg:SI 110 vscr)
> (unspec:SI [
> (const_int 0 [0])
> ] UNSPEC_SET_VSCR))
> ]) "bug.i":9 1812 {altivec_vctsxs}
>  (expr_list:REG_UNUSED (reg:V4SI 239 [ vect__4.11 ])
> (nil)))
> (gdb) p DF_REF_REGNO (last_def)
> $4 = 110
> 
> So we're looking at the definition of the VSCR hard register, which is a
> global register (ie, global_regs[110] == 1), but there are no following
> explicit uses of the VSCR reg, so:
> 
> (gdb) p DF_REF_INSN_INFO(r_use->ref)
> $5 = (df_insn_info *) 0x0
> 
> DF_REF_INSN(r_use->ref) deferences DF_REF_INSN_INFO(r_use->ref), so we segv.
> 
> The following patch fixes the problems by simply skiping over the "uses"
> that do not have insn info (ie, no explicit uses or artificial ones).
> 
> This passed bootstrap and regtesting with no regressions on powerpc64-linux.
> Ok for trunk?
> 
> Peter
> 
> 
> gcc/
>   PR rtl-optimization/84878
>   * ddg.c (add_cross_iteration_register_deps): Skip over uses that do
>   not correspond to explicit register references.
> 
> gcc/testsuite/
>   PR rtl-optimization/84878
>   * gcc.dg/pr84878.c: New test.
> 
> Index: gcc/ddg.c
> ===
> --- gcc/ddg.c (revision 258802)
> +++ gcc/ddg.c (working copy)
> @@ -295,6 +295,11 @@ add_cross_iteration_register_deps (ddg_p
>/* Create inter-loop true dependences and anti dependences.  */
>for (r_use = DF_REF_CHAIN (last_def); r_use != NULL; r_use = r_use->next)
>  {
> +  /* PR84878: Some definitions of global hard registers may not have
> +  any following uses or they may be artificial, so skip them.  */
> +  if (DF_REF_INSN_INFO (r_use->ref) == NULL)
> + continue;
> +

To me a better check would be DF_REF_IS_ARTIFICIAL (r_use->ref).  But
I'm not sure simply ignoring those will be correct?  In fact
artifical refs do have a basic-block, so

>rtx_insn *use_insn = DF_REF_INSN (r_use->ref);
>  
>if (BLOCK_FOR_INSN (use_insn) != g->bb)

should use DF_REF_BB (r_use->ref) instead of indirection through
DF_REF_INSN.  Still use_insn is used later but then if the
artificial ref is in side g->bb we should better give up here?
We don't seem to have use_nodes for these "non-insns".

Somebody with more insight on DF should chime in here and tell
me what those "artificial" refs are about ...  there's

/* If this flag is set for an artificial use or def, that ref
   logically happens at the top of the block.  If it is not set
   for an artificial use or def, that ref logically happens at the
   bottom of the block.  This is never set for regular refs.  */
DF_REF_AT_TOP = 1 << 1,

so this is kind-of global regs being live across all BBs?  This sounds
a bit stupid to me, but well ... IMHO those refs should be at
specific insns like calls.

So maybe, with a big fat comment, it is OK to ignore artificial
refs in this loop...

Richard.

> Index: gcc/testsuite/gcc.dg/pr84878.c
> ===
> --- gcc/testsuite/gcc.dg/pr84878.c(revision 0)
> +++ gcc/testsuite/gcc.dg/pr84878.c(working copy)
> @@ -0,0 +1,20 @@
> +/* PR rtl-optimization/84878 */
> +/* { dg-do compile { target { powerpc*-*-* } } } */
> +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
> "-mcpu=G5" } } */
> +/* { dg-require-effective-target powerpc_altivec_ok } */
> +/* { dg-options "-O2 -mcpu=G5 -fmodulo-sched -ftree-vectorize -funroll-loops 
> -fassociative-math -fno-signed-zeros -fno-trapping-math" } */
> +
> +int ek;
> +float zu;
> +
> +int
> +k5 (int ks)
> +{
> +  while (ek < 1)
> +{
> +  ks += (int)(0x100 + zu + !ek);
> +  ++ek;
> +}
> +
> +  return ks;
> +}
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH,nvptx] Fix PR85056

2018-03-27 Thread Tom de Vries

On 03/26/2018 11:57 PM, Cesar Philippidis wrote:

As noted in PR85056, the nvptx BE isn't declaring external arrays using
PTX array notation. Specifically, it's emitting code that's missing the
empty angle brackets '[]'. 


[ FYI, see https://en.wikipedia.org/wiki/Bracket

For '[]' I find "square brackets, closed brackets, hard brackets, third 
brackets, crotchets, or brackets (US)".


Angle brackets are different symbols. ]


This patch corrects that problem.

Tom, in contrast to my earlier patch in the PR, this patch only
considers external arrays. The patch I posted incorrectly handled
zero-length arrays and empty structs.

I tested this patch with a standalone nvptx toolchain using newlib 3.0,
and I found no new regressions. However I'm still waiting for the
results that are using the older version of newlib. Is this patch OK for
trunk if the results come back clean?



OK for stage4 trunk.

[ A minor style nit: in submission emails, rather than having the very 
specific but rather non-descriptive subject "Fix PR85056", move the PR 
number to "[PATCH,nvptx,PR85056]" and add a subject line that describes 
the nature of the patch, f.i.: "Fix declaration of external array with 
unknown size".


So, something like:
...
[PATCH,nvptx,PR85056] Fix declaration of external array with unknown size
...

Then, use the subject line as commit log header line (dropping "PATCH", 
and the PR number):

...
[nvptx] Fix declaration of external array with unknown size
...
]

Thanks,
- Tom


Thanks,
Cesar


nvptx-extern-arrays.diff


2018-03-26  Cesar Philippidis  

gcc/

PR target/85056
* config/nvptx/nvptx.c (nvptx_assemble_decl_begin): Add '[]' to
extern array declarations.

gcc/testsuite/
* testsuite/gcc.target/nvptx/pr85056.c: New test.
* testsuite/gcc.target/nvptx/pr85056a.c: New test.


diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 3cb33ae8c2d..38f25add6ab 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -2038,6 +2038,9 @@ static void
  nvptx_assemble_decl_begin (FILE *file, const char *name, const char *section,
   const_tree type, HOST_WIDE_INT size, unsigned align)
  {
+  bool atype = (TREE_CODE (type) == ARRAY_TYPE)
+&& (TYPE_DOMAIN (type) == NULL_TREE);
+
while (TREE_CODE (type) == ARRAY_TYPE)
  type = TREE_TYPE (type);
  
@@ -2077,6 +2080,8 @@ nvptx_assemble_decl_begin (FILE *file, const char *name, const char *section,

  /* We make everything an array, to simplify any initialization
 emission.  */
  fprintf (file, "[" HOST_WIDE_INT_PRINT_DEC "]", init_frag.remaining);
+  else if (atype)
+fprintf (file, "[]");
  }
  
  /* Called when the initializer for a decl has been completely output through

diff --git a/gcc/testsuite/gcc.target/nvptx/pr85056.c 
b/gcc/testsuite/gcc.target/nvptx/pr85056.c
new file mode 100644
index 000..fe7f8af856e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/pr85056.c
@@ -0,0 +1,20 @@
+/* { dg-do run } */
+/* { dg-additional-sources "pr85056a.c" } */
+
+extern void abort ();
+
+extern int a[];
+
+int
+main ()
+{
+  int i, sum;
+
+  for (i = 0; i < 10; i++)
+sum += a[i];
+
+  if (sum != 55)
+abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/nvptx/pr85056a.c 
b/gcc/testsuite/gcc.target/nvptx/pr85056a.c
new file mode 100644
index 000..a45a5f2b07f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/pr85056a.c
@@ -0,0 +1,3 @@
+/* { dg-skip-if "" { *-*-* } } */
+
+int a[10] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };





Re: [RFC PATCH for 9] rs6000: Ordered comparisons (PR56864)

2018-03-27 Thread Uros Bizjak
Hello!

+(define_insn "*cmpdd_cmpo"
+  [(set (match_operand:CCFP 0 "cc_reg_operand" "=y")
+ (compare:CCFP (match_operand:DD 1 "gpc_reg_operand" "d")
+  (match_operand:DD 2 "gpc_reg_operand" "d")))
+   (unspec [(match_dup 1) (match_dup 2)] UNSPEC_CMPO)]
+  "TARGET_DFP"
+  "dcmpo %0,%1,%2"
+  [(set_attr "type" "dfp")])

I have had some problems when adding UNSPEC tags as a parallel to a
compare for x86. For the testcase:

int testo (double a, double b)
{
  return a == b;
}

middle end code emits sequence like:

(insn 7 4 8 2 (set (reg:CCFP 17 flags)
(unspec:CCFP [
(compare:CCFP (reg/v:DF 89 [ a ])
(reg/v:DF 90 [ b ]))
] UNSPEC_NOTRAP)) "cmpdf.c":3 -1
 (nil))
(insn 8 7 9 2 (set (reg:QI 96)
(ordered:QI (reg:CCFP 17 flags)
(const_int 0 [0]))) "cmpdf.c":3 -1
 (nil))
(insn 9 8 11 2 (set (reg:SI 95)
(zero_extend:SI (reg:QI 96))) "cmpdf.c":3 -1
 (nil))
(insn 11 9 10 2 (set (reg:SI 97)
(const_int 0 [0])) "cmpdf.c":3 -1
 (nil))
(insn 10 11 12 2 (set (reg:CCFP 17 flags)
(unspec:CCFP [
(compare:CCFP (reg/v:DF 89 [ a ])
(reg/v:DF 90 [ b ]))
] UNSPEC_NOTRAP)) "cmpdf.c":3 -1
 (nil))
(insn 12 10 13 2 (set (reg:SI 92)
(if_then_else:SI (uneq (reg:CCFP 17 flags)
(const_int 0 [0]))
(reg:SI 95)
(reg:SI 97))) "cmpdf.c":3 -1
 (nil))

and postreload pass removes (insn 10). This was not the case when the
compare was implemented with a parallel.

Also, -ffast-math on x86 emits trapping compares for all cases. For
that reason, unordered (non-trapping) compares were wrapped in an
unspec, with the expectation that -ffast-math can perform some more
optimizations with patterns using naked compare RTX without unspec.

Uros.


Re: [C++ Patch] Fix confusing diagnostics for invalid overrides

2018-03-27 Thread Volker Reichelt

On 03/26/2018 01:19 PM, Nathan Sidwell wrote:

On 03/25/2018 10:18 AM, Volker Reichelt wrote:


Index: gcc/cp/search.c
===
--- gcc/cp/search.c    (revision 258835)
+++ gcc/cp/search.c    (working copy)
@@ -1918,12 +1918,14 @@
    if (fail == 1)
  {
    error ("invalid covariant return type for %q+#D", overrider);
-      error ("  overriding %q+#D", basefn);
+      inform (DECL_SOURCE_LOCATION (basefn),
+          "  overriding %q+#D", basefn);


In addtion to Paolo's comments, the new inform doesn't need the 
indentation.  Perhaps reword it to something like


"overridden function is %qD"

I.e. a more stand-alone message than a continuation of the error.

nathan


How about the following then?
I rephrased the last three errors a little to really make them 
stand-alone errors.


Again, bootstrapped and regtested.
OK for trunk?


2018-03-27  Volker Reichelt 

    * search.c (check_final_overrider): Use inform instead of error
    for the diagnostics of the overridden functions.  Tweak wording.

Index: gcc/cp/search.c
===
--- gcc/cp/search.c    (revision 258860)
+++ gcc/cp/search.c    (working copy)
@@ -1904,7 +1904,7 @@
   if (pedwarn (DECL_SOURCE_LOCATION (overrider), 0,
        "invalid covariant return type for %q#D", overrider))
     inform (DECL_SOURCE_LOCATION (basefn),
-            "  overriding %q#D", basefn);
+            "overridden function is %q#D", basefn);
 }
   else
 fail = 2;
@@ -1918,12 +1918,14 @@
   if (fail == 1)
 {
   error ("invalid covariant return type for %q+#D", overrider);
-      error ("  overriding %q+#D", basefn);
+      inform (DECL_SOURCE_LOCATION (basefn),
+          "overridden function is %q#D", basefn);
 }
   else
 {
   error ("conflicting return type specified for %q+#D", overrider);
-      error ("  overriding %q+#D", basefn);
+      inform (DECL_SOURCE_LOCATION (basefn),
+          "overridden function is %q#D", basefn);
 }
   DECL_INVALID_OVERRIDER_P (overrider) = 1;
   return 0;
@@ -1938,7 +1940,8 @@
   if (!comp_except_specs (base_throw, over_throw, ce_derived))
 {
   error ("looser throw specifier for %q+#F", overrider);
-  error ("  overriding %q+#F", basefn);
+  inform (DECL_SOURCE_LOCATION (basefn),
+      "overridden function is %q#F", basefn);
   DECL_INVALID_OVERRIDER_P (overrider) = 1;
   return 0;
 }
@@ -1950,7 +1953,8 @@
   && !tx_safe_fn_type_p (over_type))
 {
   error ("conflicting type attributes specified for %q+#D", 
overrider);

-  error ("  overriding %q+#D", basefn);
+  inform (DECL_SOURCE_LOCATION (basefn),
+      "overridden function is %q#D", basefn);
   DECL_INVALID_OVERRIDER_P (overrider) = 1;
   return 0;
 }
@@ -1974,21 +1978,26 @@
 {
   if (DECL_DELETED_FN (overrider))
 {
-      error ("deleted function %q+D", overrider);
-      error ("overriding non-deleted function %q+D", basefn);
+      error ("deleted function %q+D overriding non-deleted function",
+         overrider);
+      inform (DECL_SOURCE_LOCATION (basefn),
+          "overridden function is %qD", basefn);
   maybe_explain_implicit_delete (overrider);
 }
   else
 {
-      error ("non-deleted function %q+D", overrider);
-      error ("overriding deleted function %q+D", basefn);
+      error ("non-deleted function %q+D overriding deleted function",
+         overrider);
+      inform (DECL_SOURCE_LOCATION (basefn),
+          "overridden function is %qD", basefn);
 }
   return 0;
 }
   if (DECL_FINAL_P (basefn))
 {
-  error ("virtual function %q+D", overrider);
-  error ("overriding final function %q+D", basefn);
+  error ("virtual function %q+D overriding final function", overrider);
+  inform (DECL_SOURCE_LOCATION (basefn),
+      "overridden function is %qD", basefn);
   return 0;
 }
   return 1;
===

2018-03-27  Volker Reichelt 

    * g++.dg/cpp0x/defaulted2.C: Use dg-message instead of dg-error
    for the diagnostics of overridden functions.  Adjust for new wording.
    * g++.dg/cpp0x/implicit1.C: Likewise.
    * g++.dg/cpp0x/override1.C: Likewise.
    * g++.dg/cpp1y/auto-fn18.C: Likewise.
    * g++.dg/eh/shadow1.C: Likewise.
    * g++.dg/inherit/covariant12.C: Likewise.
    * g++.dg/inherit/covariant14.C: Likewise.
    * g++.dg/inherit/covariant15.C: Likewise.
    * g++.dg/inherit/covariant16.C: Likewise.
    * g++.dg/inherit/crash3.C: Likewise.
    * g++.dg/inherit/error2.C: Likewise.
    * g++.dg/template/crash100.C: Likewise.
    * g++.old-deja/g++.eh/spec6.C: Likewise.
    * g++.old-deja/g++.mike/p811.C: Likewise.
    * g++.old-deja/g++.other/virtual11.C: Likewise.
    * g++.old-deja/g++.other/virtual4.C: Likewise.

Index: gcc/testsuite/g++.dg/cpp0x/defaulted2.C
==