Re: [PATCH] Workaround glibc <= 2.23 nextafterl/nexttowardl bug (PR tree-optimization/85699)

2018-05-09 Thread Richard Biener
On May 9, 2018 10:52:05 PM GMT+02:00, Jakub Jelinek  wrote:
>Hi!
>
>glibc <= 2.23 has buggy nextafterl/nexttowardl as can be seen on the
>nextafter-2.c testcase.
>
>Do we want to workaround this bug, e.g. with the following patch?

Works for me. Was the reason to test the target libc to test the compare 
against arithmetic? 

Thanks, 
Richard. 

>Regtested on x86_64-linux (with glibc 2.26).  Ok for trunk?
>
>2018-05-09  Jakub Jelinek  
>
>   PR tree-optimization/85699
>   * gcc.dg/nextafter-1.c (NO_LONG_DOUBLE): Define if not defined.  Use
>   !NO_LONG_DOUBLE instead of __LDBL_MANT_DIG__ != 106.
>   * gcc.dg/nextafter-2.c: Include stdlib.h.  For glibc < 2.24 define
>   NO_LONG_DOUBLE to 1 before including nextafter-1.c.
>
>--- gcc/testsuite/gcc.dg/nextafter-1.c.jj  2018-05-06 23:12:48.952619545
>+0200
>+++ gcc/testsuite/gcc.dg/nextafter-1.c 2018-05-09 14:58:53.694198614
>+0200
>@@ -20,6 +20,9 @@ long double nexttowardl (long double, lo
> #ifndef NEED_EXC
> #define NEED_EXC 0
> #endif
>+#ifndef NO_LONG_DOUBLE
>+#define NO_LONG_DOUBLE (__LDBL_MANT_DIG__ == 106)
>+#endif
> 
> #define TEST(name, fn, type, L1, L2, l1, l2, MIN1, \
>MAX1, DENORM_MIN1, EPSILON1, MIN2, MAX2, DENORM_MIN2)   \
>@@ -129,7 +132,7 @@ TEST (test1, nextafterf, float, F, F, f,
> TEST (test2, nextafter, double, , , , , __DBL_MIN__, __DBL_MAX__,
>   __DBL_DENORM_MIN__, __DBL_EPSILON__, __DBL_MIN__, __DBL_MAX__,
>   __DBL_DENORM_MIN__)
>-#if __LDBL_MANT_DIG__ != 106
>+#if !NO_LONG_DOUBLE
>TEST (test3, nextafterl, long double, L, L, l, l, __LDBL_MIN__,
>__LDBL_MAX__,
> __LDBL_DENORM_MIN__, __LDBL_EPSILON__, __LDBL_MIN__, __LDBL_MAX__,
>   __LDBL_DENORM_MIN__)
>@@ -149,7 +152,7 @@ main ()
> {
>   test1 ();
>   test2 ();
>-#if __LDBL_MANT_DIG__ != 106
>+#if !NO_LONG_DOUBLE
>   test3 ();
>   test4 ();
>   test5 ();
>--- gcc/testsuite/gcc.dg/nextafter-2.c.jj  2018-05-08 13:56:38.265930160
>+0200
>+++ gcc/testsuite/gcc.dg/nextafter-2.c 2018-05-09 14:59:45.527245803
>+0200
>@@ -5,4 +5,13 @@
> /* { dg-add-options ieee } */
> /* { dg-add-options c99_runtime } */
> 
>+#include 
>+
>+#if defined(__GLIBC__) && defined(__GLIBC_PREREQ)
>+# if !__GLIBC_PREREQ (2, 24)
>+/* Workaround buggy nextafterl in glibc 2.23 and earlier,
>+   see https://sourceware.org/bugzilla/show_bug.cgi?id=20205  */
>+#  define NO_LONG_DOUBLE 1
>+# endif
>+#endif
> #include "nextafter-1.c"
>
>   Jakub



Fix PR85726 (div-div suboptimization) and a rant on match.pd :s-flag

2018-05-09 Thread Hans-Peter Nilsson
Replacing a division feeding a division helps only when the
second division is the only user, and "fusing" the divisions is
downright bad if another user of the result of first division is
a modulus of the same value as the second division, forming a
divmod pair.  See the test-case, where for the tested
architectures (which all fail the test-case before the patch)
the div and mod are implemented using the high-part of a widened
multiplication and shift, emitted separately but combined as
late as in rtl, with the multiplicaton and shift re-used.  That
of course does not happen if later passes see (y / 48; y % 3).
While in match.pd, I noticed the corresponding mul-mul match,
which I believe should be guarded the same way.

I noticed this spot investigating performance regressions for
mipsisa32r2el-linux-gnu comparing to gcc-4.7-era code generated
for a compute-intensive library.  I'd say the pattern in the
test-case is common in cases like implementing a 3x3 filter
using SIMD with a vector size 16.  The suboptimization was more
of an (the first, or second) eye-catcher in a hot degraded
function than an actual cause though; fixing it seems to amount
for no more than 2% (where there's a 13% regression) in that
function.  As a contrast, I see e.g. four times as many local
call-saved registers used for some loop-intensive functions, but
I can't dive into the larger problem, at least not right now.
(And no, it's not LRA, AFAICT.)

Tested using the gcc-8.1.0 release on aarch64-unknown-linux-gnu,
powerpc64-unknown-linux-gnu, x86_64-pc-linux-gnu and partly a
cross to mipsisa32r2el-linux-gnu.

The patch is generated using "-w" because otherwise
re-indentation for the single_use-wrapping makes the diff
practically unreadable.  The test-case uses rtl-scanning so the
result can be verified closer to the actual code but still not
restricting it to assembly code.  Scanning just after the (last)
match.pd pass would still be far from the divmod-combining
effects done in rtl.  Unfortunately, the pattern for the
truncated multiplication is observable in various numbers
depending on both the target and the bug, so to avoid littering
I restrict it to my target of interest at the moment.  At least
the presence and absence of the "1/48"-constant is stable across
the line with/without the bug with no false positives.

This is a optimization regression counting from at least 4.9.2,
most likely since r218211, so: ok for for all open branches?

Please also see the ":s"-rant after the patch.

gcc/testsuite:
PR tree-optimization/85726
* pr85726.c: New test.

gcc:
* match.pd (div (div C1) C2): Guard with single_use of inner div.
(mult (mult C1) C2): Similar.

--- /dev/null   Tue Oct 29 15:57:07 2002
+++ gcc.dg/pr85726.cTue May  8 07:18:19 2018
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-fwprop1" } */
+
+/* There should just be one mult-as-div sequence, re-used for the mod,
+   not one for each of the y / 3 and y % 3 (as in due to suboptimal
+   simplification as y / 48 and y % 3) and there should no be a trace of
+   the constant used for the 1 / 48 mult-as-div. */
+int ww, vv;
+int x(int y)
+{
+  int z = y / 16;
+  int w = z / 3;
+  int v = z % 3;
+  ww = w;
+  return v;
+}
+/* { dg-final { scan-rtl-dump-times "truncate:SI .lshiftrt:DI .mult:DI 
.sign_extend:DI" 1 "fwprop1" { target mips*-*-* } } }  */
+/* { dg-final { scan-rtl-dump-times " .0x2aab" 0 "fwprop1" } }  */

diff --git a/gcc/match.pd b/gcc/match.pd
index 0de4432..e8ebeac 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -278,11 +278,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   && TYPE_UNSIGNED (type))
   (trunc_div @0 @1)))
 
-/* Combine two successive divisions.  Note that combining ceil_div
-   and floor_div is trickier and combining round_div even more so.  */
+/* Combine two successive divisions, if the second is the only
+   user of the result of the first, or else we'll just end up
+   with two divisions anyway.  Note that combining ceil_div and
+   floor_div is trickier and combining round_div even more so.  */
 (for div (trunc_div exact_div)
  (simplify
-  (div (div @0 INTEGER_CST@1) INTEGER_CST@2)
+  (div (div@3 @0 INTEGER_CST@1) INTEGER_CST@2)
+  (if (single_use (@3))
(with {
  bool overflow_p;
  wide_int mul = wi::mul (wi::to_wide (@1), wi::to_wide (@2),
@@ -292,12 +295,13 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (div @0 { wide_int_to_tree (type, mul); })
  (if (TYPE_UNSIGNED (type)
  || mul != wi::min_value (TYPE_PRECISION (type), SIGNED))
- { build_zero_cst (type); })
+  { build_zero_cst (type); }))
 
 /* Combine successive multiplications.  Similar to above, but handling
overflow is different.  */
 (simplify
- (mult (mult @0 INTEGER_CST@1) INTEGER_CST@2)
+ (mult (mult@3 @0 INTEGER_CST@1) INTEGER_CST@2)
+ (if (single_use (@3))
   (with {
 bool overflow_p;
 wide_int mul = wi::mul (wi::to_wide (@1), wi::to_wide (@2),
@@ -306,7 +310,7 

Re: [PATCH, rs6000] Map dcbtst, dcbtt to n2=0 for __builtin_prefetch builtin.

2018-05-09 Thread Segher Boessenkool
Hi,

On Tue, May 08, 2018 at 05:04:33PM -0700, Carl Love wrote:
> On Tue, 2018-05-08 at 11:24 -0500, Segher Boessenkool wrote:
> > What ISA version is required for the TH field to do anything?  Will
> > it work on older machines too (just ignored)?  What assembler version
> > is required?
> 
> I went back and checked.  The mnemonics for 
> 
>   dcbtt RA,RB  dcbt for TH value of 0b1
>   dcbtstt RA,RB dcbtst for TH value of 0b1.
> 
> were introduced in ISA 2.06.

So we need a check for that.  TH=0b1 is new in ISA 2.06 as well.
For older CPUs the hint is undefined.

I think we can assume that all assemblers for which we can compile with
-mcpu=power7 will support the 3-argument form of dcbt, and have dcbtt etc.
as well.  So you just need to test for TARGET_POPCNTD I think?

> There is another pair of mnemonics 
> 
>   dcbtds RA,RB,TH   dcbt for TH values of 0b0 or
>     0b01000 - 0b0;
>     other TH values are invalid.
> 
>   dcbtstds RA,RB,TH  dcbtst for TH values of 0b0
>          or 0b01000 - 0b01010;
>      other TH values are invalid.
> 
> that could be used instead.  These are both supported starting with 
> ISA 2.05.  The dcbtds is actually supported back to ISA 2.03 but the
> dcbtstds is not.

Those do not cover 0b1.

> #ifndef HAVE_AS_POPCNTB   
>   
> #undef  TARGET_POPCNTB
>   
> #define TARGET_POPCNTB 0  
>   
> #endif  
> 
> I haven't found anything that I could use specifically for Power 7 and
> newer.  Not sure if it is worth defining a HAVE_AS_DCBTT to do
> something similar?  Seems a bit over kill.  Thoughts on how to limit
> the generation of dcbtt and dcbtstt to Power 7 or newer?

There is TARGET_POPCNTD.  (TARGET_POPCNTB is for ISA 2.02).


Segher


Re: [PATCH] PowerPC address support clean, patch 3 of 4

2018-05-09 Thread Segher Boessenkool
On Thu, May 03, 2018 at 01:22:10PM -0400, Michael Meissner wrote:
> 2018-05-03  Michael Meissner  
> 
>   * config/rs6000/rs6000.c (mode_supports_d_form): Rename
>   mode_supports_vmx_dform to mode_supports_d_form.  Add an optional
>   argument to say which reload register class to use.  Change all
>   callers to pass in the RELOAD_REG_VMX class explicitly.
>   (rs6000_secondary_reload): Likewise.
>   (rs6000_preferred_reload_class): Likewise.
>   (rs6000_secondary_reload_class): Likewise.

Please don't say "likewise" unless the change is actually similar.

> -/* Return true if we have D-form addressing in altivec registers.  */
> +/* Return true if we have D-form addressing (register+offset) in either a
> +   specific reload register class or whether some reload register class
> +   supports d-form addressing.  */
>  static inline bool
> -mode_supports_vmx_dform (machine_mode mode)
> +mode_supports_d_form (machine_mode mode,
> +   enum rs6000_reload_reg_type rt = RELOAD_REG_ANY)
>  {
> -  return ((reg_addr[mode].addr_mask[RELOAD_REG_VMX] & RELOAD_REG_OFFSET) != 
> 0);
> +  return ((reg_addr[mode].addr_mask[rt] & RELOAD_REG_OFFSET) != 0);
>  }

Will this overload help anything?  It does not look that way, all current
callers use a different argument (and all the same).

Overloads are nice if they make things *easier* for the reader, not harder.
Same as with all other syntactic sugar.


Segher


Re: [PATCH] PowerPC address support clean, patch 2 of 4

2018-05-09 Thread Segher Boessenkool
On Thu, May 03, 2018 at 01:20:32PM -0400, Michael Meissner wrote:
> 2018-05-03  Michael Meissner  
> 
>   * config/rs6000/rs6000.c (mode_supports_vmx_dform): Move these
>   functions to be next to the other mode_supports functions.
>   (mode_supports_dq_form): Likewise.

Okay for trunk, thanks.


Segher


Re: [PATCH] PowerPC address support clean, patch 1 of 4

2018-05-09 Thread Segher Boessenkool
Hi Mike,

On Thu, May 03, 2018 at 01:17:03PM -0400, Michael Meissner wrote:
> 2018-05-03  Michael Meissner  
> 
>   * config/rs6000/rs6000.c (mode_supports_dq_form): Rename
>   mode_supports_vsx_dform_quad to mode_supports_dq_form.
>   (mode_supports_vsx_dform_quad): Likewise.
>   (quad_address_p): Likewise.
>   (reg_offset_addressing_ok_p): Likewise.
>   (offsettable_ok_by_alignment): Likewise.
>   (rs6000_legitimate_offset_address_p): Likewise.
>   (legitimate_lo_sum_address_p): Likewise.
>   (rs6000_legitimize_address): Likewise.
>   (rs6000_legitimize_reload_address): Likewise.
>   (rs6000_secondary_reload_inner): Likewise.
>   (rs6000_preferred_reload_class): Likewise.
>   (rs6000_output_move_128bit): Likewise.

* config/rs6000/rs6000.c (mode_supports_vsx_dform_quad): Rename to ...
(mode_supports_dq_form): ... this.  Update all callers.


> --- gcc/config/rs6000/rs6000.c(revision 259864)
> +++ gcc/config/rs6000/rs6000.c(working copy)
> @@ -649,7 +649,7 @@ mode_supports_vmx_dform (machine_mode mo
> is more limited than normal d-form addressing in that the offset must be
> aligned on a 16-byte boundary.  */
>  static inline bool
> -mode_supports_vsx_dform_quad (machine_mode mode)
> +mode_supports_dq_form (machine_mode mode)
>  {
>return ((reg_addr[mode].addr_mask[RELOAD_REG_ANY] & RELOAD_REG_QUAD_OFFSET)
> != 0);

Will this eventually handle all DQ-form, not just vector?  Is it supposed
to?

Okay for trunk with the changelog fixed.  Thanks!


Segher


Re: [PATCH, aarch64] Patch to update pipeline descriptions in thunderx2t99.md

2018-05-09 Thread Steve Ellcey
On Fri, 2018-05-04 at 14:05 -0700, Andrew Pinski wrote:
> 
> >    (thunderx2t99_loadpair): Fix cpu unit ordering.
> I think the original ordering was correct.  The address calculation
> happens before the actual load.
> thunderx2t99_asimd_load1_ldp would have a similar issue.
> 
> Thanks,
> Andrew

OK, I checked into that and undid the change to thunderx2t99_loadpair
and fixed thunderx2t99_asimd_load1_ldp to match it.  Everything else is
the same.

Steve Ellcey
sell...@cavium.com


2018-05-09  Steve Ellcey  

* config/aarch64/thunderx2t99.md (thunderx2t99_ls_both): Delete.
(thunderx2t99_multiple): Delete psuedo-units from used cpus.
Add untyped.
(thunderx2t99_alu_shift): Remove alu_shift_reg, alus_shift_reg.
Change logics_shift_reg to logics_shift_imm.
(thunderx2t99_fp_loadpair_basic): Delete.
(thunderx2t99_fp_storepair_basic): Delete.
(thunderx2t99_asimd_int): Add neon_sub and neon_sub_q types.
(thunderx2t99_asimd_polynomial): Delete.
(thunderx2t99_asimd_fp_simple): Add neon_fp_mul_s_scalar_q
and neon_fp_mul_d_scalar_q.
(thunderx2t99_asimd_fp_conv): Add *int_to_fp* types.
(thunderx2t99_asimd_misc): Delete neon_dup and neon_dup_q.
(thunderx2t99_asimd_recip_step): Add missing *sqrt* types.
(thunderx2t99_asimd_lut): Add missing tbl types.
(thunderx2t99_asimd_ext): Delete.
(thunderx2t99_asimd_load1_1_mult): Delete.
(thunderx2t99_asimd_load1_2_mult): Delete.
(thunderx2t99_asimd_load1_ldp): New.
(thunderx2t99_asimd_load1): New.
(thunderx2t99_asimd_load2): Add missing *load2* types.
(thunderx2t99_asimd_load3): New.
(thunderx2t99_asimd_load4): New.
(thunderx2t99_asimd_store1_1_mult): Delete.
(thunderx2t99_asimd_store1_2_mult): Delete.
(thunderx2t99_asimd_store2_mult): Delete.
(thunderx2t99_asimd_store2_onelane): Delete.
(thunderx2t99_asimd_store_stp): New.
(thunderx2t99_asimd_store1): New.
(thunderx2t99_asimd_store2): New.
(thunderx2t99_asimd_store3): New.
(thunderx2t99_asimd_store4): New.diff --git a/gcc/config/aarch64/thunderx2t99.md b/gcc/config/aarch64/thunderx2t99.md
index 589e564..fb71de5 100644
--- a/gcc/config/aarch64/thunderx2t99.md
+++ b/gcc/config/aarch64/thunderx2t99.md
@@ -54,8 +54,6 @@
 (define_reservation "thunderx2t99_ls01" "thunderx2t99_ls0|thunderx2t99_ls1")
 (define_reservation "thunderx2t99_f01" "thunderx2t99_f0|thunderx2t99_f1")
 
-(define_reservation "thunderx2t99_ls_both" "thunderx2t99_ls0+thunderx2t99_ls1")
-
 ; A load with delay in the ls0/ls1 pipes.
 (define_reservation "thunderx2t99_l0delay" "thunderx2t99_ls0,\
   thunderx2t99_ls0d1,thunderx2t99_ls0d2,\
@@ -86,12 +84,10 @@
 
 (define_insn_reservation "thunderx2t99_multiple" 1
   (and (eq_attr "tune" "thunderx2t99")
-   (eq_attr "type" "multiple"))
+   (eq_attr "type" "multiple,untyped"))
   "thunderx2t99_i0+thunderx2t99_i1+thunderx2t99_i2+thunderx2t99_ls0+\
thunderx2t99_ls1+thunderx2t99_sd+thunderx2t99_i1m1+thunderx2t99_i1m2+\
-   thunderx2t99_i1m3+thunderx2t99_ls0d1+thunderx2t99_ls0d2+thunderx2t99_ls0d3+\
-   thunderx2t99_ls1d1+thunderx2t99_ls1d2+thunderx2t99_ls1d3+thunderx2t99_f0+\
-   thunderx2t99_f1")
+   thunderx2t99_i1m3+thunderx2t99_f0+thunderx2t99_f1")
 
 ;; Integer arithmetic/logic instructions.
 
@@ -113,9 +109,9 @@
 
 (define_insn_reservation "thunderx2t99_alu_shift" 2
   (and (eq_attr "tune" "thunderx2t99")
-   (eq_attr "type" "alu_shift_imm,alu_ext,alu_shift_reg,\
-			alus_shift_imm,alus_ext,alus_shift_reg,\
-			logic_shift_imm,logics_shift_reg"))
+   (eq_attr "type" "alu_shift_imm,alu_ext,\
+			alus_shift_imm,alus_ext,\
+			logic_shift_imm,logics_shift_imm"))
   "thunderx2t99_i012,thunderx2t99_i012")
 
 (define_insn_reservation "thunderx2t99_div" 13
@@ -228,21 +224,11 @@
(eq_attr "type" "f_loads,f_loadd"))
   "thunderx2t99_ls01")
 
-(define_insn_reservation "thunderx2t99_fp_loadpair_basic" 4
-  (and (eq_attr "tune" "thunderx2t99")
-   (eq_attr "type" "neon_load1_2reg"))
-  "thunderx2t99_ls01*2")
-
 (define_insn_reservation "thunderx2t99_fp_store_basic" 1
   (and (eq_attr "tune" "thunderx2t99")
(eq_attr "type" "f_stores,f_stored"))
   "thunderx2t99_ls01,thunderx2t99_sd")
 
-(define_insn_reservation "thunderx2t99_fp_storepair_basic" 1
-  (and (eq_attr "tune" "thunderx2t99")
-   (eq_attr "type" "neon_store1_2reg"))
-  "thunderx2t99_ls01,(thunderx2t99_ls01+thunderx2t99_sd),thunderx2t99_sd")
-
 ;; ASIMD integer instructions.
 
 (define_insn_reservation "thunderx2t99_asimd_int" 7
@@ -251,6 +237,7 @@
 			neon_arith_acc,neon_arith_acc_q,\
 			neon_abs,neon_abs_q,\
 			neon_add,neon_add_q,\
+			neon_sub,neon_sub_q,\
 			neon_neg,neon_neg_q,\
 			neon_add_long,neon_add_widen,\
 			neon_add_halve,neon_add_halve_q,\
@@ -301,11 +288,6 @@
(eq_attr "type" "neon_logic,neon_logic_q"))
   

[PATCH] PR fortran/70870 -- Reject data object with default initialization

2018-05-09 Thread Steve Kargl
I plan to commit the attach patch on Saturday unless someone objects.

2018-05-09  Steven G. Kargl  

PR fortran/70870
* data.c (gfc_assign_data_value): Check that a data object does
not also have default initialization.

2018-05-09  Steven G. Kargl  

PR fortran/70870
* gfortran.dg/pr70870_1.f90: New test.

-- 
Steve
Index: gcc/fortran/data.c
===
--- gcc/fortran/data.c	(revision 26)
+++ gcc/fortran/data.c	(working copy)
@@ -491,6 +491,15 @@ gfc_assign_data_value (gfc_expr *lvalue, gfc_expr *rva
 }
   else
 {
+  if (lvalue->ts.type == BT_DERIVED
+	  && gfc_has_default_initializer (lvalue->ts.u.derived))
+	{
+	  gfc_error ("Nonpointer object %qs with default initialization "
+		 "shall not appear in a DATA statement at %L", 
+		 symbol->name, >where);
+	  return false;
+	}
+
   /* Overwriting an existing initializer is non-standard but usually only
 	 provokes a warning from other compilers.  */
   if (init != NULL)
Index: gcc/testsuite/gfortran.dg/pr70870_1.f90
===
--- gcc/testsuite/gfortran.dg/pr70870_1.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr70870_1.f90	(working copy)
@@ -0,0 +1,9 @@
+! { dg-do compile }
+! PR fortran/70870
+! Contributed by Vittorio Zecca 
+  type t
+   integer :: g=0   ! default initialization
+  end type
+  type(t) :: v2
+  data v2/t(2)/ ! { dg-error "default initialization shall not" }
+  end


[PATCH] PR fortran/85521 -- Zero length substrings in array aconstructors

2018-05-09 Thread Steve Kargl
I paln to commit the attached patch on Saturday unless
someone objects.

2018-05-09  Steven G. Kargl  

PR fortran/85521
* array.c (gfc_resolve_character_array_constructor): Substrings
with upper bound smaller than lower bound are zero length strings.

2018-05-09  Steven G. Kargl  

PR fortran/85521
* gfortran.dg/pr85521_1.f90: New test.
* gfortran.dg/pr85521_2.f90: New test.

-- 
Steve
2018-05-09  Steven G. Kargl  

PR fortran/85521
* array.c (gfc_resolve_character_array_constructor): Substrings
with upper bound smaller than lower bound are zero length strings.

2018-05-09  Steven G. Kargl  

PR fortran/85521
* gfortran.dg/pr85521_1.f90: New test.
* gfortran.dg/pr85521_2.f90: New test.


[PATCH] PR fortran/85687 -- Check argument of RANK.

2018-05-09 Thread Steve Kargl
I plan to commit the attached patch on Saturday unless 
someone voices an objection.

2018-05-09  Steven G. Kargl  

PR fortran/85687
* check.c (gfc_check_rank): Check that the argument is a data object.

2018-05-09  Steven G. Kargl  

PR fortran/85687
* gfortran.dg/pr85687.f90: new test.

-- 
Steve
Index: gcc/fortran/check.c
===
--- gcc/fortran/check.c	(revision 260098)
+++ gcc/fortran/check.c	(working copy)
@@ -3886,8 +3886,11 @@ gfc_check_rank (gfc_expr *a)
 		  ? a->value.function.esym->result->attr.pointer
 		  : a->symtree->n.sym->result->attr.pointer;
 
-  if (a->expr_type == EXPR_OP || a->expr_type == EXPR_NULL
-  || a->expr_type == EXPR_COMPCALL|| a->expr_type == EXPR_PPC
+  if (a->expr_type == EXPR_OP
+  || a->expr_type == EXPR_NULL
+  || a->expr_type == EXPR_COMPCALL
+  || a->expr_type == EXPR_PPC
+  || a->ts.type == BT_PROCEDURE
   || !is_variable)
 {
   gfc_error ("The argument of the RANK intrinsic at %L must be a data "
Index: gcc/testsuite/gfortran.dg/pr85687.f90
===
--- gcc/testsuite/gfortran.dg/pr85687.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr85687.f90	(working copy)
@@ -0,0 +1,8 @@
+! { dg-do compile }
+! PR fortran/85687
+! Code original contributed by Gerhard Steinmetz gscfq at t-oline dot de
+program p
+   type t
+   end type
+   print *, rank(t)  ! { dg-error "must be a data object" }
+end


libgo patch committed: Update go tool to match recent upstream changes

2018-05-09 Thread Ian Lance Taylor
Several recent changes to the master version of cmd/go improve the
gofrontend support. These changes are partially copies of existing
gofrontend differences, and partially new code. This libgo patch makes
the gofrontend match the upstream code.

The changes included here come from:
https://golang.org/cl/111575
https://golang.org/cl/111595
https://golang.org/cl/111635
https://golang.org/cl/111636

For the record, the following recent master changes are based on code
already present in the gofrontend repo:
https://golang.org/cl/110915
https://golang.org/cl/111615

For the record, a master change, partially based on earlier gofrontend
work, also with new gc code, was already copied to gofrontend repo in
CL 111099:
https://golang.org/cl/111097

This moves the generated list of standard library packages from
cmd/go/internal/load to go/build.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian

2018-05-09  Ian Lance Taylor  

* Makefile.am (check-go-tool): Don't copy zstdpkglist.go.
* Makefile.in: Rebuild.
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 260048)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-6b0355769edd9543e6c5f2270b26b140bb96e9aa
+290c93f08f4456f0552b0764e28573164e47f259
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/Makefile.am
===
--- libgo/Makefile.am   (revision 260048)
+++ libgo/Makefile.am   (working copy)
@@ -602,13 +602,13 @@ s-runtime-inc: runtime.lo Makefile
$(SHELL) $(srcdir)/mvifdiff.sh tmp-runtime.inc runtime.inc
$(STAMP) $@
 
-noinst_DATA += zstdpkglist.go zdefaultcc.go
+noinst_DATA += zdefaultcc.go
 
 # Generate the list of go std packages that were included in libgo
 zstdpkglist.go: s-zstdpkglist; @true
 s-zstdpkglist: Makefile
rm -f zstdpkglist.go.tmp
-   echo 'package load' > zstdpkglist.go.tmp
+   echo 'package build' > zstdpkglist.go.tmp
echo "" >> zstdpkglist.go.tmp
echo 'var stdpkg = map[string]bool{' >> zstdpkglist.go.tmp
echo $(libgo_go_objs) 'unsafe.lo' 'runtime/cgo.lo' | sed 
's|[a-z0-9_/]*_c\.lo||g' | sed 's|\([a-z0-9_/]*\)\.lo|"\1": true,|g' >> 
zstdpkglist.go.tmp
@@ -960,6 +960,9 @@ runtime_pprof_check_GOCFLAGS = -static-l
 extra_go_files_runtime_internal_sys = version.go
 runtime/internal/sys.lo.dep: $(extra_go_files_runtime_internal_sys)
 
+extra_go_files_go_build = zstdpkglist.go
+go/build.lo.dep: $(extra_go_files_go_build)
+
 extra_go_files_go_types = gccgosizes.go
 go/types.lo.dep: $(extra_go_files_go_types)
 
@@ -969,9 +972,6 @@ cmd/internal/objabi.lo.dep: $(extra_go_f
 extra_go_files_cmd_go_internal_cfg = zdefaultcc.go
 cmd/go/internal/cfg.lo.dep: $(extra_go_files_cmd_go_internal_cfg)
 
-extra_go_files_cmd_go_internal_load = zstdpkglist.go
-cmd/go/internal/load.lo.dep: $(extra_go_files_cmd_go_internal_load)
-
 extra_check_libs_cmd_go_internal_cache = $(abs_builddir)/libgotool.a
 extra_check_libs_cmd_go_internal_generate = $(abs_builddir)/libgotool.a
 extra_check_libs_cmd_go_internal_get = $(abs_builddir)/libgotool.a
Index: libgo/go/cmd/go/internal/load/pkg.go
===
--- libgo/go/cmd/go/internal/load/pkg.go(revision 260048)
+++ libgo/go/cmd/go/internal/load/pkg.go(working copy)
@@ -223,9 +223,6 @@ func (p *Package) copyBuild(pp *build.Pa
// TODO? Target
p.Goroot = pp.Goroot
p.Standard = p.Goroot && p.ImportPath != "" && 
isStandardImportPath(p.ImportPath)
-   if cfg.BuildToolchainName == "gccgo" {
-   p.Standard = stdpkg[p.ImportPath]
-   }
p.GoFiles = pp.GoFiles
p.CgoFiles = pp.CgoFiles
p.IgnoredGoFiles = pp.IgnoredGoFiles
@@ -894,13 +891,6 @@ var foldPath = make(map[string]string)
 func (p *Package) load(stk *ImportStack, bp *build.Package, err error) {
p.copyBuild(bp)
 
-   // When using gccgo the go/build package will not be able to
-   // find a standard package. It would be nicer to not get that
-   // error, but go/build doesn't know stdpkg.
-   if cfg.BuildToolchainName == "gccgo" && err != nil && p.Standard {
-   err = nil
-   }
-
// Decide whether p was listed on the command line.
// Given that load is called while processing the command line,
// you might think we could simply pass a flag down into load
@@ -1096,9 +1086,6 @@ func (p *Package) load(stk *ImportStack,
continue
}
p1 := LoadImport(path, p.Dir, p, stk, 
p.Internal.Build.ImportPos[path], UseVendor)
-   if cfg.BuildToolchainName == "gccgo" && p1.Standard {
-   continue
-   }
if 

Re: [PATCH 1/2, expr.c] Optimize switch with sign-extended index.

2018-05-09 Thread Jim Wilson
On Wed, May 2, 2018 at 3:05 PM, Jim Wilson  wrote:
> This improves the code for a switch statement on targets that sign-extend
> function arguments, such as RISC-V.  Given a simple testcase
> ...
> gcc/
> * expr.c (do_tablejump): When converting index to Pmode, if we have a
> sign extended promoted subreg, and the range does not have the sign 
> bit
> set, then do a sign extend.

Ping.

Jim


Re: [PATCH] RISC-V: Add with-multilib-list support.

2018-05-09 Thread Jim Wilson
On Tue, May 1, 2018 at 11:52 AM, Jim Wilson  wrote:
> gcc/
> PR target/84797
> * config.gcc (riscv*-*-*): Handle --with-multilib-list.
> * config/riscv/t-withmultilib: New.
> * config/riscv/withmultilib.h: New.
> * doc/install.texi: Document RISC-V --with-multilib-list support.

Committed, after some additional testing.

Jim


[PATCH] Add constant folding for x86 shift builtins by vector

2018-05-09 Thread Jakub Jelinek
Hi!

The following patch on top of the earlier ix86_*fold_builtin patch adds
folding also for the *s{ll,rl,ra}v* builtins.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-05-09  Jakub Jelinek  

PR target/85323
* config/i386/i386.c (ix86_fold_builtin): Fold shift builtins by
vector.
(ix86_gimple_fold_builtin): Likewise.

* gcc.target/i386/pr85323-4.c: New test.
* gcc.target/i386/pr85323-5.c: New test.
* gcc.target/i386/pr85323-6.c: New test.

--- gcc/config/i386/i386.c.jj   2018-05-09 15:52:35.510092271 +0200
+++ gcc/config/i386/i386.c  2018-05-09 20:01:53.282727951 +0200
@@ -33364,6 +33364,7 @@ ix86_fold_builtin (tree fndecl, int n_ar
   enum ix86_builtins fn_code = (enum ix86_builtins)
   DECL_FUNCTION_CODE (fndecl);
   enum rtx_code rcode;
+  bool is_vshift;
 
   switch (fn_code)
{
@@ -33582,6 +33583,7 @@ ix86_fold_builtin (tree fndecl, int n_ar
case IX86_BUILTIN_PSLLWI256_MASK:
case IX86_BUILTIN_PSLLWI512_MASK:
  rcode = ASHIFT;
+ is_vshift = false;
  goto do_shift;
case IX86_BUILTIN_PSRAD:
case IX86_BUILTIN_PSRAD128:
@@ -33614,6 +33616,7 @@ ix86_fold_builtin (tree fndecl, int n_ar
case IX86_BUILTIN_PSRAWI256_MASK:
case IX86_BUILTIN_PSRAWI512:
  rcode = ASHIFTRT;
+ is_vshift = false;
  goto do_shift;
case IX86_BUILTIN_PSRLD:
case IX86_BUILTIN_PSRLD128:
@@ -33652,6 +33655,53 @@ ix86_fold_builtin (tree fndecl, int n_ar
case IX86_BUILTIN_PSRLWI256_MASK:
case IX86_BUILTIN_PSRLWI512:
  rcode = LSHIFTRT;
+ is_vshift = false;
+ goto do_shift;
+   case IX86_BUILTIN_PSLLVV16HI:
+   case IX86_BUILTIN_PSLLVV16SI:
+   case IX86_BUILTIN_PSLLVV2DI:
+   case IX86_BUILTIN_PSLLVV2DI_MASK:
+   case IX86_BUILTIN_PSLLVV32HI:
+   case IX86_BUILTIN_PSLLVV4DI:
+   case IX86_BUILTIN_PSLLVV4DI_MASK:
+   case IX86_BUILTIN_PSLLVV4SI:
+   case IX86_BUILTIN_PSLLVV4SI_MASK:
+   case IX86_BUILTIN_PSLLVV8DI:
+   case IX86_BUILTIN_PSLLVV8HI:
+   case IX86_BUILTIN_PSLLVV8SI:
+   case IX86_BUILTIN_PSLLVV8SI_MASK:
+ rcode = ASHIFT;
+ is_vshift = true;
+ goto do_shift;
+   case IX86_BUILTIN_PSRAVQ128:
+   case IX86_BUILTIN_PSRAVQ256:
+   case IX86_BUILTIN_PSRAVV16HI:
+   case IX86_BUILTIN_PSRAVV16SI:
+   case IX86_BUILTIN_PSRAVV32HI:
+   case IX86_BUILTIN_PSRAVV4SI:
+   case IX86_BUILTIN_PSRAVV4SI_MASK:
+   case IX86_BUILTIN_PSRAVV8DI:
+   case IX86_BUILTIN_PSRAVV8HI:
+   case IX86_BUILTIN_PSRAVV8SI:
+   case IX86_BUILTIN_PSRAVV8SI_MASK:
+ rcode = ASHIFTRT;
+ is_vshift = true;
+ goto do_shift;
+   case IX86_BUILTIN_PSRLVV16HI:
+   case IX86_BUILTIN_PSRLVV16SI:
+   case IX86_BUILTIN_PSRLVV2DI:
+   case IX86_BUILTIN_PSRLVV2DI_MASK:
+   case IX86_BUILTIN_PSRLVV32HI:
+   case IX86_BUILTIN_PSRLVV4DI:
+   case IX86_BUILTIN_PSRLVV4DI_MASK:
+   case IX86_BUILTIN_PSRLVV4SI:
+   case IX86_BUILTIN_PSRLVV4SI_MASK:
+   case IX86_BUILTIN_PSRLVV8DI:
+   case IX86_BUILTIN_PSRLVV8HI:
+   case IX86_BUILTIN_PSRLVV8SI:
+   case IX86_BUILTIN_PSRLVV8SI_MASK:
+ rcode = LSHIFTRT;
+ is_vshift = true;
  goto do_shift;
 
do_shift:
@@ -33670,7 +33720,10 @@ ix86_fold_builtin (tree fndecl, int n_ar
  if ((mask | (HOST_WIDE_INT_M1U << elems)) != HOST_WIDE_INT_M1U)
break;
}
- if (tree tem = ix86_vector_shift_count (args[1]))
+ if (is_vshift && TREE_CODE (args[1]) != VECTOR_CST)
+   break;
+ if (tree tem = (is_vshift ? integer_one_node
+ : ix86_vector_shift_count (args[1])))
{
  unsigned HOST_WIDE_INT count = tree_to_uhwi (tem);
  if (count == 0)
@@ -33681,7 +33734,9 @@ ix86_fold_builtin (tree fndecl, int n_ar
return build_zero_cst (TREE_TYPE (args[0]));
  count = TYPE_PRECISION (TREE_TYPE (TREE_TYPE (args[0]))) - 1;
}
- tree countt = build_int_cst (integer_type_node, count);
+ tree countt = NULL_TREE;
+ if (!is_vshift)
+   countt = build_int_cst (integer_type_node, count);
  tree_vector_builder builder;
  builder.new_unary_operation (TREE_TYPE (args[0]), args[0],
   false);
@@ -33694,9 +33749,30 @@ ix86_fold_builtin (tree fndecl, int n_ar
  tree type = TREE_TYPE (elt);
  if (rcode == LSHIFTRT)
elt = fold_convert (unsigned_type_for (type), elt);
+ if (is_vshift)
+   {
+ countt = VECTOR_CST_ELT (args[1], i);
+ if (TREE_CODE 

[PATCH] Workaround glibc <= 2.23 nextafterl/nexttowardl bug (PR tree-optimization/85699)

2018-05-09 Thread Jakub Jelinek
Hi!

glibc <= 2.23 has buggy nextafterl/nexttowardl as can be seen on the
nextafter-2.c testcase.

Do we want to workaround this bug, e.g. with the following patch?

Regtested on x86_64-linux (with glibc 2.26).  Ok for trunk?

2018-05-09  Jakub Jelinek  

PR tree-optimization/85699
* gcc.dg/nextafter-1.c (NO_LONG_DOUBLE): Define if not defined.  Use
!NO_LONG_DOUBLE instead of __LDBL_MANT_DIG__ != 106.
* gcc.dg/nextafter-2.c: Include stdlib.h.  For glibc < 2.24 define
NO_LONG_DOUBLE to 1 before including nextafter-1.c.

--- gcc/testsuite/gcc.dg/nextafter-1.c.jj   2018-05-06 23:12:48.952619545 
+0200
+++ gcc/testsuite/gcc.dg/nextafter-1.c  2018-05-09 14:58:53.694198614 +0200
@@ -20,6 +20,9 @@ long double nexttowardl (long double, lo
 #ifndef NEED_EXC
 #define NEED_EXC 0
 #endif
+#ifndef NO_LONG_DOUBLE
+#define NO_LONG_DOUBLE (__LDBL_MANT_DIG__ == 106)
+#endif
 
 #define TEST(name, fn, type, L1, L2, l1, l2, MIN1,  \
 MAX1, DENORM_MIN1, EPSILON1, MIN2, MAX2, DENORM_MIN2)   \
@@ -129,7 +132,7 @@ TEST (test1, nextafterf, float, F, F, f,
 TEST (test2, nextafter, double, , , , , __DBL_MIN__, __DBL_MAX__,
   __DBL_DENORM_MIN__, __DBL_EPSILON__, __DBL_MIN__, __DBL_MAX__,
   __DBL_DENORM_MIN__)
-#if __LDBL_MANT_DIG__ != 106
+#if !NO_LONG_DOUBLE
 TEST (test3, nextafterl, long double, L, L, l, l, __LDBL_MIN__, __LDBL_MAX__,
   __LDBL_DENORM_MIN__, __LDBL_EPSILON__, __LDBL_MIN__, __LDBL_MAX__,
   __LDBL_DENORM_MIN__)
@@ -149,7 +152,7 @@ main ()
 {
   test1 ();
   test2 ();
-#if __LDBL_MANT_DIG__ != 106
+#if !NO_LONG_DOUBLE
   test3 ();
   test4 ();
   test5 ();
--- gcc/testsuite/gcc.dg/nextafter-2.c.jj   2018-05-08 13:56:38.265930160 
+0200
+++ gcc/testsuite/gcc.dg/nextafter-2.c  2018-05-09 14:59:45.527245803 +0200
@@ -5,4 +5,13 @@
 /* { dg-add-options ieee } */
 /* { dg-add-options c99_runtime } */
 
+#include 
+
+#if defined(__GLIBC__) && defined(__GLIBC_PREREQ)
+# if !__GLIBC_PREREQ (2, 24)
+/* Workaround buggy nextafterl in glibc 2.23 and earlier,
+   see https://sourceware.org/bugzilla/show_bug.cgi?id=20205  */
+#  define NO_LONG_DOUBLE 1
+# endif
+#endif
 #include "nextafter-1.c"

Jakub


[C++ PATCH] Fix offsetof constexpr handling (PR c++/85662, take 4)

2018-05-09 Thread Jakub Jelinek
On Wed, May 09, 2018 at 11:01:18AM -0400, Jason Merrill wrote:
> On Wed, May 9, 2018 at 10:47 AM, Jakub Jelinek  wrote:
> > On Wed, May 09, 2018 at 10:40:26AM -0400, Jason Merrill wrote:
> >> On Wed, May 9, 2018 at 4:55 AM, Jakub Jelinek  wrote:
> >> > On Tue, May 08, 2018 at 11:28:18PM -0400, Jason Merrill wrote:
> >> >> Maybe add a type parameter that defaults to size_type_node...
> >> >>
> >> >> > +   ret = fold_convert_loc (loc, TREE_TYPE (expr),
> >> >> > +   fold_offsetof_1 (TREE_TYPE (expr), 
> >> >> > op0));
> >> >>
> >> >> ...and then this can be
> >> >>
> >> >>   fold_offsetof (op0, TREE_TYPE (exp0))
> >> >
> >> > Like this then?
> >> >
> >> > +   ret = fold_convert_loc (loc, TREE_TYPE (expr),
> >> > +   fold_offsetof (op0, TREE_TYPE (expr)));
> >>
> >> I was thinking that we then wouldn't need the fold_convert at the call
> >> sites anymore, either.
> >
> > The patch only converts to non-pointer types, I'm not sure if it is
> > desirable to do the same with pointer types (and most of the other callers
> > don't use convert, but fold_convert which is significantly different, the
> > former is emitting diagnostics, the latter is just an conversion + 
> > optimization).
> 
> Is there a reason we can't use fold_convert for the non-pointer case,
> too?  I don't think we're interested in diagnostics from this
> particular call.

This patch instead uses convert everywhere.  Bootstrapped/regtested on
x86_64-linux and i686-linux, ok for trunk?

2018-05-09  Jakub Jelinek  

PR c++/85662
* c-common.h (fold_offsetof_1): Removed.
(fold_offsetof): Add TYPE argument defaulted to size_type_node and
CTX argument defaulted to ERROR_MARK.
* c-common.c (fold_offsetof_1): Renamed to ...
(fold_offsetof): ... this.  Remove wrapper function.  Add TYPE
argument, convert the pointer constant to TYPE and use size_binop
with PLUS_EXPR instead of fold_build_pointer_plus if type is not
a pointer type.  Adjust recursive calls.

* c-fold.c (c_fully_fold_internal): Use fold_offsetof rather than
fold_offsetof_1, pass TREE_TYPE (expr) as TYPE to it and drop the
fold_convert_loc.
* c-typeck.c (build_unary_op): Use fold_offsetof rather than
fold_offsetof_1, pass argtype as TYPE to it and drop the
fold_convert_loc.

* cp-gimplify.c (cp_fold): Use fold_offsetof rather than
fold_offsetof_1, pass TREE_TYPE (x) as TYPE to it and drop the
fold_convert.

* g++.dg/ext/offsetof2.C: New test.

--- gcc/c-family/c-common.h.jj  2018-05-09 20:12:25.845258371 +0200
+++ gcc/c-family/c-common.h 2018-05-09 20:20:02.265649121 +0200
@@ -1033,8 +1033,8 @@ extern bool c_dump_tree (void *, tree);
 
 extern void verify_sequence_points (tree);
 
-extern tree fold_offsetof_1 (tree, tree_code ctx = ERROR_MARK);
-extern tree fold_offsetof (tree);
+extern tree fold_offsetof (tree, tree = size_type_node,
+  tree_code ctx = ERROR_MARK);
 
 extern int complete_array_type (tree *, tree, bool);
 
--- gcc/c-family/c-common.c.jj  2018-05-09 20:12:25.763258297 +0200
+++ gcc/c-family/c-common.c 2018-05-09 20:21:23.770718896 +0200
@@ -6168,10 +6168,11 @@ c_common_to_target_charset (HOST_WIDE_IN
 
 /* Fold an offsetof-like expression.  EXPR is a nested sequence of component
references with an INDIRECT_REF of a constant at the bottom; much like the
-   traditional rendering of offsetof as a macro.  Return the folded result.  */
+   traditional rendering of offsetof as a macro.  TYPE is the desired type of
+   the whole expression.  Return the folded result.  */
 
 tree
-fold_offsetof_1 (tree expr, enum tree_code ctx)
+fold_offsetof (tree expr, tree type, enum tree_code ctx)
 {
   tree base, off, t;
   tree_code code = TREE_CODE (expr);
@@ -6196,10 +6197,10 @@ fold_offsetof_1 (tree expr, enum tree_co
  error ("cannot apply % to a non constant address");
  return error_mark_node;
}
-  return TREE_OPERAND (expr, 0);
+  return convert (type, TREE_OPERAND (expr, 0));
 
 case COMPONENT_REF:
-  base = fold_offsetof_1 (TREE_OPERAND (expr, 0), code);
+  base = fold_offsetof (TREE_OPERAND (expr, 0), type, code);
   if (base == error_mark_node)
return base;
 
@@ -6216,7 +6217,7 @@ fold_offsetof_1 (tree expr, enum tree_co
   break;
 
 case ARRAY_REF:
-  base = fold_offsetof_1 (TREE_OPERAND (expr, 0), code);
+  base = fold_offsetof (TREE_OPERAND (expr, 0), type, code);
   if (base == error_mark_node)
return base;
 
@@ -6273,23 +6274,16 @@ fold_offsetof_1 (tree expr, enum tree_co
   /* Handle static members of volatile structs.  */
   t = TREE_OPERAND (expr, 1);
   gcc_checking_assert (VAR_P (get_base_address (t)));
-  return fold_offsetof_1 (t);
+  return fold_offsetof 

Re: [PATCH] Define DW_FORM_strx* and DW_FORM_addrx*.

2018-05-09 Thread Jason Merrill
OK, thanks.

On Wed, May 9, 2018 at 1:19 PM, Thomas Rix  wrote:
> This patch defines the dwarf 5 forms.
> DW_FORM_strx1, DW_FORM_strx2, DW_FORM_strx3, DW_FORM_strx4
> And similar for addrx.
>
> Tom
>
>
>


Re: [PATCH, rs6000] Add missing vec_expte, vec_loge, vec_re

2018-05-09 Thread Segher Boessenkool
Hi!

On Wed, May 09, 2018 at 09:07:49AM -0700, Carl Love wrote:
> 2018-05-09 Carl Love  
>   * gcc.target/powerpc/builtins-8-runnable.c: New builtin test file.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/builtins-8-runnable.c
> @@ -0,0 +1,98 @@
> +/* { dg-do run { target { powerpc*-*-* && { lp64 && p8vector_hw } } } } */
> +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
> "-mcpu=power8" } } */
> +/* { dg-options "-mcpu=power8 -O2" } */

Does this need lp64?  It's not obvious to me why.

Okay for trunk.  Thanks!


Segher


[PATCH v2, rs6000] Improve Documentation of Built-In Functions Part 1

2018-05-09 Thread Kelvin Nilsen

This is the first of several planned patches to address shortcomings in
existing documentation of PowerPC built-in functions.  The focus of this
particular patch is to improve documentation of basic built-in functions
that do not require inclusion of special header files.

A summary of this patch follows:

1. Change the name of the first PowerPC built-in section from 
   "PowerPC Built-in Functions" to "Basic PowerPC Built-in Functions".
   This section has never described all PowerPC built-in functions.

2. Introduce subsubsections within this section to independently describe
   built-in functions that target particular ISA levels.  Sort function
   descriptions into appropriate subsubsections.

3. Add descriptions of three new features that can be tested with the
   __builtin_cpu_supports function: darn, htm-no-suspend, and scv.

4. Corrected the spellings of several built-in functions:
   __builtin_fmaf128_round_to_odd, __builtin_addg6s, __builtin_cbctdt,
   __builtin_cdtbcd.

This patch is limited in scope in order to manage complexity of the
diffs.  Subsequent patches will address different sections of the
documentation.  Subsequent patches will also add new function descriptions
into these sections.

This differs from the previous draft patch in the following regards:

1. This patch adds back in documentation of the __builtin_fabsq,
   __builtin_copysignq, __builtin_infq, __builtin_huge_valq, __builtin_nanq,
   __builtin_nansq, __builtin_sqrtf128, and __builtin_fmaf128 functions.

2. Consistently, changed subsubsection names from
   "Low-Level PowerPC Built-in ... " to "Basic PowerPC Built-in ... "

3. Changed subsubsection name from "... Available on All Targets" to
   "... Available on All Configurations".

4. Used @code{} font for darn and tsuspend. instruction names.

5. Removed unnecessary parentheses around many option descriptions.

6. Clarified that the result returned from the __builtin_darn_32 function is
   conditioned.

7. Enhanced the ChangeLog to call out each of the subsection names
   (within extend.texi) that is affected by this patch.

8. Changed the menu reference to the newly named "Basic PowerPC Built-in
   Functions"

9. Added a new sub-menu to identify the subsubsections of the "Basic PowerPC
   Built-in Functions" section.

I have bootstrapped and regression tested without regressions on both
powerpc64le-unknown-linux (P8) and on powerpc-linux (P8 big-endian,
with both -m32 and -m64 target options).  I have built and reviewed the
gcc.pdf on the little-endian test platform.  I did not build the gcc.pdf 
file on my big-endian test platform because it is missing relevant fonts.

Is this ok for the trunk?

2018-05-09  Kelvin Nilsen  

* doc/extend.texi (PowerPC Built-in Functions): Rename this
subsection.
(Basic PowerPC Built-in Functions): The new name of the
subsection previously known as "PowerPC Built-in Functions".
(Basic PowerPC Built-in Functions Available on all Configurations):
New subsubsection.
(Basic PowerPC Built-in Functions Available on ISA 2.05): Likewise.
(Basic PowerPC Built-in Functions Available on ISA 2.06): Likewise.
(Basic PowerPC Built-in Functions Available on ISA 2.07): Likewise.
(Basic PowerPC Built-in Functions Available on ISA 3.0): Likewise.

Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi (revision 260073)
+++ gcc/doc/extend.texi (working copy)
@@ -12475,7 +12475,7 @@
 * MSP430 Built-in Functions::
 * NDS32 Built-in Functions::
 * picoChip Built-in Functions::
-* PowerPC Built-in Functions::
+* Basic PowerPC Built-in Functions::
 * PowerPC AltiVec/VSX Built-in Functions::
 * PowerPC Hardware Transactional Memory Built-in Functions::
 * PowerPC Atomic Memory Operation Functions::
@@ -15534,12 +15534,25 @@
 
 @end table
 
-@node PowerPC Built-in Functions
-@subsection PowerPC Built-in Functions
+@node Basic PowerPC Built-in Functions
+@subsection Basic PowerPC Built-in Functions
 
-The following built-in functions are always available and can be used to
-check the PowerPC target platform type:
+@menu
+* Basic PowerPC Built-in Functions Available on all Configurations::
+* Basic PowerPC Built-in Functions Available on ISA 2.05::
+* Basic PowerPC Built-in Functions Available on ISA 2.06::
+* Basic PowerPC Built-in Functions Available on ISA 2.07::
+* Basic PowerPC Built-in Functions Available on ISA 3.0::
+@end menu
 
+This section describes PowerPC built-in functions that do not require
+the inclusion of any special header files to declare prototypes or
+provide macro definitions.  The sections that follow describe
+additional PowerPC built-in functions.
+
+@node Basic PowerPC Built-in Functions Available on all Configurations
+@subsubsection Basic PowerPC Built-in Functions Available on all Configurations
+
 @deftypefn {Built-in Function} void __builtin_cpu_init (void)
 This 

[PATCH] Define DW_FORM_strx* and DW_FORM_addrx*.

2018-05-09 Thread Thomas Rix
This patch defines the dwarf 5 forms.
DW_FORM_strx1, DW_FORM_strx2, DW_FORM_strx3, DW_FORM_strx4
And similar for addrx.

Tom





0001-Define-DW_FORM_strx-and-DW_FORM_addrx.patch
Description: 0001-Define-DW_FORM_strx-and-DW_FORM_addrx.patch


[PATCH, rs6000] Add missing vec_expte, vec_loge, vec_re

2018-05-09 Thread Carl Love
GCC Maintainers:

The following patch adds tests for the vec_expte, vec_loge,  and vec_re
builtins.

The patch for the test case was tested on

powerpc64le-unknown-linux-gnu (Power 8 LE)
powerpc64-unknown-linux-gnu (Power 8 BE)
powerpc64le-unknown-linux-gnu (Power 9 LE).

 Please let me know if the patch looks OK for GCC mainline.

 Carl Love
-

gcc/testsuite/ChangeLog:

2018-05-09 Carl Love  
* gcc.target/powerpc/builtins-8-runnable.c: New builtin test file.
---
 .../gcc.target/powerpc/builtins-8-runnable.c   | 98 ++
 1 file changed, 98 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-8-runnable.c

diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-8-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-8-runnable.c
new file mode 100644
index 000..82b886f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-8-runnable.c
@@ -0,0 +1,98 @@
+/* { dg-do run { target { powerpc*-*-* && { lp64 && p8vector_hw } } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
+/* { dg-options "-mcpu=power8 -O2" } */
+
+#include 
+#include 
+#include 
+#include 
+
+#ifdef DEBUG
+#include 
+#endif
+
+void abort (void);
+
+int main()
+  {
+int i;
+vector float f_arg1;
+vector double d_arg1;
+   
+vector float vec_f_expected1, vec_f_result1, vec_f_error;
+vector double vec_d_expected1, vec_d_result1, vec_d_error;
+  
+/* vec_expte: float args, result */
+f_arg1 = (vector float){1.0, 2.0, 3.0, 4.0};
+vec_f_expected1 = (vector float){2.0, 4.0, 8.0, 16.0};
+
+vec_f_result1 = vec_expte (f_arg1);
+
+for (i = 0; i < 4; i++)
+  {
+if (vec_f_expected1[i] != vec_f_result1[i])
+#ifdef DEBUG
+   printf("ERROR vec_expte (f) result[%d]=%f != expected[%d]=%f\n",
+ i, vec_f_result1[i],  i, vec_f_expected1[i]);
+#else
+abort();
+#endif
+  }
+
+/* vec_loge: float args, result */
+f_arg1 = (vector float){4.0, 8.0, 16.0, 64};
+vec_f_expected1 = (vector float){2.0, 3.0, 4.0, 6.0};
+
+vec_f_result1 = vec_loge (f_arg1);
+
+for (i = 0; i < 4; i++)
+  {
+if (vec_f_expected1[i] != vec_f_result1[i])
+#ifdef DEBUG
+ printf("ERROR vec_loge (f) result[%d]=%f != expected[%d]=%f\n",
+i, vec_f_result1[i],  i, vec_f_expected1[i]);
+#else
+  abort();
+#endif
+}
+
+/* vec_re: float args, result  (calculate approximate reciprocal)  */
+f_arg1 = (vector float){1.0, 5.0, 4.0, 8.0};
+vec_f_expected1 = (vector float){1.0, 0.2, 0.25, 0.125};
+vec_f_error = (vector float){1.0, 0.2, 0.25, 0.125};
+
+vec_f_result1 = vec_re (f_arg1);
+  
+for (i = 0; i < 4; i++)
+  {
+vec_f_error[i] = fabs(vec_f_expected1[i] - vec_f_result1[i]);
+  
+if (vec_f_error[i] >=  0.0001)
+#ifdef DEBUG
+   printf("ERROR vec_re (f) result[%d]=%f != expected[%d]=%f\n",
+ i, vec_f_result1[i],  i, vec_f_expected1[i]);
+#else
+   abort();
+#endif
+  }
+
+/* vec_re: double args, result  (calculate approximate reciprocal)  */
+d_arg1 = (vector double){1.0, 8.0};
+vec_d_expected1 = (vector double){1.0, 0.125};
+vec_d_error = (vector double){1.0, 0.125};
+
+vec_d_result1 = vec_re (d_arg1);
+  
+for (i = 0; i < 2; i++)
+  {
+ vec_d_error[i] = fabs(vec_d_expected1[i] - vec_d_result1[i]);
+  
+ if (vec_d_error[i] >=  0.0001)
+#ifdef DEBUG
+   printf("ERROR vec_re (d) result[%d]=%f != expected[%d]=%f\n",
+ i, vec_d_result1[i],  i, vec_d_expected1[i]);
+#else
+  abort();
+#endif
+  }
+  }
-- 
2.7.4



Re: [PATCH, libgomp, openacc] Use GOMP_ASYNC_SYNC in GOACC_declare

2018-05-09 Thread Tom de Vries

On 11/17/2017 09:45 AM, Tom de Vries wrote:

Hi,

GOACC_enter_exit_data has this prototype:
...
void
GOACC_enter_exit_data (int device, size_t mapnum,
    void **hostaddrs, size_t *sizes,
    unsigned short *kinds,
    int async, int num_waits, ...)
...

And GOACC_declare calls GOACC_enter_exit_data with async arg zero:
...
   case GOMP_MAP_DELETE:
     GOACC_enter_exit_data (device, 1, [i], [i],
    [i], 0, 0);
...

Async arg zero means some async queue (see openacc 2.0a, 2.14.1 "async 
clause" for more details).


The declare directive has no async clause, so the arg should be 
GOMP_ASYNC_SYNC.


Tested libgomp testsuite on x86_64 with nvptx accelerator.

OK for trunk?


Assuming no objections, committed to trunk as attached.

Thanks,
- Tom
[openacc, libgomp] Use GOMP_ASYNC_SYNC in GOACC_declare

2018-05-09  Tom de Vries  

	PR libgomp/82901
	* oacc-parallel.c (GOACC_declare): Use GOMP_ASYNC_SYNC as async argument
	to GOACC_enter_exit_data.

---
 libgomp/oacc-parallel.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
index a71b399..f270491 100644
--- a/libgomp/oacc-parallel.c
+++ b/libgomp/oacc-parallel.c
@@ -524,7 +524,7 @@ GOACC_declare (int device, size_t mapnum,
 	  case GOMP_MAP_POINTER:
 	  case GOMP_MAP_DELETE:
 	GOACC_enter_exit_data (device, 1, [i], [i],
-   [i], 0, 0);
+   [i], GOMP_ASYNC_SYNC, 0);
 	break;
 
 	  case GOMP_MAP_FORCE_DEVICEPTR:
@@ -533,19 +533,19 @@ GOACC_declare (int device, size_t mapnum,
 	  case GOMP_MAP_ALLOC:
 	if (!acc_is_present (hostaddrs[i], sizes[i]))
 	  GOACC_enter_exit_data (device, 1, [i], [i],
- [i], 0, 0);
+ [i], GOMP_ASYNC_SYNC, 0);
 	break;
 
 	  case GOMP_MAP_TO:
 	GOACC_enter_exit_data (device, 1, [i], [i],
-   [i], 0, 0);
+   [i], GOMP_ASYNC_SYNC, 0);
 
 	break;
 
 	  case GOMP_MAP_FROM:
 	kinds[i] = GOMP_MAP_FORCE_FROM;
 	GOACC_enter_exit_data (device, 1, [i], [i],
-   [i], 0, 0);
+   [i], GOMP_ASYNC_SYNC, 0);
 	break;
 
 	  case GOMP_MAP_FORCE_PRESENT:


Re: [og7] Update deviceptr handling in Fortran

2018-05-09 Thread Cesar Philippidis
On 05/09/2018 03:50 AM, Thomas Schwinge wrote:

>> In addition to XPASS'ing devicetpr-1.f90, this patch [...]
> 
> Apart from one remaining XFAIL for "-Os" (see PR80995), I now too see the
> following XPASSes on my main development machine:
> 
> PASS: libgomp.oacc-fortran/deviceptr-1.f90 -DACC_DEVICE_TYPE_nvidia=1 
> -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  (test for excess errors)
> PASS: libgomp.oacc-fortran/deviceptr-1.f90 -DACC_DEVICE_TYPE_nvidia=1 
> -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  execution test
> PASS: libgomp.oacc-fortran/deviceptr-1.f90 -DACC_DEVICE_TYPE_nvidia=1 
> -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O1  (test for excess errors)
> PASS: libgomp.oacc-fortran/deviceptr-1.f90 -DACC_DEVICE_TYPE_nvidia=1 
> -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O1  execution test
> [-XFAIL:-]{+XPASS:+} libgomp.oacc-fortran/deviceptr-1.f90 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  
> (test for excess errors)
> PASS: libgomp.oacc-fortran/deviceptr-1.f90 -DACC_DEVICE_TYPE_nvidia=1 
> -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  execution test
> [-XFAIL:-]{+XPASS:+} libgomp.oacc-fortran/deviceptr-1.f90 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O3 
> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  
> (test for excess errors)
> PASS: libgomp.oacc-fortran/deviceptr-1.f90 -DACC_DEVICE_TYPE_nvidia=1 
> -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O3 -fomit-frame-pointer 
> -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> [-XFAIL:-]{+XPASS:+} libgomp.oacc-fortran/deviceptr-1.f90 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O3 -g  
> (test for excess errors)
> PASS: libgomp.oacc-fortran/deviceptr-1.f90 -DACC_DEVICE_TYPE_nvidia=1 
> -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O3 -g  execution test
> XFAIL: libgomp.oacc-fortran/deviceptr-1.f90 -DACC_DEVICE_TYPE_nvidia=1 
> -DACC_MEM_SHARED=0 -foffload=nvptx-none  -Os  (test for excess errors)
> PASS: libgomp.oacc-fortran/deviceptr-1.f90 -DACC_DEVICE_TYPE_nvidia=1 
> -DACC_MEM_SHARED=0 -foffload=nvptx-none  -Os  execution test
> 
>> I've applied this patch to og7 [...]. It was tempting to remove the
>> XFAIL from deviceptr-1.f90, but the test case still fails on at least
>> one legacy driver.
> 
> That's surprising.  These XFAILs were because "OpenACC kernels construct
> will be executed sequentially", so shouldn't have any relationship to
> Nvidia driver versions.  If you identified such a problem (which versions
> and hardware exactly?), that's a separate problam and needs to be filed
> as a new issue, and the reference in the test case file updated.  So
> please verify that, and/or alternatively remove the non-"-Os" XFAILs.

You're correct. On further inspection, only -Os fails. The attached
patch removes the xfails for -O2 and -O3.

> Also please verify and resolve the following regression introduced by
> your patch:
> 
> PASS: c-c++-common/goacc/deviceptr-4.c (test for excess errors)
> [-PASS:-]{+FAIL:+} c-c++-common/goacc/deviceptr-4.c scan-tree-dump-times 
> gimple "#pragma omp target oacc_parallel.*map\\(tofrom:a" 1
> 
> [-PASS:-]{+FAIL:+} c-c++-common/goacc/deviceptr-4.c  -std=c++11  
> scan-tree-dump-times gimple "#pragma omp target 
> oacc_parallel.*map\\(tofrom:a" 1
> PASS: c-c++-common/goacc/deviceptr-4.c  -std=c++11 (test for excess 
> errors)
> [-PASS:-]{+FAIL:+} c-c++-common/goacc/deviceptr-4.c  -std=c++14  
> scan-tree-dump-times gimple "#pragma omp target 
> oacc_parallel.*map\\(tofrom:a" 1
> PASS: c-c++-common/goacc/deviceptr-4.c  -std=c++14 (test for excess 
> errors)
> [-PASS:-]{+FAIL:+} c-c++-common/goacc/deviceptr-4.c  -std=c++98  
> scan-tree-dump-times gimple "#pragma omp target 
> oacc_parallel.*map\\(tofrom:a" 1
> PASS: c-c++-common/goacc/deviceptr-4.c  -std=c++98 (test for excess 
> errors)

I forgot to update the expected data mapping in devicetpr-4.c. Now,
instead of implicitly adding a 'copy' clause for know deviceptr
variables, the gimplifier will assign a force_deviceptr clause.

I've applied the attached patch to og7 to fix both of the issues you've
identified.

Cesar
2018-05-09  Cesar Philippidis  

	gcc/testsuite/
	* c-c++-common/goacc/deviceptr-4.c: Update expected data mapping.

	libgomp/
	* libgomp.oacc-fortran/deviceptr-1.f90: Remove xfail for -O2 and -O3.

diff --git a/gcc/testsuite/c-c++-common/goacc/deviceptr-4.c b/gcc/testsuite/c-c++-common/goacc/deviceptr-4.c
index db1b91633a6..79a51620db9 100644
--- a/gcc/testsuite/c-c++-common/goacc/deviceptr-4.c
+++ b/gcc/testsuite/c-c++-common/goacc/deviceptr-4.c
@@ -8,4 +8,4 @@ subr (int *a)
   a[0] += 1.0;
 }
 
-/* { dg-final { scan-tree-dump-times "#pragma omp target oacc_parallel.*map\\(tofrom:a" 1 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "#pragma omp target oacc_parallel.*map\\(force_deviceptr:a" 1 "gimple" } } 

Re: [PATCH] Add ax_pthread.m4 for use in binutils-gdb

2018-05-09 Thread Jason Merrill
Applied.

On Tue, May 8, 2018 at 7:47 PM, Joshua Watt  wrote:
> On Wed, Apr 18, 2018, 05:20 Pedro Alves  wrote:
>
>> On 04/17/2018 11:10 PM, Joshua Watt wrote:
>> > On Tue, 2018-04-17 at 22:50 +0100, Pedro Alves wrote:
>> >> On 04/17/2018 06:24 PM, Joshua Watt wrote:
>> >>> Ping? I'd really like to get this in binutils, which apparently
>> >>> requires getting it here first.
>> >>
>> >> I think it would help if you mentioned what this is and
>> >> what is the intended use case.
>> >
>> > Ah, that would probably be helpful! Yes, this was discussed on the
>> > binutils mailing list, see:
>> > https://sourceware.org/ml/binutils/2018-02/msg00260.html
>> >
>> > In short summary: the gold linker doesn't currently build for mingw,
>> > but only because it is attempting to link against libpthread
>> > incorrectly on that platform. Instead of bringing in more specialized
>> > logic to account for that, I opted to include the autotools
>> > ax_pthread.m4 macro (this patch) that automatically handles discovering
>> > pthreads on a wide variety of platforms and compilers, including mingw.
>> >
>> > binutils slaves its config/ directory to GCC, so the patch is required
>> > to be committed here first, and then it will be ported over there.
>>
>> Thanks, that helps indeed.
>>
>> I agree that the ax_pthread.m4 approach makes sense.  Better to use
>> a field-tested macro than reinvent the wheel.  We're using other
>> files from the autoconf-archive archive already, for similar reasons
>> (e.g., config/ax_check_define.m4, and gdb/ax_cxx_compile_stdcxx.m4).
>>
>> Since GCC won't be using it (yet at least, but it's conceivable it
>> could make use of it in future), there should be no harm in
>> installing it even if GCC is in stage 4, IMO.
>>
>> I don't have the authority to approve it, though.
>>
>> Thanks,
>> Pedro Alves
>>
>
> Ping (again)
>
>>


Re: [C++ PATCH] Fix offsetof constexpr handling (PR c++/85662)

2018-05-09 Thread Jason Merrill
On Wed, May 9, 2018 at 10:47 AM, Jakub Jelinek  wrote:
> On Wed, May 09, 2018 at 10:40:26AM -0400, Jason Merrill wrote:
>> On Wed, May 9, 2018 at 4:55 AM, Jakub Jelinek  wrote:
>> > On Tue, May 08, 2018 at 11:28:18PM -0400, Jason Merrill wrote:
>> >> Maybe add a type parameter that defaults to size_type_node...
>> >>
>> >> > +   ret = fold_convert_loc (loc, TREE_TYPE (expr),
>> >> > +   fold_offsetof_1 (TREE_TYPE (expr), 
>> >> > op0));
>> >>
>> >> ...and then this can be
>> >>
>> >>   fold_offsetof (op0, TREE_TYPE (exp0))
>> >
>> > Like this then?
>> >
>> > +   ret = fold_convert_loc (loc, TREE_TYPE (expr),
>> > +   fold_offsetof (op0, TREE_TYPE (expr)));
>>
>> I was thinking that we then wouldn't need the fold_convert at the call
>> sites anymore, either.
>
> The patch only converts to non-pointer types, I'm not sure if it is
> desirable to do the same with pointer types (and most of the other callers
> don't use convert, but fold_convert which is significantly different, the
> former is emitting diagnostics, the latter is just an conversion + 
> optimization).

Is there a reason we can't use fold_convert for the non-pointer case,
too?  I don't think we're interested in diagnostics from this
particular call.

Jason


Re: [Patch] Use two source permute for vector initialization (PR 85692)

2018-05-09 Thread Allan Sandfeld Jensen
On Mittwoch, 9. Mai 2018 11:08:02 CEST Jakub Jelinek wrote:
> On Tue, May 08, 2018 at 01:25:35PM +0200, Allan Sandfeld Jensen wrote:
> > 2018-05-08 Allan Sandfeld Jensen 
> 
> 2 spaces between date and name and two spaces between name and email
> address.
> 
> > gcc/
> > 
> > PR tree-optimization/85692
> > * tree-ssa-forwprop.c (simplify_vector_constructor): Try two
> > source permute as well.
> > 
> > gcc/testsuite
> > 
> > * gcc.target/i386/pr85692.c: Test two simply constructions are
> > detected as permute instructions.
> 
> Just
>   * gcc.target/i386/pr85692.c: New test.
> 
> > diff --git a/gcc/testsuite/gcc.target/i386/pr85692.c
> > b/gcc/testsuite/gcc.target/i386/pr85692.c new file mode 100644
> > index 000..322c1050161
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr85692.c
> > @@ -0,0 +1,18 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-O2 -msse4.1" } */
> > +/* { dg-final { scan-assembler "unpcklps" } } */
> > +/* { dg-final { scan-assembler "blendps" } } */
> > +/* { dg-final { scan-assembler-not "shufps" } } */
> > +/* { dg-final { scan-assembler-not "unpckhps" } } */
> > +
> > +typedef float v4sf __attribute__ ((vector_size (16)));
> > +
> > +v4sf unpcklps(v4sf a, v4sf b)
> > +{
> > +return v4sf{a[0],b[0],a[1],b[1]};
> 
> Though, not really sure if this has been tested at all.
> The above is valid only in C++ (and only C++11 and above), while the
> test is compiled as C and thus has to fail.
>
Yes, I thought it had been tested, but it wasn't. It also needs to change the 
first line to be a compile and not run test.

> > @@ -2022,8 +2022,9 @@ simplify_vector_constructor (gimple_stmt_iterator
> > *gsi)> 
> >elem_type = TREE_TYPE (type);
> >elem_size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
> > 
> > -  vec_perm_builder sel (nelts, nelts, 1);
> > -  orig = NULL;
> > +  vec_perm_builder sel (nelts, 2, nelts);
> 
> Why this change?  I admit the vec_parm_builder arguments are confusing, but
> I think the second times third is the number of how many indices are being
> pushed into the vector, so I think (nelts, nelts, 1) is right.
> 
I had the impression it was what was selected from. In any case, I changed it 
because without I get crash when vec_perm_indices is created later with a 
possible nparms of 2.

> > @@ -2063,10 +2064,26 @@ simplify_vector_constructor (gimple_stmt_iterator
> > *gsi)> 
> > return false;
> > 
> >op1 = gimple_assign_rhs1 (def_stmt);
> >ref = TREE_OPERAND (op1, 0);
> > 
> > -  if (orig)
> > +  if (orig1)
> > 
> > {
> > 
> > - if (ref != orig)
> > -   return false;
> > + if (ref == orig1 || orig2)
> > +   {
> > + if (ref != orig1 && ref != orig2)
> > +   return false;
> > +   }
> > + else
> > +   {
> > + if (TREE_CODE (ref) != SSA_NAME)
> > +   return false;
> > + if (! VECTOR_TYPE_P (TREE_TYPE (ref))
> > + || ! useless_type_conversion_p (TREE_TYPE (op1),
> > + TREE_TYPE (TREE_TYPE (ref
> > +   return false;
> > + if (TREE_TYPE (orig1) != TREE_TYPE (ref))
> > +   return false;
> 
> I think even different type is acceptable here, as long as its conversion to
> orig1's type is useless.
> 
> Furthermore, I think the way you wrote the patch with 2 variables rather
> than an array of 2 elements means too much duplication, this else block
> is a duplication of the else block below.  See the patch I've added to the
> PR
It seemed to me it was clearer like this, but I can see your point.

> (and sorry for missing your patch first, the PR wasn't ASSIGNED and there
> was no link to gcc-patches for it).
> 
It is okay. You are welcome to take it over. I am not a regular gcc 
contributor and thus not well-versed in the details, only the basic logic of 
how things work.

'Allan





Re: [C++ PATCH] Fix offsetof constexpr handling (PR c++/85662)

2018-05-09 Thread Jakub Jelinek
On Wed, May 09, 2018 at 10:40:26AM -0400, Jason Merrill wrote:
> On Wed, May 9, 2018 at 4:55 AM, Jakub Jelinek  wrote:
> > On Tue, May 08, 2018 at 11:28:18PM -0400, Jason Merrill wrote:
> >> Maybe add a type parameter that defaults to size_type_node...
> >>
> >> > +   ret = fold_convert_loc (loc, TREE_TYPE (expr),
> >> > +   fold_offsetof_1 (TREE_TYPE (expr), op0));
> >>
> >> ...and then this can be
> >>
> >>   fold_offsetof (op0, TREE_TYPE (exp0))
> >
> > Like this then?
> >
> > +   ret = fold_convert_loc (loc, TREE_TYPE (expr),
> > +   fold_offsetof (op0, TREE_TYPE (expr)));
> 
> I was thinking that we then wouldn't need the fold_convert at the call
> sites anymore, either.

The patch only converts to non-pointer types, I'm not sure if it is
desirable to do the same with pointer types (and most of the other callers
don't use convert, but fold_convert which is significantly different, the
former is emitting diagnostics, the latter is just an conversion + 
optimization).
If it is ok to use the final pointer type rather than the initial one, but
convert is not ok, then it would be something like:
  if (!POINTER_TYPE_P (type))
return convert (type, TREE_OPERAND (expr, 0));
  else
return fold_convert (type, TREE_OPERAND (expr, 0));
on the innermost constant and then indeed no conversion would be needed.

Jakub


Re: [C++ PATCH] Fix offsetof constexpr handling (PR c++/85662)

2018-05-09 Thread Jason Merrill
On Wed, May 9, 2018 at 4:55 AM, Jakub Jelinek  wrote:
> On Tue, May 08, 2018 at 11:28:18PM -0400, Jason Merrill wrote:
>> Maybe add a type parameter that defaults to size_type_node...
>>
>> > +   ret = fold_convert_loc (loc, TREE_TYPE (expr),
>> > +   fold_offsetof_1 (TREE_TYPE (expr), op0));
>>
>> ...and then this can be
>>
>>   fold_offsetof (op0, TREE_TYPE (exp0))
>
> Like this then?
>
> +   ret = fold_convert_loc (loc, TREE_TYPE (expr),
> +   fold_offsetof (op0, TREE_TYPE (expr)));

I was thinking that we then wouldn't need the fold_convert at the call
sites anymore, either.

Jason


Re: [PATCH] Fix PR c++/85400

2018-05-09 Thread Jason Merrill
OK.

On Wed, May 9, 2018 at 6:05 AM, Eric Botcazou  wrote:
>> So it isn't clear to me if a cxx_make_decl_one_only is the way to go.  Maybe
>> doing the recalculation in comdat_linkage and maybe_make_one_only only
>> would be sufficient.
>
> Patch to that effect attached, tested on x86-64/Linux, OK for mainline?
>
>
> 2018-05-09  Eric Botcazou  
>
> cp/
> PR c++/85400
> * decl2.c (adjust_var_decl_tls_model): New static function.
> (comdat_linkage): Call it on a variable.
> (maybe_make_one_only): Likewise.
>
> c-family/
> * c-attribs.c (handle_visibility_attribute): Do not set no_add_attrs.
>
> --
> Eric Botcazou


Re: [committed][AArch64] Predicated SVE comparison folds

2018-05-09 Thread Andreas Schwab
* gcc.target/aarch64/sve/vcond_6.c: Add missing brace.

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vcond_6.c 
b/gcc/testsuite/gcc.target/aarch64/sve/vcond_6.c
index a59f08d553..f41c94c400 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/vcond_6.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/vcond_6.c
@@ -54,5 +54,5 @@ TEST_ALL (LOOP)
and then use NOT, but the original BIC sequence is better.  It's a fairly
niche failure though.  We'd handle most other types of comparison by
using the inverse operation instead of a separate NOT.  */
-/* { dg-final { scan-assembler-times {\tbic\tp[0-9]+\.b, p[0-9]+/z, 
p[0-9]+\.b, p[0-9]+\.b} 3 { xfail *-*-* } } */
+/* { dg-final { scan-assembler-times {\tbic\tp[0-9]+\.b, p[0-9]+/z, 
p[0-9]+\.b, p[0-9]+\.b} 3 { xfail *-*-* } } } */
 /* { dg-final { scan-assembler-times {\torn\tp[0-9]+\.b, p[0-9]+/z, 
p[0-9]+\.b, p[0-9]+\.b} 3 } } */
-- 
2.17.0


Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [PING] [PATCH, libgomp, openacc] Factor out async argument utility functions

2018-05-09 Thread Tom de Vries

On 05/01/2018 10:50 PM, Tom de Vries wrote:

On 11/17/2017 02:18 PM, Tom de Vries wrote:

Hi,

I've factored out 3 new functions to test properties of enum acc_async_t:
...
typedef enum acc_async_t {
   /* Keep in sync with include/gomp-constants.h.  */
   acc_async_noval = -1,
   acc_async_sync  = -2
} acc_async_t;
...


In order to understand what this means:
...
   if (async < acc_async_noval)
...
you need to know the names and values of the enum.

Using the factored out functions, we get something that is easier to 
understand:

...
   if (async_synchronous_p (async))
...

Also I've changed the bodies of the functions to be robust against 
changes in values of acc_async_noval and acc_async_sync. No functional 
changes.


Build and tested on x86_64 with nvptx accelerator.

OK for trunk if bootstrap and reg-test on x86_64 succeeds?



Stage1 ping.



Assuming no objections, committed as attached.

Thanks,
- Tom
[openacc] Factor out async argument utility functions

2017-11-17  Tom de Vries  

	PR libgomp/83792
	* oacc-int.h (async_valid_stream_id_p, async_valid_p)
	(async_synchronous_p): New function.
	* oacc-async.c (acc_async_test, acc_wait, acc_wait_all_async): Use
	async_valid_p.
	* oacc-cuda.c (acc_get_cuda_stream, acc_set_cuda_stream): Use
	async_valid_stream_id_p.
	* oacc-mem.c (gomp_acc_remove_pointer): Use async_synchronous_p.
	* oacc-parallel.c (GOACC_parallel_keyed): Same.

---
 libgomp/oacc-async.c|  6 +++---
 libgomp/oacc-cuda.c |  4 ++--
 libgomp/oacc-int.h  | 22 ++
 libgomp/oacc-mem.c  |  2 +-
 libgomp/oacc-parallel.c |  2 +-
 5 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/libgomp/oacc-async.c b/libgomp/oacc-async.c
index 7cdb627..a4e1863 100644
--- a/libgomp/oacc-async.c
+++ b/libgomp/oacc-async.c
@@ -34,7 +34,7 @@
 int
 acc_async_test (int async)
 {
-  if (async < acc_async_sync)
+  if (!async_valid_p (async))
 gomp_fatal ("invalid async argument: %d", async);
 
   struct goacc_thread *thr = goacc_thread ();
@@ -59,7 +59,7 @@ acc_async_test_all (void)
 void
 acc_wait (int async)
 {
-  if (async < acc_async_sync)
+  if (!async_valid_p (async))
 gomp_fatal ("invalid async argument: %d", async);
 
   struct goacc_thread *thr = goacc_thread ();
@@ -117,7 +117,7 @@ acc_async_wait_all (void)
 void
 acc_wait_all_async (int async)
 {
-  if (async < acc_async_sync)
+  if (!async_valid_p (async))
 gomp_fatal ("invalid async argument: %d", async);
 
   struct goacc_thread *thr = goacc_thread ();
diff --git a/libgomp/oacc-cuda.c b/libgomp/oacc-cuda.c
index c388170..20774c1 100644
--- a/libgomp/oacc-cuda.c
+++ b/libgomp/oacc-cuda.c
@@ -58,7 +58,7 @@ acc_get_cuda_stream (int async)
 {
   struct goacc_thread *thr = goacc_thread ();
 
-  if (async < 0)
+  if (!async_valid_stream_id_p (async))
 return NULL;
 
   if (thr && thr->dev && thr->dev->openacc.cuda.get_stream_func)
@@ -72,7 +72,7 @@ acc_set_cuda_stream (int async, void *stream)
 {
   struct goacc_thread *thr;
 
-  if (async < 0 || stream == NULL)
+  if (!async_valid_stream_id_p (async) || stream == NULL)
 return 0;
 
   goacc_lazy_initialize ();
diff --git a/libgomp/oacc-int.h b/libgomp/oacc-int.h
index 912433a..cdd0f7f 100644
--- a/libgomp/oacc-int.h
+++ b/libgomp/oacc-int.h
@@ -99,6 +99,28 @@ void goacc_restore_bind (void);
 void goacc_lazy_initialize (void);
 void goacc_host_init (void);
 
+static inline bool
+async_valid_stream_id_p (int async)
+{
+  return async >= 0;
+}
+
+static inline bool
+async_valid_p (int async)
+{
+  return (async == acc_async_noval || async == acc_async_sync
+	  || async_valid_stream_id_p (async));
+}
+
+static inline bool
+async_synchronous_p (int async)
+{
+  if (!async_valid_p (async))
+return true;
+
+  return async == acc_async_sync;
+}
+
 #ifdef HAVE_ATTRIBUTE_VISIBILITY
 # pragma GCC visibility pop
 #endif
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 5cc8fcf..158f086 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -723,7 +723,7 @@ gomp_acc_remove_pointer (void *h, bool force_copyfrom, int async, int mapnum)
   gomp_mutex_unlock (_dev->lock);
 
   /* If running synchronously, unmap immediately.  */
-  if (async < acc_async_noval)
+  if (async_synchronous_p (async))
 gomp_unmap_vars (t, true);
   else
 t->device_descr->openacc.register_async_cleanup_func (t, async);
diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
index a71b399..cfba581 100644
--- a/libgomp/oacc-parallel.c
+++ b/libgomp/oacc-parallel.c
@@ -183,7 +183,7 @@ GOACC_parallel_keyed (int device, void (*fn) (void *),
 			  async, dims, tgt);
 
   /* If running synchronously, unmap immediately.  */
-  if (async < acc_async_noval)
+  if (async_synchronous_p (async))
 gomp_unmap_vars (tgt, true);
   else
 tgt->device_descr->openacc.register_async_cleanup_func (tgt, async);


Re: [PATCH, v2] Recognize a missed usage of a sbfiz instruction

2018-05-09 Thread Kyrill Tkachov


On 09/05/18 13:30, Luis Machado wrote:

Hi Kyrill,

On 05/08/2018 11:09 AM, Kyrill Tkachov wrote:

Hi Luis,

On 07/05/18 15:28, Luis Machado wrote:

Hi,

On 02/08/2018 10:45 AM, Luis Machado wrote:

Hi Kyrill,

On 02/08/2018 09:48 AM, Kyrill Tkachov wrote:

Hi Luis,

On 06/02/18 15:04, Luis Machado wrote:

Thanks for the feedback Kyrill. I've adjusted the v2 patch based on your
suggestions and re-tested the changes. Everything is still sane.


Thanks! This looks pretty good to me.


Since this is ARM-specific and fairly specific, i wonder if it would be
reasonable to consider it for inclusion at the current stage.


It is true that the target maintainers can choose to take
such patches at any stage. However, any patch at this stage increases
the risk of regressions being introduced and these regressions
can come bite us in ways that are very hard to anticipate.

Have a look at some of the bugs in bugzilla (or a quick scan of the gcc-bugs 
list)
for examples of the ways that things can go wrong with any of the myriad of GCC 
components
and the unexpected ways in which they can interact.

For example, I am now working on what I initially thought was a one-liner fix 
for
PR 84164 but it has expanded into a 3-patch series with a midend component and
target-specific changes for 2 ports.

These issues are very hard to catch during review and normal testing, and can 
sometimes take months of deep testing by
fuzzing and massive codebase rebuilds to expose, so the closer the commit is to 
a release
the higher the risk is that an obscure edge case will be unnoticed and unfixed 
in the release.

So the priority at this stage is to minimise the risk of destabilising the 
codebase,
as opposed to taking in new features and desirable performance improvements 
(like your patch!)

That is the rationale for delaying committing such changes until the start
of GCC 9 development. But again, this is up to the aarch64 maintainers.
I'm sure the patch will be a perfectly fine and desirable commit for GCC 9.
This is just my perspective as maintainer of the arm port.


Thanks. Your explanation makes the situation pretty clear and it sounds very 
reasonable. I'll put the patch on hold until development is open again.

Regards,
Luis


With GCC 9 development open, i take it this patch is worth considering again?



Yes, I believe the latest version is at:
https://gcc.gnu.org/ml/gcc-patches/2018-02/msg00239.html ?

+(define_insn "*ashift_extv_bfiz"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+(ashift:GPI (sign_extract:GPI (match_operand:GPI 1 "register_operand" "r")
+  (match_operand 2 "aarch64_simd_shift_imm_offset_" 
"n")
+  (match_operand 3 "aarch64_simd_shift_imm_" "n"))
+ (match_operand 4 "aarch64_simd_shift_imm_" "n")))]
+  ""
+  "sbfiz\\t%0, %1, %4, %2"
+  [(set_attr "type" "bfx")]
+)
+


Indeed.



Can you give a bit more information about what are the values for operands 2,3 
and 4 in your example testcases?


For sbfiz32 we have 3, 0 and 19 respectively. For sbfiz64 we have 6, 0 and 38.


I'm trying to understand why the value of operand 3 (the bit position the 
sign-extract starts from) doesn't get validated
in any way and doesn't play any role in the output...


This may be an oversight. It seems operand 3 will always be 0 in this particular 
case i'm covering. It starts from 0, gets shifted x bits to the left and then y 
< x bits to the right). The operation is essentially an ashift of the bitfield 
followed by a sign-extension of the msb of the bitfield being extracted.

Having a non-zero operand 3 from RTL means the shift amount won't translate 
directly to operand 3 of sbfiz (the position). Then we'd need to do a 
calculation where we take into account operand 3 from RTL.

I'm wondering when such a RTL pattern, with a non-zero operand 3, would be 
generated though.


I think it's best to enforce that operand 3 is a zero. Maybe just match 
const_int 0 here directly.
Better safe than sorry with these things.

Thanks,
Kyrill


[wwwdocs] about.html - simplify, update, add a bit

2018-05-09 Thread Gerald Pfeifer
I promised Martin to look into adding more information to
gcc.gnu.org/about.html for new contributors.

This isn't the actual meat, but a number of changes I found
while preparing for that.

Applied (last weekend actually).

Gerald

Split a long sentence.  Adjust the intro.  Add a note to add 
[wwwdocs] to mail subjects.  Streamline notes how to check-in.

Index: about.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/about.html,v
retrieving revision 1.27
diff -u -r1.27 about.html
--- about.html  2 Feb 2017 21:37:32 -   1.27
+++ about.html  6 May 2018 14:27:42 -
@@ -11,19 +11,20 @@
 These pages are maintained by the GCC team and it's easy to
 contribute.
 
-The web effort was originally led by Jeff Law.  For the last decade
-or so Gerald Pfeifer has been leading the effort, but there are
+The web effort was originally led by Jeff Law.  For the last two
+decades or so Gerald Pfeifer has been leading the effort, but there are
 many
 https://gcc.gnu.org/onlinedocs/gcc/Contributors.html;>contributors
 .
 
 The web pages are under CVS control.
 The pages on gcc.gnu.org are updated directly after a
-change has been committed; www.gnu.org is updated once a day at 4:00 -0700
+change has been committed. www.gnu.org is updated once a day at 4:00 -0700
 (PDT).
 
 Please send feedback, problem reports and patches to our
-mailing lists.
+mailing lists, ideally putting the
+string "[wwwdocs]" at the beginning of the mail subject.
 
 
 
@@ -62,10 +63,8 @@
 
 Checking in a change
 
-The following is a very quick overview of how
-to check in a change.  We recommend you list files explicitly
-to avoid accidental checkins and prefer that each checkin be of a
-complete, single logical change.
+We recommend you list files explicitly to avoid accidental checkins
+and prefer that each checkin be of a complete, single logical change.
 
 
 Sync your sources with the master repository via "cvs


[PATCH] Make std::function tolerate semantically non-CopyConstructible objects

2018-05-09 Thread Jonathan Wakely

To satisfy the CopyConstructible requirement a callable object stored in
a std::function must behave the same when copied from a const or
non-const source. If copying a non-const object doesn't produce an
equivalent copy then the behaviour is undefined. But we can make our
std::function more tolerant of such objects by ensuring we always copy
from a const lvalue.

Additionally use an if constexpr statement in the _M_get_pointer
function to avoid unnecessary instantiations in the discarded branch.

* include/bits/std_function.h (_Base_manager::_M_get_pointer):
Use constexpr if in C++17 mode.
(_Base_manager::_M_clone(_Any_data&, const _Any_data&, true_type)):
Copy from const object.
* testsuite/20_util/function/cons/non_copyconstructible.cc: New.

Tested powerpc64le-linux, committed to trunk.


commit 98de94098559575d68a3a639b78c6c61475f1d9c
Author: Jonathan Wakely 
Date:   Wed May 9 13:41:25 2018 +0100

Make std::function tolerate semantically non-CopyConstructible objects

To satisfy the CopyConstructible requirement a callable object stored in
a std::function must behave the same when copied from a const or
non-const source. If copying a non-const object doesn't produce an
equivalent copy then the behaviour is undefined. But we can make our
std::function more tolerant of such objects by ensuring we always copy
from a const lvalue.

Additionally use an if constexpr statement in the _M_get_pointer
function to avoid unnecessary instantiations in the discarded branch.

* include/bits/std_function.h (_Base_manager::_M_get_pointer):
Use constexpr if in C++17 mode.
(_Base_manager::_M_clone(_Any_data&, const _Any_data&, true_type)):
Copy from const object.
* testsuite/20_util/function/cons/non_copyconstructible.cc: New.

diff --git a/libstdc++-v3/include/bits/std_function.h 
b/libstdc++-v3/include/bits/std_function.h
index 96261355a9d..ee94d1ca81e 100644
--- a/libstdc++-v3/include/bits/std_function.h
+++ b/libstdc++-v3/include/bits/std_function.h
@@ -131,8 +131,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   class _Function_base
   {
   public:
-static const std::size_t _M_max_size = sizeof(_Nocopy_types);
-static const std::size_t _M_max_align = __alignof__(_Nocopy_types);
+static const size_t _M_max_size = sizeof(_Nocopy_types);
+static const size_t _M_max_align = __alignof__(_Nocopy_types);
 
 template
   class _Base_manager
@@ -150,10 +150,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
static _Functor*
_M_get_pointer(const _Any_data& __source)
{
- const _Functor* __ptr =
-   __stored_locally? std::__addressof(__source._M_access<_Functor>())
-   /* have stored a pointer */ : __source._M_access<_Functor*>();
- return const_cast<_Functor*>(__ptr);
+ if _GLIBCXX17_CONSTEXPR (__stored_locally)
+   {
+ const _Functor& __f = __source._M_access<_Functor>();
+ return const_cast<_Functor*>(std::__addressof(__f));
+   }
+ else // have stored a pointer
+   return __source._M_access<_Functor*>();
}
 
// Clone a location-invariant function object that fits within
@@ -170,7 +173,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
_M_clone(_Any_data& __dest, const _Any_data& __source, false_type)
{
  __dest._M_access<_Functor*>() =
-   new _Functor(*__source._M_access<_Functor*>());
+   new _Functor(*__source._M_access());
}
 
// Destroying a location-invariant object may still require
diff --git 
a/libstdc++-v3/testsuite/20_util/function/cons/non_copyconstructible.cc 
b/libstdc++-v3/testsuite/20_util/function/cons/non_copyconstructible.cc
new file mode 100644
index 000..d2a99925c58
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/function/cons/non_copyconstructible.cc
@@ -0,0 +1,39 @@
+// Copyright (C) 2018 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-do run { target c++11 } }
+
+#include 
+
+// This type is not CopyConstructible because copying a non-const lvalue
+// will call the throwing constructor.
+struct A
+{
+  A() = default;
+  A(const A&) { } // not 

[PATCH] Fix BB scalar costing

2018-05-09 Thread Richard Biener

The following fixes the same issue with scalar BB costing as I fixed
earlier this year with the loop scalar costing.  We are currently
comparing apples and oranges on x86 where the add_stmt_cost hook
uses costs dependent on the stmt operation while the old hook
does not (cannot).

Fixed as follows.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

I am considering to backport this after some soaking/benchmarking.

Richard.

2018-05-09  Richard Biener  

* tree-vect-slp.c (vect_bb_slp_scalar_cost): Fill a cost
vector.
(vect_bb_vectorization_profitable_p): Adjust.  Compute
actual scalar cost using the cost vector and the add_stmt_cost
machinery.

Index: gcc/tree-vect-slp.c
===
--- gcc/tree-vect-slp.c (revision 260072)
+++ gcc/tree-vect-slp.c (working copy)
@@ -2886,18 +2886,17 @@ vect_slp_analyze_operations (vec_info *v
and return it.  Do not account defs that are marked in LIFE and
update LIFE according to uses of NODE.  */
 
-static unsigned
+static void 
 vect_bb_slp_scalar_cost (basic_block bb,
-slp_tree node, vec *life)
+slp_tree node, vec *life,
+stmt_vector_for_cost *cost_vec)
 {
-  unsigned scalar_cost = 0;
   unsigned i;
   gimple *stmt;
   slp_tree child;
 
   FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt)
 {
-  unsigned stmt_cost;
   ssa_op_iter op_iter;
   def_operand_p def_p;
   stmt_vec_info stmt_info;
@@ -2933,17 +2932,17 @@ vect_bb_slp_scalar_cost (basic_block bb,
   gimple_set_visited (stmt, true);
 
   stmt_info = vinfo_for_stmt (stmt);
+  vect_cost_for_stmt kind;
   if (STMT_VINFO_DATA_REF (stmt_info))
 {
   if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info)))
-stmt_cost = vect_get_stmt_cost (scalar_load);
+   kind = scalar_load;
   else
-stmt_cost = vect_get_stmt_cost (scalar_store);
+   kind = scalar_store;
 }
   else
-stmt_cost = vect_get_stmt_cost (scalar_stmt);
-
-  scalar_cost += stmt_cost;
+   kind = scalar_stmt;
+  record_stmt_cost (cost_vec, 1, kind, stmt_info, 0, vect_body);
 }
 
   auto_vec subtree_life;
@@ -2954,12 +2953,10 @@ vect_bb_slp_scalar_cost (basic_block bb,
  /* Do not directly pass LIFE to the recursive call, copy it to
 confine changes in the callee to the current child/subtree.  */
  subtree_life.safe_splice (*life);
- scalar_cost += vect_bb_slp_scalar_cost (bb, child, _life);
+ vect_bb_slp_scalar_cost (bb, child, _life, cost_vec);
  subtree_life.truncate (0);
}
 }
-
-  return scalar_cost;
 }
 
 /* Check if vectorization of the basic block is profitable.  */
@@ -2974,14 +2971,30 @@ vect_bb_vectorization_profitable_p (bb_v
   unsigned int vec_prologue_cost = 0, vec_epilogue_cost = 0;
 
   /* Calculate scalar cost.  */
+  stmt_vector_for_cost scalar_costs;
+  scalar_costs.create (0);
   FOR_EACH_VEC_ELT (slp_instances, i, instance)
 {
   auto_vec life;
   life.safe_grow_cleared (SLP_INSTANCE_GROUP_SIZE (instance));
-  scalar_cost += vect_bb_slp_scalar_cost (BB_VINFO_BB (bb_vinfo),
- SLP_INSTANCE_TREE (instance),
- );
+  vect_bb_slp_scalar_cost (BB_VINFO_BB (bb_vinfo),
+  SLP_INSTANCE_TREE (instance),
+  , _costs);
+}
+  void *target_cost_data = init_cost (NULL);
+  stmt_info_for_cost *si;
+  FOR_EACH_VEC_ELT (scalar_costs, i, si)
+{
+  struct _stmt_vec_info *stmt_info
+ = si->stmt ? vinfo_for_stmt (si->stmt) : NULL;
+  (void) add_stmt_cost (target_cost_data, si->count,
+   si->kind, stmt_info, si->misalign,
+   vect_body);
 }
+  scalar_costs.release ();
+  unsigned dummy;
+  finish_cost (target_cost_data, , _cost, );
+  destroy_cost_data (target_cost_data);
 
   /* Unset visited flag.  */
   for (gimple_stmt_iterator gsi = bb_vinfo->region_begin;


Re: [PATCH 3/4] shrink-wrap: Improve spread_components (PR85645)

2018-05-09 Thread Segher Boessenkool
On Wed, May 09, 2018 at 09:33:30AM +0200, Eric Botcazou wrote:
> > Now, neither of the two branches needs to have LR restored at all,
> > because both of the branches end up in an infinite loop.
> > 
> > This patch makes spread_component return a boolean saying if anything
> > was changed, and if so, it is called again.  This obviously is finite
> > (there is a finite number of basic blocks, each with a finite number
> > of components, and spread_components can only assign more components
> > to a block, never less).  I also instrumented the code, and on a
> > bootstrap+regtest spread_components made changes a maximum of two
> > times.  Interestingly though it made changes on two iterations in
> > a third of the cases it did anything at all!
> 
> I don't know the code much so I don't see why this solves the problem.

When I wrote the code I thought it would reach the fixpoint after just
one iteration.  This is not true though, not e.g. in the motivating
example with no-return paths through the function.

It didn't generate wrong code btw, just pretty silly^W^Wnot terribly
good code.

> > 2018-05-08  Segher Boessenkool  
> > 
> > PR rtl-optimization/85645
> > * shrink-wrap.c (spread_components): Return a boolean saying if
> > anything was changed.
> > (try_shrink_wrapping_separate): Iterate spread_components until
> > nothing changes anymore.
> 
> OK if you add a comment in try_shrink_wrapping_separate with the rationale.

I made it say this:

  /* Try to minimize the number of saves and restores.  Do this as long as
 it changes anything.  This does not iterate more than a few times.  */
  int spread_times = 0;
  while (spread_components (components))
{
  spread_times++;

  if (dump_file)
fprintf (dump_file, "Now spread %d times.\n", spread_times);
}

Thanks for the quick reviews,


Segher


Re: [PATCH, v2] Recognize a missed usage of a sbfiz instruction

2018-05-09 Thread Luis Machado

Hi Kyrill,

On 05/08/2018 11:09 AM, Kyrill Tkachov wrote:

Hi Luis,

On 07/05/18 15:28, Luis Machado wrote:

Hi,

On 02/08/2018 10:45 AM, Luis Machado wrote:

Hi Kyrill,

On 02/08/2018 09:48 AM, Kyrill Tkachov wrote:

Hi Luis,

On 06/02/18 15:04, Luis Machado wrote:
Thanks for the feedback Kyrill. I've adjusted the v2 patch based on 
your

suggestions and re-tested the changes. Everything is still sane.


Thanks! This looks pretty good to me.

Since this is ARM-specific and fairly specific, i wonder if it 
would be

reasonable to consider it for inclusion at the current stage.


It is true that the target maintainers can choose to take
such patches at any stage. However, any patch at this stage increases
the risk of regressions being introduced and these regressions
can come bite us in ways that are very hard to anticipate.

Have a look at some of the bugs in bugzilla (or a quick scan of the 
gcc-bugs list)
for examples of the ways that things can go wrong with any of the 
myriad of GCC components

and the unexpected ways in which they can interact.

For example, I am now working on what I initially thought was a 
one-liner fix for
PR 84164 but it has expanded into a 3-patch series with a midend 
component and

target-specific changes for 2 ports.

These issues are very hard to catch during review and normal 
testing, and can sometimes take months of deep testing by
fuzzing and massive codebase rebuilds to expose, so the closer the 
commit is to a release
the higher the risk is that an obscure edge case will be unnoticed 
and unfixed in the release.


So the priority at this stage is to minimise the risk of 
destabilising the codebase,
as opposed to taking in new features and desirable performance 
improvements (like your patch!)


That is the rationale for delaying committing such changes until the 
start

of GCC 9 development. But again, this is up to the aarch64 maintainers.
I'm sure the patch will be a perfectly fine and desirable commit for 
GCC 9.

This is just my perspective as maintainer of the arm port.


Thanks. Your explanation makes the situation pretty clear and it 
sounds very reasonable. I'll put the patch on hold until development 
is open again.


Regards,
Luis


With GCC 9 development open, i take it this patch is worth considering 
again?




Yes, I believe the latest version is at:
https://gcc.gnu.org/ml/gcc-patches/2018-02/msg00239.html ?

+(define_insn "*ashift_extv_bfiz"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+    (ashift:GPI (sign_extract:GPI (match_operand:GPI 1 
"register_operand" "r")
+  (match_operand 2 
"aarch64_simd_shift_imm_offset_" "n")
+  (match_operand 3 "aarch64_simd_shift_imm_" 
"n"))

+ (match_operand 4 "aarch64_simd_shift_imm_" "n")))]
+  ""
+  "sbfiz\\t%0, %1, %4, %2"
+  [(set_attr "type" "bfx")]
+)
+


Indeed.



Can you give a bit more information about what are the values for 
operands 2,3 and 4 in your example testcases?


For sbfiz32 we have 3, 0 and 19 respectively. For sbfiz64 we have 6, 0 
and 38.


I'm trying to understand why the value of operand 3 (the bit position 
the sign-extract starts from) doesn't get validated

in any way and doesn't play any role in the output...


This may be an oversight. It seems operand 3 will always be 0 in this 
particular case i'm covering. It starts from 0, gets shifted x bits to 
the left and then y < x bits to the right). The operation is essentially 
an ashift of the bitfield followed by a sign-extension of the msb of the 
bitfield being extracted.


Having a non-zero operand 3 from RTL means the shift amount won't 
translate directly to operand 3 of sbfiz (the position). Then we'd need 
to do a calculation where we take into account operand 3 from RTL.


I'm wondering when such a RTL pattern, with a non-zero operand 3, would 
be generated though.


Re: Handle vector boolean types when calculating the SLP unroll factor

2018-05-09 Thread Richard Biener
On Wed, May 9, 2018 at 1:29 PM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Wed, May 9, 2018 at 12:34 PM, Richard Sandiford
>>  wrote:
>>> The SLP unrolling factor is calculated by finding the smallest
>>> scalar type for each SLP statement and taking the number of required
>>> lanes from the vector versions of those scalar types.  E.g. for an
>>> int32->int64 conversion, it's the vector of int32s rather than the
>>> vector of int64s that determines the unroll factor.
>>>
>>> We rely on tree-vect-patterns.c to replace boolean operations like:
>>>
>>>bool a, b, c;
>>>a = b & c;
>>>
>>> with integer operations of whatever the best size is in context.
>>> E.g. if b and c are fed by comparisons of ints, a, b and c will become
>>> the appropriate size for an int comparison.  For most targets this means
>>> that a, b and c will end up as int-sized themselves, but on targets like
>>> SVE and AVX512 with packed vector booleans, they'll instead become a
>>> small bitfield like :1, padded to a byte for memory purposes.
>>> The SLP code would then take these scalar types and try to calculate
>>> the vector type for them, causing the unroll factor to be much higher
>>> than necessary.
>>>
>>> This patch makes SLP use the cached vector boolean type if that's
>>> appropriate.  Tested on aarch64-linux-gnu (with and without SVE),
>>> aarch64_be-none-elf and x86_64-linux-gnu.  OK to install?
>>>
>>> Richard
>>>
>>>
>>> 2018-05-09  Richard Sandiford  
>>>
>>> gcc/
>>> * tree-vect-slp.c (get_vectype_for_smallest_scalar_type): New 
>>> function.
>>> (vect_build_slp_tree_1): Use it when calculating the unroll factor.
>>>
>>> gcc/testsuite/
>>> * gcc.target/aarch64/sve/vcond_10.c: New test.
>>> * gcc.target/aarch64/sve/vcond_10_run.c: Likewise.
>>> * gcc.target/aarch64/sve/vcond_11.c: Likewise.
>>> * gcc.target/aarch64/sve/vcond_11_run.c: Likewise.
>>>
>>> Index: gcc/tree-vect-slp.c
>>> ===
>>> --- gcc/tree-vect-slp.c 2018-05-08 09:42:03.526648115 +0100
>>> +++ gcc/tree-vect-slp.c 2018-05-09 11:30:41.061096063 +0100
>>> @@ -608,6 +608,41 @@ vect_record_max_nunits (vec_info *vinfo,
>>>return true;
>>>  }
>>>
>>> +/* Return the vector type associated with the smallest scalar type in 
>>> STMT.  */
>>> +
>>> +static tree
>>> +get_vectype_for_smallest_scalar_type (gimple *stmt)
>>> +{
>>> +  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>>> +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
>>> +  if (vectype != NULL_TREE
>>> +  && VECTOR_BOOLEAN_TYPE_P (vectype))
>>
>> Hum.  At this point you can't really rely on vector types being set...
>
> Not for everything, but here we only care about the result of the
> pattern replacements, and pattern replacements do set the vector type
> up-front.  vect_determine_vectorization_factor (which runs earlier
> for loop vectorisation) also relies on this.
>
>>> +{
>>> +  /* The result of a vector boolean operation has the smallest scalar
>>> +type unless the statement is extending an even narrower boolean.  
>>> */
>>> +  if (!gimple_assign_cast_p (stmt))
>>> +   return vectype;
>>> +
>>> +  tree src = gimple_assign_rhs1 (stmt);
>>> +  gimple *def_stmt;
>>> +  enum vect_def_type dt;
>>> +  tree src_vectype = NULL_TREE;
>>> +  if (vect_is_simple_use (src, stmt_info->vinfo, _stmt, ,
>>> + _vectype)
>>> + && src_vectype
>>> + && VECTOR_BOOLEAN_TYPE_P (src_vectype))
>>> +   {
>>> + if (TYPE_PRECISION (TREE_TYPE (src_vectype))
>>> + < TYPE_PRECISION (TREE_TYPE (vectype)))
>>> +   return src_vectype;
>>> + return vectype;
>>> +   }
>>> +}
>>> +  HOST_WIDE_INT dummy;
>>> +  tree scalar_type = vect_get_smallest_scalar_type (stmt, , );
>>> +  return get_vectype_for_scalar_type (scalar_type);
>>> +}
>>> +
>>>  /* Verify if the scalar stmts STMTS are isomorphic, require data
>>> permutation or are of unsupported types of operation.  Return
>>> true if they are, otherwise return false and indicate in *MATCHES
>>> @@ -636,12 +671,11 @@ vect_build_slp_tree_1 (vec_info *vinfo,
>>>enum tree_code first_cond_code = ERROR_MARK;
>>>tree lhs;
>>>bool need_same_oprnds = false;
>>> -  tree vectype = NULL_TREE, scalar_type, first_op1 = NULL_TREE;
>>> +  tree vectype = NULL_TREE, first_op1 = NULL_TREE;
>>>optab optab;
>>>int icode;
>>>machine_mode optab_op2_mode;
>>>machine_mode vec_mode;
>>> -  HOST_WIDE_INT dummy;
>>>gimple *first_load = NULL, *prev_first_load = NULL;
>>>
>>>/* For every stmt in NODE find its def stmt/s.  */
>>> @@ -685,15 +719,14 @@ vect_build_slp_tree_1 (vec_info *vinfo,
>>>   return false;
>>> }
>>>
>>> -  scalar_type = 

Re: [PATCH 1/3] Add PTWRITE builtins for x86

2018-05-09 Thread Uros Bizjak
On Wed, May 9, 2018 at 1:23 PM, Peryt, Sebastian
 wrote:
> I have rebased this patch to the latest trunk and addressed comments. Also, 
> there was a test in changelog,
> but not in the patch itself - this has been added.
>
> Is it ok for trunk and backport to GCC-8 after few days?
>
> gcc/
>
> * common/config/i386/i386-common.c (OPTION_MASK_ISA_PTWRITE_SET,
> OPTION_MASK_ISA_PTWRITE_UNSET): New.
> (ix86_handle_option): Handle OPT_mptwrite.
> * config/i386/cpuid.h (bit_PTWRITE): Add.
> * config/i386/driver-i386.c (host_detect_local_cpu): Detect
> PTWRITE CPUID.
> * config/i386/i386-builtin.def (PTWRITE): Add PTWRITE.
> * config/i386/i386-c.c (ix86_target_macros_internal):
> Support __PTWRITE__.
> * config/i386/i386.c (ix86_target_string): Add -mptwrite.
> (ix86_valid_target_attribute_inner_p): Support ptwrite.
> (ix86_init_mmx_sse_builtins): Add edges detection for ptwrites
> generated by vartrace.
> * config/i386/i386.h (TARGET_PTWRITE): Add.
> (TARGET_PTWRITE_P): Add.
> * config/i386/i386.md: Add ptwrite.
> * config/i386/i386.opt: Add -mptwrite.
> * config/i386/immintrin.h (target):
> (_ptwrite64): Add.
> (_ptwrite32): Add.
> * doc/extend.texi: Document ptwrite builtins.
> * doc/invoke.texi: Document -mptwrite.
>
> gcc/testsuite/
>
> * gcc.target/i386/ptwrite-1.c: New test.

@@ -31325,7 +31329,21 @@ ix86_init_mmx_sse_builtins (void)
 continue;

   ftype = (enum ix86_builtin_func_type) d->flag;
-  def_builtin2 (d->mask, d->name, ftype, d->code);
+  decl = def_builtin2 (d->mask, d->name, ftype, d->code);
+
+  /* Avoid edges for ptwrites generated by vartrace pass.  */
+  if (decl)
+{
+  DECL_ATTRIBUTES (decl) = build_tree_list (get_identifier ("leaf"),
+NULL_TREE);
+  TREE_NOTHROW (decl) = 1;
+}
+  else
+{
+  ix86_builtins_isa[(int)d->code].leaf_p = true;
+  ix86_builtins_isa[(int)d->code].nothrow_p = true;
+}
+

Can you please explain what is the purpose of the above change?

Uros.


Re: Incremental LTO linking part 2: lto-plugin support

2018-05-09 Thread H.J. Lu
On Wed, May 9, 2018 at 1:25 AM, Jan Hubicka  wrote:
>> On Tue, 8 May 2018, Jan Hubicka wrote:
>>
>> > > On Tue, May 8, 2018 at 8:14 AM, Jan Hubicka  wrote:
>> > > > Hi,
>> > > > with lto, incremental linking can be meaninfuly done in three ways:
>> > > >  1) read LTO file and produce non-LTO .o file
>> > > > this is current behaviour of gcc -r or ld -r with plugin
>> > > >  2) read LTO files and merge section for later LTO
>> > > > this is current behaviour of ld -r w/o plugin
>> > > >  3) read LTO files into the compiler, link them and produce
>> > > > incrementaly linked LTO object.
>> > > >
>> > > > 3 makes most sense and I am maing it new default for gcc -r. For 
>> > > > testing purposes
>> > > > and perhaps in order to have tool to turn LTO object into real object, 
>> > > > we want
>> > > > to have 1) available as well.  GCC currently have -flinker-output 
>> > > > option that
>> > > > decides between modes that is decided by linker plugin and can be 
>> > > > overwritten
>> > > > by user (I have forgot to document this).
>> > > >
>> > > > I am targeting for -flinker-output=rel to be incremental linking into 
>> > > > LTO
>> > > > and adding -flinker-output=nolto-rel for 1).
>> > > >
>> > > > The main limitation of 2 and 3 is that you can not link LTO and non-LTO
>> > > > object files theger.  For 2 HJ's binutils patchset has support and I 
>> > > > think
>> > > > it can be extended to handle 3 as well. But with default binutils we 
>> > > > want
>> > > > to warn users.  This patch implements the warning (and prevents linker 
>> > > > plugin
>> > > > to add redundat linker-ouptut options.
>> > >
>> > >
>> > > My users/hjl/lto-mixed/master branch is quite flexible.  I can extend
>> > > it if needed.
>> >
>> > I think once the main patchset settles down we could add a way to 
>> > communicate
>> > to lto-plugin if combined lto+non-lto .o files are supported by linker and 
>> > sillence
>> > the warning.
>>
>> How does the patchset deal with partially linking fat objects?  How
>
> Currently it will turn them into slim LTO merged object. I can add code path
> that will optimize them into binary. That will be additional fun because we 
> probably
> want to WPA them, but it should not be that hard to implement: WPA will 
> produce one
> object file with merged LTO data that will be passed to linker plus 
> partitions that will
> be turned to final binary.
>
>> do HJs binutils deal with them when you consider a fat object partially
>> linked with a non-LTO object?
>

It should just work since the non-LTO object is stored in a special section.
Linker can merge the special sections from multiple input files for ld -r.
Let me know if you run into any issues.

-- 
H.J.


Re: [PATCH][i386] Adding CLDEMOTE instruction

2018-05-09 Thread Uros Bizjak
On Tue, May 8, 2018 at 1:58 PM, Peryt, Sebastian
 wrote:
> Sorry, forgot attachment.
>
> Sebastian
>
>
> -Original Message-
> From: Peryt, Sebastian
> Sent: Tuesday, May 8, 2018 1:56 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Uros Bizjak ; Kirill Yukhin ; 
> Peryt, Sebastian 
> Subject: [PATCH][i386] Adding CLDEMOTE instruction
>
> Hi,
>
> This patch adds support for CLDEMOTE instruction.
>
> Is it ok for trunk and after few day for backport to GCC-8?
>
> 2018-05-08  Sebastian Peryt  
>
> gcc/
>
> * common/config/i386/i386-common.c (OPTION_MASK_ISA_CLDEMOTE_SET,
> OPTION_MASK_ISA_CLDEMOTE_UNSET): New defines.
> (ix86_handle_option): Handle -mcldemote.
> * config.gcc: New header.
> * config/i386/cldemoteintrin.h: New file.
> * config/i386/cpuid.h (bit_CLDEMOTE): New bit.
> * config/i386/driver-i386.c (host_detect_local_cpu): Detect
> -mcldemote.
> * config/i386/i386-c.c (ix86_target_macros_internal): Handle
> OPTION_MASK_ISA_CLDEMOTE.
> * config/i386/i386.c (ix86_target_string): Added -mcldemote.
> (ix86_valid_target_attribute_inner_p): Ditto.
> (enum ix86_builtins): Added IX86_BUILTIN_CLDEMOTE.
> (ix86_init_mmx_sse_builtins): Define __builtin_ia32_cldemote.
> (ix86_expand_builtin): Expand IX86_BUILTIN_CLDEMOTE.
> * config/i386/i386.h (TARGET_CLDEMOTE, TARGET_CLDEMOTE_P): New.
> * config/i386/i386.md (UNSPECV_CLDEMOTE): New.
> (cldemote): New.
> * config/i386/i386.opt: Added -mcldemote.
> * config/i386/x86intrin.h: New header.
> * doc/invoke.texi: Added -mcldemote.
>
> 2018-05-08  Sebastian Peryt  
>
> gcc/testsuite/
>
> * gcc.target/i386/cldemote-1.c: New test.

OK for mainline.

is there a compelling reason why we want this new feature in gcc-8
release branch?

Thanks,
Uros.


Re: [PATCH][i386] Adding WAITPKG instructions

2018-05-09 Thread Uros Bizjak
On Tue, May 8, 2018 at 1:34 PM, Peryt, Sebastian
 wrote:
> Hi,
>
> This patch adds support for WAITPKG instructions.
>
> Is it ok for trunk and after few day for backport to GCC-8?
>
> 2018-05-08  Sebastian Peryt  
>
> gcc/
>
> * common/config/i386/i386-common.c (OPTION_MASK_ISA_WAITPKG_SET,
> OPTION_MASK_ISA_WAITPKG_UNSET): New defines.
> (ix86_handle_option): Handle -mwaitpkg.
> * config.gcc: New header.
> * config/i386/cpuid.h (bit_WAITPKG): New bit.
> * config/i386/driver-i386.c (host_detect_local_cpu): Detect -mwaitpkg.
> * config/i386/i386-builtin-types.def ((UINT8, UNSIGNED, UINT64)): New
> function type.
> * config/i386/i386-c.c (ix86_target_macros_internal): Handle
> OPTION_MASK_ISA_WAITPKG
> * config/i386/i386.c (ix86_target_string): Added -mwaitpkg.
> (ix86_option_override_internal): Added PTA_WAITPKG.
> (ix86_valid_target_attribute_inner_p): Added -mwaitpkg.
> (enum ix86_builtins): Added IX86_BUILTIN_UMONITOR, 
> IX86_BUILTIN_UMWAIT,
> IX86_BUILTIN_TPAUSE.
> (ix86_init_mmx_sse_builtins): Define __builtin_ia32_umonitor,
> __builtin_ia32_umwait and __builtin_ia32_tpause.
> (ix86_expand_builtin):Expand  IX86_BUILTIN_UMONITOR,
> IX86_BUILTIN_UMWAIT, IX86_BUILTIN_TPAUSE.
> * config/i386/i386.h (TARGET_WAITPKG, TARGET_WAITPKG_P): New.
> * config/i386/i386.opt: Added -mwaitpkg.
> * config/i386/sse.md (UNSPECV_UMWAIT, UNSPECV_UMONITOR,
> UNSPECV_TPAUSE): New.
> (umwait, umonitor_, tpause): New.
> * config/i386/waitpkgintrin.h: New file.
> * config/i386/x86intrin.h: New header.
> * doc/invoke.texi: Added -mwaitpkg.
>
> 2018-05-08  Sebastian Peryt  
>
> gcc/testsuite/
>
> * gcc.target/i386/tpause-1.c: New test.
> * gcc.target/i386/umonitor-1.c: New test.
>
> Thanks,
> Sebastian
>
>

+case IX86_BUILTIN_UMONITOR:
+  arg0 = CALL_EXPR_ARG (exp, 0);
+  op0 = expand_normal (arg0);
+  if (!REG_P (op0))
+op0 = ix86_zero_extend_to_Pmode (op0);
+
+  emit_insn (ix86_gen_umonitor (op0));
+  return 0;

Please see how movdir64b handles its address operand. Also, do not use
global ix86_gen_monitor, just expand directly in the same way as
movdir64b.

+case IX86_BUILTIN_UMWAIT:
+case IX86_BUILTIN_TPAUSE:
+  rtx eax, edx, op1_lo, op1_hi;
+  arg0 = CALL_EXPR_ARG (exp, 0);
+  arg1 = CALL_EXPR_ARG (exp, 1);
+  op0 = expand_normal (arg0);
+  op1 = expand_normal (arg1);
+  eax = gen_rtx_REG (SImode, AX_REG);
+  edx = gen_rtx_REG (SImode, DX_REG);
+  if (!REG_P (op0))
+op0 = copy_to_mode_reg (SImode, op0);
+  if (!REG_P (op1))
+op1 = copy_to_mode_reg (DImode, op1);
+  op1_lo = gen_lowpart (SImode, op1);
+  op1_hi = expand_shift (RSHIFT_EXPR, DImode, op1,
+ GET_MODE_BITSIZE (SImode), 0, 1);
+  op1_hi = convert_modes (SImode, DImode, op1_hi, 1);
+  emit_move_insn (eax, op1_lo);
+  emit_move_insn (edx, op1_hi);
+  emit_insn (fcode == IX86_BUILTIN_UMWAIT
+? gen_umwait (op0, eax, edx)
+: gen_tpause (op0, eax, edx));
+
+  /* Return current CF value.  */
+  op3 = gen_rtx_REG (CCCmode, FLAGS_REG);
+  target = gen_rtx_LTU (QImode, op3, const0_rtx);
+
+  return target;

For the above code, please see how xsetbv expansion and patterns are
handling their input operands. There should be two patterns, one for
32bit and the other for 64bit targets. The patterns will need to set
FLAGS_REG, otherwise the test will be removed.

+(define_insn "umwait"
+  [(unspec_volatile [(match_operand:SI 0 "register_operand" "r")
+ (use (match_operand:SI 1 "register_operand" "a"))
+ (use (match_operand:SI 2 "register_operand" "d"))]
+UNSPECV_UMWAIT)]
+  "TARGET_WAITPKG"
+  "umwait\t{%0}"
+  [(set_attr "length" "3")])

No need for "use" RTX here and in other patterns. You should also
remove {} from insn template, otherwise there will be no operand
printed in some asm dialect.

Uros.


Re: Handle vector boolean types when calculating the SLP unroll factor

2018-05-09 Thread Richard Sandiford
Richard Biener  writes:
> On Wed, May 9, 2018 at 12:34 PM, Richard Sandiford
>  wrote:
>> The SLP unrolling factor is calculated by finding the smallest
>> scalar type for each SLP statement and taking the number of required
>> lanes from the vector versions of those scalar types.  E.g. for an
>> int32->int64 conversion, it's the vector of int32s rather than the
>> vector of int64s that determines the unroll factor.
>>
>> We rely on tree-vect-patterns.c to replace boolean operations like:
>>
>>bool a, b, c;
>>a = b & c;
>>
>> with integer operations of whatever the best size is in context.
>> E.g. if b and c are fed by comparisons of ints, a, b and c will become
>> the appropriate size for an int comparison.  For most targets this means
>> that a, b and c will end up as int-sized themselves, but on targets like
>> SVE and AVX512 with packed vector booleans, they'll instead become a
>> small bitfield like :1, padded to a byte for memory purposes.
>> The SLP code would then take these scalar types and try to calculate
>> the vector type for them, causing the unroll factor to be much higher
>> than necessary.
>>
>> This patch makes SLP use the cached vector boolean type if that's
>> appropriate.  Tested on aarch64-linux-gnu (with and without SVE),
>> aarch64_be-none-elf and x86_64-linux-gnu.  OK to install?
>>
>> Richard
>>
>>
>> 2018-05-09  Richard Sandiford  
>>
>> gcc/
>> * tree-vect-slp.c (get_vectype_for_smallest_scalar_type): New 
>> function.
>> (vect_build_slp_tree_1): Use it when calculating the unroll factor.
>>
>> gcc/testsuite/
>> * gcc.target/aarch64/sve/vcond_10.c: New test.
>> * gcc.target/aarch64/sve/vcond_10_run.c: Likewise.
>> * gcc.target/aarch64/sve/vcond_11.c: Likewise.
>> * gcc.target/aarch64/sve/vcond_11_run.c: Likewise.
>>
>> Index: gcc/tree-vect-slp.c
>> ===
>> --- gcc/tree-vect-slp.c 2018-05-08 09:42:03.526648115 +0100
>> +++ gcc/tree-vect-slp.c 2018-05-09 11:30:41.061096063 +0100
>> @@ -608,6 +608,41 @@ vect_record_max_nunits (vec_info *vinfo,
>>return true;
>>  }
>>
>> +/* Return the vector type associated with the smallest scalar type in STMT. 
>>  */
>> +
>> +static tree
>> +get_vectype_for_smallest_scalar_type (gimple *stmt)
>> +{
>> +  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>> +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
>> +  if (vectype != NULL_TREE
>> +  && VECTOR_BOOLEAN_TYPE_P (vectype))
>
> Hum.  At this point you can't really rely on vector types being set...

Not for everything, but here we only care about the result of the
pattern replacements, and pattern replacements do set the vector type
up-front.  vect_determine_vectorization_factor (which runs earlier
for loop vectorisation) also relies on this.

>> +{
>> +  /* The result of a vector boolean operation has the smallest scalar
>> +type unless the statement is extending an even narrower boolean.  */
>> +  if (!gimple_assign_cast_p (stmt))
>> +   return vectype;
>> +
>> +  tree src = gimple_assign_rhs1 (stmt);
>> +  gimple *def_stmt;
>> +  enum vect_def_type dt;
>> +  tree src_vectype = NULL_TREE;
>> +  if (vect_is_simple_use (src, stmt_info->vinfo, _stmt, ,
>> + _vectype)
>> + && src_vectype
>> + && VECTOR_BOOLEAN_TYPE_P (src_vectype))
>> +   {
>> + if (TYPE_PRECISION (TREE_TYPE (src_vectype))
>> + < TYPE_PRECISION (TREE_TYPE (vectype)))
>> +   return src_vectype;
>> + return vectype;
>> +   }
>> +}
>> +  HOST_WIDE_INT dummy;
>> +  tree scalar_type = vect_get_smallest_scalar_type (stmt, , );
>> +  return get_vectype_for_scalar_type (scalar_type);
>> +}
>> +
>>  /* Verify if the scalar stmts STMTS are isomorphic, require data
>> permutation or are of unsupported types of operation.  Return
>> true if they are, otherwise return false and indicate in *MATCHES
>> @@ -636,12 +671,11 @@ vect_build_slp_tree_1 (vec_info *vinfo,
>>enum tree_code first_cond_code = ERROR_MARK;
>>tree lhs;
>>bool need_same_oprnds = false;
>> -  tree vectype = NULL_TREE, scalar_type, first_op1 = NULL_TREE;
>> +  tree vectype = NULL_TREE, first_op1 = NULL_TREE;
>>optab optab;
>>int icode;
>>machine_mode optab_op2_mode;
>>machine_mode vec_mode;
>> -  HOST_WIDE_INT dummy;
>>gimple *first_load = NULL, *prev_first_load = NULL;
>>
>>/* For every stmt in NODE find its def stmt/s.  */
>> @@ -685,15 +719,14 @@ vect_build_slp_tree_1 (vec_info *vinfo,
>>   return false;
>> }
>>
>> -  scalar_type = vect_get_smallest_scalar_type (stmt, , );
>
> ... so I wonder how this goes wrong here.

It picks the right scalar type, but then we go on to use
get_vectype_for_scalar_type when get_mask_type_for_scalar_type
is what we actually want.  The 

RE: [PATCH 1/3] Add PTWRITE builtins for x86

2018-05-09 Thread Peryt, Sebastian
I have rebased this patch to the latest trunk and addressed comments. Also, 
there was a test in changelog,
but not in the patch itself - this has been added.

Is it ok for trunk and backport to GCC-8 after few days?

gcc/

* common/config/i386/i386-common.c (OPTION_MASK_ISA_PTWRITE_SET,
OPTION_MASK_ISA_PTWRITE_UNSET): New.
(ix86_handle_option): Handle OPT_mptwrite.
* config/i386/cpuid.h (bit_PTWRITE): Add.
* config/i386/driver-i386.c (host_detect_local_cpu): Detect
PTWRITE CPUID.
* config/i386/i386-builtin.def (PTWRITE): Add PTWRITE.
* config/i386/i386-c.c (ix86_target_macros_internal):
Support __PTWRITE__.
* config/i386/i386.c (ix86_target_string): Add -mptwrite.
(ix86_valid_target_attribute_inner_p): Support ptwrite.
(ix86_init_mmx_sse_builtins): Add edges detection for ptwrites
generated by vartrace.
* config/i386/i386.h (TARGET_PTWRITE): Add.
(TARGET_PTWRITE_P): Add.
* config/i386/i386.md: Add ptwrite.
* config/i386/i386.opt: Add -mptwrite.
* config/i386/immintrin.h (target):
(_ptwrite64): Add.
(_ptwrite32): Add.
* doc/extend.texi: Document ptwrite builtins.
* doc/invoke.texi: Document -mptwrite.

gcc/testsuite/

* gcc.target/i386/ptwrite-1.c: New test.

Sebastian


> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Andi Kleen
> Sent: Monday, February 12, 2018 3:53 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Metzger, Markus T ; ubiz...@gmail.com;
> Andi Kleen 
> Subject: [PATCH 1/3] Add PTWRITE builtins for x86
> 
> From: Andi Kleen 
> 
> Add builtins/intrinsics for PTWRITE. PTWRITE is a new instruction on Intel 
> Cherry
> Trail that allows to write values into the Processor Trace log.
> 
> This is fairly straight forward, except I had to add isa2 support for variable
> number of operands.
> 
> gcc/:
> 
> 2018-02-10  Andi Kleen  
> 
>   * common/config/i386/i386-common.c
> (OPTION_MASK_ISA_PTWRITE_SET):
>   (OPTION_MASK_ISA_PTWRITE_UNSET): New.
>   (ix86_handle_option): Handle OPT_mptwrite.
>   * config/i386/cpuid.h (bit_PTWRITE): Add.
>   * config/i386/driver-i386.c (host_detect_local_cpu): Detect
>   PTWRITE CPUID.
>   * config/i386/i386-builtin.def (PTWRITE): Add PTWRITE.
>   * config/i386/i386-c.c (ix86_target_macros_internal):
>   Support __PTWRITE__.
>   * config/i386/i386.c (ix86_target_string): Add -mptwrite.
>   (ix86_valid_target_attribute_inner_p): Support ptwrite.
>   (BDESC_VERIFYS): Verify SPECIAL_ARGS2.
>   (ix86_init_mmx_sse_builtins): Handle special args2.
>   * config/i386/i386.h (TARGET_PTWRITE): Add.
>   (TARGET_PTWRITE_P): Add.
>   * config/i386/i386.md: Add ptwrite.
>   * config/i386/i386.opt: Add -mptwrite.
>   * config/i386/immintrin.h (target):
>   (_ptwrite_u64): Add.
>   (_ptwrite_u32): Add.
>   * doc/extend.texi: Document ptwrite builtins.
>   * doc/invoke.texi: Document -mptwrite.
> 
> gcc/testsuite/:
> 
> 2018-02-10  Andi Kleen  
> 
>   * gcc.target/i386/ptwrite1.c: New test.
>   * gcc.target/i386/ptwrite2.c: New test.


0001-PTWRITE-intrinsics.patch
Description: 0001-PTWRITE-intrinsics.patch


[PATCH, i386]: Implement usadv64qi

2018-05-09 Thread Uros Bizjak
This patch adds usadv64qi expander, so the compiler is able to
vectorize with 512bit vpsadbw insn.

2017-05-09  Uros Bizjak  

PR target/85693
* config/i386/sse.md (usadv64qi): New expander.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

OK for mainline?

Uros.
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index ae6294e559c..0e625a4cc58 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -10878,6 +10878,21 @@
   DONE;
 })
 
+(define_expand "usadv64qi"
+  [(match_operand:V16SI 0 "register_operand")
+   (match_operand:V64QI 1 "register_operand")
+   (match_operand:V64QI 2 "nonimmediate_operand")
+   (match_operand:V16SI 3 "nonimmediate_operand")]
+  "TARGET_AVX512BW"
+{
+  rtx t1 = gen_reg_rtx (V8DImode);
+  rtx t2 = gen_reg_rtx (V16SImode);
+  emit_insn (gen_avx512f_psadbw (t1, operands[1], operands[2]));
+  convert_move (t2, t1, 0);
+  emit_insn (gen_addv16si3 (operands[0], t2, operands[3]));
+  DONE;
+})
+
 (define_insn "ashr3"
   [(set (match_operand:VI248_AVX512BW_1 0 "register_operand" "=v,v")
(ashiftrt:VI248_AVX512BW_1


Re: Handle vector boolean types when calculating the SLP unroll factor

2018-05-09 Thread Richard Biener
On Wed, May 9, 2018 at 12:34 PM, Richard Sandiford
 wrote:
> The SLP unrolling factor is calculated by finding the smallest
> scalar type for each SLP statement and taking the number of required
> lanes from the vector versions of those scalar types.  E.g. for an
> int32->int64 conversion, it's the vector of int32s rather than the
> vector of int64s that determines the unroll factor.
>
> We rely on tree-vect-patterns.c to replace boolean operations like:
>
>bool a, b, c;
>a = b & c;
>
> with integer operations of whatever the best size is in context.
> E.g. if b and c are fed by comparisons of ints, a, b and c will become
> the appropriate size for an int comparison.  For most targets this means
> that a, b and c will end up as int-sized themselves, but on targets like
> SVE and AVX512 with packed vector booleans, they'll instead become a
> small bitfield like :1, padded to a byte for memory purposes.
> The SLP code would then take these scalar types and try to calculate
> the vector type for them, causing the unroll factor to be much higher
> than necessary.
>
> This patch makes SLP use the cached vector boolean type if that's
> appropriate.  Tested on aarch64-linux-gnu (with and without SVE),
> aarch64_be-none-elf and x86_64-linux-gnu.  OK to install?
>
> Richard
>
>
> 2018-05-09  Richard Sandiford  
>
> gcc/
> * tree-vect-slp.c (get_vectype_for_smallest_scalar_type): New 
> function.
> (vect_build_slp_tree_1): Use it when calculating the unroll factor.
>
> gcc/testsuite/
> * gcc.target/aarch64/sve/vcond_10.c: New test.
> * gcc.target/aarch64/sve/vcond_10_run.c: Likewise.
> * gcc.target/aarch64/sve/vcond_11.c: Likewise.
> * gcc.target/aarch64/sve/vcond_11_run.c: Likewise.
>
> Index: gcc/tree-vect-slp.c
> ===
> --- gcc/tree-vect-slp.c 2018-05-08 09:42:03.526648115 +0100
> +++ gcc/tree-vect-slp.c 2018-05-09 11:30:41.061096063 +0100
> @@ -608,6 +608,41 @@ vect_record_max_nunits (vec_info *vinfo,
>return true;
>  }
>
> +/* Return the vector type associated with the smallest scalar type in STMT.  
> */
> +
> +static tree
> +get_vectype_for_smallest_scalar_type (gimple *stmt)
> +{
> +  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> +  if (vectype != NULL_TREE
> +  && VECTOR_BOOLEAN_TYPE_P (vectype))

Hum.  At this point you can't really rely on vector types being set...

> +{
> +  /* The result of a vector boolean operation has the smallest scalar
> +type unless the statement is extending an even narrower boolean.  */
> +  if (!gimple_assign_cast_p (stmt))
> +   return vectype;
> +
> +  tree src = gimple_assign_rhs1 (stmt);
> +  gimple *def_stmt;
> +  enum vect_def_type dt;
> +  tree src_vectype = NULL_TREE;
> +  if (vect_is_simple_use (src, stmt_info->vinfo, _stmt, ,
> + _vectype)
> + && src_vectype
> + && VECTOR_BOOLEAN_TYPE_P (src_vectype))
> +   {
> + if (TYPE_PRECISION (TREE_TYPE (src_vectype))
> + < TYPE_PRECISION (TREE_TYPE (vectype)))
> +   return src_vectype;
> + return vectype;
> +   }
> +}
> +  HOST_WIDE_INT dummy;
> +  tree scalar_type = vect_get_smallest_scalar_type (stmt, , );
> +  return get_vectype_for_scalar_type (scalar_type);
> +}
> +
>  /* Verify if the scalar stmts STMTS are isomorphic, require data
> permutation or are of unsupported types of operation.  Return
> true if they are, otherwise return false and indicate in *MATCHES
> @@ -636,12 +671,11 @@ vect_build_slp_tree_1 (vec_info *vinfo,
>enum tree_code first_cond_code = ERROR_MARK;
>tree lhs;
>bool need_same_oprnds = false;
> -  tree vectype = NULL_TREE, scalar_type, first_op1 = NULL_TREE;
> +  tree vectype = NULL_TREE, first_op1 = NULL_TREE;
>optab optab;
>int icode;
>machine_mode optab_op2_mode;
>machine_mode vec_mode;
> -  HOST_WIDE_INT dummy;
>gimple *first_load = NULL, *prev_first_load = NULL;
>
>/* For every stmt in NODE find its def stmt/s.  */
> @@ -685,15 +719,14 @@ vect_build_slp_tree_1 (vec_info *vinfo,
>   return false;
> }
>
> -  scalar_type = vect_get_smallest_scalar_type (stmt, , );

... so I wonder how this goes wrong here.

I suppose we want to ignore vector booleans for the purpose of max_nunits
computation.  So isn't a better fix to simply "ignore" those in
vect_get_smallest_scalar_type instead?  I see that for intermediate
full-boolean operations like

  a = x[i] < 0;
  b = y[i] > 0;
  tem = a & b;

we want to ignore 'tem = a & b' fully here for the purpose of
vect_record_max_nunits.  So if scalar_type is a bitfield type
then skip it?

Richard.

> -  vectype = get_vectype_for_scalar_type (scalar_type);
> +  vectype = get_vectype_for_smallest_scalar_type (stmt);
>

Re: [og7] Update deviceptr handling in Fortran

2018-05-09 Thread Thomas Schwinge
Hi Cesar!

On Mon, 7 May 2018 08:49:26 -0700, Cesar Philippidis  
wrote:
> This patch teaches both the Fortran FE and the gimplifier how to only
> utilize one data mapping for OpenACC deviceptr clauses.  [...]

Thanks!  (I didn't verify your code changes.)


> In addition to XPASS'ing devicetpr-1.f90, this patch [...]

Apart from one remaining XFAIL for "-Os" (see PR80995), I now too see the
following XPASSes on my main development machine:

PASS: libgomp.oacc-fortran/deviceptr-1.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  (test for excess errors)
PASS: libgomp.oacc-fortran/deviceptr-1.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  execution test
PASS: libgomp.oacc-fortran/deviceptr-1.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O1  (test for excess errors)
PASS: libgomp.oacc-fortran/deviceptr-1.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O1  execution test
[-XFAIL:-]{+XPASS:+} libgomp.oacc-fortran/deviceptr-1.f90 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  (test 
for excess errors)
PASS: libgomp.oacc-fortran/deviceptr-1.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  execution test
[-XFAIL:-]{+XPASS:+} libgomp.oacc-fortran/deviceptr-1.f90 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O3 
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  
(test for excess errors)
PASS: libgomp.oacc-fortran/deviceptr-1.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
[-XFAIL:-]{+XPASS:+} libgomp.oacc-fortran/deviceptr-1.f90 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O3 -g  
(test for excess errors)
PASS: libgomp.oacc-fortran/deviceptr-1.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O3 -g  execution test
XFAIL: libgomp.oacc-fortran/deviceptr-1.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -Os  (test for excess errors)
PASS: libgomp.oacc-fortran/deviceptr-1.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -Os  execution test

> I've applied this patch to og7 [...]. It was tempting to remove the
> XFAIL from deviceptr-1.f90, but the test case still fails on at least
> one legacy driver.

That's surprising.  These XFAILs were because "OpenACC kernels construct
will be executed sequentially", so shouldn't have any relationship to
Nvidia driver versions.  If you identified such a problem (which versions
and hardware exactly?), that's a separate problam and needs to be filed
as a new issue, and the reference in the test case file updated.  So
please verify that, and/or alternatively remove the non-"-Os" XFAILs.


Also please verify and resolve the following regression introduced by
your patch:

PASS: c-c++-common/goacc/deviceptr-4.c (test for excess errors)
[-PASS:-]{+FAIL:+} c-c++-common/goacc/deviceptr-4.c scan-tree-dump-times 
gimple "#pragma omp target oacc_parallel.*map\\(tofrom:a" 1

[-PASS:-]{+FAIL:+} c-c++-common/goacc/deviceptr-4.c  -std=c++11  
scan-tree-dump-times gimple "#pragma omp target oacc_parallel.*map\\(tofrom:a" 1
PASS: c-c++-common/goacc/deviceptr-4.c  -std=c++11 (test for excess errors)
[-PASS:-]{+FAIL:+} c-c++-common/goacc/deviceptr-4.c  -std=c++14  
scan-tree-dump-times gimple "#pragma omp target oacc_parallel.*map\\(tofrom:a" 1
PASS: c-c++-common/goacc/deviceptr-4.c  -std=c++14 (test for excess errors)
[-PASS:-]{+FAIL:+} c-c++-common/goacc/deviceptr-4.c  -std=c++98  
scan-tree-dump-times gimple "#pragma omp target oacc_parallel.*map\\(tofrom:a" 1
PASS: c-c++-common/goacc/deviceptr-4.c  -std=c++98 (test for excess errors)


Grüße
 Thomas


[nvptx, PR85626, committed] Make trap insn noreturn

2018-05-09 Thread Tom de Vries

Hi,

the nvptx trap* define_insns are implemented using the ptx insn 'trap' . 
The ptx insn 'trap' may however return, and therefore the ptx insn 
'exit' is needed after the 'trap'.


Fixed by attached patch.

Build x86_64 with nvptx accelerator.

Committed to trunk.

Thanks,
- Tom
[nvptx] Make trap insn noreturn

2018-05-09  Tom de Vries  

	PR target/85626
	* config/nvptx/nvptx.md (define_insn "trap", define_insn "trap_if_true")
	(define_insn "trap_if_false"): Add exit after trap.

---
 gcc/config/nvptx/nvptx.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 9754219..2988f5d 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -1101,14 +1101,14 @@
 (define_insn "trap"
   [(trap_if (const_int 1) (const_int 0))]
   ""
-  "trap;")
+  "trap; exit;")
 
 (define_insn "trap_if_true"
   [(trap_if (ne (match_operand:BI 0 "nvptx_register_operand" "R")
 		(const_int 0))
 	(const_int 0))]
   ""
-  "%j0 trap;"
+  "%j0 trap; %j0 exit;"
   [(set_attr "predicable" "false")])
 
 (define_insn "trap_if_false"
@@ -1116,7 +1116,7 @@
 		(const_int 0))
 	(const_int 0))]
   ""
-  "%J0 trap;"
+  "%J0 trap; %J0 exit;"
   [(set_attr "predicable" "false")])
 
 (define_expand "ctrap4"


Handle vector boolean types when calculating the SLP unroll factor

2018-05-09 Thread Richard Sandiford
The SLP unrolling factor is calculated by finding the smallest
scalar type for each SLP statement and taking the number of required
lanes from the vector versions of those scalar types.  E.g. for an
int32->int64 conversion, it's the vector of int32s rather than the
vector of int64s that determines the unroll factor.

We rely on tree-vect-patterns.c to replace boolean operations like:

   bool a, b, c;
   a = b & c;

with integer operations of whatever the best size is in context.
E.g. if b and c are fed by comparisons of ints, a, b and c will become
the appropriate size for an int comparison.  For most targets this means
that a, b and c will end up as int-sized themselves, but on targets like
SVE and AVX512 with packed vector booleans, they'll instead become a
small bitfield like :1, padded to a byte for memory purposes.
The SLP code would then take these scalar types and try to calculate
the vector type for them, causing the unroll factor to be much higher
than necessary.

This patch makes SLP use the cached vector boolean type if that's
appropriate.  Tested on aarch64-linux-gnu (with and without SVE),
aarch64_be-none-elf and x86_64-linux-gnu.  OK to install?

Richard


2018-05-09  Richard Sandiford  

gcc/
* tree-vect-slp.c (get_vectype_for_smallest_scalar_type): New function.
(vect_build_slp_tree_1): Use it when calculating the unroll factor.

gcc/testsuite/
* gcc.target/aarch64/sve/vcond_10.c: New test.
* gcc.target/aarch64/sve/vcond_10_run.c: Likewise.
* gcc.target/aarch64/sve/vcond_11.c: Likewise.
* gcc.target/aarch64/sve/vcond_11_run.c: Likewise.

Index: gcc/tree-vect-slp.c
===
--- gcc/tree-vect-slp.c 2018-05-08 09:42:03.526648115 +0100
+++ gcc/tree-vect-slp.c 2018-05-09 11:30:41.061096063 +0100
@@ -608,6 +608,41 @@ vect_record_max_nunits (vec_info *vinfo,
   return true;
 }
 
+/* Return the vector type associated with the smallest scalar type in STMT.  */
+
+static tree
+get_vectype_for_smallest_scalar_type (gimple *stmt)
+{
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  if (vectype != NULL_TREE
+  && VECTOR_BOOLEAN_TYPE_P (vectype))
+{
+  /* The result of a vector boolean operation has the smallest scalar
+type unless the statement is extending an even narrower boolean.  */
+  if (!gimple_assign_cast_p (stmt))
+   return vectype;
+
+  tree src = gimple_assign_rhs1 (stmt);
+  gimple *def_stmt;
+  enum vect_def_type dt;
+  tree src_vectype = NULL_TREE;
+  if (vect_is_simple_use (src, stmt_info->vinfo, _stmt, ,
+ _vectype)
+ && src_vectype
+ && VECTOR_BOOLEAN_TYPE_P (src_vectype))
+   {
+ if (TYPE_PRECISION (TREE_TYPE (src_vectype))
+ < TYPE_PRECISION (TREE_TYPE (vectype)))
+   return src_vectype;
+ return vectype;
+   }
+}
+  HOST_WIDE_INT dummy;
+  tree scalar_type = vect_get_smallest_scalar_type (stmt, , );
+  return get_vectype_for_scalar_type (scalar_type);
+}
+
 /* Verify if the scalar stmts STMTS are isomorphic, require data
permutation or are of unsupported types of operation.  Return
true if they are, otherwise return false and indicate in *MATCHES
@@ -636,12 +671,11 @@ vect_build_slp_tree_1 (vec_info *vinfo,
   enum tree_code first_cond_code = ERROR_MARK;
   tree lhs;
   bool need_same_oprnds = false;
-  tree vectype = NULL_TREE, scalar_type, first_op1 = NULL_TREE;
+  tree vectype = NULL_TREE, first_op1 = NULL_TREE;
   optab optab;
   int icode;
   machine_mode optab_op2_mode;
   machine_mode vec_mode;
-  HOST_WIDE_INT dummy;
   gimple *first_load = NULL, *prev_first_load = NULL;
 
   /* For every stmt in NODE find its def stmt/s.  */
@@ -685,15 +719,14 @@ vect_build_slp_tree_1 (vec_info *vinfo,
  return false;
}
 
-  scalar_type = vect_get_smallest_scalar_type (stmt, , );
-  vectype = get_vectype_for_scalar_type (scalar_type);
+  vectype = get_vectype_for_smallest_scalar_type (stmt);
   if (!vect_record_max_nunits (vinfo, stmt, group_size, vectype,
   max_nunits))
{
  /* Fatal mismatch.  */
  matches[0] = false;
-  return false;
-}
+ return false;
+   }
 
   if (gcall *call_stmt = dyn_cast  (stmt))
{
Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_10.c
===
--- /dev/null   2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/vcond_10.c 2018-05-09 
11:30:41.057096221 +0100
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */
+
+#include 
+
+#define DEF_LOOP(TYPE) \
+  void __attribute__ ((noinline, noclone))  

Re: Incremental LTO linking part 5: symtab and compilation driver support

2018-05-09 Thread Richard Biener
On Tue, 8 May 2018, Jan Hubicka wrote:

> Hi,
> this patch adds the symtab support for LTO incremental linking. Most of the
> code path is same for both modes of incremental link except hat we want to
> produce LTO object file rather than compile down to assembly.
> 
> Only non-obvious changes are in ipa.c where I hit a bug where we stream in 
> initializers that are going to be eliminated form the symbol table for no
> good reasons.
> 
> Bootstrapped/regtested x86_64-linux with rest of the incremental link 
> patchset.
> 
> Honza
> 
>   * passes.c (ipa_write_summaries): Only modify statements if body
>   is in memory.
>   * cgraphunit.c (ipa_passes): Also produce intermeidate code when
>   incrementally linking.
>   (ipa_passes): Likewise.
>   * lto-cgraph.c (lto_output_node): When incrementally linking do not
>   pass down resolution info.
>   * common.opt (flag_incremental_link): Update info.
>   * gcc.c (plugin specs): Turn flinker-output=* to
>   -plugin-opt=-linker-output-known
>   * toplev.c (compile_file): Also cut compilation when doing incremental
>   link.
>   * flag-types. (enum lto_partition_model): Add
>   LTO_LINKER_OUTPUT_NOLTOREL.
>   (invoke.texi): Add -flinker-output docs.
>   * ipa.c (symbol_table::remove_unreachable_nodes): Handle LTO incremental
>   link same way as WPA; do not stream in dead initializers.
> 
>   * lang.opt (lto_linker_output): Add nolto-rel.
>   * lto-lang.c (lto_post_options): Handle LTO_LINKER_OUTPUT_REL
>   and LTO_LINKER_OUTPUT_NOLTOREL.
>   (lto_init): Generate lto when doing incremental link.
>   * lto.c (lto_precess_name): Add lto1-inclink.
> Index: cgraphunit.c
> ===
> --- cgraphunit.c  (revision 260042)
> +++ cgraphunit.c  (working copy)
> @@ -2452,8 +2452,10 @@
>if (flag_generate_lto || flag_generate_offload)
>  targetm.asm_out.lto_start ();
>  
> -  if (!in_lto_p)
> +  if (!in_lto_p || flag_incremental_link == 2)

Can we have an enum for flag_incremental_link pretty please?

Can't we arrange flag_wpa to be set for this and or merge the
various flags into a more intelligent enum?

>  {
> +  if (!quiet_flag)
> + fprintf (stderr, "Streaming LTO\n");
>if (g->have_offload)
>   {
> section_name_prefix = OFFLOAD_SECTION_NAME_PREFIX;
> @@ -2472,7 +2474,9 @@
>if (flag_generate_lto || flag_generate_offload)
>  targetm.asm_out.lto_end ();
>  
> -  if (!flag_ltrans && (in_lto_p || !flag_lto || flag_fat_lto_objects))
> +  if (!flag_ltrans
> +  && ((in_lto_p && flag_incremental_link != 2)
> +   || !flag_lto || flag_fat_lto_objects))
>  execute_ipa_pass_list (passes->all_regular_ipa_passes);
>invoke_plugin_callbacks (PLUGIN_ALL_IPA_PASSES_END, NULL);
>  
> @@ -2559,7 +2563,8 @@
>  
>/* Do nothing else if any IPA pass found errors or if we are just 
> streaming LTO.  */
>if (seen_error ()
> -  || (!in_lto_p && flag_lto && !flag_fat_lto_objects))
> +  || ((!in_lto_p || flag_incremental_link == 2)
> +   && flag_lto && !flag_fat_lto_objects))
>  {
>timevar_pop (TV_CGRAPHOPT);
>return;
> Index: common.opt
> ===
> --- common.opt(revision 260042)
> +++ common.opt(working copy)
> @@ -48,7 +48,8 @@
>  
>  ; This variable is set to non-0 only by LTO front-end.  1 indicates that
>  ; the output produced will be used for incrmeental linking (thus weak symbols
> -; can still be bound).
> +; can still be bound) and 2 indicates that the IL is going to be linked and
> +; and output to LTO object file.
>  Variable
>  int flag_incremental_link = 0
>  
> Index: flag-types.h
> ===
> --- flag-types.h  (revision 260042)
> +++ flag-types.h  (working copy)
> @@ -289,6 +289,7 @@
>  enum lto_linker_output {
>LTO_LINKER_OUTPUT_UNKNOWN,
>LTO_LINKER_OUTPUT_REL,
> +  LTO_LINKER_OUTPUT_NOLTOREL,
>LTO_LINKER_OUTPUT_DYN,
>LTO_LINKER_OUTPUT_PIE,
>LTO_LINKER_OUTPUT_EXEC
> Index: gcc.c
> ===
> --- gcc.c (revision 260042)
> +++ gcc.c (working copy)
> @@ -961,6 +961,7 @@
>  -plugin %(linker_plugin_file) \
>  -plugin-opt=%(lto_wrapper) \
>  -plugin-opt=-fresolution=%u.res \
> +%{flinker-output=*:-plugin-opt=-linker-output-known} \
>  
> %{!nostdlib:%{!nodefaultlibs:%:pass-through-libs(%(link_gcc_c_sequence))}} \
>  }" PLUGIN_COND_CLOSE
>  #else
> Index: ipa.c
> ===
> --- ipa.c (revision 260042)
> +++ ipa.c (working copy)
> @@ -130,9 +130,9 @@
>constant folding.  Keep references alive so partitioning
>knows about potential references.  */
> || (VAR_P 

Re: Incremental LTO linking part 4: lto-opts support

2018-05-09 Thread Richard Biener
On Tue, 8 May 2018, Jan Hubicka wrote:

> Hi,
> this patch prevents lto-option to store some flags that does not make snese 
> to store,
> in partiuclar dumpdir and -fresolution. These definitly should not be 
> preserved from
> compile time to link time and in case of incremental linking they caused 
> trouble with
> wrong resolution file being used in some cases.
> 
> I guess this is just tip of iceberg - I think we should switch to 
> whitelisting options that needs saving rather than saving everything 
> with few exceptions. This is however a separate issue.

We probably should strip all CL_OPTIMIZATION and CL_TARGET options (if
the target supports streaming target options and the option is saved,
unfortunately we don't record a CL_SAVED flag).

We should think about _not_ dropping diagnostic options and/or
putting those into the optimization nodes.

> Bootstrapped/regtested x86_64-linux, OK?
>   * lto-opts.c (lto_write_options): Skip OPT_dumpdir, OPT_fresolution_.
> Index: lto-opts.c
> ===
> --- lto-opts.c(revision 260042)
> +++ lto-opts.c(working copy)
> @@ -109,6 +109,8 @@
>   case OPT_SPECIAL_ignore:
>   case OPT_SPECIAL_program_name:
>   case OPT_SPECIAL_input_file:
> + case OPT_dumpdir:
> + case OPT_fresolution_:
> continue;

OK.

Richard.


Re: [PATCH] Fix PR c++/85400

2018-05-09 Thread Eric Botcazou
> So it isn't clear to me if a cxx_make_decl_one_only is the way to go.  Maybe
> doing the recalculation in comdat_linkage and maybe_make_one_only only
> would be sufficient.

Patch to that effect attached, tested on x86-64/Linux, OK for mainline?


2018-05-09  Eric Botcazou  

cp/
PR c++/85400
* decl2.c (adjust_var_decl_tls_model): New static function.
(comdat_linkage): Call it on a variable.
(maybe_make_one_only): Likewise.

c-family/
* c-attribs.c (handle_visibility_attribute): Do not set no_add_attrs.

-- 
Eric BotcazouIndex: cp/decl2.c
===
--- cp/decl2.c	(revision 259642)
+++ cp/decl2.c	(working copy)
@@ -1838,6 +1838,17 @@ mark_vtable_entries (tree decl)
 }
 }
 
+/* Adjust the TLS model on variable DECL if need be, typically after
+   the linkage of DECL has been modified.  */
+
+static void
+adjust_var_decl_tls_model (tree decl)
+{
+  if (CP_DECL_THREAD_LOCAL_P (decl)
+  && !lookup_attribute ("tls_model", DECL_ATTRIBUTES (decl)))
+set_decl_tls_model (decl, decl_default_tls_model (decl));
+}
+
 /* Set DECL up to have the closest approximation of "initialized common"
linkage available.  */
 
@@ -1888,6 +1899,9 @@ comdat_linkage (tree decl)
 
   if (TREE_PUBLIC (decl))
 DECL_COMDAT (decl) = 1;
+
+  if (VAR_P (decl))
+adjust_var_decl_tls_model (decl);
 }
 
 /* For win32 we also want to put explicit instantiations in
@@ -1926,6 +1940,8 @@ maybe_make_one_only (tree decl)
 	  /* Mark it needed so we don't forget to emit it.  */
   node->forced_by_abi = true;
 	  TREE_USED (decl) = 1;
+
+	  adjust_var_decl_tls_model (decl);
 	}
 }
 }
Index: c-family/c-attribs.c
===
--- c-family/c-attribs.c	(revision 259642)
+++ c-family/c-attribs.c	(working copy)
@@ -2299,14 +2299,13 @@ handle_visibility_attribute (tree *node,
 
 static tree
 handle_tls_model_attribute (tree *node, tree name, tree args,
-			int ARG_UNUSED (flags), bool *no_add_attrs)
+			int ARG_UNUSED (flags),
+			bool *ARG_UNUSED (no_add_attrs))
 {
   tree id;
   tree decl = *node;
   enum tls_model kind;
 
-  *no_add_attrs = true;
-
   if (!VAR_P (decl) || !DECL_THREAD_LOCAL_P (decl))
 {
   warning (OPT_Wattributes, "%qE attribute ignored", name);


Re: Incremental LTO linking part 3: lto-wrapper support

2018-05-09 Thread Richard Biener
On Tue, 8 May 2018, Jan Hubicka wrote:

> Hi,
> this patch makes lto-wrapper to look for -flinker-output=rel and in this
> case confiugre lto1 in non-WHOPR mode + disable section renaming.
> 
> Bootstrapped/regtested x86_64-linux with rest of incremental link patchset.
> OK?
> 
>   * lto-wrapper.c (debug_objcopy): Add rename parameter; pass
>   it down to simple_object_copy_lto_debug_sections.
>   (run_gcc): Determine incremental LTO link time and configure
>   lto1 into non-wpa mode, disable renaming of debug sections.
> 
> Index: lto-wrapper.c
> ===
> --- lto-wrapper.c (revision 260042)
> +++ lto-wrapper.c (working copy)
> @@ -966,7 +966,7 @@
> is returned.  Return NULL on error.  */
>  
>  const char *
> -debug_objcopy (const char *infile)
> +debug_objcopy (const char *infile, bool rename)
>  {
>const char *outfile;
>const char *errmsg;
> @@ -1008,7 +1008,7 @@
>  }
>  
>outfile = make_temp_file ("debugobjtem");
> -  errmsg = simple_object_copy_lto_debug_sections (inobj, outfile, );
> +  errmsg = simple_object_copy_lto_debug_sections (inobj, outfile, , 
> rename);
>if (errmsg)
>  {
>unlink_if_ordinary (outfile);
> @@ -1056,6 +1056,7 @@
>bool have_offload = false;
>unsigned lto_argc = 0, ltoobj_argc = 0;
>char **lto_argv, **ltoobj_argv;
> +  bool linker_output_rel = false;
>bool skip_debug = false;
>unsigned n_debugobj;
>  
> @@ -1108,9 +1109,12 @@
> file_offset = (off_t) loffset;
>   }
>fd = open (filename, O_RDONLY | O_BINARY);
> +  /* Linker plugin passes -fresolution and -flinker-output options.  */
>if (fd == -1)
>   {
> lto_argv[lto_argc++] = argv[i];
> +   if (strcmp (argv[i], "-flinker-output=rel") == 0)
> + linker_output_rel = true;
> continue;
>   }

Why do you need this?

>  
> @@ -1175,6 +1179,11 @@
> lto_mode = LTO_MODE_WHOPR;
> break;
>  
> + case OPT_flinker_output_:
> +   linker_output_rel = !strcmp (option->arg, "rel");
> +   break;
> +
> +

And this?  It looks to me either should suffice and if not then
what about conflicting options here?

Otherwise looks ok.

Richard.

>   default:
> break;
>   }
> @@ -1191,6 +1200,9 @@
>fputc ('\n', stderr);
>  }
>  
> +  if (linker_output_rel)
> +no_partition = true;
> +
>if (no_partition)
>  {
>lto_mode = LTO_MODE_LTO;
> @@ -1435,7 +1447,7 @@
>  for (i = 0; i < ltoobj_argc; ++i)
>{
>   const char *tem;
> - if ((tem = debug_objcopy (ltoobj_argv[i])))
> + if ((tem = debug_objcopy (ltoobj_argv[i], !linker_output_rel)))
> {
>   obstack_ptr_grow (_obstack, tem);
>   n_debugobj++;
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [RFA] Incremental LTO linking part 1: simple-object bits

2018-05-09 Thread Richard Biener
On Tue, 8 May 2018, Jan Hubicka wrote:

> Hi,
> for incremental linking of LTO objects we need to copy debug sections from
> source object files into destination without renaming them from .gnu.debuglto
> into the standard debug section (because they will again be LTO debug section
> in the resulting object file).
> 
> I have discussed this with Richard on IRC and I hope it is fine to change the
> API here becuase lto-wrapper is the only user of this function.  I will send
> lto-wrapper support in separate patch.
> 
> I have lto-bootstrapped/regtested the whole incremental linking patchet on
> x86-64-linux with libbackend being incrementaly linked and also experimented
> with extra testcases and tested that debugging works on resulting cc1 binary.
> OK?

Works for me.

Richard.

> Honza
> 
>   * simple-object.h (simple_object_copy_lto_debug_sections): Add rename
>   parameter.
>   * simple-object.c (handle_lto_debug_sections): Add rename parameter.
>   (handle_lto_debug_sections_rename): New function.
>   (handle_lto_debug_sections_norename): New function.
>   (simple_object_copy_lto_debug_sections): Add rename parameter.
> Index: include/simple-object.h
> ===
> --- include/simple-object.h   (revision 260042)
> +++ include/simple-object.h   (working copy)
> @@ -198,12 +198,15 @@
>  simple_object_release_write (simple_object_write *);
>  
>  /* Copy LTO debug sections from SRC_OBJECT to DEST.
> +   If RENAME is true, rename LTO debug section into debug section (i.e.
> +   when producing final binary) and if it is false, keep the sections with
> +   original names (when incrementally linking).
> If an error occurs, return the errno value in ERR and an error string.  */
>  
>  extern const char *
>  simple_object_copy_lto_debug_sections (simple_object_read *src_object,
>  const char *dest,
> -int *err);
> +int *err, int rename);
>  
>  #ifdef __cplusplus
>  }
> Index: libiberty/simple-object.c
> ===
> --- libiberty/simple-object.c (revision 260042)
> +++ libiberty/simple-object.c (working copy)
> @@ -251,12 +251,15 @@
>  }
>  
>  /* Callback to identify and rename LTO debug sections by name.
> -   Returns 1 if NAME is a LTO debug section, 0 if not.  */
> +   Returns non-NULL if NAME is a LTO debug section, NULL if not.
> +   If RENAME is true it will rename LTO debug sections to non-LTO
> +   ones.  */
>  
>  static char *
> -handle_lto_debug_sections (const char *name)
> +handle_lto_debug_sections (const char *name, int rename)
>  {
> -  char *newname = XCNEWVEC (char, strlen (name) + 1);
> +  char *newname = rename ? XCNEWVEC (char, strlen (name) + 1)
> +  : xstrdup (name);
>  
>/* ???  So we can't use .gnu.lto_ prefixed sections as the assembler
>   complains about bogus section flags.  Which means we need to arrange
> @@ -265,12 +268,14 @@
>/* Also include corresponding reloc sections.  */
>if (strncmp (name, ".rela", sizeof (".rela") - 1) == 0)
>  {
> -  strncpy (newname, name, sizeof (".rela") - 1);
> +  if (rename)
> +strncpy (newname, name, sizeof (".rela") - 1);
>name += sizeof (".rela") - 1;
>  }
>else if (strncmp (name, ".rel", sizeof (".rel") - 1) == 0)
>  {
> -  strncpy (newname, name, sizeof (".rel") - 1);
> +  if (rename)
> +strncpy (newname, name, sizeof (".rel") - 1);
>name += sizeof (".rel") - 1;
>  }
>/* ???  For now this handles both .gnu.lto_ and .gnu.debuglto_ prefixed
> @@ -277,10 +282,10 @@
>   sections.  */
>/* Copy LTO debug sections and rename them to their non-LTO name.  */
>if (strncmp (name, ".gnu.debuglto_", sizeof (".gnu.debuglto_") - 1) == 0)
> -return strcat (newname, name + sizeof (".gnu.debuglto_") - 1);
> +return rename ? strcat (newname, name + sizeof (".gnu.debuglto_") - 1) : 
> newname;
>else if (strncmp (name, ".gnu.lto_.debug_",
>   sizeof (".gnu.lto_.debug_") -1) == 0)
> -return strcat (newname, name + sizeof (".gnu.lto_") - 1);
> +return rename ? strcat (newname, name + sizeof (".gnu.lto_") - 1) : 
> newname;
>/* Copy over .note.GNU-stack section under the same name if present.  */
>else if (strcmp (name, ".note.GNU-stack") == 0)
>  return strcpy (newname, name);
> @@ -289,14 +294,31 @@
>   COMDAT sections in objects produced by GCC.  */
>else if (strcmp (name, ".comment") == 0)
>  return strcpy (newname, name);
> +  free (newname);
>return NULL;
>  }
>  
> +/* Wrapper for handle_lto_debug_sections.  */
> +
> +static char *
> +handle_lto_debug_sections_rename (const char *name)
> +{
> +  return handle_lto_debug_sections (name, 1);
> +}
> +
> +/* Wrapper for handle_lto_debug_sections.  */
> +
> +static char *
> 

Re: [gomp5] simd if/nontemporal clauses parsing and cancel if modifier

2018-05-09 Thread Richard Biener
On Fri, May 4, 2018 at 8:37 PM, Jakub Jelinek  wrote:
> Hi!
>
> This patch adds parsing of if and nontemporal clauses for simd construct
> and also adds parsing of (optional) cancel modifier for if clause on cancel
> directive.
>
> While nontemporal clause is just an optimization (we still want to use
> non-temporal stores (or even loads?) for those vars, what is the best way to
> do that?)

On GIMPLE we have gimple_assign_set_nontemporal_move so you could mark
all memory references involving those vars accordingly.  Not so easy if aliases
are involved I guess.  You'd need to apply this during lowering or
gimplification
then.  If you add some similar modifiers to GENERIC you could do it during
parsing.

> , simd if is not an optimization, if the expression evaluates to
> false at runtime, then we should just not vectorize; so probably we want to
> preserve it in some form until vectorization and include this condition next
> to where we emit checks for runtime aliasing or alignment etc.  Thoughts on
> how to do that?

Easiest would be to apply the versioning during OMP lowering/expansion.
You can then mark the 'else' loop as dont_vectorize.  Yes, this means we'll
get redundant versioning if the to be vectorized variant needs
versioning as well.
But we don't have any good infrastructure to do it in another way
(yet another IFN aka __builtin_vectorize_if () that the vectorizer could rely
on providing a scalar fallback?)

Richard.

> 2018-05-04  Jakub Jelinek  
>
> * tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_NONTEMPORAL.
> * tree.c (omp_clause_num_ops, omp_clause_code_name): Add nontemporal
> clause entries.
> (walk_tree_1): Handle OMP_CLAUSE_NONTEMPORAL.
> * gimplify.c (enum gimplify_omp_var_data): Add GOVD_NONTEMPORAL.
> (gimplify_scan_omp_clauses): Handle cancel and simd
> OMP_CLAUSE_IF_MODIFIERs.  Handle OMP_CLAUSE_NONTEMPORAL.
> (gimplify_adjust_omp_clauses_1): Ignore GOVD_NONTEMPORAL.
> (gimplify_adjust_omp_clauses): Handle OMP_CLAUSE_NONTEMPORAL.
> * omp-grid.c (grid_eliminate_combined_simd_part): Formatting fix.
> Fix comment typos.
> * tree-nested.c (convert_local_omp_clauses): Handle
> OMP_CLAUSE_NONTEMPORAL.
> (convert_nonlocal_omp_clauses): Likewise.  Remove useless test.
> * tree-pretty-print.c (dump_omp_clause): Handle 
> OMP_CLAUSE_NONTEMPORAL.
> Handle cancel and simd OMP_CLAUSE_IF_MODIFIERs.
> * omp-low.c (scan_sharing_clauses): Handle OMP_CLAUSE_NONTEMPORAL.
> gcc/c-family/
> * c-omp.c (c_omp_split_clauses): Handle OMP_CLAUSE_NONTEMPORAL.  
> Handle
> splitting OMP_CLAUSE_IF also to OMP_SIMD.
> * c-pragma.h (enum pragma_omp_clause): Add
> PRAGMA_OMP_CLAUSE_NONTEMPORAL.
> gcc/c/
> * c-parser.c (c_parser_omp_clause_name): Handle nontemporal clause.
> (c_parser_omp_clause_if): Handle cancel and simd modifiers.
> (c_parser_omp_clause_nontemporal): New function.
> (c_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_NONTEMPORAL.
> (OMP_SIMD_CLAUSE_MASK): Add if and nontemporal clauses.
> * c-typeck.c (c_finish_omp_cancel): Diagnose if clause with modifier
> other than cancel.
> (c_finish_omp_clauses): Handle OMP_CLAUSE_NONTEMPORAL.
> gcc/cp/
> * parser.c (cp_parser_omp_clause_name): Handle nontemporal clause.
> (cp_parser_omp_clause_if): Handle cancel and simd modifiers.
> (cp_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_NONTEMPORAL.
> (OMP_SIMD_CLAUSE_MASK): Add if and nontemporal clauses.
> * semantics.c (finish_omp_clauses): Diagnose if clause with modifier
> other than cancel.
> (finish_omp_cancel): Handle OMP_CLAUSE_NONTEMPORAL.
> * pt.c (tsubst_omp_clauses): Likewise.
> gcc/testsuite/
> * c-c++-common/gomp/if-1.c (foo): Add some further tests.
> * c-c++-common/gomp/if-2.c (foo): Likewise.  Expect slightly different
> diagnostics wording in one case.
> * c-c++-common/gomp/if-3.c: New test.
> * c-c++-common/gomp/nontemporal-1.c: New test.
> libgomp/
> * testsuite/libgomp.c/cancel-for-2.c (foo): Use cancel modifier
> in some cases.
>
> --- gcc/tree-core.h.jj  2018-05-02 17:29:55.902260817 +0200
> +++ gcc/tree-core.h 2018-05-04 15:13:22.384614499 +0200
> @@ -293,6 +293,9 @@ enum omp_clause_code {
>/* OpenMP clause: depend ({in,out,inout}:variable-list).  */
>OMP_CLAUSE_DEPEND,
>
> +  /* OpenMP clause: nontemporal (variable-list).  */
> +  OMP_CLAUSE_NONTEMPORAL,
> +
>/* OpenMP clause: uniform (argument-list).  */
>OMP_CLAUSE_UNIFORM,
>
> --- gcc/tree.c.jj   2018-04-30 13:49:44.692824652 +0200
> +++ gcc/tree.c  2018-05-04 19:08:55.309273302 +0200
> @@ -289,6 +289,7 @@ unsigned const char omp_clause_num_ops[]
>3, /* OMP_CLAUSE_LINEAR  */
>2, /* OMP_CLAUSE_ALIGNED  */
>  

Re: Add clobbers around IFN_LOAD/STORE_LANES

2018-05-09 Thread Richard Biener
On Tue, May 8, 2018 at 5:56 PM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Tue, May 8, 2018 at 3:25 PM, Richard Sandiford
>>  wrote:
>>> We build up the input to IFN_STORE_LANES one vector at a time.
>>> In RTL, each of these vector assignments becomes a write to
>>> subregs of the form (subreg:VEC (reg:AGGR R)), where R is the
>>> eventual input to the store lanes instruction.  The problem is
>>> that RTL isn't very good at tracking liveness when things are
>>> initialised piecemeal by subregs, so R tends to end up being
>>> live on all paths from the entry block to the store.  This in
>>> turn leads to unnecessary spilling around calls, as well as to
>>> excess register pressure in vector loops.
>>>
>>> This patch adds gimple clobbers to indicate the liveness of the
>>> IFN_STORE_LANES variable and makes sure that gimple clobbers are
>>> expanded to rtl clobbers where useful.  For consistency it also
>>> uses clobbers to mark the point at which an IFN_LOAD_LANES
>>> variable is no longer needed.
>>>
>>> Tested on aarch64-linux-gnu (with and without SVE), aaarch64_be-elf
>>> and x86_64-linux-gnu.  OK to install?
>>
>> Minor comment inline.
>
> Thanks, fixed.
>
>> Also it looks like clobbers are at the moment all thrown away during
>> RTL expansion?  Do the clobbers we generate with this patch eventually
>> get collected somehow if they turn out to be no longer necessary?
>> How many of them do we generate?  I expect not many decls get
>> expanded to registers and if they are most of them are likely
>> not constructed piecemail  - thus, maybe we should restrict ourselves
>> to non-scalar typed lhs?  So, change it to
>>
>>   if (DECL_P (lhs)
>>   && (AGGREGATE_TYPE_P (TREE_TYPE (lhs)) // but what about
>> single-element aggregates?
>>  || VECTOR_TYPE_P (TREE_TYPE (lhs))
>>  || COMPLEX_TYPE_P (TREE_TYPE (lhs
>
> How about instead deciding based on whether the pseudo register spans a
> single hard register or multiple hard registers, as per the patch below?
> The clobber is only useful if the pseudo register can be partially
> modified via subregs.
>
> This could potentially also help with any large single-element
> aggregrates that get broken down into word-sized subreg ops.

Yeah, that looks even better.

>> The vectorizer part is ok with the minor adjustment pointed out below.  Maybe
>> you want to split this patch while we discuss the RTL bits.
>
> OK, thanks.  I'll keep it as one patch for applying purposes,
> but snipped the approved bits below.

This version is ok.

Thanks,
Richard.

> Richard
>
>
> 2018-05-08  Richard Sandiford  
>
> gcc/
> * cfgexpand.c (expand_clobber): New function.
> (expand_gimple_stmt_1): Use it.
>
> Index: gcc/cfgexpand.c
> ===
> --- gcc/cfgexpand.c 2018-05-08 16:50:31.815501502 +0100
> +++ gcc/cfgexpand.c 2018-05-08 16:50:31.997495804 +0100
> @@ -3582,6 +3582,26 @@ expand_return (tree retval, tree bounds)
>  }
>  }
>
> +/* Expand a clobber of LHS.  If LHS is stored it in a multi-part
> +   register, tell the rtl optimizers that its value is no longer
> +   needed.  */
> +
> +static void
> +expand_clobber (tree lhs)
> +{
> +  if (DECL_P (lhs))
> +{
> +  rtx decl_rtl = DECL_RTL_IF_SET (lhs);
> +  if (decl_rtl && REG_P (decl_rtl))
> +   {
> + machine_mode decl_mode = GET_MODE (decl_rtl);
> + if (maybe_gt (GET_MODE_SIZE (decl_mode),
> +   REGMODE_NATURAL_SIZE (decl_mode)))
> +   emit_clobber (decl_rtl);
> +   }
> +}
> +}
> +
>  /* A subroutine of expand_gimple_stmt, expanding one gimple statement
> STMT that doesn't require special handling for outgoing edges.  That
> is no tailcalls and no GIMPLE_COND.  */
> @@ -3687,7 +3707,7 @@ expand_gimple_stmt_1 (gimple *stmt)
> if (TREE_CLOBBER_P (rhs))
>   /* This is a clobber to mark the going out of scope for
>  this LHS.  */
> - ;
> + expand_clobber (lhs);
> else
>   expand_assignment (lhs, rhs,
>  gimple_assign_nontemporal_move_p (


Re: [libstdc++, PATCH] PR libstdc++/83140 - assoc_legendre returns negated value when m is odd.

2018-05-09 Thread Jonathan Wakely

On 07/05/18 12:39 -0400, Ed Smith-Rowland wrote:

All,

We were using a different convention for P_l^m assoc_legendre(int l, 
int m, FloatTp x)


 - the so-called Condon-Shortley convention which includes (-1)^m.  
This unfortunately is common.


This factor is taken out to match the standard.  The underlying 
__detail code has an arg that allows you to flip this


- mostly to highlight the subtle difference.

The related sph_legendre is unaffected by this (our impl and the 
standard include the C-S phase).


OK for trunk and branches?

Ed






2018-05-07  Edward Smith-Rowland  <3dw...@verizon.net>

PR libstdc++/83140 - assoc_legendre returns negated value when m is odd
* include/tr1/legendre_function.tcc (__assoc_legendre_p): Add __phase
argument defaulted to +1.  Doxy comments on same.
* testsuite/special_functions/02_assoc_legendre/
check_assoc_legendre.cc: Regen.
* testsuite/tr1/5_numerical_facilities/special_functions/
02_assoc_legendre/check_tr1_assoc_legendre.cc: Regen.




Index: include/tr1/legendre_function.tcc
===
--- include/tr1/legendre_function.tcc   (revision 259973)
+++ include/tr1/legendre_function.tcc   (working copy)
@@ -65,7 +65,7 @@
  namespace __detail
  {
/**
- *   @brief  Return the Legendre polynomial by recursion on order
+ *   @brief  Return the Legendre polynomial by recursion on degree
 *   @f$ l @f$.
 *
 *   The Legendre function of @f$ l @f$ and @f$ x @f$,
@@ -74,7 +74,7 @@
 * P_l(x) = \frac{1}{2^l l!}\frac{d^l}{dx^l}(x^2 - 1)^{l}
 *   @f]
 *
- *   @param  l  The order of the Legendre polynomial.  @f$l >= 0@f$.
+ *   @param  l  The degree of the Legendre polynomial.  @f$l >= 0@f$.
 *   @param  x  The argument of the Legendre polynomial.  @f$|x| <= 1@f$.
 */
template
@@ -127,16 +127,19 @@
 * P_l^m(x) = (1 - x^2)^{m/2}\frac{d^m}{dx^m}P_l(x)
 *   @f]
 *
- *   @param  l  The order of the associated Legendre function.
+ *   @param  l  The degree of the associated Legendre function.
 *  @f$ l >= 0 @f$.
 *   @param  m  The order of the associated Legendre function.
 *  @f$ m <= l @f$.
 *   @param  x  The argument of the associated Legendre function.
 *  @f$ |x| <= 1 @f$.
+ *   @param  phase  The phase of the associated Legendre function.
+ *  Use -1 for the Condon-Shortley phase convention.
 */
template
_Tp
-__assoc_legendre_p(unsigned int __l, unsigned int __m, _Tp __x)
+__assoc_legendre_p(unsigned int __l, unsigned int __m, _Tp __x,
+  _Tp __phase = _Tp{+1})


This list-init isn't valid for C++98 i.e. when used via .
GCC seems to allow it, but Clang won't.

We could consider dropping the TR1 support, and just provide these
functions for ISO/IEC 29124:2010 in C++11 (or later) and for C++17.
But that decision should be taken separately, and should only happen
on trunk anyway so we need to use _Tp(+1) here.

OK for trunk with _Tp(+1) instead of _Tp{+1}.

Do we want to change the result of these functions on the branches?
How likely is it that changing it will affect somebody's calcuations
in a way that they don't expect from a minor release on a branch?




Re: [Patch] Use two source permute for vector initialization (PR 85692)

2018-05-09 Thread Jakub Jelinek
On Tue, May 08, 2018 at 01:25:35PM +0200, Allan Sandfeld Jensen wrote:
> 2018-05-08 Allan Sandfeld Jensen 

2 spaces between date and name and two spaces between name and email
address.

> gcc/
> 
> PR tree-optimization/85692
> * tree-ssa-forwprop.c (simplify_vector_constructor): Try two
> source permute as well.
> 
> gcc/testsuite
> 
> * gcc.target/i386/pr85692.c: Test two simply constructions are
> detected as permute instructions.

Just
* gcc.target/i386/pr85692.c: New test.
> 
> diff --git a/gcc/testsuite/gcc.target/i386/pr85692.c 
> b/gcc/testsuite/gcc.target/i386/pr85692.c
> new file mode 100644
> index 000..322c1050161
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr85692.c
> @@ -0,0 +1,18 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -msse4.1" } */
> +/* { dg-final { scan-assembler "unpcklps" } } */
> +/* { dg-final { scan-assembler "blendps" } } */
> +/* { dg-final { scan-assembler-not "shufps" } } */
> +/* { dg-final { scan-assembler-not "unpckhps" } } */
> +
> +typedef float v4sf __attribute__ ((vector_size (16)));
> +
> +v4sf unpcklps(v4sf a, v4sf b)
> +{
> +return v4sf{a[0],b[0],a[1],b[1]};

Though, not really sure if this has been tested at all.
The above is valid only in C++ (and only C++11 and above), while the
test is compiled as C and thus has to fail.

In C one should use e.g.
return (v4sf){a[0],b[0],a[1],b[1]};
instead (i.e. a compound literal).

> @@ -2022,8 +2022,9 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
>elem_type = TREE_TYPE (type);
>elem_size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
>  
> -  vec_perm_builder sel (nelts, nelts, 1);
> -  orig = NULL;
> +  vec_perm_builder sel (nelts, 2, nelts);

Why this change?  I admit the vec_parm_builder arguments are confusing, but
I think the second times third is the number of how many indices are being
pushed into the vector, so I think (nelts, nelts, 1) is right.

> @@ -2063,10 +2064,26 @@ simplify_vector_constructor (gimple_stmt_iterator 
> *gsi)
>   return false;
>op1 = gimple_assign_rhs1 (def_stmt);
>ref = TREE_OPERAND (op1, 0);
> -  if (orig)
> +  if (orig1)
>   {
> -   if (ref != orig)
> - return false;
> +   if (ref == orig1 || orig2)
> + {
> +   if (ref != orig1 && ref != orig2)
> + return false;
> + }
> +   else
> + {
> +   if (TREE_CODE (ref) != SSA_NAME)
> + return false;
> +   if (! VECTOR_TYPE_P (TREE_TYPE (ref))
> +   || ! useless_type_conversion_p (TREE_TYPE (op1),
> +   TREE_TYPE (TREE_TYPE (ref
> + return false;
> +   if (TREE_TYPE (orig1) != TREE_TYPE (ref))
> + return false;

I think even different type is acceptable here, as long as its conversion to
orig1's type is useless.

Furthermore, I think the way you wrote the patch with 2 variables rather
than an array of 2 elements means too much duplication, this else block
is a duplication of the else block below.  See the patch I've added to the
PR (and sorry for missing your patch first, the PR wasn't ASSIGNED and there
was no link to gcc-patches for it).

> @@ -2125,15 +2147,14 @@ simplify_vector_constructor (gimple_stmt_iterator 
> *gsi)
>   return false;
>op2 = vec_perm_indices_to_tree (mask_type, indices);
>if (conv_code == ERROR_MARK)
> - gimple_assign_set_rhs_with_ops (gsi, VEC_PERM_EXPR, orig, orig, op2);
> + gimple_assign_set_rhs_with_ops (gsi, VEC_PERM_EXPR, orig1, orig2, op2);
>else
>   {
> gimple *perm
> - = gimple_build_assign (make_ssa_name (TREE_TYPE (orig)),
> -VEC_PERM_EXPR, orig, orig, op2);
> -   orig = gimple_assign_lhs (perm);
> + = gimple_build_assign (make_ssa_name (TREE_TYPE (orig1)),
> +VEC_PERM_EXPR, orig1, orig2, op2);
> gsi_insert_before (gsi, perm, GSI_SAME_STMT);
> -   gimple_assign_set_rhs_with_ops (gsi, conv_code, orig,
> +   gimple_assign_set_rhs_with_ops (gsi, conv_code, gimple_assign_lhs 
> (perm),

Too long line.

Jakub


Re: Debug Mode ENH 3/4: Add backtrace

2018-05-09 Thread Jonathan Wakely

On 08/05/18 16:51 -0700, Ian Lance Taylor via libstdc++ wrote:

On Tue, May 8, 2018 at 12:54 PM, François Dumont  wrote:


I'll go with this version for now but I'll look into libbacktrace.

It will be perhaps the occasion to play with autoconf & al tools to find out
if I can use libbacktrace.


In GCC libgo and libgfortran already use libbacktrace, so there are
good examples to copy.


And if there are any concerns about adding an extra dependency to
libstdcOO.so or increasing the size of libstdc++.a we could
conditionally use libbacktrace for the unoptimized versions of libstdc++
installed into $libdir/debug/ when --enable-libstdcxx-debug is used
(which is orthogonal to the Debug Mode).

That would allow getting nicer backtraces by linking to the debug
libs, without adding an unconditional dependency that is only used by
code compiled with _GLIBCXX_DEBUG.




Re: [C++ PATCH] Fix offsetof constexpr handling (PR c++/85662)

2018-05-09 Thread Jakub Jelinek
On Tue, May 08, 2018 at 11:28:18PM -0400, Jason Merrill wrote:
> Maybe add a type parameter that defaults to size_type_node...
> 
> >
> > --- gcc/c/c-fold.c.jj   2018-01-17 22:00:12.310228253 +0100
> > +++ gcc/c/c-fold.c  2018-05-08 21:52:43.303940175 +0200
> > @@ -473,7 +473,8 @@ c_fully_fold_internal (tree expr, bool i
> >   && (op1 = get_base_address (op0)) != NULL_TREE
> >   && INDIRECT_REF_P (op1)
> >   && TREE_CONSTANT (TREE_OPERAND (op1, 0)))
> > -   ret = fold_convert_loc (loc, TREE_TYPE (expr), fold_offsetof_1 
> > (op0));
> > +   ret = fold_convert_loc (loc, TREE_TYPE (expr),
> > +   fold_offsetof_1 (TREE_TYPE (expr), op0));
> 
> ...and then this can be
> 
>   fold_offsetof (op0, TREE_TYPE (exp0))

Like this then?

2018-05-09  Jakub Jelinek  

PR c++/85662
* c-common.h (fold_offsetof_1): Removed.
(fold_offsetof): Add TYPE argument defaulted to size_type_node and
CTX argument defaulted to ERROR_MARK.
* c-common.c (fold_offsetof_1): Renamed to ...
(fold_offsetof): ... this.  Remove wrapper function.  Add TYPE
argument, if it is not a pointer type, convert the pointer constant
to TYPE and use size_binop with PLUS_EXPR instead of
fold_build_pointer_plus.  Adjust recursive calls.

* c-fold.c (c_fully_fold_internal): Use fold_offsetof rather than
fold_offsetof_1, pass TREE_TYPE (expr) as TYPE to it.
* c-typeck.c (build_unary_op): Use fold_offsetof rather than
fold_offsetof_1, pass argtype as TYPE to it.

* cp-gimplify.c (cp_fold): Use fold_offsetof rather than
fold_offsetof_1, pass TREE_TYPE (x) as TYPE to it.

* g++.dg/ext/offsetof2.C: New test.

--- gcc/c-family/c-common.h.jj  2018-05-06 23:12:49.185619717 +0200
+++ gcc/c-family/c-common.h 2018-05-09 10:28:20.149559265 +0200
@@ -1033,8 +1033,8 @@ extern bool c_dump_tree (void *, tree);
 
 extern void verify_sequence_points (tree);
 
-extern tree fold_offsetof_1 (tree, tree_code ctx = ERROR_MARK);
-extern tree fold_offsetof (tree);
+extern tree fold_offsetof (tree, tree = size_type_node,
+  tree_code ctx = ERROR_MARK);
 
 extern int complete_array_type (tree *, tree, bool);
 
--- gcc/c-family/c-common.c.jj  2018-05-06 23:12:49.135619681 +0200
+++ gcc/c-family/c-common.c 2018-05-09 10:29:34.481650988 +0200
@@ -6168,10 +6168,12 @@ c_common_to_target_charset (HOST_WIDE_IN
 
 /* Fold an offsetof-like expression.  EXPR is a nested sequence of component
references with an INDIRECT_REF of a constant at the bottom; much like the
-   traditional rendering of offsetof as a macro.  Return the folded result.  */
+   traditional rendering of offsetof as a macro.  TYPE is the desired type of
+   the whole expression to which it will be converted afterwards.
+   Return the folded result.  */
 
 tree
-fold_offsetof_1 (tree expr, enum tree_code ctx)
+fold_offsetof (tree expr, tree type, enum tree_code ctx)
 {
   tree base, off, t;
   tree_code code = TREE_CODE (expr);
@@ -6196,10 +6198,12 @@ fold_offsetof_1 (tree expr, enum tree_co
  error ("cannot apply % to a non constant address");
  return error_mark_node;
}
+  if (!POINTER_TYPE_P (type))
+   return convert (type, TREE_OPERAND (expr, 0));
   return TREE_OPERAND (expr, 0);
 
 case COMPONENT_REF:
-  base = fold_offsetof_1 (TREE_OPERAND (expr, 0), code);
+  base = fold_offsetof (TREE_OPERAND (expr, 0), type, code);
   if (base == error_mark_node)
return base;
 
@@ -6216,7 +6220,7 @@ fold_offsetof_1 (tree expr, enum tree_co
   break;
 
 case ARRAY_REF:
-  base = fold_offsetof_1 (TREE_OPERAND (expr, 0), code);
+  base = fold_offsetof (TREE_OPERAND (expr, 0), type, code);
   if (base == error_mark_node)
return base;
 
@@ -6273,23 +6277,16 @@ fold_offsetof_1 (tree expr, enum tree_co
   /* Handle static members of volatile structs.  */
   t = TREE_OPERAND (expr, 1);
   gcc_checking_assert (VAR_P (get_base_address (t)));
-  return fold_offsetof_1 (t);
+  return fold_offsetof (t, type);
 
 default:
   gcc_unreachable ();
 }
 
+  if (!POINTER_TYPE_P (type))
+return size_binop (PLUS_EXPR, base, convert (type, off));
   return fold_build_pointer_plus (base, off);
 }
-
-/* Likewise, but convert it to the return type of offsetof.  */
-
-tree
-fold_offsetof (tree expr)
-{
-  return convert (size_type_node, fold_offsetof_1 (expr));
-}
-
 
 /* *PTYPE is an incomplete array.  Complete it with a domain based on
INITIAL_VALUE.  If INITIAL_VALUE is not present, use 1 if DO_DEFAULT
--- gcc/c/c-fold.c.jj   2018-01-17 22:00:12.310228253 +0100
+++ gcc/c/c-fold.c  2018-05-09 10:30:04.185687645 +0200
@@ -473,7 +473,8 @@ c_fully_fold_internal (tree expr, bool i
  && (op1 = get_base_address (op0)) != NULL_TREE
  && INDIRECT_REF_P (op1)
 

Re: Incremental LTO linking part 2: lto-plugin support

2018-05-09 Thread Jan Hubicka
> On Tue, 8 May 2018, Jan Hubicka wrote:
> 
> > > On Tue, May 8, 2018 at 8:14 AM, Jan Hubicka  wrote:
> > > > Hi,
> > > > with lto, incremental linking can be meaninfuly done in three ways:
> > > >  1) read LTO file and produce non-LTO .o file
> > > > this is current behaviour of gcc -r or ld -r with plugin
> > > >  2) read LTO files and merge section for later LTO
> > > > this is current behaviour of ld -r w/o plugin
> > > >  3) read LTO files into the compiler, link them and produce
> > > > incrementaly linked LTO object.
> > > >
> > > > 3 makes most sense and I am maing it new default for gcc -r. For 
> > > > testing purposes
> > > > and perhaps in order to have tool to turn LTO object into real object, 
> > > > we want
> > > > to have 1) available as well.  GCC currently have -flinker-output 
> > > > option that
> > > > decides between modes that is decided by linker plugin and can be 
> > > > overwritten
> > > > by user (I have forgot to document this).
> > > >
> > > > I am targeting for -flinker-output=rel to be incremental linking into 
> > > > LTO
> > > > and adding -flinker-output=nolto-rel for 1).
> > > >
> > > > The main limitation of 2 and 3 is that you can not link LTO and non-LTO
> > > > object files theger.  For 2 HJ's binutils patchset has support and I 
> > > > think
> > > > it can be extended to handle 3 as well. But with default binutils we 
> > > > want
> > > > to warn users.  This patch implements the warning (and prevents linker 
> > > > plugin
> > > > to add redundat linker-ouptut options.
> > > 
> > > 
> > > My users/hjl/lto-mixed/master branch is quite flexible.  I can extend
> > > it if needed.
> > 
> > I think once the main patchset settles down we could add a way to 
> > communicate
> > to lto-plugin if combined lto+non-lto .o files are supported by linker and 
> > sillence
> > the warning.
> 
> How does the patchset deal with partially linking fat objects?  How

Currently it will turn them into slim LTO merged object. I can add code path
that will optimize them into binary. That will be additional fun because we 
probably
want to WPA them, but it should not be that hard to implement: WPA will produce 
one
object file with merged LTO data that will be passed to linker plus partitions 
that will
be turned to final binary.

> do HJs binutils deal with them when you consider a fat object partially
> linked with a non-LTO object?

HJ?
Honza
> 
> Richard.
> 
> -- 
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)


Re: [patch] Fix PR rtl-optimization/85638

2018-05-09 Thread Eric Botcazou
> 2018-05-07  Eric Botcazou  
> 
>   PR rtl-optimization/85638
>   * bb-reorder.c: Include common/common-target.h.
>   (create_forwarder_block): New function extracted from...
>   (fix_up_crossing_landing_pad): ...here.  Rename into...
>   (dw2_fix_up_crossing_landing_pad): ...this.  Call 
> create_forwarder_block.
>   (sjlj_fix_up_crossing_landing_pad): New function.
>   (find_rarely_executed_basic_blocks_and_crossing_edges): In SJLJ mode, 
> call
>   sjlj_fix_up_crossing_landing_pad if there are incoming EH edges from 
> both
>   partitions and exit the loop after one iteration.

Not much interest so I have self-approved and installed it on both branches.

-- 
Eric Botcazou


Re: Incremental LTO linking part 2: lto-plugin support

2018-05-09 Thread Richard Biener
On Tue, 8 May 2018, Jan Hubicka wrote:

> > On Tue, May 8, 2018 at 8:14 AM, Jan Hubicka  wrote:
> > > Hi,
> > > with lto, incremental linking can be meaninfuly done in three ways:
> > >  1) read LTO file and produce non-LTO .o file
> > > this is current behaviour of gcc -r or ld -r with plugin
> > >  2) read LTO files and merge section for later LTO
> > > this is current behaviour of ld -r w/o plugin
> > >  3) read LTO files into the compiler, link them and produce
> > > incrementaly linked LTO object.
> > >
> > > 3 makes most sense and I am maing it new default for gcc -r. For testing 
> > > purposes
> > > and perhaps in order to have tool to turn LTO object into real object, we 
> > > want
> > > to have 1) available as well.  GCC currently have -flinker-output option 
> > > that
> > > decides between modes that is decided by linker plugin and can be 
> > > overwritten
> > > by user (I have forgot to document this).
> > >
> > > I am targeting for -flinker-output=rel to be incremental linking into LTO
> > > and adding -flinker-output=nolto-rel for 1).
> > >
> > > The main limitation of 2 and 3 is that you can not link LTO and non-LTO
> > > object files theger.  For 2 HJ's binutils patchset has support and I think
> > > it can be extended to handle 3 as well. But with default binutils we want
> > > to warn users.  This patch implements the warning (and prevents linker 
> > > plugin
> > > to add redundat linker-ouptut options.
> > 
> > 
> > My users/hjl/lto-mixed/master branch is quite flexible.  I can extend
> > it if needed.
> 
> I think once the main patchset settles down we could add a way to communicate
> to lto-plugin if combined lto+non-lto .o files are supported by linker and 
> sillence
> the warning.

How does the patchset deal with partially linking fat objects?  How
do HJs binutils deal with them when you consider a fat object partially
linked with a non-LTO object?

Richard.

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH 3/4] shrink-wrap: Improve spread_components (PR85645)

2018-05-09 Thread Eric Botcazou
> Now, neither of the two branches needs to have LR restored at all,
> because both of the branches end up in an infinite loop.
> 
> This patch makes spread_component return a boolean saying if anything
> was changed, and if so, it is called again.  This obviously is finite
> (there is a finite number of basic blocks, each with a finite number
> of components, and spread_components can only assign more components
> to a block, never less).  I also instrumented the code, and on a
> bootstrap+regtest spread_components made changes a maximum of two
> times.  Interestingly though it made changes on two iterations in
> a third of the cases it did anything at all!

I don't know the code much so I don't see why this solves the problem.

> 2018-05-08  Segher Boessenkool  
> 
>   PR rtl-optimization/85645
>   * shrink-wrap.c (spread_components): Return a boolean saying if
>   anything was changed.
>   (try_shrink_wrapping_separate): Iterate spread_components until
>   nothing changes anymore.

OK if you add a comment in try_shrink_wrapping_separate with the rationale.

-- 
Eric Botcazou


Re: [PATCH 2/4] regrename: Don't rename the dest of a REG_CFA_REGISTER (PR85645)

2018-05-09 Thread Eric Botcazou
> 2018-05-08  Segher Boessenkool  
> 
>   PR rtl-optimization/85645
>   * regrename.c (build_def_use): Also kill the chains that include the
>   destination of a REG_CFA_REGISTER note.

OK, thanks.

-- 
Eric Botcazou


Re: [PATCH 1/4] regcprop: Avoid REG_CFA_REGISTER notes (PR85645)

2018-05-09 Thread Eric Botcazou
> 2018-05-08  Segher Boessenkool  
> 
>   PR rtl-optimization/85645
>   *  regcprop.c (copyprop_hardreg_forward_1): Don't propagate into an
>   insn that has a REG_CFA_REGISTER note.

OK, thanks.

-- 
Eric Botcazou