[PATCH] Document atomic fetch and nand

2019-01-08 Thread Sebastian Huber
Copy code example for fetch and nand from "Legacy __sync Built-in
Functions for Atomic Memory Access" to "Built-in Functions for Memory
Model Aware Atomic Operations".

gcc/

* doc/extend.texi (Built-in Functions for Memory Model Aware
Atomic Operations): Document atomic fetch and nand.
---
 gcc/doc/extend.texi | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 7f33be4f29c..f2a61e4c4f8 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -11008,6 +11008,7 @@ they are not scaled by the size of the type to which 
the pointer points.
 
 @smallexample
 @{ *ptr @var{op}= val; return *ptr; @}
+@{ *ptr = ~(*ptr & val); return *ptr; @} // nand
 @end smallexample
 
 The object pointed to by the first argument must be of integer or pointer
@@ -11029,6 +11030,7 @@ the type to which the pointer points.
 
 @smallexample
 @{ tmp = *ptr; *ptr @var{op}= val; return tmp; @}
+@{ tmp = *ptr; *ptr = ~(*ptr & val); return tmp; @} // nand
 @end smallexample
 
 The same constraints on arguments apply as for the corresponding
-- 
2.16.4



Re: Make the vectoriser drop to strided accesses for stores with gaps

2019-01-08 Thread H.J. Lu
On Fri, Jul 20, 2018 at 3:57 AM Richard Sandiford
 wrote:
>
> We could vectorise:
>
>  for (...)
>{
>  a[0] = ...;
>  a[1] = ...;
>  a[2] = ...;
>  a[3] = ...;
>  a += stride;
>}
>
> (including the case when stride == 8) but not:
>
>  for (...)
>{
>  a[0] = ...;
>  a[1] = ...;
>  a[2] = ...;
>  a[3] = ...;
>  a += 8;
>}
>
> (where the stride is always 8).  The former was treated as a "grouped
> and strided" store, while the latter was treated as grouped store with
> gaps, which we don't support.
>
> This patch makes us treat groups of stores with gaps at the end as
> strided groups too.  I tried to go through all uses of STMT_VINFO_STRIDED_P
> and all vector uses of DR_STEP to see whether there were any hard-baked
> assumptions, but couldn't see any.  I wondered whether we should relax:
>
>   /* We do not have to consider dependences between accesses that belong
>  to the same group, unless the stride could be smaller than the
>  group size.  */
>   if (DR_GROUP_FIRST_ELEMENT (stmtinfo_a)
>   && (DR_GROUP_FIRST_ELEMENT (stmtinfo_a)
>   == DR_GROUP_FIRST_ELEMENT (stmtinfo_b))
>   && !STMT_VINFO_STRIDED_P (stmtinfo_a))
> return false;
>
> for cases in which the step is constant and the absolute step is known
> to be greater than the group size, but data dependence analysis should
> already return chrec_known for those cases.
>
> The new test is a version of vect-avg-15.c with the variable step
> replaced by a constant one.
>
> A natural follow-on would be to do the same for groups with gaps in
> the middle:
>
>   /* Check that the distance between two accesses is equal to the type
>  size. Otherwise, we have gaps.  */
>   diff = (TREE_INT_CST_LOW (DR_INIT (data_ref))
>   - TREE_INT_CST_LOW (prev_init)) / type_size;
>   if (diff != 1)
> {
>   [...]
>   if (DR_IS_WRITE (data_ref))
> {
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>  "interleaved store with gaps\n");
>   return false;
> }
>
> But I think we should do that separately and see what the fallout
> from this change is first.
>
> Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf
> and x86_64-linux-gnu.  OK to install?
>
> Richard
>
>
> 2018-07-20  Richard Sandiford  
>
> gcc/
> * tree-vect-data-refs.c (vect_analyze_group_access_1): Convert
> grouped stores with gaps to a strided group.
>

This patch caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87214

H.J.


[committed][libgomp, testsuite, openacc] Don't use const int for dimensions

2019-01-08 Thread Tom de Vries
Hi,

Const int is handled differently at -O0 for -xc and -xc++, which can cause noise
in testsuite/libgomp.oacc-c-c++-common test-cases (which are both run for c and
c++) if const int is used for launch dimensions.

Fix this by using #defines instead.

Committed to trunk.

Thanks,
- Tom

[libgomp, testsuite, openacc] Don't use const int for dimensions

2019-01-08  Tom de Vries  

PR target/88756
* testsuite/libgomp.oacc-c-c++-common/reduction-1.c (ng, nw, vl): Use
#define instead of "const int".
* testsuite/libgomp.oacc-c-c++-common/reduction-2.c (ng, nw, vl): Same.
* testsuite/libgomp.oacc-c-c++-common/reduction-3.c (ng, nw, vl): Same.
* testsuite/libgomp.oacc-c-c++-common/reduction-4.c (ng, nw, vl): Same.
* testsuite/libgomp.oacc-c-c++-common/reduction-5.c (ng, nw, vl): Same.

---
 libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-1.c | 6 +++---
 libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-2.c | 6 +++---
 libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-3.c | 6 +++---
 libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c | 6 +++---
 libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-5.c | 6 +++---
 5 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-1.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-1.c
index e8a8911faeb..3fe9ae65309 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-1.c
@@ -8,9 +8,9 @@
 #include 
 #include "reduction.h"
 
-const int ng = 8;
-const int nw = 4;
-const int vl = 32;
+#define ng 8
+#define nw 4
+#define vl 32
 
 static void
 test_reductions (void)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-2.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-2.c
index d19b1c825ca..83986abf7e0 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-2.c
@@ -8,9 +8,9 @@
 #include 
 #include "reduction.h"
 
-const int ng = 8;
-const int nw = 4;
-const int vl = 32;
+#define ng 8
+#define nw 4
+#define vl 32
 
 static void
 test_reductions (void)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-3.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-3.c
index 1b948bef5a0..3ac0f996a86 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-3.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-3.c
@@ -8,9 +8,9 @@
 #include 
 #include "reduction.h"
 
-const int ng = 8;
-const int nw = 4;
-const int vl = 32;
+#define ng 8
+#define nw 4
+#define vl 32
 
 static void
 test_reductions (void)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c
index 79355eded80..b8fa954cfee 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c
@@ -9,9 +9,9 @@
 #include 
 #include "reduction.h"
 
-const int ng = 8;
-const int nw = 4;
-const int vl = 32;
+#define ng 8
+#define nw 4
+#define vl 32
 
 static void
 test_reductions (void)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-5.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-5.c
index 46b553a61ff..215d919a103 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-5.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-5.c
@@ -9,9 +9,9 @@
 #include 
 #include 
 
-const int ng = 8;
-const int nw = 4;
-const int vl = 32;
+#define ng 8
+#define nw 4
+#define vl 32
 
 const int n = 100;
 


[committed][nvptx, libgomp] Don't launch with num_workers == 0

2019-01-08 Thread Tom de Vries
Hi,

When using a compiler build with:
...
+#define PTX_DEFAULT_VECTOR_LENGTH PTX_CTA_SIZE
+#define PTX_MAX_VECTOR_LENGTH PTX_CTA_SIZE
...
and running the libgomp testsuite, we run into an execution failure in
parallel-loop-1.c, due to a cuda launch failure:
...
  nvptx_exec: kernel f6_none_none$_omp_fn$0: launch gangs=480, workers=0, \
vectors=1024

libgomp: cuLaunchKernel error: invalid argument
...
because workers == 0.

The workers variable is set to 0 here in nvptx_exec:
...
workers = blocks / actual_vectors;
...
because actual_vectors is 1024, and blocks is 768:
...
cuOccupancyMaxPotentialBlockSize: grid = 10, block = 768
...

Fix this by ensuring that workers is at least one.

Committed to trunk.

Thanks,
- Tom

[nvptx, libgomp] Don't launch with num_workers == 0

2019-01-08  Tom de Vries  

* plugin/plugin-nvptx.c (nvptx_exec): Make sure to launch with at least
one worker.

---
 libgomp/plugin/plugin-nvptx.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 572d9ef8d5c..60553bdf3bd 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1272,6 +1272,7 @@ nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, 
void **devaddrs,
  ? vectors
  : dims[GOMP_DIM_VECTOR]);
workers = blocks / actual_vectors;
+   workers = MAX (workers, 1);
  }
 
for (i = 0; i != GOMP_DIM_MAX; i++)


Go patch committed: Use int type for len and cap in slice value

2019-01-08 Thread Ian Lance Taylor
This patch by Cherry Zhang fixes the Go frontend to use the int type
for the len & cap arguments to a  slice value.  Bootstrapped and ran
Go testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 267661)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-085ef4556ec810a5a9c422e7b86d98441dc92e86
+960637781ca9546ea2db913e48afd7eccbdadfa9
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 267660)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -7821,8 +7821,10 @@ Builtin_call_expression::lower_make(Stat
   cap_arg);
   mem = Expression::make_unsafe_cast(Type::make_pointer_type(et), mem,
 loc);
-  call = Expression::make_slice_value(type, mem, len_arg->copy(),
- cap_arg->copy(), loc);
+  Type* int_type = Type::lookup_integer_type("int");
+  len_arg = Expression::make_cast(int_type, len_arg->copy(), loc);
+  cap_arg = Expression::make_cast(int_type, cap_arg->copy(), loc);
+  call = Expression::make_slice_value(type, mem, len_arg, cap_arg, loc);
 }
   else if (is_map)
 {


Re: Remove overall growth from badness metrics

2019-01-08 Thread Qing Zhao


> On Jan 8, 2019, at 11:53 AM, Jan Hubicka  wrote:
> 
>>> 
>>> In general this parameter affects primarily -O3 builds, becuase -O2
>>> hardly hits the limit. From -O3 only programs with very large units are
>>> affected (-O2 units hits the limit only if you do have a lot of inline
>>> hints in the code).
>> don’t quite understand here, what’s the major difference for inlining 
>> between -O3 and -O2? 
>> (I see -finline-functions is enabled for both O3 and O2).
> 
> -O2 has -finline-small-functions where we inline only when function is
> declared inline or code size is expected to shrink after the inlining.
> -O3 has -finline-functions where we auto-inline a lot more.

Looks like that our current documentation has a bug in the below:

https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html 


-finline-functions
Consider all functions for inlining, even if they are not declared inline. The 
compiler heuristically decides which functions are worth integrating in this 
way.
If all calls to a given function are integrated, and the function is declared 
static, then the function is normally not output as assembler code in its own 
right.
Enabled at levels -O2, -O3, -Os. Also enabled by -fprofile-use and 
-fauto-profile.

It clearly mentioned that -finline-functions is enabled at -O2, O3, -Os. 

And I checked the gcc9 source code, opts.c:

/* -O3 and -Os optimizations.  */
/* Inlining of functions reducing size is a good idea with -Os
   regardless of them being declared inline.  */
{ OPT_LEVELS_3_PLUS_AND_SIZE, OPT_finline_functions, NULL, 1 },

looks like that -finline-functions is ONLY enabled at -O3 and -Os, not for O2.
(However, I am confused with why -finline-functions should be enabled for -Os?)

>> 
>>> 
>>> In my test bed this included Firefox with or without LTO becuase they do
>>> "poor man's" LTO by #including multiple .cpp files into single unified
>>> source which makes average units large.  Also tramp3d, DLV from our C++
>>> benhcmark is affected. 
>>> 
>>> I have some data on Firefox and I will build remainin ones:
>> in the following, are the data for code size? are the optimization level O3?
>> what’s PGO mean?  
> 
> Those are sizes of libxul, which is the largest library of Firefox.
> PGO is profile guided optimization.

Okay.  I see. 

looks like for LTO,  the code size increase with profiling is much smaller than
that without profiling when growth is increased from 20% to 40%.  

for Non-LTO, the code size increase is minimal when growth is increased fro 20% 
to 40%.

However, not quite understand the last column, could you please explain a 
little bit
on last column (-finline-functions)?

>>> 
>>> growth  LTO+PGOPGO   LTOnone  
>>> -finline-functions
>>> 20 (default)   83752215   94390023  93085455  103437191  94351191
>>> 40 85299111   97220935  101600151 108910311  115311719
>>> clang  111520431114863807 108437807
>>> 
>>> Build times are within noise of my setup, but they are less pronounced
>>> than the code size difference. I think at most 1 minute out of 100.
>>> Note that Firefox consists of 6% Rust code that is not built by GCC and
>>> and building that consumes over half of the build time.
>>> 
>>> Problem I am trying to solve here are is to get consistent LTO
>>> performance improvements compared to non-LTO. Currently there are
>>> some regressions:
>>> https://treeherder.mozilla.org/perf.html#/compare?originalProject=try=b6ba1ebfe913d152989495d8cb450bce02f27d44=try=c7bd18804e328ed490eab707072b3cf59da91042=1=1=1
>>> All those regressions goes away with limit increase.
>> 
>> 
>>> I tracked them down to the fact that we do not inline some very small
>>> functions already (such as IsHTMLWhitespace .  In GCC 5 timeframe I
>>> tuned this parameter to 20% based on Firefox LTO benchmarks but I was
>>> not that serious about performance since my setup was not giving very
>>> reproducible results for sub 5% differences on tp5o. Since we plan to
>>> enable LTO by default for Tumbleweed I need to find something that does
>>> not cause too many regression while keeping code size advantage of
>>> non-LTO.
>> 
>> from my understanding, the performance regression from LTO to non-LTO is 
>> caused 
>> by some small and important functions cannot be inlined anymore with LTO due 
>> to more functions are
>> eligible to be inlined for LTO, therefore the original value for 
>> inline-unit-growth becomes relatively smaller.
> 
> Yes, with whole program optimization most functions calls are inlinable,
> while when in normal non-LTO build most function calls are external.
> Since there are a lot of small functions called cross-module in modern
> C++ programs it simply makes inliner to run out of the limits before
> getting some of useful inline decisions.
> 
> I was poking about this for a while, but did not really have very good
> testcases available making it 

C++ PATCH to add test for c++/88744

2019-01-08 Thread Marek Polacek
I got confused and opened this PR thinking we've got another problem here.
Turns out things work as expected with the recent patch of mine.  Well, at
least we have another testcase.

Bootstrapped/regtested on x86_64-linux, applying to trunk.

2019-01-08  Marek Polacek  

PR c++/88744
* g++.dg/cpp2a/nontype-class12.C: New test.

diff --git gcc/testsuite/g++.dg/cpp2a/nontype-class12.C 
gcc/testsuite/g++.dg/cpp2a/nontype-class12.C
new file mode 100644
index 000..11f8c12f3ff
--- /dev/null
+++ gcc/testsuite/g++.dg/cpp2a/nontype-class12.C
@@ -0,0 +1,23 @@
+// PR c++/88744
+// { dg-do compile { target c++2a } }
+
+#define SA(X) static_assert((X),#X)
+
+struct S {
+  int a;
+  int b;
+  constexpr S(int a_, int b_) : a{a_}, b{b_} { }
+};
+
+template
+struct X {
+  static constexpr int i = s.a;
+  static constexpr int j = s.b;
+};
+X x; // ok, X<{1, 2}>
+X<{3, 4}> x2;
+
+SA (x.i == 1);
+SA (x.j == 2);
+SA (x2.i == 3);
+SA (x2.j == 4);


Re: [Patch 4/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2019-01-08 Thread Steve Ellcey
On Mon, 2019-01-07 at 17:38 +, Richard Sandiford wrote:
> 
> Yeah, this was the kind of thing I had in mind, thanks.

Here is an updated version of the patch.  I bootstrapped and tested
on aarch64 and x86.  I didn't test the other platforms where I changed
the arguments to hard_regno_call_part_clobbered but I think they should
be OK.  I believe I addressed all the issues you brought up.  The ones
I am least confident of are the lra-lives.c changes.  I think they are
right and testing had no regressions, but they are probably the changes
that need to be checked most closely.

Steve Ellcey
sell...@marvell.com


2019-01-08  Steve Ellcey  

* config/aarch64/aarch64.c (aarch64_simd_call_p): New function.
(aarch64_hard_regno_call_part_clobbered): Add insn argument.
(aarch64_return_call_with_max_clobbers): New function.
(TARGET_RETURN_CALL_WITH_MAX_CLOBBERS): New macro.
* config/avr/avr.c (avr_hard_regno_call_part_clobbered): Add insn
argument.
* config/i386/i386.c (ix86_hard_regno_call_part_clobbered): Ditto.
* config/mips/mips.c (mips_hard_regno_call_part_clobbered): Ditto.
* config/rs6000/rs6000.c (rs6000_hard_regno_call_part_clobbered): Ditto.
* config/s390/s390.c (s390_hard_regno_call_part_clobbered): Ditto.
* cselib.c (cselib_process_insn): Add argument to
targetm.hard_regno_call_part_clobbered call.
* ira-conflicts.c (ira_build_conflicts): Ditto.
* ira-costs.c (ira_tune_allocno_costs): Ditto.
* lra-constraints.c (inherit_reload_reg): Ditto.
* lra-int.h (struct lra_reg): Add call_insn field.
* lra-lives.c (check_pseudos_live_through_calls): Add call_insn
argument.  Call targetm.return_call_with_max_clobbers.
Add argument to targetm.hard_regno_call_part_clobbered call.
(process_bb_lives): Use new target function
targetm.return_call_with_max_clobbers to set call_insn.
Pass call_insn to check_pseudos_live_through_calls.
Modify if to check targetm.return_call_with_max_clobbers.
* lra.c (initialize_lra_reg_info_element): Set call_insn to NULL.
* regcprop.c (copyprop_hardreg_forward_1): Add argument to
targetm.hard_regno_call_part_clobbered call.
* reginfo.c (choose_hard_reg_mode): Ditto.
* regrename.c (check_new_reg_p): Ditto.
* reload.c (find_equiv_reg): Ditto.
* reload1.c (emit_reload_insns): Ditto.
* sched-deps.c (deps_analyze_insn): Ditto.
* sel-sched.c (init_regs_for_mode): Ditto.
(mark_unavailable_hard_regs): Ditto.
* targhooks.c (default_dwarf_frame_reg_mode): Ditto.
* target.def (hard_regno_call_part_clobbered): Add insn argument.
(return_call_with_max_clobbers): New target function.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in (TARGET_RETURN_CALL_WITH_MAX_CLOBBERS): New hook.
* hooks.c (hook_bool_uint_mode_false): Change to
hook_bool_insn_uint_mode_false.
* hooks.h (hook_bool_uint_mode_false): Ditto.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1c45243..2063292 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1644,14 +1644,51 @@ aarch64_reg_save_mode (tree fndecl, unsigned regno)
 	   : (aarch64_simd_decl_p (fndecl) ? E_TFmode : E_DFmode);
 }
 
+/* Return true if the instruction is a call to a SIMD function, false
+   if it is not a SIMD function or if we do not know anything about
+   the function.  */
+
+static bool
+aarch64_simd_call_p (rtx_insn *insn)
+{
+  rtx symbol;
+  rtx call;
+  tree fndecl;
+
+  gcc_assert (CALL_P (insn));
+  call = get_call_rtx_from (insn);
+  symbol = XEXP (XEXP (call, 0), 0);
+  if (GET_CODE (symbol) != SYMBOL_REF)
+return false;
+  fndecl = SYMBOL_REF_DECL (symbol);
+  if (!fndecl)
+return false;
+
+  return aarch64_simd_decl_p (fndecl);
+}
+
 /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  The callee only saves
the lower 64 bits of a 128-bit register.  Tell the compiler the callee
clobbers the top 64 bits when restoring the bottom 64 bits.  */
 
 static bool
-aarch64_hard_regno_call_part_clobbered (unsigned int regno, machine_mode mode)
+aarch64_hard_regno_call_part_clobbered (rtx_insn *insn, unsigned int regno,
+	machine_mode mode)
 {
-  return FP_REGNUM_P (regno) && maybe_gt (GET_MODE_SIZE (mode), 8);
+  bool simd_p = insn && CALL_P (insn) && aarch64_simd_call_p (insn);
+  return FP_REGNUM_P (regno) && maybe_gt (GET_MODE_SIZE (mode), simd_p ? 16: 8);
+}
+
+/* Implement TARGET_RETURN_CALL_WITH_MAX_CLOBBERS.  */
+
+rtx_insn *
+aarch64_return_call_with_max_clobbers (rtx_insn *call_1, rtx_insn *call_2)
+{
+  gcc_assert (CALL_P (call_1));
+  if (call_2 == NULL_RTX || aarch64_simd_call_p (call_2))
+return call_1;
+  else
+return call_2;
 }
 
 /* Implement REGMODE_NATURAL_SIZE.  */
@@ -18764,6 +18801,10 @@ aarch64_libgcc_floating_mode_supported_p
 

Re: ISO_Fortran_binding patch

2019-01-08 Thread Thomas Koenig

Hi Paul,


This is an updated version of the earlier patch. The main addition is
a second testcase that checks the errors emitted by the CFI API
functions.


I notice that the header file ISO_Fortran_binding.h is found twice
in the patch.

Is there any particular reason why you do not want to use

! { dg-additional-options "-I $srcdir/../../libgfortran" }

in the test cases, and have it only once in the source trees?

However, I have no real strong opinion on this matter, if you
want to keep it as submitted, it is also fine.

Therefore: OK for trunk, and thanks a lot for the patch!

Documentation we can add at a later date, I think.

Regards

Thomas


[PATCH] Pretty printer test fixes and improvements

2019-01-08 Thread Jonathan Wakely

Test that StdUniquePtrPrinter correctly prints std::unique_ptr objects
using the old layout, prior to the PR libstdc++/77990 changes.

The printer test for a valueless std::variant started to fail because
the PR libstdc++/87431 fix meant it no longer became valueless. Change
the test to use a type that is not trivially copyable, so that the
exception causes it to become valueless.

* testsuite/libstdc++-prettyprinters/compat.cc: Test printer support
for old std::unique_ptr layout.
* testsuite/libstdc++-prettyprinters/cxx17.cc: Fix std::variant test
to become valueless. Add filesystem::path tests.

Tested x86_64-linux, committed to trunk.

commit 3291fa129196c251cf05641ae6f420464d96864d
Author: Jonathan Wakely 
Date:   Tue Jan 8 23:09:39 2019 +

Pretty printer test fixes and improvements

Test that StdUniquePtrPrinter correctly prints std::unique_ptr objects
using the old layout, prior to the PR libstdc++/77990 changes.

The printer test for a valueless std::variant started to fail because
the PR libstdc++/87431 fix meant it no longer became valueless. Change
the test to use a type that is not trivially copyable, so that the
exception causes it to become valueless.

* testsuite/libstdc++-prettyprinters/compat.cc: Test printer support
for old std::unique_ptr layout.
* testsuite/libstdc++-prettyprinters/cxx17.cc: Fix std::variant test
to become valueless. Add filesystem::path tests.

diff --git a/libstdc++-v3/testsuite/libstdc++-prettyprinters/compat.cc 
b/libstdc++-v3/testsuite/libstdc++-prettyprinters/compat.cc
index 39271ddaf27..a538b854038 100644
--- a/libstdc++-v3/testsuite/libstdc++-prettyprinters/compat.cc
+++ b/libstdc++-v3/testsuite/libstdc++-prettyprinters/compat.cc
@@ -23,6 +23,22 @@
 
 namespace std
 {
+  template
+struct tuple
+{
+  T _M_head_impl;
+};
+
+  template struct default_delete { };
+
+  template>
+struct unique_ptr
+{
+  unique_ptr(T* p) { _M_t._M_head_impl = p; }
+
+  tuple _M_t;
+};
+
   // Old representation of std::optional, before GCC 9
   template
 struct _Optional_payload
@@ -58,6 +74,12 @@ namespace std
 int
 main()
 {
+  struct datum { };
+  std::unique_ptr uptr (new datum);
+// { dg-final { regexp-test uptr {std::unique_ptr.datum. = {get\(\) = 0x.*}} } 
}
+  std::unique_ptr  = uptr;
+// { dg-final { regexp-test ruptr {std::unique_ptr.datum. = {get\(\) = 0x.*}} 
} }
+
   using std::optional;
 
   optional o;
diff --git a/libstdc++-v3/testsuite/libstdc++-prettyprinters/cxx17.cc 
b/libstdc++-v3/testsuite/libstdc++-prettyprinters/cxx17.cc
index 7e6f45b7b26..c550cbd61bd 100644
--- a/libstdc++-v3/testsuite/libstdc++-prettyprinters/cxx17.cc
+++ b/libstdc++-v3/testsuite/libstdc++-prettyprinters/cxx17.cc
@@ -22,6 +22,7 @@
 // Type printers only recognize the old std::string for now.
 #define _GLIBCXX_USE_CXX11_ABI 0
 
+#include 
 #include 
 #include 
 #include 
@@ -41,6 +42,11 @@ using std::unordered_set;
 using std::shared_ptr;
 using std::weak_ptr;
 
+struct X {
+  X(int) { }
+  X(const X&) { } // not trivially-copyable
+};
+
 int
 main()
 {
@@ -84,11 +90,11 @@ main()
 // { dg-final { note-test v0 {std::variant 
[index 0] = {0}} } }
   variant v1{ 0.5f };
 // { dg-final { note-test v1 {std::variant 
[index 0] = {0.5}} } }
-  variant v2;
+  variant v2;
   try {
 v2.emplace<1>(S());
   } catch (int) { }
-// { dg-final { note-test v2 {std::variant [no 
contained value]} } }
+// { dg-final { note-test v2 {std::variant [no 
contained value]} } }
   variant v3{ 3 };
 // { dg-final { note-test v3 {std::variant 
[index 1] = {3}} } }
   variant v4{ str };
@@ -118,6 +124,13 @@ main()
 // { dg-final { regexp-test q {std::shared_ptr.int \[2\]. \(use count 2, weak 
count 1\) = {get\(\) = 0x.*}} } }
 // { dg-final { regexp-test wq {std::weak_ptr.int \[2\]. \(use count 2, weak 
count 1\) = {get\(\) = 0x.*}} } }
 
+  std::filesystem::path p0;
+// { dg-final { note-test p0 {filesystem::path ""} } }
+  std::filesystem::path p1("filename");
+// { dg-final { note-test p1 {filesystem::path "filename"} } }
+  std::filesystem::path p2("/dir/.");
+// { dg-final { note-test p2 {filesystem::path "/dir/file" = {[root-directory] 
= "/", [1] = "dir", [2] = "."}} } }
+
   std::cout << "\n";
   return 0;// Mark SPOT
 }


[patch, fortran] Fix PR 68426, simplification of SPREAD

2019-01-08 Thread Thomas Koenig

Hello world,

the attached patch fixes the PR by simpliy having gfc_simplify_spread
also do its job when there's an EXPR_STRUCTURE.  The code to correctly
handle that case was already in place, it just was not run for that
case.

Regression-tested. OK for trunk?

Regards

Thomas

! { dg-do  run  }
! PR 68426 - simplification used to fail.
  module m
implicit none
type t
  integer :: i
end type t
type(t), dimension(2), parameter :: a1  = (/ t(1), t(2) /)
type(t), dimension(1), parameter :: c = spread ( a1(1), 1, 1 )
  end module m
ig25@flaemmli:~/Krempel/Spread> cat ChangeLog
2019-01-09  Thomas Koenig  

PR fortran/68426
* simplify.c (gfc_simplify_spread): Also simplify if the
type of source is an EXPR_STRUCTURE.

2019-01-09  Thomas Koenig  

PR fortran/68426
* gfortran.dg/spread_simplify_1.f90: New test.
Index: simplify.c
===
--- simplify.c	(Revision 267737)
+++ simplify.c	(Arbeitskopie)
@@ -7572,7 +7572,8 @@ gfc_simplify_spread (gfc_expr *source, gfc_expr *d
 	return NULL;
 }
 
-  if (source->expr_type == EXPR_CONSTANT)
+  if (source->expr_type == EXPR_CONSTANT
+  || source->expr_type == EXPR_STRUCTURE)
 {
   gcc_assert (dim == 0);
 
! { dg-do  run  }
! PR 68426 - simplification used to fail.
  module m
implicit none
type t
  integer :: i
end type t
type(t), dimension(2), parameter :: a1  = (/ t(1), t(2) /)
type(t), dimension(1), parameter :: c = spread ( a1(1), 1, 1 )
  end module m


program main
  use m
  if (c(1)%i /= 1) stop 1
end program main


[PATCH] PR libstdc++/87855 fix optional for types with non-trivial copy/move

2019-01-08 Thread Jonathan Wakely

When the contained value is not trivially copy (or move) constructible
the union's copy (or move) constructor will be deleted, and so the
_Optional_payload delegating constructors are invalid. G++ fails to
diagnose this because it incorrectly performs copy elision in the
delegating constructors. Clang does diagnose it (llvm.org/PR40245).

The solution is to avoid performing any copy (or move) when the
contained value's copy (or move) constructor isn't trivial. Instead the
contained value can be constructed by calling _M_construct. This is OK,
because the relevant constructor doesn't need to be constexpr when the
contained value isn't trivially copy (or move) constructible.

Additionally, this patch removes a lot of code duplication in the
_Optional_payload partial specializations and the _Optional_base partial
specialization, by hoisting it into common base classes.

The Python pretty printer for std::optional needs to be adjusted to
support the new layout. Retain support for the old layout, and add a
test to verify that the support still works.

PR libstdc++/87855
* include/std/optional (_Optional_payload_base): New class template
for common code hoisted from _Optional_payload specializations. Use
a template for the union, to allow a partial specialization for
types with non-trivial destructors. Add constructors for in-place
initialization to the union.
(_Optional_payload(bool, const _Optional_payload&)): Use _M_construct
to perform non-trivial copy construction, instead of relying on
non-standard copy elision in a delegating constructor.
(_Optional_payload(bool, _Optional_payload&&)): Likewise for
non-trivial move construction.
(_Optional_payload): Derive from _Optional_payload_base and use it
for everything except the non-trivial assignment operators, which are
defined as needed.
(_Optional_payload): Derive from the specialization
_Optional_payload and add a destructor.
(_Optional_base_impl::_M_destruct, _Optional_base_impl::_M_reset):
Forward to corresponding members of _Optional_payload.
(_Optional_base_impl::_M_is_engaged, _Optional_base_impl::_M_get):
Hoist common members from _Optional_base.
(_Optional_base): Make all members and base class public.
(_Optional_base::_M_get, _Optional_base::_M_is_engaged): Move to
_Optional_base_impl.
* python/libstdcxx/v6/printers.py (StdExpOptionalPrinter): Add
support for new std::optional layout.
* testsuite/libstdc++-prettyprinters/compat.cc: New test.

Tested x86_64-linux. Ville already ack'd this, so I'm committing it to trunk.

commit 5b7409eb9081b9c3efced73af2d386926b9b3dfe
Author: Jonathan Wakely 
Date:   Tue Jan 8 13:12:32 2019 +

PR libstdc++/87855 fix optional for types with non-trivial copy/move

When the contained value is not trivially copy (or move) constructible
the union's copy (or move) constructor will be deleted, and so the
_Optional_payload delegating constructors are invalid. G++ fails to
diagnose this because it incorrectly performs copy elision in the
delegating constructors. Clang does diagnose it (llvm.org/PR40245).

The solution is to avoid performing any copy (or move) when the
contained value's copy (or move) constructor isn't trivial. Instead the
contained value can be constructed by calling _M_construct. This is OK,
because the relevant constructor doesn't need to be constexpr when the
contained value isn't trivially copy (or move) constructible.

Additionally, this patch removes a lot of code duplication in the
_Optional_payload partial specializations and the _Optional_base partial
specialization, by hoisting it into common base classes.

The Python pretty printer for std::optional needs to be adjusted to
support the new layout. Retain support for the old layout, and add a
test to verify that the support still works.

PR libstdc++/87855
* include/std/optional (_Optional_payload_base): New class template
for common code hoisted from _Optional_payload specializations. Use
a template for the union, to allow a partial specialization for
types with non-trivial destructors. Add constructors for in-place
initialization to the union.
(_Optional_payload(bool, const _Optional_payload&)): Use 
_M_construct
to perform non-trivial copy construction, instead of relying on
non-standard copy elision in a delegating constructor.
(_Optional_payload(bool, _Optional_payload&&)): Likewise for
non-trivial move construction.
(_Optional_payload): Derive from _Optional_payload_base and use it
for everything except the non-trivial assignment operators, which 
are
defined as needed.

Re: [nvptx] vector length patch series

2019-01-08 Thread Tom de Vries
On 14-12-18 20:58, Tom de Vries wrote:
> 0016-nvptx-Add-vector_length-128-testcases.patch

> --- /dev/null 
> 
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vred2d-128.c
> 

> +gentest (test1, "acc parallel loop gang vector_length (128)",
> 
> +"acc loop vector reduction(+:t1) reduction(-:t2)")  

With this I run into PR70895 - "OpenACC: loop reduction does not work.
Output is zero".

Making the implicit firstprivate explicit fixes that.

Same problem and solution for gemm.f90.

Thanks,
- Tom


Re: [RFC] moving assemble_start_function / assemble_end_function to output_mi_thunk

2019-01-08 Thread Max Filippov
Sorry, wrong list, meant to send to g...@gcc.gnu.org

-- 
Thanks.
-- Max


[PATCH] Fix the dynamic_align_addr stuff partially (PR rtl-optimization/88331)

2019-01-08 Thread Jakub Jelinek
Hi!

The following patch attempts to fix one of the several issues with
the r266345 commit.  assign_stack_local isn't called just during expansion,
but also during RA, at which point we can't just randomly emit insns in the
middle of nowhere hoping it will be emitted in some insn sequence.

This patch just reverts the behavior for the over-aligned temporaries to
what we did before if it isn't during RA.  If needed, for GCC10 we can come
up with a way to tell the RA that it should realign, but IMHO it isn't
really needed right now.

Bootstrapped/regtested on x86_64-linux, i686-linux and powerpc64le-linux, ok
for trunk?

2019-01-08  Jakub Jelinek  

PR rtl-optimization/88331
* function.c (assign_stack_local_1): Don't set dynamic_align_addr if
not currently_expanding_to_rtl.

* gcc.target/i386/pr88331.c: New test.

--- gcc/function.c.jj   2019-01-01 12:37:16.550984910 +0100
+++ gcc/function.c  2019-01-08 16:51:02.868722422 +0100
@@ -400,7 +400,9 @@ assign_stack_local_1 (machine_mode mode,
 {
   /* If the required alignment exceeds MAX_SUPPORTED_STACK_ALIGNMENT and
 it is not OK to reduce it.  Align the slot dynamically.  */
-  if (mode == BLKmode && (kind & ASLK_REDUCE_ALIGN) == 0)
+  if (mode == BLKmode
+ && (kind & ASLK_REDUCE_ALIGN) == 0
+ && currently_expanding_to_rtl)
dynamic_align_addr = true;
   else
{
--- gcc/testsuite/gcc.target/i386/pr88331.c.jj  2019-01-08 17:31:46.504540870 
+0100
+++ gcc/testsuite/gcc.target/i386/pr88331.c 2019-01-08 17:31:17.547014393 
+0100
@@ -0,0 +1,30 @@
+/* PR rtl-optimization/88331 */
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=core-avx2" } */
+
+int b, d, e, g, i, j, l, m;
+int *c, *h, *n, *o;
+long f, k;
+
+void
+foo (void)
+{
+  long p = i;
+  int *a = o;
+  while (p)
+{
+  n = (int *) (__UINTPTR_TYPE__) a[0];
+  for (; f; f += 4)
+   for (; m <= d;)
+ {
+   for (; g <= e; ++g)
+ l = (int) (__UINTPTR_TYPE__) (n + l);
+   c[m] = (int) (__UINTPTR_TYPE__) n;
+ }
+}
+  int q = 0;
+  k = 0;
+  for (; k < j; k++)
+q += o[k] * h[k];
+  b = q;
+}

Jakub


[RFC] moving assemble_start_function / assemble_end_function to output_mi_thunk

2019-01-08 Thread Max Filippov
Hello,

I'm implementing MI thunk generation for the xtensa target and I've got
an issue that when my code generates a constant it is missing in the
resulting assembly. This happens because a constant pool output happens
inside the assemble_start_function, which is called before the thunk
function body has a chance to be generated. The following patch moves
assemble_start_function / assemble_end_function pair to the backend for
all targets that define TARGET_ASM_OUTPUT_MI_THUNK to allow calling
assemble_start_function after the function body is ready.

Is it OK, or should I try to fix it differently?

---8<---
>From bad901880a3f9fc69726aa082e2b2c674bacca94 Mon Sep 17 00:00:00 2001
From: Max Filippov 
Date: Mon, 7 Jan 2019 18:22:12 -0800
Subject: [PATCH] gcc: move assemble_start_function / assemble_end_function to
 output_mi_thunk

Let backends call assemble_start_function after they have generated
thunk function body so that a constant pool could be output if it is
required.

gcc/
2019-01-08  Max Filippov  

* cgraphunit.c (cgraph_node::expand_thunk): Remove
assemble_start_function and assemble_end_function calls.
* config/alpha/alpha.c (alpha_output_mi_thunk_osf): Call
assemble_start_function and assemble_end_function.
* config/arc/arc.c (arc_output_mi_thunk): Likewise.
* config/arm/arm.c (arm_output_mi_thunk): Likewise.
* config/bfin/bfin.c (bfin_output_mi_thunk): Likewise.
* config/c6x/c6x.c (c6x_output_mi_thunk): Likewise.
* config/cris/cris.c (cris_asm_output_mi_thunk): Likewise.
* config/csky/csky.c (csky_output_mi_thunk): Likewise.
* config/epiphany/epiphany.c (epiphany_output_mi_thunk): Likewise.
* config/frv/frv.c (frv_asm_output_mi_thunk): Likewise.
* config/i386/i386.c (x86_output_mi_thunk): Likewise.
* config/ia64/ia64.c (ia64_output_mi_thunk): Likewise.
* config/m68k/m68k.c (m68k_output_mi_thunk): Likewise.
* config/microblaze/microblaze.c (microblaze_asm_output_mi_thunk):
Likewise.
* config/mips/mips.c (mips_output_mi_thunk): Likewise.
* config/mmix/mmix.c (mmix_asm_output_mi_thunk): Likewise.
* config/mn10300/mn10300.c (mn10300_asm_output_mi_thunk): Likewise.
* config/nds32/nds32.c (nds32_asm_output_mi_thunk): Likewise.
* config/nios2/nios2.c (nios2_asm_output_mi_thunk): Likewise.
* config/or1k/or1k.c (or1k_output_mi_thunk): Likewise.
* config/pa/pa.c (pa_asm_output_mi_thunk): Likewise.
* config/riscv/riscv.c (riscv_output_mi_thunk): Likewise.
* config/rs6000/rs6000.c (rs6000_output_mi_thunk): Likewise.
* config/s390/s390.c (s390_output_mi_thunk): Likewise.
* config/sh/sh.c (sh_output_mi_thunk): Likewise.
* config/sparc/sparc.c (sparc_output_mi_thunk): Likewise.
* config/spu/spu.c (spu_output_mi_thunk): Likewise.
* config/stormy16/stormy16.c (xstormy16_asm_output_mi_thunk):
Likewise.
* config/tilegx/tilegx.c (tilegx_output_mi_thunk): Likewise.
* config/tilepro/tilepro.c (tilepro_asm_output_mi_thunk): Likewise.
* config/vax/vax.c (vax_output_mi_thunk): Likewise.
---
 gcc/cgraphunit.c   | 4 
 gcc/config/alpha/alpha.c   | 3 +++
 gcc/config/arc/arc.c   | 4 
 gcc/config/arm/arm.c   | 4 
 gcc/config/bfin/bfin.c | 3 +++
 gcc/config/c6x/c6x.c   | 3 +++
 gcc/config/cris/cris.c | 4 
 gcc/config/csky/csky.c | 3 +++
 gcc/config/epiphany/epiphany.c | 3 +++
 gcc/config/frv/frv.c   | 4 
 gcc/config/i386/i386.c | 5 -
 gcc/config/ia64/ia64.c | 3 +++
 gcc/config/m68k/m68k.c | 3 +++
 gcc/config/microblaze/microblaze.c | 3 +++
 gcc/config/mips/mips.c | 3 +++
 gcc/config/mmix/mmix.c | 6 +-
 gcc/config/mn10300/mn10300.c   | 3 +++
 gcc/config/nds32/nds32.c   | 3 +++
 gcc/config/nios2/nios2.c   | 3 +++
 gcc/config/or1k/or1k.c | 5 -
 gcc/config/pa/pa.c | 3 +++
 gcc/config/riscv/riscv.c   | 3 +++
 gcc/config/rs6000/rs6000.c | 3 +++
 gcc/config/s390/s390.c | 3 +++
 gcc/config/sh/sh.c | 3 +++
 gcc/config/sparc/sparc.c   | 3 +++
 gcc/config/spu/spu.c   | 3 +++
 gcc/config/stormy16/stormy16.c | 3 +++
 gcc/config/tilegx/tilegx.c | 3 +++
 gcc/config/tilepro/tilepro.c   | 3 +++
 gcc/config/vax/vax.c   | 4 
 31 files changed, 99 insertions(+), 7 deletions(-)

diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index e6b1296abfb5..1c070ee95cd7 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -1786,7 +1786,6 @@ cgraph_node::expand_thunk (bool output_asm_thunks, bool 
force_gimple_thunk)
   && targetm.asm_out.can_output_mi_thunk (thunk_fndecl, fixed_offset,
  

[PATCH] Fix up initializer_each_zero_or_onep (PR middle-end/88758)

2019-01-08 Thread Jakub Jelinek
Hi!

As mentioned in the PR, if a VECTOR_CST is not VECTOR_CST_STEPPED_P,
it is sufficient to recurse just on all the encoded elt, because if
the vector has more elts than encoded, all the remaining ones are equal to
the last one.
But, if it is stepped, there is the possibility that while the penultimate
and last encoded elt could be zero or one, the first non-encoded one could
be two or minus one.

The patch as committed is doing:
unsigned HOST_WIDE_INT nelts = vector_cst_encoded_nelts (expr);
if (VECTOR_CST_STEPPED_P (expr)
&& !TYPE_VECTOR_SUBPARTS (TREE_TYPE (expr)).is_constant ())
  return false;
so if it is stepped, it updates nelts to the subparts count and thus
attempts to verify all elts rather than just the encoded ones.  But that
fails because VECTOR_CST_ENCODED_ELT can't really access the non-encoded
ones.  The following patch fixes it by using vector_cst_elt there instead,
for the encoded elt it just returns VECTOR_CST_ENCODED_ELT immediately and
for the next value it will likely fail the predicate.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-01-08  Jelinek  

PR middle-end/88758
* tree.c (initializer_each_zero_or_onep) : Use
vector_cst_elt instead of VECTOR_CST_ENCODED_ELT.

--- gcc/tree.c.jj   2019-01-07 17:59:22.883951743 +0100
+++ gcc/tree.c  2019-01-08 18:15:01.956087119 +0100
@@ -11255,7 +11255,7 @@ initializer_each_zero_or_onep (const_tre
 
for (unsigned int i = 0; i < nelts; ++i)
  {
-   tree elt = VECTOR_CST_ENCODED_ELT (expr, i);
+   tree elt = vector_cst_elt (expr, i);
if (!initializer_each_zero_or_onep (elt))
  return false;
  }

Jakub


Re: C++ PATCH for c++/88538 - braced-init-list in template-argument-list

2019-01-08 Thread Marek Polacek
On Tue, Jan 08, 2019 at 03:45:54PM -0500, Jason Merrill wrote:
> On 1/8/19 10:42 AM, Marek Polacek wrote:
> > @@ -17020,6 +17020,18 @@ cp_parser_template_argument (cp_parser* parser)
> >   argument = cp_parser_constant_expression (parser);
> > else
> >   {
> > +  /* In C++20, we can encounter a braced-init-list.  */
> > +  if (cxx_dialect >= cxx2a
> > + && cp_lexer_next_token_is (parser->lexer, CPP_OPEN_BRACE))
> > +   {
> > + cp_parser_parse_tentatively (parser);
> 
> Hmm, I wonder if we would get better diagnostics for an ill-formed
> braced-init-list without tentative parsing here.  OK either way.

We get the same diagnostics with or without tentative parsing.  But since
we've checked that the next following token is {, I think it would be
cleaner to just parse it for real.

This is what I'm about to commit then.  Thanks for the reviews!

2019-01-08  Marek Polacek  

PR c++/88538 - braced-init-list in template-argument-list.
* parser.c (cp_parser_template_argument): Handle braced-init-list when
in C++20.

* g++.dg/cpp2a/nontype-class11.C: New test.

diff --git gcc/cp/parser.c gcc/cp/parser.c
index ca75c010e22..f441943dc8e 100644
--- gcc/cp/parser.c
+++ gcc/cp/parser.c
@@ -17026,6 +17026,14 @@ cp_parser_template_argument (cp_parser* parser)
 argument = cp_parser_constant_expression (parser);
   else
 {
+  /* In C++20, we can encounter a braced-init-list.  */
+  if (cxx_dialect >= cxx2a
+ && cp_lexer_next_token_is (parser->lexer, CPP_OPEN_BRACE))
+   {
+ bool expr_non_constant_p;
+ return cp_parser_braced_list (parser, _non_constant_p);
+   }
+
   /* With C++17 generalized non-type template arguments we need to handle
 lvalue constant expressions, too.  */
   argument = cp_parser_assignment_expression (parser);
diff --git gcc/testsuite/g++.dg/cpp2a/nontype-class11.C 
gcc/testsuite/g++.dg/cpp2a/nontype-class11.C
new file mode 100644
index 000..8a06d23904b
--- /dev/null
+++ gcc/testsuite/g++.dg/cpp2a/nontype-class11.C
@@ -0,0 +1,21 @@
+// PR c++/88538
+// { dg-do compile { target c++2a } }
+
+struct S {
+  unsigned a;
+  unsigned b;
+  constexpr S(unsigned _a, unsigned _b) noexcept: a{_a}, b{_b} { }
+};
+
+template 
+void fnc()
+{
+}
+
+template struct X { };
+
+void f()
+{
+  fnc<{10,20}>();
+  X<{1, 2}> x;
+}


[PATCH v2] ARM: add test case for -masm-syntax-unified (PR88648)

2019-01-08 Thread Stefan Agner
Add a test case to check whether -masm-syntax-unified is indeed
emitting the inline assembler with .syntax unified.

gcc/testsuite/ChangeLog
* gcc.target/arm/pr88648-asm-syntax-unified.c: add test to
  check if -masm-syntax-unified gets applied properly
---
 .../gcc.target/arm/pr88648-asm-syntax-unified.c| 14 ++
 1 file changed, 14 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arm/pr88648-asm-syntax-unified.c

diff --git a/gcc/testsuite/gcc.target/arm/pr88648-asm-syntax-unified.c 
b/gcc/testsuite/gcc.target/arm/pr88648-asm-syntax-unified.c
new file mode 100644
index 000..251b4d5bc9d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr88648-asm-syntax-unified.c
@@ -0,0 +1,14 @@
+/* Test for unified syntax assembly generation.  */
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_arch_v7a_ok } */
+/* { dg-add-options arm_arch_v7a } */
+/* { dg-options "-marm -march=armv7-a -masm-syntax-unified" } */
+
+void test ()
+{
+  asm("nop");
+}
+
+/* { dg-final { scan-assembler-times {\.syntax\sunified} 3 } } */
+/* { dg-final { scan-assembler-not {\.syntax\sdivided} } } */
+
-- 
2.20.1



Re: [PATCH] ARM: add test case for -masm-syntax-unified (PR88648)

2019-01-08 Thread Stefan Agner
On 08.01.2019 10:35, Kyrill  Tkachov wrote:
> Hi Stefan,
> 
> On 02/01/19 21:47, Stefan Agner wrote:
>> Add a test case to check whether -masm-syntax-unified is indeed
>> emitting the inline assembler with .syntax unified.
> 
> Can you please provide a ChangeLog entry for this change.
> 
>> ---
>>  .../gcc.target/arm/pr88648-asm-syntax-unified.c| 14 ++
>>  1 file changed, 14 insertions(+)
>>  create mode 100644 gcc/testsuite/gcc.target/arm/pr88648-asm-syntax-unified.c
>>
>> diff --git a/gcc/testsuite/gcc.target/arm/pr88648-asm-syntax-unified.c 
>> b/gcc/testsuite/gcc.target/arm/pr88648-asm-syntax-unified.c
>> new file mode 100644
>> index 000..2bd9d891b9e
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/arm/pr88648-asm-syntax-unified.c
>> @@ -0,0 +1,14 @@
>> +/* Test for unified syntax assembly generation.  */
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target arm_arch_v7a_ok } */
>> +/* { dg-add-options arm_arch_v7a } */
>> +/* { dg-options "-marm -march=armv7-a -masm-syntax-unified" } */
>> +
>> +void test ()
>> +{
>> +  asm("nop");
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {\.syntax\sunified} 3 } } */
>> +/* { dg-final { scan-assembler-times {\.syntax\sdivided} 0 } } */
>> +
> 
> Please use scan-assembler-not here to check for the absence of the
> ".syntax divided".

Ok, will send a v2 soon.

> 
> Looks ok to me otherwise.
> Do you need someone to commit these for you?

Yes please, I do not have commit rights.

--
Stefan


Re: [PATCH] Fix gcc.target/powerpc/pr88457.c testcase (PR target/88457)

2019-01-08 Thread Segher Boessenkool
On Tue, Jan 08, 2019 at 09:39:21PM +0100, Jakub Jelinek wrote:
> Hi!
> 
> I've noticed this testcase FAILs, there are many reasons for that:
> 1) hardcoded use of -m32, which isn't really available on powerpc64le
> 2) -c in dg-options
> 3) use of target_clones attribute without ppc_cpu_supports_hw effective
>target
> Also, there is no need to use both -mcpu=power7 -mcpu=e300c3, either of
> those is enough to get the ICE if the ira-colors.c change is reverted.
> For 1), while the testcase doesn't ICE unless ilp32 with the fix reverted,
> it compiles just fine with current trunk, so I see no reason to limit it to
> ilp32 only.
> 
> Regtested on powerpc64le-linux, ok for trunk?

Yes please.  Thanks!


Segher


> 2019-01-08  Jakub Jelinek  
> 
>   PR target/88457
>   * gcc.target/powerpc/pr88457.c: Remove -m32, -c and -mcpu=e300c3 from
>   dg-options.  Require ppc_cpu_supports_hw effective target instead of
>   powerpc64*-*-*.


Re: [PATCH] Fix power8 non-delegitimized UNSPEC UNSPEC_FUSION_GPR messages

2019-01-08 Thread Segher Boessenkool
On Tue, Jan 08, 2019 at 09:34:31PM +0100, Jakub Jelinek wrote:
> My recent changes to UNSPEC handling in dwarf2out.c apparently broke
> (non-release checking) regtest on powerpc*, a lot of
> note: non-delegitimized UNSPEC UNSPEC_FUSION_GPR (73) found in variable 
> location
> messages are emitted while compiling pretty much anything with -mcpu=power8
> or later.
> 
> The problem is that debug insns say some variable lives in the result of
> UNSPEC_FUSION_GPR and var-tracking manages to optimize the operand of that
> UNSPEC from a complex MEM into a LABEL_REF.  For UNSPECs that don't have any
> constant arguments dwarf2out is silent as before, but for those that have
> CONSTANT_P operands it hints that it would be nice to delegitimize those.
> 
> Apparently UNSPEC_FUSION_GPR wraps a MEM that isn't valid in a normal insn,
> as it includes both the high and lo_sum parts, but otherwise it is something
> that is completely ok for debug info purposes.
> So, this patch delegitimizes this UNSPEC to its argument.
> 
> Bootstrapped/regtested on powerpc64le-linux, ok for trunk?

Okay for trunk.  Thanks!


Segher


> 2019-01-08  Jakub Jelinek  
> 
>   * config/rs6000/rs6000.c (rs6000_delegitimize_address): Delegitimize
>   UNSPEC_FUSION_GPR to its argument.  Formatting fixes.


Re: [PATCH] PR fortran/69101 -- IEEE_SELECTED_REAL_KIND part I

2019-01-08 Thread Thomas Koenig

Hi Steve,


Well, that was quick.  Moving code around is problematic.


Thanks for checking. The patch is OK for trunk.

Regards

Thomas


Re: [PATCH][jit] Add thread-local globals to the libgccjit frontend

2019-01-08 Thread Marc Nieper-Wißkirchen

[...]


2019-01-08  Marc Nieper-Wißkirchen  

  * docs/topics/compatibility.rst: Add LIBGCCJIT_ABI_11.
  * docs/topics/expressions.rst (Global variables): Add
  documentation of gcc_jit_lvalue_set_bool_thread_local.
* docs/_build/texinfo/libgccjit.texi: Regenerate.
* jit-playback.c: Include "varasm.h".
Within namespace gcc::jit::playback...
(context::new_global) Add "thread_local_p" param and use it
to set DECL_TLS_MODEL.
* jit-playback.h: Within namespace gcc::jit::playback...
(context::new_global): Add "thread_local_p" param.
* jit-recording.c: Within namespace gcc::jit::recording...
(global::replay_into): Provide m_thread_local to call to
new_global.
(global::write_reproducer): Call write_reproducer_thread_local.
(global::write_reproducer_thread_local): New method.
* jit-recording.h: Within namespace gcc::jit::recording...
(lvalue::dyn_cast_global): New virtual function.
(global::m_thread_local): New field.
* libgccjit.c (gcc_jit_lvalue_set_bool_thread_local): New
function.
* libgccjit.h
(LIBGCCJIT_HAVE_gcc_jit_lvalue_set_bool_thread_local): New
macro.
(gcc_jit_lvalue_set_bool_thread_local): New function.
* libgccjit.map (LIBGCCJIT_ABI_11): New.
(gcc_jit_lvalue_set_bool_thread_local): Add.
* ../testsuite/jit.dg/all-non-failing-tests.h: Include new
test.
* ../testsuite/jit.dg/jit.exp: Load pthread for tests involving
thread-local globals.
* ../testsuite/jit.dg/test-thread-local.c: New test case for
thread-local globals.


BTW, the convention here is to split out the ChangeLog entries by
directory based on the presence of ChangeLog files.

There's a gcc/jit/ChangeLog and a gcc/testsuite/ChangeLog, so for this
patch there should be two sets of ChangeLog entries, one for each of
these, with the paths expressed relative to the directory holding the
ChangeLog.

So the testsuite entries would go into gcc/testsuite/ChangeLog, and
look like:

* jit.dg/all-non-failing-tests.h: Include new test.

...etc.



Thanks for explaining the policy.  Corrected ChangeLogs:

gcc/jit/ChangeLog
=

2019-01-08  Marc Nieper-Wißkirchen  

* docs/topics/compatibility.rst: Add LIBGCCJIT_ABI_11.
* docs/topics/expressions.rst (Global variables): Add
documentation of gcc_jit_lvalue_set_bool_thread_local.
* docs/_build/texinfo/libgccjit.texi: Regenerate.
* jit-playback.c: Include "varasm.h".
Within namespace gcc::jit::playback...
(context::new_global) Add "thread_local_p" param and use it
to set DECL_TLS_MODEL.
* jit-playback.h: Within namespace gcc::jit::playback...
(context::new_global): Add "thread_local_p" param.
* jit-recording.c: Within namespace gcc::jit::recording...
(global::replay_into): Provide m_thread_local to call to
new_global.
(global::write_reproducer): Call write_reproducer_thread_local.
(global::write_reproducer_thread_local): New method.
* jit-recording.h: Within namespace gcc::jit::recording...
(lvalue::dyn_cast_global): New virtual function.
(global::m_thread_local): New field.
* libgccjit.c (gcc_jit_lvalue_set_bool_thread_local): New
function.
* libgccjit.h
(LIBGCCJIT_HAVE_gcc_jit_lvalue_set_bool_thread_local): New
macro.
(gcc_jit_lvalue_set_bool_thread_local): New function.
* libgccjit.map (LIBGCCJIT_ABI_11): New.
(gcc_jit_lvalue_set_bool_thread_local): Add.

gcc/testsuite/ChangeLog
===

2019-01-08  Marc Nieper-Wißkirchen  

* jit.dg/all-non-failing-tests.h: Include new test.
* jit.dg/jit.exp: Load pthread for tests involving
thread-local globals.
* jit.dg/test-thread-local.c: New test case for
thread-local globals.

[...]


diff --git a/gcc/testsuite/jit.dg/test-thread-local.c
b/gcc/testsuite/jit.dg/test-thread-local.c
new file mode 100644
index 0..287ba85e4
--- /dev/null
+++ b/gcc/testsuite/jit.dg/test-thread-local.c
@@ -0,0 +1,99 @@
+#include 
+#include 
+#include 
+#include 
+
+#include "libgccjit.h"
+
+#include "harness.h"
+
+void
+create_code (gcc_jit_context *ctxt, void *user_data)
+{
+  /* Let's try to inject the equivalent of:
+
+  static thread_local int tl;
+  set_tl (int v)
+  {
+tl = v;
+  }
+
+  int
+  get_tl (void)
+  {
+return tl;
+  }


Thanks for posting this.

This test is OK as far as it goes, but there are some gaps in test
coverage: the test verifies that jit-generated code can create a new
thread-local variable, and writes to it, but it doesn't seem to test
reading from it; e.g. there doesn't seem to be a CHECK_VALUE that the
thread-local var is 43 after the set_tl (43);


The test does check reading the 

Re: [patch, rfa] Fix PR other/16615, change "can not" to "cannot" throughout docs and code

2019-01-08 Thread Sandra Loosemore

On 12/31/18 4:23 PM, Joseph Myers wrote:

On Fri, 28 Dec 2018, Sandra Loosemore wrote:


Taken individually, all these changes probably qualify as obvious, but given
how extensive they are and how many files are touched, I thought it would be
good to get a sanity check on methodology before checking in the whole pile.
E.g. are there other files that should be excluded from the recipe for part 1?


gcc/go/gofrontend/ and libgo/ have their own commit processes involving
applying fixes to another repository first (and no ChangeLogs).
Unfortunately the URL to that repository in gcc/go/README.gcc and
libgo/README.gcc no longer works.

I think the same applies to gcc/d/dmd/, though I'm not sure the process
there is documented anywhere.  Likewise for parts of libphobos (although
that has a ChangeLog for the GCC-local build machinery).

The .po files come verbatim from the Translation Project; it's useless to
patch those locally in GCC since the changes will be overwritten by the
next copy from the TP.  gettext's fuzzy match machinery should suffice to
help translators match their old translation of a "can not" message to the
new "cannot" message.


OK, it seems simplest to exclude changes to files in all those 
subdirectories that are maintained outside the normal GCC processes.



I don't really think patching such issues in testcases is useful;
certainly not for the ACATS tests coming from an upstream source.
(Testsuite .exp files would be more like normal sources in this regard and
should get spelling fixes etc., but I don't see any such files affected by
your changes.)


It turned out to be necessary to touch several of the testcases to match 
changes to diagnostic messages in dg-error, etc; otherwise those tests 
would FAIL.  But in the new iteration of the patch set I've excluded any 
other changes to comments, etc in test cases.



libtool.m4 and ltmain.sh come from upstream libtool; this seems like an
issue that should be fixed there (if still present, our libtool version is
very old), not patched locally in GCC.  Not patching libtool.m4 locally
might eliminate at least some of the configure regenerations.


OK, I've skipped those files too, and since the only configure changes 
were those resulting from changing libtool.m4 it's no longer to 
regenerate any of them.



For tm.texi, make sure the changes exactly correspond to the result of
regenerating given the target.def changes.


Done.


For changes to include/ and libiberty/, it's helpful to apply them to
binutils-gdb at the same time as GCC (the shared files and directories
aren't always exactly in sync between the two repositories, but should be,
and I'd consider copying a change from one to the other to be obvious).


I've split changes to those directories out into a separate patch so it 
can be applied to the binutils-gdb repository more easily.


Here are the ChangeLogs for the new patch series.  Parts 1, 2, and 3 are 
mechanically generated via sed (etc).  Part 4 is the hand-edited 
changes, identical to part 2 of the previous series except for excluding 
a couple files.  Part 5 is the regenerated gcc.pot file.  Does this 
version look OK to commit?


-Sandra
2019-01-07  Sandra Loosemore  

	PR other/16615

	contrib/
	* mklog: Mechanically replace "can not" with "cannot".

	gcc/
	* Makefile.in: Mechanically replace "can not" with "cannot".
	* alias.c: Likewise.
	* builtins.c: Likewise.
	* calls.c: Likewise.
	* cgraph.c: Likewise.
	* cgraph.h: Likewise.
	* cgraphclones.c: Likewise.
	* cgraphunit.c: Likewise.
	* combine-stack-adj.c: Likewise.
	* combine.c: Likewise.
	* common/config/i386/i386-common.c: Likewise.
	* config/aarch64/aarch64.c: Likewise.
	* config/alpha/sync.md: Likewise.
	* config/arc/arc.c: Likewise.
	* config/arc/predicates.md: Likewise.
	* config/arm/arm-c.c: Likewise.
	* config/arm/arm.c: Likewise.
	* config/arm/arm.h: Likewise.
	* config/arm/arm.md: Likewise.
	* config/arm/cortex-r4f.md: Likewise.
	* config/csky/csky.c: Likewise.
	* config/csky/csky.h: Likewise.
	* config/darwin-f.c: Likewise.
	* config/epiphany/epiphany.md: Likewise.
	* config/i386/i386.c: Likewise.
	* config/i386/sol2.h: Likewise.
	* config/m68k/m68k.c: Likewise.
	* config/mcore/mcore.h: Likewise.
	* config/microblaze/microblaze.md: Likewise.
	* config/mips/20kc.md: Likewise.
	* config/mips/sb1.md: Likewise.
	* config/nds32/nds32.c: Likewise.
	* config/nds32/predicates.md: Likewise.
	* config/pa/pa.c: Likewise.
	* config/rs6000/e300c2c3.md: Likewise.
	* config/rs6000/rs6000.c: Likewise.
	* config/s390/s390.h: Likewise.
	* config/sh/sh.c: Likewise.
	* config/sh/sh.md: Likewise.
	* config/spu/vmx2spu.h: Likewise.
	* cprop.c: Likewise.
	* dbxout.c: Likewise.
	* df-scan.c: Likewise.
	* doc/cfg.texi: Likewise.
	* doc/extend.texi: Likewise.
	* doc/fragments.texi: Likewise.
	* doc/gty.texi: Likewise.
	* doc/invoke.texi: Likewise.
	* doc/lto.texi: Likewise.
	* doc/md.texi: Likewise.
	* doc/objc.texi: Likewise.
	* doc/rtl.texi: Likewise.
	* doc/tm.texi: Likewise.
	* 

Re: [C++ PATCH] [PR86648] use auto identifier for class placeholder templates

2019-01-08 Thread Jason Merrill

On 12/28/18 2:45 PM, Alexandre Oliva wrote:


dwarf2out recognizes unspecified auto types by the identifier.  C++
template class placeholders are unspecified auto types that take the
identifier of the class rather than those used by preexisting auto
types, so dwarf2out ICEs when it finds one of those.  Alas, they may
be visible to dwarf2out, since the types of e.g. static data members
of templates are only deduced at member instantiation, i.e., if the
data member is actually referenced, but the data member is added as a
field, still with unspecified auto placeholder type, when the
enclosing class is instantiated.

I've changed placeholder creator to use an auto identifier instead,
which allowed dropping the placeholder test in C++'s is_auto (alas, it
can't be used in dwarf2out, think LTO).  To avoid losing information
in error messages and dumps and whatnot, I've added code to recognize
placeholders for template classes say A and print them out as
A<...auto...>.

Regstrapped on x86_64- and i686-linux-gnu.  Ok to install?


for  gcc/cp/ChangeLog

PR c++/86648
 * pt.c (make_template_placeholder): Use auto_identifier.
 (is_auto): Drop CLASS_PLACEHOLDER_TEMPLATE test.
 * error.c (dump_type): Handle template placeholders.
 * cxx-pretty-print.c (pp_cx_unqualified_id): Likewise.

for  gcc/testsuite/ChangeLog

PR c++/86648
 * gcc.dg/cpp1z/pr86648.C: New.

---
  gcc/cp/cxx-pretty-print.c|   10 +-
  gcc/cp/error.c   |8 
  gcc/cp/pt.c  |5 ++---
  gcc/testsuite/g++.dg/cpp1z/pr86648.C |5 +
  4 files changed, 24 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/pr86648.C

diff --git a/gcc/cp/cxx-pretty-print.c b/gcc/cp/cxx-pretty-print.c
index b79ff5137aa1..c173760f0425 100644
--- a/gcc/cp/cxx-pretty-print.c
+++ b/gcc/cp/cxx-pretty-print.c
@@ -187,7 +187,15 @@ pp_cxx_unqualified_id (cxx_pretty_printer *pp, tree t)
  
  case TEMPLATE_TYPE_PARM:

  case TEMPLATE_TEMPLATE_PARM:
-  if (TYPE_IDENTIFIER (t))
+  if (template_placeholder_p (t)
+ && DECL_P (CLASS_PLACEHOLDER_TEMPLATE (t))
+ && TYPE_IDENTIFIER (TREE_TYPE (CLASS_PLACEHOLDER_TEMPLATE (t


Are these extra checks needed?  I would expect them to be true whenever 
template_placeholder_p is.


Jason


Re: [C++ Patch] Fix three additional locations

2019-01-08 Thread Jason Merrill

On 1/8/19 8:13 AM, Paolo Carlini wrote:

Hi,

a few additional easy to fix cases where we thought that a plain error 
was good enough. Tested x86_64-linux.


Thanks, Paolo.




OK.


Re: C++ PATCH for c++/88538 - braced-init-list in template-argument-list

2019-01-08 Thread Jason Merrill

On 1/8/19 10:42 AM, Marek Polacek wrote:

On Mon, Jan 07, 2019 at 09:52:55PM -0500, Jason Merrill wrote:

On 1/7/19 6:56 PM, Marek Polacek wrote:

At the risk of seeming overly eager, I thought it would be reasonable to
go with the following: enabling braced-init-list as a template-argument.
As the discussion on the reflector clearly indicates, this was the intent
from the get-go.

I know, it's not a regression.  But I restricted the change to C++20, and it
should strictly allow code that wasn't accepted before -- when a template
argument starts with {.  Perhaps we could even drop the C++20 check.

What's your preference?


Let's keep the C++20 check for now at least.  I'd suggest moving the change
further down, with this code:


Okay.  I've experimented with checking expr_non_constant_p, but this version
gives better diagnostics.

Bootstrapped/regtested running on x86_64-linux, ok for trunk if it passes?

2019-01-08  Marek Polacek  

PR c++/88538 - braced-init-list in template-argument-list.
* parser.c (cp_parser_template_argument): Handle braced-init-list when
in C++20.

* g++.dg/cpp2a/nontype-class11.C: New test.

diff --git gcc/cp/parser.c gcc/cp/parser.c
index bca1739ace3..87f37d8ab2b 100644
--- gcc/cp/parser.c
+++ gcc/cp/parser.c
@@ -17020,6 +17020,18 @@ cp_parser_template_argument (cp_parser* parser)
  argument = cp_parser_constant_expression (parser);
else
  {
+  /* In C++20, we can encounter a braced-init-list.  */
+  if (cxx_dialect >= cxx2a
+ && cp_lexer_next_token_is (parser->lexer, CPP_OPEN_BRACE))
+   {
+ cp_parser_parse_tentatively (parser);


Hmm, I wonder if we would get better diagnostics for an ill-formed 
braced-init-list without tentative parsing here.  OK either way.


Jason


[PATCH] Fix gcc.target/powerpc/pr88457.c testcase (PR target/88457)

2019-01-08 Thread Jakub Jelinek
Hi!

I've noticed this testcase FAILs, there are many reasons for that:
1) hardcoded use of -m32, which isn't really available on powerpc64le
2) -c in dg-options
3) use of target_clones attribute without ppc_cpu_supports_hw effective
   target
Also, there is no need to use both -mcpu=power7 -mcpu=e300c3, either of
those is enough to get the ICE if the ira-colors.c change is reverted.
For 1), while the testcase doesn't ICE unless ilp32 with the fix reverted,
it compiles just fine with current trunk, so I see no reason to limit it to
ilp32 only.

Regtested on powerpc64le-linux, ok for trunk?

2019-01-08  Jakub Jelinek  

PR target/88457
* gcc.target/powerpc/pr88457.c: Remove -m32, -c and -mcpu=e300c3 from
dg-options.  Require ppc_cpu_supports_hw effective target instead of
powerpc64*-*-*.

--- gcc/testsuite/gcc.target/powerpc/pr88457.c.jj   2018-12-21 
00:40:46.882973124 +0100
+++ gcc/testsuite/gcc.target/powerpc/pr88457.c  2019-01-08 18:46:13.633442809 
+0100
@@ -1,5 +1,6 @@
-/* { dg-do compile  { target { powerpc64*-*-* } } } */
-/* { dg-options "-m32 -mcpu=power7 -O1 -fexpensive-optimizations --param 
ira-max-conflict-table-size=0 --param max-cse-insns=3 -c -mcpu=e300c3" } */
+/* { dg-do compile } */
+/* { dg-require-effective-target ppc_cpu_supports_hw } */
+/* { dg-options "-mcpu=power7 -O1 -fexpensive-optimizations --param 
ira-max-conflict-table-size=0 --param max-cse-insns=3" } */
 
 __attribute__((target_clones("cpu=power9,default")))
 long mod_func (long a, long b)

Jakub


[PATCH] Fix power8 non-delegitimized UNSPEC UNSPEC_FUSION_GPR messages

2019-01-08 Thread Jakub Jelinek
Hi!

My recent changes to UNSPEC handling in dwarf2out.c apparently broke
(non-release checking) regtest on powerpc*, a lot of
note: non-delegitimized UNSPEC UNSPEC_FUSION_GPR (73) found in variable location
messages are emitted while compiling pretty much anything with -mcpu=power8
or later.

The problem is that debug insns say some variable lives in the result of
UNSPEC_FUSION_GPR and var-tracking manages to optimize the operand of that
UNSPEC from a complex MEM into a LABEL_REF.  For UNSPECs that don't have any
constant arguments dwarf2out is silent as before, but for those that have
CONSTANT_P operands it hints that it would be nice to delegitimize those.

Apparently UNSPEC_FUSION_GPR wraps a MEM that isn't valid in a normal insn,
as it includes both the high and lo_sum parts, but otherwise it is something
that is completely ok for debug info purposes.
So, this patch delegitimizes this UNSPEC to its argument.

Bootstrapped/regtested on powerpc64le-linux, ok for trunk?

2019-01-08  Jakub Jelinek  

* config/rs6000/rs6000.c (rs6000_delegitimize_address): Delegitimize
UNSPEC_FUSION_GPR to its argument.  Formatting fixes.

--- gcc/config/rs6000/rs6000.c.jj   2019-01-01 12:37:44.885520010 +0100
+++ gcc/config/rs6000/rs6000.c  2019-01-08 13:22:48.846995153 +0100
@@ -8388,14 +8388,17 @@ rs6000_delegitimize_address (rtx orig_x)
 {
   rtx x, y, offset;
 
+  if (GET_CODE (orig_x) == UNSPEC && XINT (orig_x, 1) == UNSPEC_FUSION_GPR)
+orig_x = XVECEXP (orig_x, 0, 0);
+
   orig_x = delegitimize_mem_from_attrs (orig_x);
+
   x = orig_x;
   if (MEM_P (x))
 x = XEXP (x, 0);
 
   y = x;
-  if (TARGET_CMODEL != CMODEL_SMALL
-  && GET_CODE (y) == LO_SUM)
+  if (TARGET_CMODEL != CMODEL_SMALL && GET_CODE (y) == LO_SUM)
 y = XEXP (y, 1);
 
   offset = NULL_RTX;
@@ -8407,8 +8410,7 @@ rs6000_delegitimize_address (rtx orig_x)
   y = XEXP (y, 0);
 }
 
-  if (GET_CODE (y) == UNSPEC
-  && XINT (y, 1) == UNSPEC_TOCREL)
+  if (GET_CODE (y) == UNSPEC && XINT (y, 1) == UNSPEC_TOCREL)
 {
   y = XVECEXP (y, 0, 0);
 
@@ -8435,8 +8437,7 @@ rs6000_delegitimize_address (rtx orig_x)
   && GET_CODE (XEXP (orig_x, 1)) == CONST)
 {
   y = XEXP (XEXP (orig_x, 1), 0);
-  if (GET_CODE (y) == UNSPEC
- && XINT (y, 1) == UNSPEC_MACHOPIC_OFFSET)
+  if (GET_CODE (y) == UNSPEC && XINT (y, 1) == UNSPEC_MACHOPIC_OFFSET)
return XVECEXP (y, 0, 0);
 }
 


Jakub


Re: [Patch, Fortran] PR 88047: [9 Regression] ICE in gfc_find_vtab, at fortran/class.c:2843

2019-01-08 Thread Janus Weil
Am Di., 8. Jan. 2019 um 16:28 Uhr schrieb Steve Kargl
:
>
> On Tue, Jan 08, 2019 at 10:11:33AM +0100, Janus Weil wrote:
> >
> > the attached patch is close to obvious and fixes another small
> > ICE-on-invalid regression. Since there was a bit of discussion in the
> > PR, I am submitting it for approval instead of just committing as
> > obvious.
> >
> > Regtests cleanly on x86_64-linux-gnu. Ok for trunk?
> >
>
> OK.

Thanks! Committed as r267735.

Cheers,
Janus


[SPARC] Fix a couple of PRs

2019-01-08 Thread Eric Botcazou
Although they are already fixed, the first one because it didn't fail for me 
and the second one thanks to Jakub's patch.

Tested on SPARC/Solaris 11, applied on the mainline.


2019-01-08  Eric Botcazou  

PR bootstrap/88721
* config/sparc/sparc.c (function_arg_slotno): Set *PPREGNO & *PPADDING
to -1 on entry.

PR debug/88723
* config/sparc/sparc.c (sparc_delegitimize_address): Deal with naked
UNSPECs and UNSPEC_MOVE_GOTDATA specifically.

-- 
Eric BotcazouIndex: config/sparc/sparc.c
===
--- config/sparc/sparc.c	(revision 267574)
+++ config/sparc/sparc.c	(working copy)
@@ -4949,12 +4949,19 @@ sparc_delegitimize_address (rtx x)
 {
   x = delegitimize_mem_from_attrs (x);
 
-  if (GET_CODE (x) == LO_SUM && GET_CODE (XEXP (x, 1)) == UNSPEC)
-switch (XINT (XEXP (x, 1), 1))
+  if (GET_CODE (x) == LO_SUM)
+x = XEXP (x, 1);
+
+  if (GET_CODE (x) == UNSPEC)
+switch (XINT (x, 1))
   {
   case UNSPEC_MOVE_PIC:
   case UNSPEC_TLSLE:
-	x = XVECEXP (XEXP (x, 1), 0, 0);
+	x = XVECEXP (x, 0, 0);
+	gcc_assert (GET_CODE (x) == SYMBOL_REF);
+	break;
+  case UNSPEC_MOVE_GOTDATA:
+	x = XVECEXP (x, 0, 2);
 	gcc_assert (GET_CODE (x) == SYMBOL_REF);
 	break;
   default:
@@ -6873,6 +6880,10 @@ function_arg_slotno (const struct sparc_
   int slotno = cum->words, regno;
   enum mode_class mclass = GET_MODE_CLASS (mode);
 
+  /* Silence warnings in the callers.  */
+  *pregno = -1;
+  *ppadding = -1;
+
   if (type && TREE_ADDRESSABLE (type))
 return -1;
 


Re: Remove overall growth from badness metrics

2019-01-08 Thread Jan Hubicka
> > 
> > In general this parameter affects primarily -O3 builds, becuase -O2
> > hardly hits the limit. From -O3 only programs with very large units are
> > affected (-O2 units hits the limit only if you do have a lot of inline
> > hints in the code).
> don’t quite understand here, what’s the major difference for inlining between 
> -O3 and -O2? 
> (I see -finline-functions is enabled for both O3 and O2).

-O2 has -finline-small-functions where we inline only when function is
declared inline or code size is expected to shrink after the inlining.
-O3 has -finline-functions where we auto-inline a lot more.
> 
> > 
> > In my test bed this included Firefox with or without LTO becuase they do
> > "poor man's" LTO by #including multiple .cpp files into single unified
> > source which makes average units large.  Also tramp3d, DLV from our C++
> > benhcmark is affected. 
> > 
> > I have some data on Firefox and I will build remainin ones:
> in the following, are the data for code size? are the optimization level O3?
> what’s PGO mean?  

Those are sizes of libxul, which is the largest library of Firefox.
PGO is profile guided optimization.
> > 
> > growth  LTO+PGOPGO   LTOnone  
> > -finline-functions
> > 20 (default)   83752215   94390023  93085455  103437191  94351191
> > 40 85299111   97220935  101600151 108910311  115311719
> > clang  111520431114863807 108437807
> > 
> > Build times are within noise of my setup, but they are less pronounced
> > than the code size difference. I think at most 1 minute out of 100.
> > Note that Firefox consists of 6% Rust code that is not built by GCC and
> > and building that consumes over half of the build time.
> > 
> > Problem I am trying to solve here are is to get consistent LTO
> > performance improvements compared to non-LTO. Currently there are
> > some regressions:
> > https://treeherder.mozilla.org/perf.html#/compare?originalProject=try=b6ba1ebfe913d152989495d8cb450bce02f27d44=try=c7bd18804e328ed490eab707072b3cf59da91042=1=1=1
> > All those regressions goes away with limit increase.
> 
> 
> > I tracked them down to the fact that we do not inline some very small
> > functions already (such as IsHTMLWhitespace .  In GCC 5 timeframe I
> > tuned this parameter to 20% based on Firefox LTO benchmarks but I was
> > not that serious about performance since my setup was not giving very
> > reproducible results for sub 5% differences on tp5o. Since we plan to
> > enable LTO by default for Tumbleweed I need to find something that does
> > not cause too many regression while keeping code size advantage of
> > non-LTO.
> 
> from my understanding, the performance regression from LTO to non-LTO is 
> caused 
> by some small and important functions cannot be inlined anymore with LTO due 
> to more functions are
> eligible to be inlined for LTO, therefore the original value for 
> inline-unit-growth becomes relatively smaller.

Yes, with whole program optimization most functions calls are inlinable,
while when in normal non-LTO build most function calls are external.
Since there are a lot of small functions called cross-module in modern
C++ programs it simply makes inliner to run out of the limits before
getting some of useful inline decisions.

I was poking about this for a while, but did not really have very good
testcases available making it difficult to judge code size/performance
tradeoffs here.  With Firefox I can measure things better now and
it is clear that 20% growth is just too small. It is small even with
profile feedback where compiler knows quite well what calls to inline
and more so without.
> 
> When increasing the value of inline-unit-growth for LTO is one approach to 
> resolve this issue, adjusting
> the sorting heuristic to sort those important and smaller routines as higher 
> priority to be inlined might be
> another and better approach? 

Yes, i have also reworked the inline metrics somehwat and spent quite
some time looking into dumps to see that it behaves reasonably.  There
was two ages old bugs I fixed in last two weeks and also added some
extra tricks like penalizing cross-module inlines some time ago. Given
the fact that even with profile feedback I am not able to sort the
priority queue well and neither can Clang do the job, I think it is good
motivation to adjust the parameter which I have set somewhat arbitrarily
at a time I was not able to test it well.

Honza


Re: [PATCH, og8] Add OpenACC 2.6 `acc_get_property' support

2019-01-08 Thread Thomas Schwinge
Hi Maciej!

On Mon, 3 Dec 2018 16:51:14 +, "Maciej W. Rozycki"  
wrote:
> Add generic support for the OpenACC 2.6 `acc_get_property' and 
> `acc_get_property_string' routines, as well as full handlers for the 
> host and the NVPTX offload targets and a minimal handler for the HSA 
> offload target.

..., but a similar minimal handler for the Intel MIC offload plugin
missing.  ;-) (... which "for reasons" doesn't live in "libgomp/plugin/",
next to the other plugins.)

To fix that, I pushed the attached to openacc-gcc-8-branch, with the code
copied and adjusted from your minimal handler for the HSA offload target,
but also with an additional TODO added.


Grüße
 Thomas


>From c632bd83096f1a4e4ed59161797087ff800e4c23 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 8 Jan 2019 15:21:35 +0100
Subject: [PATCH] Add OpenACC 2.6 `acc_get_property' support: restore Intel MIC
 offloading

The "OpenACC 2.6 `acc_get_property' support" changes regressed the relevant
libgomp OpenMP execution test cases to no longer consider Intel MIC offloading
because of:

libgomp: while loading libgomp-plugin-intelmic.so.1: [...]/libgomp-plugin-intelmic.so.1: undefined symbol: GOMP_OFFLOAD_get_property

	liboffloadmic/
	* plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_get_property):
	New function.
---
 liboffloadmic/ChangeLog.openacc   | 10 +
 .../plugin/libgomp-plugin-intelmic.cpp| 21 +++
 2 files changed, 31 insertions(+)
 create mode 100644 liboffloadmic/ChangeLog.openacc

diff --git a/liboffloadmic/ChangeLog.openacc b/liboffloadmic/ChangeLog.openacc
new file mode 100644
index ..2e666da2ce0c
--- /dev/null
+++ b/liboffloadmic/ChangeLog.openacc
@@ -0,0 +1,10 @@
+2019-01-08  Thomas Schwinge  
+
+	* plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_get_property):
+	New function.
+
+Copyright (C) 2019 Free Software Foundation, Inc.
+
+Copying and distribution of this file, with or without modification,
+are permitted in any medium without royalty provided the copyright
+notice and this notice are preserved.
diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
index d1678d0514e9..f74941c2b549 100644
--- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
+++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
@@ -174,6 +174,27 @@ GOMP_OFFLOAD_get_num_devices (void)
   return num_devices;
 }
 
+extern "C" union gomp_device_property_value
+GOMP_OFFLOAD_get_property (int n, int prop)
+{
+  union gomp_device_property_value nullval = { .val = 0 };
+
+  if (n >= num_devices)
+{
+  GOMP_PLUGIN_error
+	("Request for a property of a non-existing Intel MIC device %i", n);
+  return nullval;
+}
+
+  switch (prop)
+{
+case GOMP_DEVICE_PROPERTY_VENDOR:
+  return (union gomp_device_property_value) { .ptr = /* TODO: "error: invalid conversion from 'const void*' to 'void*' [-fpermissive]" */ (char *) "Intel" };
+default:
+  return nullval;
+}
+}
+
 static bool
 offload (const char *file, uint64_t line, int device, const char *name,
 	 int num_vars, VarDesc *vars, const void **async_data)
-- 
2.17.1



Re: [PATCH] Optimize away x86 mem stores of what the mem contains already (PR rtl-optimization/79593)

2019-01-08 Thread Uros Bizjak
On Tue, Jan 8, 2019 at 12:43 PM Jakub Jelinek  wrote:
>
> On Tue, Jan 08, 2019 at 11:49:10AM +0100, Uros Bizjak wrote:
> > FLD from memory in SF and DFmode is considered a conversion, and
> > converts sNaN to NaN (and emits #IA exception). But sNaN handling is
> > already busted in the compiler as RA is free to spill the register in
> > non-XFmode. IMO, the peephole2 pattern is no worse than the current
> > situation.
>
> Ok.
>
> > At least for x86, there are no SUBREGs after reload, otherwise other
> > parts of the compiler would break.
>
> The new patch would really handle even a SUBREG there...
>
> > > I don't see how, that would mean I'd have to write two peephole2s instead 
> > > of
> > > one.  It tries to deal with two different cases, one is where the 
> > > temporary
> > > reg is dead, in that case we can optimize away both the load or store, the
> > > second case is where the temporary reg isn't dead, in that case we can
> > > optimize away the store, but not the load.  With the optimizing away of 
> > > both
> > > load and store I was just trying to do a cheap DCE there.
> >
> > I didn't realize this is an optimization, a comment would be welcome here.
>
> Ugh, except that it doesn't work.  peep2_reg_dead_p (1, operands[0])
> is not what I meant, that is always false, as the register must be live in
> between the first and second instruction.  I meant
> peep2_reg_dead_p (2, operands[0]), the register dead at the end of the
> second instruction, except we don't really support
> define_split/define_peephole2 splitting into zero instructions, DONE; in
> that case returns NULL like FAIL; does.  So, let's just wait for DCE to
> finish it up.
>
> Here is what I'll bootstrap/regtest then.  Added also
> reg_overlap_mentioned_p, in case there is e.g.
>   movl (%eax,%edx), %eax
>   movl %eax, (%eax,%edx)

I doubt this would *ever* happen, but ... OK.

> or similar and as I said earlier, explicit match_operand so that I can
> check MEM_VOLATILE_P on both MEMs.
>
> 2019-01-08  Jakub Jelinek  
>
> PR rtl-optimization/79593
> * config/i386/i386.md (reg = mem; mem = reg): New define_peephole2.

OK for mainline.

Thanks,
Uros.

> --- gcc/config/i386/i386.md.jj  2019-01-07 23:54:54.494800693 +0100
> +++ gcc/config/i386/i386.md 2019-01-08 12:34:18.916832780 +0100
> @@ -18740,6 +18740,18 @@ (define_peephole2
>const0_rtx);
>  })
>
> +;; Attempt to optimize away memory stores of values the memory already
> +;; has.  See PR79593.
> +(define_peephole2
> +  [(set (match_operand 0 "register_operand")
> +(match_operand 1 "memory_operand"))
> +   (set (match_operand 2 "memory_operand") (match_dup 0))]
> +  "!MEM_VOLATILE_P (operands[1])
> +   && !MEM_VOLATILE_P (operands[2])
> +   && rtx_equal_p (operands[1], operands[2])
> +   && !reg_overlap_mentioned_p (operands[0], operands[2])"
> +  [(set (match_dup 0) (match_dup 1))])
> +
>  ;; Attempt to always use XOR for zeroing registers (including FP modes).
>  (define_peephole2
>[(set (match_operand 0 "general_reg_operand")
>
>
> Jakub


Re: V2 [PATCH] x86: Don't generate vzeroupper if caller is AVX_U128_DIRTY

2019-01-08 Thread H.J. Lu
On Tue, Jan 8, 2019 at 9:29 AM Uros Bizjak  wrote:
>
> On Tue, Jan 8, 2019 at 5:17 PM H.J. Lu  wrote:
> >
> > On Tue, Jan 8, 2019 at 6:54 AM Uros Bizjak  wrote:
> > >
> > > On Tue, Jan 8, 2019 at 3:39 PM H.J. Lu  wrote:
> > > >
> > > > On Mon, Jan 7, 2019 at 11:12 PM Uros Bizjak  wrote:
> > > > >
> > > > > On Mon, Jan 7, 2019 at 6:40 PM H.J. Lu  wrote:
> > > > > >
> > > > > > There is no need to generate vzeroupper if caller uses upper bits of
> > > > > > AVX/AVX512 registers,  We track caller's avx_u128_state and avoid
> > > > > > vzeroupper when caller's avx_u128_state is AVX_U128_DIRTY.
> > > > > >
> > > > > > Tested on i686 and x86-64 with and without --with-arch=native.
> > > > > >
> > > > > > OK for trunk?
> > > > >
> > > > > In principle OK, but I think we don't have to cache the result of
> > > > > ix86_avx_u128_mode_entry. Simply call the function from
> > > > > ix86_avx_u128_mode_exit; it is a simple function, so I guess we can
> > > > > afford to re-call it one more time per function.
> > > >
> > > > Do we really need ix86_avx_u128_mode_entry?  We can just
> > > > set entry state to AVX_U128_CLEAN and set exit state to
> > > > AVX_U128_DIRTY if caller returns AVX/AVX512 register or passes
> > > > AVX/AVX512 registers to callee.
> > > >
> > > > Does this patch look OK?
> > >
> > > No, the compiler is then free to move optimal insertion point at the
> > > beginning of the function.
> > >
> >
> > Here is the updated patch.  OK for trunk?
>
> OK with the comment fix.
>
> Thanks,
> Uros.
>
> -  return AVX_U128_CLEAN;
> +  /* Entry mode is set to AVX_U128_DIRTY if there are 256bit or 512bit
>
> s/Entry/Exit/
>
> + modes used in function arguments.  */
>
> ... , otherwise return AVX_U128_CLEAN.
>
> +  return ix86_avx_u128_mode_entry ();
>  }

This is what I am checking in.

Thanks.

-- 
H.J.
From 315e6eadf7021748de375c59da9cf451351c9597 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Mon, 7 Jan 2019 06:56:44 -0800
Subject: [PATCH] x86: Don't generate vzeroupper if caller passes AVX/AVX512
 registers

There is no need to generate vzeroupper if caller passes arguments in
AVX/AVX512 registers.

Tested on i686 and x86-64 with and without --with-arch=native.

gcc/

	PR target/88717
	* config/i386/i386.c (ix86_avx_u128_mode_exit): Call
	ix86_avx_u128_mode_entry.

gcc/testsuite/

	PR target/88717
	* gcc.target/i386/pr88717.c: New test.
---
 gcc/config/i386/i386.c  |  5 -
 gcc/testsuite/gcc.target/i386/pr88717.c | 24 
 2 files changed, 28 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr88717.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d01278d866f..bd48e080f46 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -19137,7 +19137,10 @@ ix86_avx_u128_mode_exit (void)
   if (reg && ix86_check_avx_upper_register (reg))
 return AVX_U128_DIRTY;
 
-  return AVX_U128_CLEAN;
+  /* Exit mode is set to AVX_U128_DIRTY if there are 256bit or 512bit
+ modes used in function arguments, otherwise return AVX_U128_CLEAN.
+   */
+  return ix86_avx_u128_mode_entry ();
 }
 
 /* Return a mode that ENTITY is assumed to be
diff --git a/gcc/testsuite/gcc.target/i386/pr88717.c b/gcc/testsuite/gcc.target/i386/pr88717.c
new file mode 100644
index 000..01680998f1b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr88717.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512f -mvzeroupper" } */
+
+#include 
+
+__m128
+foo1 (__m256 x)
+{
+  return _mm256_castps256_ps128 (x);
+}
+
+void
+foo2 (float *p, __m256 x)
+{
+  *p = ((__v8sf)x)[0];
+}
+
+void
+foo3 (float *p, __m512 x)
+{
+  *p = ((__v16sf)x)[0];
+}
+
+/* { dg-final { scan-assembler-not "vzeroupper" } } */
-- 
2.20.1



Re: V2 [PATCH] x86: Don't generate vzeroupper if caller is AVX_U128_DIRTY

2019-01-08 Thread Uros Bizjak
On Tue, Jan 8, 2019 at 5:17 PM H.J. Lu  wrote:
>
> On Tue, Jan 8, 2019 at 6:54 AM Uros Bizjak  wrote:
> >
> > On Tue, Jan 8, 2019 at 3:39 PM H.J. Lu  wrote:
> > >
> > > On Mon, Jan 7, 2019 at 11:12 PM Uros Bizjak  wrote:
> > > >
> > > > On Mon, Jan 7, 2019 at 6:40 PM H.J. Lu  wrote:
> > > > >
> > > > > There is no need to generate vzeroupper if caller uses upper bits of
> > > > > AVX/AVX512 registers,  We track caller's avx_u128_state and avoid
> > > > > vzeroupper when caller's avx_u128_state is AVX_U128_DIRTY.
> > > > >
> > > > > Tested on i686 and x86-64 with and without --with-arch=native.
> > > > >
> > > > > OK for trunk?
> > > >
> > > > In principle OK, but I think we don't have to cache the result of
> > > > ix86_avx_u128_mode_entry. Simply call the function from
> > > > ix86_avx_u128_mode_exit; it is a simple function, so I guess we can
> > > > afford to re-call it one more time per function.
> > >
> > > Do we really need ix86_avx_u128_mode_entry?  We can just
> > > set entry state to AVX_U128_CLEAN and set exit state to
> > > AVX_U128_DIRTY if caller returns AVX/AVX512 register or passes
> > > AVX/AVX512 registers to callee.
> > >
> > > Does this patch look OK?
> >
> > No, the compiler is then free to move optimal insertion point at the
> > beginning of the function.
> >
>
> Here is the updated patch.  OK for trunk?

OK with the comment fix.

Thanks,
Uros.

-  return AVX_U128_CLEAN;
+  /* Entry mode is set to AVX_U128_DIRTY if there are 256bit or 512bit

s/Entry/Exit/

+ modes used in function arguments.  */

... , otherwise return AVX_U128_CLEAN.

+  return ix86_avx_u128_mode_entry ();
 }


Re: [PATCH][jit] Add thread-local globals to the libgccjit frontend

2019-01-08 Thread David Malcolm
On Tue, 2019-01-08 at 14:31 +0100, Marc Nieper-Wißkirchen wrote:
> Dear David,
> 
> thank you very much for your timely response and for talking a
> thorough 
> look at my proposed patch.
> 
> Am 07.01.19 um 21:34 schrieb David Malcolm:
> 
> > Have you done the legal paperwork with the FSF for contributing to
> > GCC?
> >   See https://gcc.gnu.org/contribute.html#legal
> 
> Not yet; this is my first patch I would like to contribute to GCC.
> You 
> should have received a private email to get the legal matters done.

Thanks; I've replied to that.

[...]


> I have applied your recent patch. With the patch, there are no more 
> failures.

Excellent.

[...]

> # of expected passes10394


[...]


> 2019-01-08  Marc Nieper-Wißkirchen  
> 
>  * docs/topics/compatibility.rst: Add LIBGCCJIT_ABI_11.
>  * docs/topics/expressions.rst (Global variables): Add
>  documentation of gcc_jit_lvalue_set_bool_thread_local.
>   * docs/_build/texinfo/libgccjit.texi: Regenerate.
>   * jit-playback.c: Include "varasm.h".
>   Within namespace gcc::jit::playback...
>   (context::new_global) Add "thread_local_p" param and use it
>   to set DECL_TLS_MODEL.
>   * jit-playback.h: Within namespace gcc::jit::playback...
>   (context::new_global): Add "thread_local_p" param.
>   * jit-recording.c: Within namespace gcc::jit::recording...
>   (global::replay_into): Provide m_thread_local to call to
>   new_global.
>   (global::write_reproducer): Call write_reproducer_thread_local.
>   (global::write_reproducer_thread_local): New method.
>   * jit-recording.h: Within namespace gcc::jit::recording...
>   (lvalue::dyn_cast_global): New virtual function.
>   (global::m_thread_local): New field.
>   * libgccjit.c (gcc_jit_lvalue_set_bool_thread_local): New
>   function.
>   * libgccjit.h
>   (LIBGCCJIT_HAVE_gcc_jit_lvalue_set_bool_thread_local): New
>   macro.
>   (gcc_jit_lvalue_set_bool_thread_local): New function.
>   * libgccjit.map (LIBGCCJIT_ABI_11): New.
>   (gcc_jit_lvalue_set_bool_thread_local): Add.
>   * ../testsuite/jit.dg/all-non-failing-tests.h: Include new
> test.
>   * ../testsuite/jit.dg/jit.exp: Load pthread for tests involving
>   thread-local globals.
>   * ../testsuite/jit.dg/test-thread-local.c: New test case for
>   thread-local globals.

BTW, the convention here is to split out the ChangeLog entries by
directory based on the presence of ChangeLog files.

There's a gcc/jit/ChangeLog and a gcc/testsuite/ChangeLog, so for this
patch there should be two sets of ChangeLog entries, one for each of
these, with the paths expressed relative to the directory holding the
ChangeLog.

So the testsuite entries would go into gcc/testsuite/ChangeLog, and
look like:

* jit.dg/all-non-failing-tests.h: Include new test.

...etc.

[...]

> diff --git a/gcc/testsuite/jit.dg/test-thread-local.c
> b/gcc/testsuite/jit.dg/test-thread-local.c
> new file mode 100644
> index 0..287ba85e4
> --- /dev/null
> +++ b/gcc/testsuite/jit.dg/test-thread-local.c
> @@ -0,0 +1,99 @@
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "libgccjit.h"
> +
> +#include "harness.h"
> +
> +void
> +create_code (gcc_jit_context *ctxt, void *user_data)
> +{
> +  /* Let's try to inject the equivalent of:
> +
> +  static thread_local int tl;
> +  set_tl (int v)
> +  {
> +tl = v;
> +  }
> +
> +  int
> +  get_tl (void)
> +  {
> +return tl;
> +  }

Thanks for posting this.  

This test is OK as far as it goes, but there are some gaps in test
coverage: the test verifies that jit-generated code can create a new
thread-local variable, and writes to it, but it doesn't seem to test
reading from it; e.g. there doesn't seem to be a CHECK_VALUE that the
thread-local var is 43 after the set_tl (43);

It would probably also be good to verify that jit-generated code can
work with a thread-local variable declared and defined in C-generated
code.

Thinking aloud, how about the following changes:
- split out the thread-local vars there's e.g. a
extern thread_local int tl_c;
  in the C code and the "tl" becomes "tl_jit", and access it via a 
  GCC_JIT_GLOBAL_EXTERNAL, or somesuch.
- in the test code, run two new threads, passing them two different
"int" values; verify that the set_tl(value); can be matched with a
CHECK_VALUE get_tl() == the value for that thread on the two threads.

...or somesuch, though the patch is in pretty good shape already.

Dave

> +   */
> +  gcc_jit_type *the_type =
> +gcc_jit_context_get_type (ctxt, GCC_JIT_TYPE_INT);
> +  gcc_jit_type *void_type =
> +gcc_jit_context_get_type (ctxt, GCC_JIT_TYPE_VOID);
> +
> +  gcc_jit_lvalue *tl =
> +gcc_jit_context_new_global (ctxt, NULL, GCC_JIT_GLOBAL_INTERNAL,
> + the_type, "tl");
> +  gcc_jit_lvalue_set_bool_thread_local (tl, 1);
> +
> +  gcc_jit_param *v =
> +  

Re: Remove overall growth from badness metrics

2019-01-08 Thread Qing Zhao


> On Jan 8, 2019, at 5:46 AM, Jan Hubicka  wrote:
> 
>>> I plan to commit the patch tomorrow after re-testing everything after
>>> the bugfixes from today and yesterday.  In addition to this have found
>>> that current inline-unit-growth is too small for LTO of large programs
>>> (especially Firefox:) and there are important improvements when
>>> increased from 20 to 30 or 40.  I am re-running C++ benchmarks and other
>>> tests to decide about precise setting.  Finally I plan to increase
>>> the new parameters for bit more inlining at -O2 and -Os.
>> 
>> Usually increasing these parameters might increase the compilation time and 
>> the 
>> final code size, do you have any data for compilation time and code size 
>> impact from
>> these parameter change?
> 
> Yes, currently LNT is down because some machines apparently ran out of
> disk space after christmas, so I can not show you data on that, but I
> can show Firefox.  Will make summary of LNT too once it restarts.
Okay, thanks.
> 
> In general this parameter affects primarily -O3 builds, becuase -O2
> hardly hits the limit. From -O3 only programs with very large units are
> affected (-O2 units hits the limit only if you do have a lot of inline
> hints in the code).
don’t quite understand here, what’s the major difference for inlining between 
-O3 and -O2? 
(I see -finline-functions is enabled for both O3 and O2).

> 
> In my test bed this included Firefox with or without LTO becuase they do
> "poor man's" LTO by #including multiple .cpp files into single unified
> source which makes average units large.  Also tramp3d, DLV from our C++
> benhcmark is affected. 
> 
> I have some data on Firefox and I will build remainin ones:
in the following, are the data for code size? are the optimization level O3?
what’s PGO mean?  
> 
> growthLTO+PGOPGO   LTOnone  
> -finline-functions
> 20 (default)   83752215   94390023  93085455  103437191  94351191
> 40 85299111   97220935  101600151 108910311  115311719
> clang  111520431114863807 108437807
> 
> Build times are within noise of my setup, but they are less pronounced
> than the code size difference. I think at most 1 minute out of 100.
> Note that Firefox consists of 6% Rust code that is not built by GCC and
> and building that consumes over half of the build time.
> 
> Problem I am trying to solve here are is to get consistent LTO
> performance improvements compared to non-LTO. Currently there are
> some regressions:
> https://treeherder.mozilla.org/perf.html#/compare?originalProject=try=b6ba1ebfe913d152989495d8cb450bce02f27d44=try=c7bd18804e328ed490eab707072b3cf59da91042=1=1=1
> All those regressions goes away with limit increase.


> I tracked them down to the fact that we do not inline some very small
> functions already (such as IsHTMLWhitespace .  In GCC 5 timeframe I
> tuned this parameter to 20% based on Firefox LTO benchmarks but I was
> not that serious about performance since my setup was not giving very
> reproducible results for sub 5% differences on tp5o. Since we plan to
> enable LTO by default for Tumbleweed I need to find something that does
> not cause too many regression while keeping code size advantage of
> non-LTO.

from my understanding, the performance regression from LTO to non-LTO is caused 
by some small and important functions cannot be inlined anymore with LTO due to 
more functions are
eligible to be inlined for LTO, therefore the original value for 
inline-unit-growth becomes relatively smaller.

When increasing the value of inline-unit-growth for LTO is one approach to 
resolve this issue, adjusting
the sorting heuristic to sort those important and smaller routines as higher 
priority to be inlined might be
another and better approach? 

Qing

> 
> Honza
>> 
>> thanks.
>> 
>> Qing
>>> 
>>> Bootstrapped/regtested x86_64-linux, will commit it tomorrow.
>>> 
>>> * ipa-inline.c (edge_badness): Do not account overall_growth into
>>> badness metrics.
>>> Index: ipa-inline.c
>>> ===
>>> --- ipa-inline.c(revision 267612)
>>> +++ ipa-inline.c(working copy)
>>> @@ -1082,8 +1082,8 @@ edge_badness (struct cgraph_edge *edge,
>>>  /* When profile is available. Compute badness as:
>>> 
>>> time_saved * caller_count
>>> - goodness =  -
>>> -growth_of_caller * overall_growth * combined_size
>>> + goodness =  
>>> +growth_of_caller * combined_size
>>> 
>>> badness = - goodness
>>> 
>>> @@ -1094,7 +1094,6 @@ edge_badness (struct cgraph_edge *edge,
>>>|| caller->count.ipa ().nonzero_p ())
>>>{
>>>  sreal numerator, denominator;
>>> -  int overall_growth;
>>>  sreal inlined_time = compute_inlined_call_time (edge, edge_time);
>>> 
>>>  numerator = (compute_uninlined_call_time (edge, unspec_edge_time)

Re: [Aarch64][SVE] Add copysign and xorsign support

2019-01-08 Thread Wilco Dijkstra
Hi Alejandro,

+emit_move_insn (mask,
+   aarch64_simd_gen_const_vector_dup (mode,
+  HOST_WIDE_INT_M1U
+  << bits));
+
+emit_insn (gen_and3 (sign, arg2, mask));

Is there a reason to emit separate moves and then requiring the optimizer
to combine them? The result of aarch64_simd_gen_const_vector_dup
can be used directly in the gen_and for all supported floating point types.

Cheers,
Wilco

[Aarch64][SVE] Add copysign and xorsign support

2019-01-08 Thread Alejandro Martinez Vicente
Hi,

This patch adds support for copysign and xorsign builtins to SVE. With the new
expands, they can be vectorized using bitwise logical operations.

I tested this patch in an aarch64 machine bootstrapping the compiler and
running the checks.

Alejandro

gcc/Changelog:

2019-01-08  Alejandro Martinez  

* config/aarch64/aarch64-sve.md (copysign3): New define_expand.
(xorsign3): Likewise.

gcc/testsuite/Changelog:
 
2019-01-08  Alejandro Martinez  

* gcc.target/aarch64/sve/copysign_1.c: New test for SVE vectorized
copysign.
* gcc.target/aarch64/sve/copysign_1_run.c: Likewise.
* gcc.target/aarch64/sve/xorsign_1.c: New test for SVE vectorized
xorsign.
* gcc.target/aarch64/sve/xorsign_1_run.c: Likewise.



copysign.patch
Description: copysign.patch


V2 [PATCH] x86: Don't generate vzeroupper if caller is AVX_U128_DIRTY

2019-01-08 Thread H.J. Lu
On Tue, Jan 8, 2019 at 6:54 AM Uros Bizjak  wrote:
>
> On Tue, Jan 8, 2019 at 3:39 PM H.J. Lu  wrote:
> >
> > On Mon, Jan 7, 2019 at 11:12 PM Uros Bizjak  wrote:
> > >
> > > On Mon, Jan 7, 2019 at 6:40 PM H.J. Lu  wrote:
> > > >
> > > > There is no need to generate vzeroupper if caller uses upper bits of
> > > > AVX/AVX512 registers,  We track caller's avx_u128_state and avoid
> > > > vzeroupper when caller's avx_u128_state is AVX_U128_DIRTY.
> > > >
> > > > Tested on i686 and x86-64 with and without --with-arch=native.
> > > >
> > > > OK for trunk?
> > >
> > > In principle OK, but I think we don't have to cache the result of
> > > ix86_avx_u128_mode_entry. Simply call the function from
> > > ix86_avx_u128_mode_exit; it is a simple function, so I guess we can
> > > afford to re-call it one more time per function.
> >
> > Do we really need ix86_avx_u128_mode_entry?  We can just
> > set entry state to AVX_U128_CLEAN and set exit state to
> > AVX_U128_DIRTY if caller returns AVX/AVX512 register or passes
> > AVX/AVX512 registers to callee.
> >
> > Does this patch look OK?
>
> No, the compiler is then free to move optimal insertion point at the
> beginning of the function.
>

Here is the updated patch.  OK for trunk?

Thanks.

-- 
H.J.
From 702ece14923f9922be5a6ed835a8efbe24e890ba Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Mon, 7 Jan 2019 06:56:44 -0800
Subject: [PATCH] x86: Don't generate vzeroupper if caller passes AVX/AVX512
 registers

There is no need to generate vzeroupper if caller passes arguments in
AVX/AVX512 registers.

Tested on i686 and x86-64 with and without --with-arch=native.

gcc/

	PR target/88717
	* config/i386/i386.c (ix86_avx_u128_mode_exit): Call
	ix86_avx_u128_mode_entry.

gcc/testsuite/

	PR target/88717
	* gcc.target/i386/pr88717.c: New test.
---
 gcc/config/i386/i386.c  |  4 +++-
 gcc/testsuite/gcc.target/i386/pr88717.c | 24 
 2 files changed, 27 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr88717.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d01278d866f..7d82a241143 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -19137,7 +19137,9 @@ ix86_avx_u128_mode_exit (void)
   if (reg && ix86_check_avx_upper_register (reg))
 return AVX_U128_DIRTY;
 
-  return AVX_U128_CLEAN;
+  /* Entry mode is set to AVX_U128_DIRTY if there are 256bit or 512bit
+ modes used in function arguments.  */
+  return ix86_avx_u128_mode_entry ();
 }
 
 /* Return a mode that ENTITY is assumed to be
diff --git a/gcc/testsuite/gcc.target/i386/pr88717.c b/gcc/testsuite/gcc.target/i386/pr88717.c
new file mode 100644
index 000..01680998f1b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr88717.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512f -mvzeroupper" } */
+
+#include 
+
+__m128
+foo1 (__m256 x)
+{
+  return _mm256_castps256_ps128 (x);
+}
+
+void
+foo2 (float *p, __m256 x)
+{
+  *p = ((__v8sf)x)[0];
+}
+
+void
+foo3 (float *p, __m512 x)
+{
+  *p = ((__v16sf)x)[0];
+}
+
+/* { dg-final { scan-assembler-not "vzeroupper" } } */
-- 
2.20.1



Re: C++ PATCH for c++/88538 - braced-init-list in template-argument-list

2019-01-08 Thread Marek Polacek
On Mon, Jan 07, 2019 at 09:52:55PM -0500, Jason Merrill wrote:
> On 1/7/19 6:56 PM, Marek Polacek wrote:
> > At the risk of seeming overly eager, I thought it would be reasonable to
> > go with the following: enabling braced-init-list as a template-argument.
> > As the discussion on the reflector clearly indicates, this was the intent
> > from the get-go.
> > 
> > I know, it's not a regression.  But I restricted the change to C++20, and it
> > should strictly allow code that wasn't accepted before -- when a template
> > argument starts with {.  Perhaps we could even drop the C++20 check.
> > 
> > What's your preference?
> 
> Let's keep the C++20 check for now at least.  I'd suggest moving the change
> further down, with this code:

Okay.  I've experimented with checking expr_non_constant_p, but this version
gives better diagnostics.

Bootstrapped/regtested running on x86_64-linux, ok for trunk if it passes?

2019-01-08  Marek Polacek  

PR c++/88538 - braced-init-list in template-argument-list.
* parser.c (cp_parser_template_argument): Handle braced-init-list when
in C++20.

* g++.dg/cpp2a/nontype-class11.C: New test.

diff --git gcc/cp/parser.c gcc/cp/parser.c
index bca1739ace3..87f37d8ab2b 100644
--- gcc/cp/parser.c
+++ gcc/cp/parser.c
@@ -17020,6 +17020,18 @@ cp_parser_template_argument (cp_parser* parser)
 argument = cp_parser_constant_expression (parser);
   else
 {
+  /* In C++20, we can encounter a braced-init-list.  */
+  if (cxx_dialect >= cxx2a
+ && cp_lexer_next_token_is (parser->lexer, CPP_OPEN_BRACE))
+   {
+ cp_parser_parse_tentatively (parser);
+ bool expr_non_constant_p;
+ argument = cp_parser_braced_list (parser, _non_constant_p);
+ if (cp_parser_parse_definitely (parser))
+   /* Yup, it was a braced-init-list.  */
+   return argument;
+   }
+
   /* With C++17 generalized non-type template arguments we need to handle
 lvalue constant expressions, too.  */
   argument = cp_parser_assignment_expression (parser);
diff --git gcc/testsuite/g++.dg/cpp2a/nontype-class11.C 
gcc/testsuite/g++.dg/cpp2a/nontype-class11.C
new file mode 100644
index 000..8a06d23904b
--- /dev/null
+++ gcc/testsuite/g++.dg/cpp2a/nontype-class11.C
@@ -0,0 +1,21 @@
+// PR c++/88538
+// { dg-do compile { target c++2a } }
+
+struct S {
+  unsigned a;
+  unsigned b;
+  constexpr S(unsigned _a, unsigned _b) noexcept: a{_a}, b{_b} { }
+};
+
+template 
+void fnc()
+{
+}
+
+template struct X { };
+
+void f()
+{
+  fnc<{10,20}>();
+  X<{1, 2}> x;
+}


Re: [Patch, Fortran] PR 88047: [9 Regression] ICE in gfc_find_vtab, at fortran/class.c:2843

2019-01-08 Thread Steve Kargl
On Tue, Jan 08, 2019 at 10:11:33AM +0100, Janus Weil wrote:
> 
> the attached patch is close to obvious and fixes another small
> ICE-on-invalid regression. Since there was a bit of discussion in the
> PR, I am submitting it for approval instead of just committing as
> obvious.
> 
> Regtests cleanly on x86_64-linux-gnu. Ok for trunk?
> 

OK.

-- 
Steve


Re: [PATCH, GCC] PR target/86487: fix the way 'uses_hard_regs_p' handles paradoxical subregs

2019-01-08 Thread Andre Vieira (lists)




On 07/01/2019 22:50, Jeff Law wrote:

On 1/7/19 7:42 AM, Andre Vieira (lists) wrote:

Hi,

This patch fixes the way 'uses_hard_regs_p' handles paradoxical subregs.
  The function is supposed to detect whether a register access of 'x'
overlaps with 'set'.  For SUBREGs it should check whether any of the
full multi-register overlaps with 'set'.  The former behavior used to
grab the widest mode of the inner/outer registers of a SUBREG and the
inner register, and check all registers from the inner-register onwards
for the given width.  For normal SUBREGS this gives you the full
register, for paradoxical SUBREGS however it may give you the wrong set
of registers if the index is not the first of the multi-register set.

The original error reported in PR target/86487 can no longer be
reproduced with the given test, this was due to an unrelated code-gen
change, regardless I believe this should still be fixed as it is simply
wrong behavior by uses_hard_regs_p which may be triggered by a different
test-case or by future changes to the compiler.  Also it is useful to
point out that this isn't actually a 'target' issue as this code could
potentially hit any other target using paradoxical SUBREGS.  Should I
change the Bugzilla ticket to reflect this is actually a target agnostic
issue in RTL?

There is a gotcha here, I don't know what would happen if you hit the
cases of get_hard_regno where it would return -1, quoting the comment
above that function "If X is not a register or a subreg of a register,
return -1." though I think if we are hitting this then things must have
gone wrong before?

Bootstrapped on aarch64, arm and x86, no regressions.

Is this OK for trunk?


gcc/ChangeLog:
2019-01-07 Andre Vieira  


     PR target/86487
     * lra-constraints.c(uses_hard_regs_p): Fix handling of
paradoxical SUBREGS.

But doesn't wider_subreg_mode give us the wider of the two modes here
and we use that wider mode when we call overlaps_hard_reg_set_p which
should ultimately check all the registers in the paradoxical.

I must be missing something here?!?

jeff



Hi Jeff,

It does give us the wider of the two modes, but we also then grab the 
"subreg" of the paradoxical subreg.  If you look at the first example 
case of the bugzilla ticket, for an older gcc (say gcc-8) and the 
options provided (using big-endian), it will generate the following subreg:

(subreg:DI (reg:SI 122) 0)

This paradoxical subreg represents a register pair r0-r1, where because 
of big-endian and subgreg index 0, r1 is the value we care about and r0 
the one we say "it can be whatever" by using this paradoxical subreg.


When 'uses_hard_regs_p' sees this as a subreg, it sets 'mode' to the 
wider, i.e. DImode, but it also sets 'x' to the subreg i.e. 'reg:SI 
122', for which get_hard_regno correctly returns 'r1'.  But if you now 
pass 'overlaps_hard_reg_set_p' DImode and 'r1', it will check whether 
'set' contains either 'r1-r2', and not 'r1'r0'.


To reproduce this again I now applied this patch to GCC 8 and found an 
issue with it. 'REG_P (x)' returns false if x is a 'SUBREG'. So I will 
need to change the later check to also include 'SUBREG_P (x)', I guess I 
was testing with a too new version of gcc that didn't lead to the bogus 
register allocation...


Which really encourages me to add some sort of testcase, but I'd very 
much like to come up with a less flaky one, we basically need to force 
the generation of a paradoxical subreg 'x', where 'get_hard_regno 
(SUBREG_REG (x)) != get_hard_regno (x)'.  This will cause 
'uses_hard_regs_p' to give you a wrong answer.


Suggestions welcome!!

Cheers,
Andre


Re: [PATCH] x86: Don't generate vzeroupper if caller is AVX_U128_DIRTY

2019-01-08 Thread Uros Bizjak
On Tue, Jan 8, 2019 at 3:39 PM H.J. Lu  wrote:
>
> On Mon, Jan 7, 2019 at 11:12 PM Uros Bizjak  wrote:
> >
> > On Mon, Jan 7, 2019 at 6:40 PM H.J. Lu  wrote:
> > >
> > > There is no need to generate vzeroupper if caller uses upper bits of
> > > AVX/AVX512 registers,  We track caller's avx_u128_state and avoid
> > > vzeroupper when caller's avx_u128_state is AVX_U128_DIRTY.
> > >
> > > Tested on i686 and x86-64 with and without --with-arch=native.
> > >
> > > OK for trunk?
> >
> > In principle OK, but I think we don't have to cache the result of
> > ix86_avx_u128_mode_entry. Simply call the function from
> > ix86_avx_u128_mode_exit; it is a simple function, so I guess we can
> > afford to re-call it one more time per function.
>
> Do we really need ix86_avx_u128_mode_entry?  We can just
> set entry state to AVX_U128_CLEAN and set exit state to
> AVX_U128_DIRTY if caller returns AVX/AVX512 register or passes
> AVX/AVX512 registers to callee.
>
> Does this patch look OK?

No, the compiler is then free to move optimal insertion point at the
beginning of the function.

Uros.

> Thanks.
>
> H.J.
> --
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index d01278d866f..1ac89fd2eb5 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -19087,25 +19087,6 @@ ix86_dirflag_mode_entry (void)
>return X86_DIRFLAG_RESET;
>  }
>
> -static int
> -ix86_avx_u128_mode_entry (void)
> -{
> -  tree arg;
> -
> -  /* Entry mode is set to AVX_U128_DIRTY if there are
> - 256bit or 512bit modes used in function arguments.  */
> -  for (arg = DECL_ARGUMENTS (current_function_decl); arg;
> -   arg = TREE_CHAIN (arg))
> -{
> -  rtx incoming = DECL_INCOMING_RTL (arg);
> -
> -  if (incoming && ix86_check_avx_upper_register (incoming))
> - return AVX_U128_DIRTY;
> -}
> -
> -  return AVX_U128_CLEAN;
> -}
> -
>  /* Return a mode that ENTITY is assumed to be
> switched to at function entry.  */
>
> @@ -19117,7 +19098,7 @@ ix86_mode_entry (int entity)
>  case X86_DIRFLAG:
>return ix86_dirflag_mode_entry ();
>  case AVX_U128:
> -  return ix86_avx_u128_mode_entry ();
> +  return AVX_U128_CLEAN;
>  case I387_TRUNC:
>  case I387_FLOOR:
>  case I387_CEIL:
> @@ -19130,13 +19111,24 @@ ix86_mode_entry (int entity)
>  static int
>  ix86_avx_u128_mode_exit (void)
>  {
> +  /* Exit mode is set to AVX_U128_DIRTY if there are 256bit or 512bit
> + modes used in function arguments or function return..  */
>rtx reg = crtl->return_rtx;
>
> -  /* Exit mode is set to AVX_U128_DIRTY if there are 256bit
> - or 512 bit modes used in the function return register. */
>if (reg && ix86_check_avx_upper_register (reg))
>  return AVX_U128_DIRTY;
>
> +  tree arg;
> +
> +  for (arg = DECL_ARGUMENTS (current_function_decl); arg;
> +   arg = TREE_CHAIN (arg))
> +{
> +  rtx incoming = DECL_INCOMING_RTL (arg);
> +
> +  if (incoming && ix86_check_avx_upper_register (incoming))
> + return AVX_U128_DIRTY;
> +}
> +
>return AVX_U128_CLEAN;
>  }


Re: [PATCH] Use proper type in linear transformation in tree-switch-conversion (PR tree-optimization/88753).

2019-01-08 Thread Martin Liška
On 1/8/19 2:57 PM, Martin Liška wrote:
> I'll re-run the bootstrap with the hunk.

Bootstraps, thus I'm going to install it.

Martin


Re: [PATCH] x86: Don't generate vzeroupper if caller is AVX_U128_DIRTY

2019-01-08 Thread H.J. Lu
On Mon, Jan 7, 2019 at 11:12 PM Uros Bizjak  wrote:
>
> On Mon, Jan 7, 2019 at 6:40 PM H.J. Lu  wrote:
> >
> > There is no need to generate vzeroupper if caller uses upper bits of
> > AVX/AVX512 registers,  We track caller's avx_u128_state and avoid
> > vzeroupper when caller's avx_u128_state is AVX_U128_DIRTY.
> >
> > Tested on i686 and x86-64 with and without --with-arch=native.
> >
> > OK for trunk?
>
> In principle OK, but I think we don't have to cache the result of
> ix86_avx_u128_mode_entry. Simply call the function from
> ix86_avx_u128_mode_exit; it is a simple function, so I guess we can
> afford to re-call it one more time per function.

Do we really need ix86_avx_u128_mode_entry?  We can just
set entry state to AVX_U128_CLEAN and set exit state to
AVX_U128_DIRTY if caller returns AVX/AVX512 register or passes
AVX/AVX512 registers to callee.

Does this patch look OK?

Thanks.

H.J.
--
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d01278d866f..1ac89fd2eb5 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -19087,25 +19087,6 @@ ix86_dirflag_mode_entry (void)
   return X86_DIRFLAG_RESET;
 }

-static int
-ix86_avx_u128_mode_entry (void)
-{
-  tree arg;
-
-  /* Entry mode is set to AVX_U128_DIRTY if there are
- 256bit or 512bit modes used in function arguments.  */
-  for (arg = DECL_ARGUMENTS (current_function_decl); arg;
-   arg = TREE_CHAIN (arg))
-{
-  rtx incoming = DECL_INCOMING_RTL (arg);
-
-  if (incoming && ix86_check_avx_upper_register (incoming))
- return AVX_U128_DIRTY;
-}
-
-  return AVX_U128_CLEAN;
-}
-
 /* Return a mode that ENTITY is assumed to be
switched to at function entry.  */

@@ -19117,7 +19098,7 @@ ix86_mode_entry (int entity)
 case X86_DIRFLAG:
   return ix86_dirflag_mode_entry ();
 case AVX_U128:
-  return ix86_avx_u128_mode_entry ();
+  return AVX_U128_CLEAN;
 case I387_TRUNC:
 case I387_FLOOR:
 case I387_CEIL:
@@ -19130,13 +19111,24 @@ ix86_mode_entry (int entity)
 static int
 ix86_avx_u128_mode_exit (void)
 {
+  /* Exit mode is set to AVX_U128_DIRTY if there are 256bit or 512bit
+ modes used in function arguments or function return..  */
   rtx reg = crtl->return_rtx;

-  /* Exit mode is set to AVX_U128_DIRTY if there are 256bit
- or 512 bit modes used in the function return register. */
   if (reg && ix86_check_avx_upper_register (reg))
 return AVX_U128_DIRTY;

+  tree arg;
+
+  for (arg = DECL_ARGUMENTS (current_function_decl); arg;
+   arg = TREE_CHAIN (arg))
+{
+  rtx incoming = DECL_INCOMING_RTL (arg);
+
+  if (incoming && ix86_check_avx_upper_register (incoming))
+ return AVX_U128_DIRTY;
+}
+
   return AVX_U128_CLEAN;
 }


Re: [PATCH] Use proper type in linear transformation in tree-switch-conversion (PR tree-optimization/88753).

2019-01-08 Thread Martin Liška
On 1/8/19 3:14 PM, Jakub Jelinek wrote:
> On Tue, Jan 08, 2019 at 02:56:44PM +0100, Martin Liška wrote:
>> --- a/gcc/tree-switch-conversion.c
>> +++ b/gcc/tree-switch-conversion.c
>> @@ -100,6 +100,7 @@ switch_conversion::collect (gswitch *swtch)
>>max_case = gimple_switch_label (swtch, branch_num - 1);
>>  
>>m_range_min = CASE_LOW (min_case);
>> +  gcc_assert (operand_equal_p (TYPE_SIZE (TREE_TYPE (m_range_min)), 
>> TYPE_SIZE (TREE_TYPE (m_index_expr)), 0));
>>if (CASE_HIGH (max_case) != NULL_TREE)
>>  m_range_max = CASE_HIGH (max_case);
>>else
>>
>> and I haven't triggered the assert.
>>
>>>
>>> With using just the constructor elt type, do you count on the analysis to
>>> fail if starting with casting the index to the elt type (or unsigned variant
>>> thereof) affects the computation?
>>
>> So hopefully the situation can't happen. Note that if it happens we should 
>> not
>> generate wrong-code, but we miss an opportunity.
> 
> The situation can happen very easily, just use
> int foo (long long x) { int ret; switch (x) { case 1234567LL: ret = 123; 
> break; ... } }
> 
> What I was wondering if doing the computation in the wider (index) type and 
> then
> casting to the narrower (ctor value) type could ever optimize something
> that doing it on the narrower type can't.
> 
> Say, if index type is unsigned int and elt0 type is unsigned char, if
> (a * i + b) % 256 could be the ctor sequence, but one couldn't find
> c, d in [0, 255] that (c * (i % 256) + d) % 256 == (a * i + b) % 256.
> But don't c = a % 256 and d = b % 256 satisfy that?

Yep, that can be an improvement. I can return to it in next stage1.

Martin

> 
>   Jakub
> 



Re: [PATCH] Use proper type in linear transformation in tree-switch-conversion (PR tree-optimization/88753).

2019-01-08 Thread Jakub Jelinek
On Tue, Jan 08, 2019 at 02:56:44PM +0100, Martin Liška wrote:
> --- a/gcc/tree-switch-conversion.c
> +++ b/gcc/tree-switch-conversion.c
> @@ -100,6 +100,7 @@ switch_conversion::collect (gswitch *swtch)
>max_case = gimple_switch_label (swtch, branch_num - 1);
>  
>m_range_min = CASE_LOW (min_case);
> +  gcc_assert (operand_equal_p (TYPE_SIZE (TREE_TYPE (m_range_min)), 
> TYPE_SIZE (TREE_TYPE (m_index_expr)), 0));
>if (CASE_HIGH (max_case) != NULL_TREE)
>  m_range_max = CASE_HIGH (max_case);
>else
> 
> and I haven't triggered the assert.
> 
> > 
> > With using just the constructor elt type, do you count on the analysis to
> > fail if starting with casting the index to the elt type (or unsigned variant
> > thereof) affects the computation?
> 
> So hopefully the situation can't happen. Note that if it happens we should not
> generate wrong-code, but we miss an opportunity.

The situation can happen very easily, just use
int foo (long long x) { int ret; switch (x) { case 1234567LL: ret = 123; break; 
... } }

What I was wondering if doing the computation in the wider (index) type and then
casting to the narrower (ctor value) type could ever optimize something
that doing it on the narrower type can't.

Say, if index type is unsigned int and elt0 type is unsigned char, if
(a * i + b) % 256 could be the ctor sequence, but one couldn't find
c, d in [0, 255] that (c * (i % 256) + d) % 256 == (a * i + b) % 256.
But don't c = a % 256 and d = b % 256 satisfy that?

Jakub


Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts (revised, v4)

2019-01-08 Thread Chung-Lin Tang

On 2019/1/7 10:15 AM, Thomas Schwinge wrote:

Well, the "Properly handle wait clause with no arguments" changes still
need to be completed and go in first (to avoid introducing regressions),
and then I will have to see your whole set of changes that you intend to
commit: the bits you've incrementally posted still don't include several
of the changes I suggested and provided patches for (again, to avoid
introducing regressions).


I'll look at that state again.


But GCC now is in "regression and documentation fixes mode", so I fear
that it's too late now?


Maybe...I don't know.


--- oacc-async.c(revision 267507)
+++ oacc-async.c(working copy)
@@ -62,12 +158,10 @@ acc_wait (int async)
+  goacc_aq aq = lookup_goacc_asyncqueue (thr, true, async);
+  thr->dev->openacc.async.synchronize_func (aq);

Have to check the result here?  Like you're doing here, for example:


  acc_wait_async (int async1, int async2)
  {
+  if (!thr->dev->openacc.async.synchronize_func (aq1))
+gomp_fatal ("wait on %d failed", async1);
+  if (!thr->dev->openacc.async.serialize_func (aq1, aq2))
+gomp_fatal ("ordering of async ids %d and %d failed", async1, async2);
--- oacc-parallel.c (revision 267507)
+++ oacc-parallel.c (working copy)
@@ -521,17 +500,22 @@ goacc_wait (int async, int num_waits, va_list *ap)
if (async == acc_async_sync)
-   acc_wait (qid);
+   acc_dev->openacc.async.synchronize_func (aq);

Likewise?


Oh okay, I forgot about those sites.



Also, I had to apply additional changes as attached, to make this build.



Oh I had those changes, but forgot to update the other patches. I'll resend 
those later too.

Thanks,
Chung-Lin


Re: [GCC][middle-end] Add rules to strip away unneeded type casts in expressions (2nd patch)

2019-01-08 Thread Tamar Christina
Hi Marc,

> >>> + (nop:type (op (convert:ty1 @1) (convert:ty2 @2)
> >>
> >> Please don't use 'nop' directly, use 'convert' instead. This line is very
> >> suspicious, both arguments of op should have the same type. Specifying the
> >> outertype should be unnecessary, it is always 'type'. And if necessary, I
> >> expect '(convert:ty1 @1)' is the same as '{ arg0; }'.
> >>
> >
> > Ah I wasn't aware I could use arg0 here. I've updated the patch, though I 
> > don't
> > really find this clearer.
> 
> > + (convert (op (convert:ty1 { arg0; }) (convert:ty2 { arg1; })
> 
> I think you misunderstood my point. What you wrote is equivalent to:
> 
>   (convert (op { arg0; } { arg1; }
> 
> since arg0 already has type ty1. And I am complaining that both arguments 
> to op must have the same type, but you are creating one of type ty1 and 
> one of type ty2, which doesn't clearly indicate that ty1==ty2.
>

Ah ok, I've reverted the previous changes and added a types_match on ty1 and 
ty2.
 
> Maybe experiment with
> (long double)some_float * (long double)some_double
> cast to either float or double.
> 

I did, and none of them were an issue:

All of these worked fine and did the operation as expected as DFmode
now, whereas before all except the first would have used TFmode.

double foo (float a, double b)
{
  return (long double)a * (long double)b;
}

double bar (float a, double b)
{
  float x = (long double)a;
  return x * (long double)b;
}

void foo_ (double b, double *c)
{
  float a = (double)3.0d;
  long double e = (long double)a * (long double)b;
  *c = (double)e;
}

> SCALAR_FLOAT_TYPE_P may be safer than FLOAT_TYPE_P.

Hmm, I can't think of any reason why these rules shouldn't apply to vector or
complex float modes.  The old code in convert.c already used FLOAT_TYPE_P.

I've attached the updated patch.

Thanks,
Tamar

> 
> -- 
> Marc Glisse

-- 
diff --git a/gcc/convert.c b/gcc/convert.c
index 1a3353c870768a33fe22480ec97c7d3e0c504075..a16b7af0ec54693eb4f1e3a110aabc1aa18eb8df 100644
--- a/gcc/convert.c
+++ b/gcc/convert.c
@@ -295,92 +295,6 @@ convert_to_real_1 (tree type, tree expr, bool fold_p)
 	  return build1 (TREE_CODE (expr), type, arg);
 	}
 	  break;
-	/* Convert (outertype)((innertype0)a+(innertype1)b)
-	   into ((newtype)a+(newtype)b) where newtype
-	   is the widest mode from all of these.  */
-	case PLUS_EXPR:
-	case MINUS_EXPR:
-	case MULT_EXPR:
-	case RDIV_EXPR:
-	   {
-	 tree arg0 = strip_float_extensions (TREE_OPERAND (expr, 0));
-	 tree arg1 = strip_float_extensions (TREE_OPERAND (expr, 1));
-
-	 if (FLOAT_TYPE_P (TREE_TYPE (arg0))
-		 && FLOAT_TYPE_P (TREE_TYPE (arg1))
-		 && DECIMAL_FLOAT_TYPE_P (itype) == DECIMAL_FLOAT_TYPE_P (type))
-	   {
-		  tree newtype = type;
-
-		  if (TYPE_MODE (TREE_TYPE (arg0)) == SDmode
-		  || TYPE_MODE (TREE_TYPE (arg1)) == SDmode
-		  || TYPE_MODE (type) == SDmode)
-		newtype = dfloat32_type_node;
-		  if (TYPE_MODE (TREE_TYPE (arg0)) == DDmode
-		  || TYPE_MODE (TREE_TYPE (arg1)) == DDmode
-		  || TYPE_MODE (type) == DDmode)
-		newtype = dfloat64_type_node;
-		  if (TYPE_MODE (TREE_TYPE (arg0)) == TDmode
-		  || TYPE_MODE (TREE_TYPE (arg1)) == TDmode
-		  || TYPE_MODE (type) == TDmode)
-newtype = dfloat128_type_node;
-		  if (newtype == dfloat32_type_node
-		  || newtype == dfloat64_type_node
-		  || newtype == dfloat128_type_node)
-		{
-		  expr = build2 (TREE_CODE (expr), newtype,
- convert_to_real_1 (newtype, arg0,
-			fold_p),
- convert_to_real_1 (newtype, arg1,
-			fold_p));
-		  if (newtype == type)
-			return expr;
-		  break;
-		}
-
-		  if (TYPE_PRECISION (TREE_TYPE (arg0)) > TYPE_PRECISION (newtype))
-		newtype = TREE_TYPE (arg0);
-		  if (TYPE_PRECISION (TREE_TYPE (arg1)) > TYPE_PRECISION (newtype))
-		newtype = TREE_TYPE (arg1);
-		  /* Sometimes this transformation is safe (cannot
-		 change results through affecting double rounding
-		 cases) and sometimes it is not.  If NEWTYPE is
-		 wider than TYPE, e.g. (float)((long double)double
-		 + (long double)double) converted to
-		 (float)(double + double), the transformation is
-		 unsafe regardless of the details of the types
-		 involved; double rounding can arise if the result
-		 of NEWTYPE arithmetic is a NEWTYPE value half way
-		 between two representable TYPE values but the
-		 exact value is sufficiently different (in the
-		 right direction) for this difference to be
-		 visible in ITYPE arithmetic.  If NEWTYPE is the
-		 same as TYPE, however, the transformation may be
-		 safe depending on the types involved: it is safe
-		 if the ITYPE has strictly more than twice as many
-		 mantissa bits as TYPE, can represent infinities
-		 and NaNs if the TYPE can, and has sufficient
-		 exponent range for the product or ratio of two
-		 values representable in the TYPE to be 

Re: [PATCH] Use proper type in linear transformation in tree-switch-conversion (PR tree-optimization/88753).

2019-01-08 Thread Martin Liška
On 1/8/19 2:50 PM, Jakub Jelinek wrote:
> On Tue, Jan 08, 2019 at 02:09:28PM +0100, Martin Liška wrote:
>> Changed that. I verified that tree-ssa.exp tests work fine. May I install 
>> the hunk you suggested
>> without testing?
> 
> I think better would be commit without that hunk and when you bootstrap next
> time, test that and commit (preapproved) that hunk.

I'll re-run the bootstrap with the hunk.

> 
> That said, have you looked at the other mail I've posted (TYPE_PRECISION
> (index_type) > TYPE_PRECISION (elt0))?
> The way contains_linear_function_p is written it does the checks in the type
> that will do it at runtime, so I'm just wondering if there are cases where
> doing it in the wider type could optimize something that the narrower one
> can't.  Perhaps there are none.

I replied that in previous email.

Martin

> 
>   Jakub
> 



Re: [PATCH] Use proper type in linear transformation in tree-switch-conversion (PR tree-optimization/88753).

2019-01-08 Thread Martin Liška
On 1/8/19 2:08 PM, Jakub Jelinek wrote:
> On Tue, Jan 08, 2019 at 01:52:35PM +0100, Jakub Jelinek wrote:
>>> 2019-01-08  Martin Liska  
>>>
>>> PR tree-optimization/88753
>>> * tree-switch-conversion.c (switch_conversion::build_one_array):
>>> Come up with local variable constructor.  Convert first to
>>> type of constructor values.
>>
>> Why is the testcase missing?
> 
> Oh, one more thing.  What happens if the index is wider (higher precision)
> than the type of the constructor elts?  At least for the two_value_replacement
> optimization in phiopt, I'm using the wider of the two types for the
> arithmetics (conditionally unsigned if not proven not to overflow).
> Shouldn't that be the case in this optimization too?

I've been running for quite some GCC bootstrap with:

diff --git a/gcc/tree-switch-conversion.c b/gcc/tree-switch-conversion.c
index c3f2baf39d7..c774350b497 100644
--- a/gcc/tree-switch-conversion.c
+++ b/gcc/tree-switch-conversion.c
@@ -100,6 +100,7 @@ switch_conversion::collect (gswitch *swtch)
   max_case = gimple_switch_label (swtch, branch_num - 1);
 
   m_range_min = CASE_LOW (min_case);
+  gcc_assert (operand_equal_p (TYPE_SIZE (TREE_TYPE (m_range_min)), TYPE_SIZE 
(TREE_TYPE (m_index_expr)), 0));
   if (CASE_HIGH (max_case) != NULL_TREE)
 m_range_max = CASE_HIGH (max_case);
   else

and I haven't triggered the assert.

> 
> With using just the constructor elt type, do you count on the analysis to
> fail if starting with casting the index to the elt type (or unsigned variant
> thereof) affects the computation?

So hopefully the situation can't happen. Note that if it happens we should not
generate wrong-code, but we miss an opportunity.

Martin

> 
>   Jakub
> 



Re: [PATCH] Use proper type in linear transformation in tree-switch-conversion (PR tree-optimization/88753).

2019-01-08 Thread Jakub Jelinek
On Tue, Jan 08, 2019 at 02:09:28PM +0100, Martin Liška wrote:
> Changed that. I verified that tree-ssa.exp tests work fine. May I install the 
> hunk you suggested
> without testing?

I think better would be commit without that hunk and when you bootstrap next
time, test that and commit (preapproved) that hunk.

That said, have you looked at the other mail I've posted (TYPE_PRECISION
(index_type) > TYPE_PRECISION (elt0))?
The way contains_linear_function_p is written it does the checks in the type
that will do it at runtime, so I'm just wondering if there are cases where
doing it in the wider type could optimize something that the narrower one
can't.  Perhaps there are none.

Jakub


Re: [patch,openacc] Fix PR71959: lto dump of callee counts

2019-01-08 Thread Jakub Jelinek
On Tue, Jan 08, 2019 at 01:30:45PM +, Julian Brown wrote:
> commit 2ee3f8d09a7b2af6c9ba29cdd8e8587db1946c0b
> Author: Julian Brown 
> Date:   Wed Dec 19 05:01:58 2018 -0800
> 
> Add testcase from PR71959
> 
>   libgomp/
> 
>   PR lto/71959
>   * testsuite/libgomp.oacc-c++/pr71959-aux.cc: New.
>   * testsuite/libgomp.oacc-c++/pr71959.C: New.

Ok, thanks.

Jakub


Re: [PATCH 2/2] libgomp: Reduce copy and paste for RTEMS

2019-01-08 Thread Jakub Jelinek
On Tue, Jan 08, 2019 at 02:28:35PM +0100, Sebastian Huber wrote:
> libgomp/
> 
>   * config/rtems/bar.c: Include "../linux/bar.c" and delete copy
>   and paste code.

>  libgomp/config/rtems/bar.c | 183 
> +
>  1 file changed, 2 insertions(+), 181 deletions(-)
> 
> diff --git a/libgomp/config/rtems/bar.c b/libgomp/config/rtems/bar.c
> index 8f92d60b325..c16c763b954 100644
> --- a/libgomp/config/rtems/bar.c
> +++ b/libgomp/config/rtems/bar.c
> @@ -29,6 +29,7 @@
>  
>  #include "libgomp.h"
>  #include "bar.h"
> +#define GOMP_WAIT_H 1

Maybe move the above define right before the following line.

> +#include "../linux/bar.c"

Ok with that change.

Jakub


Re: [PATCH 1/2] libgomp: Avoid complex dependencies for RTEMS

2019-01-08 Thread Jakub Jelinek
On Tue, Jan 08, 2019 at 02:28:34PM +0100, Sebastian Huber wrote:
> libgomp/
> 
>   * config/rtems/affinity-fmt.c: New file.  Include affinity-fmt.c,
>   undefining HAVE_GETPID and HAVE_GETHOSTNAME, and mapping fwrite to
>   write.

Ok for trunk, thanks.

Jakub


Re: [PATCH][jit] Add thread-local globals to the libgccjit frontend

2019-01-08 Thread Marc Nieper-Wißkirchen

Dear David,

thank you very much for your timely response and for talking a thorough 
look at my proposed patch.


Am 07.01.19 um 21:34 schrieb David Malcolm:


Have you done the legal paperwork with the FSF for contributing to GCC?
  See https://gcc.gnu.org/contribute.html#legal


Not yet; this is my first patch I would like to contribute to GCC. You 
should have received a private email to get the legal matters done.



ChangeLog

2019-01-05  Marc Nieper-Wißkirchen  

 * docs/topics/compatibility.rst: Add LIBGCCJIT_ABI_11.
 * docs/topics/expressions.rst (Global variables): Add
 documentation of gcc_jit_lvalue_set_bool_thread_local.
* docs/_build/texinfo/libgccjit.texi: Regenerate.
* jit-playback.c: Include "varasm.h".
Within namespace gcc::jit::playback...
(context::new_global) Add "thread_local_p" param and use it
to set DECL_TLS_MODEL.
* jit-playback.h: Within namespace gcc::jit::playback...
(context::new_global: Add "thread_local_p" param.
* jit-recording.c: Within namespace gcc::jit::recording...
(global::replay_into): Provide m_thread_local to call to
new_global.
(global::write_reproducer): Call write_reproducer_thread_local.
(global::write_reproducer_thread_local): New method.
* jit-recording.h: Within namespace gcc::jit::recording...
(lvalue::dyn_cast_global): New virtual function.
(global::m_thread_local): New field.
* libgccjit.c (gcc_jit_lvalue_set_bool_thread_local): New
function.
* libgccjit.h
(LIBGCCJIT_HAVE_gcc_jit_lvalue_set_bool_thread_local): New
macro.
(gcc_jit_lvalue_set_bool_thread_local): New function.
* libgccjit.map (LIBGCCJIT_ABI_11): New.
(gcc_jit_lvalue_set_bool_thread_local): Add.

Testing

The result of `make check-jit' is (the failing test in
`test-sum-squares.c` is also failing without this patch on my
machine):

Native configuration is x86_64-pc-linux-gnu

[...]


FAIL:  test-combination.c.exe iteration 1 of 5:
verify_code_sum_of_squares: dump_vrp1: actual: "
FAIL: test-combination.c.exe killed: 20233 exp6 0 0 CHILDKILLED
SIGABRT SIGABRT
FAIL:  test-sum-of-squares.c.exe iteration 1 of 5: verify_code:
dump_vrp1: actual: "
FAIL: test-sum-of-squares.c.exe killed: 22698 exp6 0 0 CHILDKILLED
SIGABRT SIGABRT
FAIL:  test-threads.c.exe: verify_code_sum_of_squares: dump_vrp1:
actual: "
FAIL: test-threads.c.exe killed: 22840 exp6 0 0 CHILDKILLED SIGABRT
SIGABRT

That one's failing for me as well.  I'm investigating (I've filed it as
PR jit/88747).


I have applied your recent patch. With the patch, there are no more 
failures.


Including the new test case for thread-local globals (see below), the 
final output of `make check-jit' is now as follows:


Test run by mnieper on Tue Jan  8 14:18:27 2019

Native configuration is x86_64-pc-linux-gnu

        === jit tests ===

Schedule of variations:

    unix

Running target unix

Using /usr/share/dejagnu/baseboards/unix.exp as board description file for 
target.

Using /usr/share/dejagnu/config/unix.exp as generic interface file for target.

Using /home/mnieper/gcc/src/gcc/testsuite/config/default.exp as 
tool-and-target-specific interface file.

Running /home/mnieper/gcc/src/gcc/testsuite/jit.dg/jit.exp ...

        === jit Summary ===

# of expected passes        10394


diff --git a/gcc/testsuite/jit.dg/all-non-failing-tests.h

b/gcc/testsuite/jit.dg/all-non-failing-tests.h
index bf02e1258..c2654ff09 100644
--- a/gcc/testsuite/jit.dg/all-non-failing-tests.h
+++ b/gcc/testsuite/jit.dg/all-non-failing-tests.h
@@ -224,6 +224,13 @@
  #undef create_code
  #undef verify_code

+/* test-factorial-must-tail-call.c */

Looks like a cut error: presumably the above comment is meant to
refer to the new test...


Yep.


+#define create_code create_code_thread_local
+#define verify_code verify_code_thread_local
+#include "test-thread-local.c"

...but it looks like the new test file is missing from the patch.


My fault. I supplied the wrong arguments to `git diff'.

At the end of this email, you'll find the updated ChangeLog and the 
amended patch.


Thanks again,

Marc

2019-01-08  Marc Nieper-Wißkirchen  

* docs/topics/compatibility.rst: Add LIBGCCJIT_ABI_11.
* docs/topics/expressions.rst (Global variables): Add
documentation of gcc_jit_lvalue_set_bool_thread_local.
* docs/_build/texinfo/libgccjit.texi: Regenerate.
* jit-playback.c: Include "varasm.h".
Within namespace gcc::jit::playback...
(context::new_global) Add "thread_local_p" param and use it
to set DECL_TLS_MODEL.
* jit-playback.h: Within namespace gcc::jit::playback...
(context::new_global): Add "thread_local_p" param.
* jit-recording.c: Within namespace gcc::jit::recording...
(global::replay_into): Provide m_thread_local to call to
new_global.
(global::write_reproducer): Call write_reproducer_thread_local.
(global::write_reproducer_thread_local): New method.
* jit-recording.h: Within namespace gcc::jit::recording...

Re: [patch,openacc] Fix PR71959: lto dump of callee counts

2019-01-08 Thread Julian Brown
On Sat, 22 Dec 2018 15:09:34 +
Iain Sandoe  wrote:

> Hi Julian,
> 
> > On 21 Dec 2018, at 16:47, Julian Brown 
> > wrote: 
> 
> > On Fri, 21 Dec 2018 14:31:19 +0100
> > Jakub Jelinek  wrote:
> >   
> >> On Fri, Dec 21, 2018 at 01:23:03PM +, Julian Brown wrote:  
> >>> 2018-xx-yy  Nathan Sidwell
> >   
> 
> >>>   * testsuite/libgomp.oacc-c++/pr71959-a.C: New.
> >>>   * testsuite/libgomp.oacc-c++/pr71959.C: New.
> >>   
> >>> +void apply (int (*fn)(), Iter out) asm
> >>> ("_ZN5Apply5applyEPFivE4Iter");
> >> 
> >> Will this work even on targets that use _ or other symbol
> >> prefixes?  
> > 
> > I'd guess so, else there would be no portable way of using "asm" to
> > write pre-mangled C++ names. The only existing similar uses I could
> > find in the testsuite are for the ifunc attribute, not asm, though
> > (e.g. g++.dg/ext/attr-ifunc-*.C).  
> 
> It won’t work on such targets (e.g. Darwin)
> … but it’s not too hard to make it happen (see, for example,
> gcc.dg/memcmp-1.c)
> 
> One just has to remember that __USER_LABEL_PREFIX__ is a token, not a
> string.
> 
> so .. in the example above…
> 
> #define STR1(X) #X
> #define STR2(X) STR1(X)
> 
> ….
> 
>  asm(STR2(__USER_LABEL_PREFIX__) "_ZN5Apply5applyEPFivE4Iter”);

Thanks! I've amended the test to use this technique (though I can't
easily test on Darwin, so this is "best effort").

> > Anyway, OpenACC is only useful for a handful of targets at present,
> > neither of which use special symbol prefixes AFAIK.  
> 
> I have hopes of one day getting offloading to work on Darwin (the
> only limitation is developer time, not technical feasibility) .. 

Is this OK now (for stage 4)?

Thanks,

Julian
commit 2ee3f8d09a7b2af6c9ba29cdd8e8587db1946c0b
Author: Julian Brown 
Date:   Wed Dec 19 05:01:58 2018 -0800

Add testcase from PR71959

	libgomp/

	PR lto/71959
	* testsuite/libgomp.oacc-c++/pr71959-aux.cc: New.
	* testsuite/libgomp.oacc-c++/pr71959.C: New.

diff --git a/libgomp/testsuite/libgomp.oacc-c++/pr71959-aux.cc b/libgomp/testsuite/libgomp.oacc-c++/pr71959-aux.cc
new file mode 100644
index 000..10a6eeb
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c++/pr71959-aux.cc
@@ -0,0 +1,35 @@
+// { dg-do compile }
+
+#define STR1(X) #X
+#define STR2(X) STR1(X)
+#define LABEL(X) STR2(__USER_LABEL_PREFIX__) X
+
+struct Iter
+{
+  int *cursor;
+
+  void ctor (int *cursor_) asm (LABEL ("_ZN4IterC1EPi"));
+  int *point () const asm (LABEL ("_ZNK4Iter5pointEv"));
+};
+
+#pragma acc routine
+void Iter::ctor (int *cursor_)
+{
+  cursor = cursor_;
+}
+
+#pragma acc routine
+int *Iter::point () const
+{
+  return cursor;
+}
+
+void apply (int (*fn)(), Iter out) asm (LABEL ("_ZN5Apply5applyEPFivE4Iter"));
+
+#pragma acc routine
+void apply (int (*fn)(), struct Iter out)
+{ *out.point() = fn (); }
+
+extern "C" void __gxx_personality_v0 ()
+{
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c++/pr71959.C b/libgomp/testsuite/libgomp.oacc-c++/pr71959.C
new file mode 100644
index 000..bf27a75
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c++/pr71959.C
@@ -0,0 +1,31 @@
+// { dg-additional-sources "pr71959-aux.cc" }
+
+// PR lto/71959 ICEd LTO due to mismatch between writing & reading behaviour
+
+struct Iter
+{
+  int *cursor;
+
+  Iter(int *cursor_) : cursor(cursor_) {}
+
+  int *point() const { return cursor; }
+};
+
+#pragma acc routine seq
+int one () { return 1; }
+
+struct Apply
+{
+  static void apply (int (*fn)(), Iter out)
+  { *out.point() = fn (); }
+};
+
+int main ()
+{
+  int x;
+
+#pragma acc parallel copyout(x)
+  Apply::apply (one, Iter ());
+
+  return x != 1;
+}


[PATCH 2/2] libgomp: Reduce copy and paste for RTEMS

2019-01-08 Thread Sebastian Huber
libgomp/

* config/rtems/bar.c: Include "../linux/bar.c" and delete copy
and paste code.
---
 libgomp/config/rtems/bar.c | 183 +
 1 file changed, 2 insertions(+), 181 deletions(-)

diff --git a/libgomp/config/rtems/bar.c b/libgomp/config/rtems/bar.c
index 8f92d60b325..c16c763b954 100644
--- a/libgomp/config/rtems/bar.c
+++ b/libgomp/config/rtems/bar.c
@@ -29,6 +29,7 @@
 
 #include "libgomp.h"
 #include "bar.h"
+#define GOMP_WAIT_H 1
 #include 
 
 static gomp_barrier_t *
@@ -72,184 +73,4 @@ do_wait (int *addr, int val)
 futex_wait (addr, val);
 }
 
-/* Everything below this point should be identical to the Linux
-   implementation.  */
-
-void
-gomp_barrier_wait_end (gomp_barrier_t *bar, gomp_barrier_state_t state)
-{
-  if (__builtin_expect (state & BAR_WAS_LAST, 0))
-{
-  /* Next time we'll be awaiting TOTAL threads again.  */
-  bar->awaited = bar->total;
-  __atomic_store_n (>generation, bar->generation + BAR_INCR,
-   MEMMODEL_RELEASE);
-  futex_wake ((int *) >generation, INT_MAX);
-}
-  else
-{
-  do
-   do_wait ((int *) >generation, state);
-  while (__atomic_load_n (>generation, MEMMODEL_ACQUIRE) == state);
-}
-}
-
-void
-gomp_barrier_wait (gomp_barrier_t *bar)
-{
-  gomp_barrier_wait_end (bar, gomp_barrier_wait_start (bar));
-}
-
-/* Like gomp_barrier_wait, except that if the encountering thread
-   is not the last one to hit the barrier, it returns immediately.
-   The intended usage is that a thread which intends to gomp_barrier_destroy
-   this barrier calls gomp_barrier_wait, while all other threads
-   call gomp_barrier_wait_last.  When gomp_barrier_wait returns,
-   the barrier can be safely destroyed.  */
-
-void
-gomp_barrier_wait_last (gomp_barrier_t *bar)
-{
-  gomp_barrier_state_t state = gomp_barrier_wait_start (bar);
-  if (state & BAR_WAS_LAST)
-gomp_barrier_wait_end (bar, state);
-}
-
-void
-gomp_team_barrier_wake (gomp_barrier_t *bar, int count)
-{
-  futex_wake ((int *) >generation, count == 0 ? INT_MAX : count);
-}
-
-void
-gomp_team_barrier_wait_end (gomp_barrier_t *bar, gomp_barrier_state_t state)
-{
-  unsigned int generation, gen;
-
-  if (__builtin_expect (state & BAR_WAS_LAST, 0))
-{
-  /* Next time we'll be awaiting TOTAL threads again.  */
-  struct gomp_thread *thr = gomp_thread ();
-  struct gomp_team *team = thr->ts.team;
-
-  bar->awaited = bar->total;
-  team->work_share_cancelled = 0;
-  if (__builtin_expect (team->task_count, 0))
-   {
- gomp_barrier_handle_tasks (state);
- state &= ~BAR_WAS_LAST;
-   }
-  else
-   {
- state &= ~BAR_CANCELLED;
- state += BAR_INCR - BAR_WAS_LAST;
- __atomic_store_n (>generation, state, MEMMODEL_RELEASE);
- futex_wake ((int *) >generation, INT_MAX);
- return;
-   }
-}
-
-  generation = state;
-  state &= ~BAR_CANCELLED;
-  do
-{
-  do_wait ((int *) >generation, generation);
-  gen = __atomic_load_n (>generation, MEMMODEL_ACQUIRE);
-  if (__builtin_expect (gen & BAR_TASK_PENDING, 0))
-   {
- gomp_barrier_handle_tasks (state);
- gen = __atomic_load_n (>generation, MEMMODEL_ACQUIRE);
-   }
-  generation |= gen & BAR_WAITING_FOR_TASK;
-}
-  while (gen != state + BAR_INCR);
-}
-
-void
-gomp_team_barrier_wait (gomp_barrier_t *bar)
-{
-  gomp_team_barrier_wait_end (bar, gomp_barrier_wait_start (bar));
-}
-
-void
-gomp_team_barrier_wait_final (gomp_barrier_t *bar)
-{
-  gomp_barrier_state_t state = gomp_barrier_wait_final_start (bar);
-  if (__builtin_expect (state & BAR_WAS_LAST, 0))
-bar->awaited_final = bar->total;
-  gomp_team_barrier_wait_end (bar, state);
-}
-
-bool
-gomp_team_barrier_wait_cancel_end (gomp_barrier_t *bar,
-  gomp_barrier_state_t state)
-{
-  unsigned int generation, gen;
-
-  if (__builtin_expect (state & BAR_WAS_LAST, 0))
-{
-  /* Next time we'll be awaiting TOTAL threads again.  */
-  /* BAR_CANCELLED should never be set in state here, because
-cancellation means that at least one of the threads has been
-cancelled, thus on a cancellable barrier we should never see
-all threads to arrive.  */
-  struct gomp_thread *thr = gomp_thread ();
-  struct gomp_team *team = thr->ts.team;
-
-  bar->awaited = bar->total;
-  team->work_share_cancelled = 0;
-  if (__builtin_expect (team->task_count, 0))
-   {
- gomp_barrier_handle_tasks (state);
- state &= ~BAR_WAS_LAST;
-   }
-  else
-   {
- state += BAR_INCR - BAR_WAS_LAST;
- __atomic_store_n (>generation, state, MEMMODEL_RELEASE);
- futex_wake ((int *) >generation, INT_MAX);
- return false;
-   }
-}
-
-  if (__builtin_expect (state & BAR_CANCELLED, 0))
-return true;
-
-  generation = state;
-  do
-{
-  do_wait ((int *) 

[PATCH 1/2] libgomp: Avoid complex dependencies for RTEMS

2019-01-08 Thread Sebastian Huber
libgomp/

* config/rtems/affinity-fmt.c: New file.  Include affinity-fmt.c,
undefining HAVE_GETPID and HAVE_GETHOSTNAME, and mapping fwrite to
write.
---
 libgomp/config/rtems/affinity-fmt.c | 49 +
 1 file changed, 49 insertions(+)
 create mode 100644 libgomp/config/rtems/affinity-fmt.c

diff --git a/libgomp/config/rtems/affinity-fmt.c 
b/libgomp/config/rtems/affinity-fmt.c
new file mode 100644
index 000..e4e14c163e9
--- /dev/null
+++ b/libgomp/config/rtems/affinity-fmt.c
@@ -0,0 +1,49 @@
+/* Copyright (C) 2018-2019 Free Software Foundation, Inc.
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#include "libgomp.h"
+#include 
+#include 
+#include 
+#ifdef HAVE_UNISTD_H
+#include 
+#endif
+#ifdef HAVE_INTTYPES_H
+# include   /* For PRIx64.  */
+#endif
+#ifdef HAVE_UNAME
+#include 
+#endif
+
+/* The HAVE_GETPID and HAVE_GETHOSTNAME configure tests are passing for RTEMS,
+   but the extra information they give are of little value for the user.
+   Override the configure test results here.  */
+#undef HAVE_GETPID
+#undef HAVE_GETHOSTNAME
+
+/* Avoid the complex fwrite() in favour of the simple write().  */
+#undef fwrite
+#define fwrite(ptr, size, nmemb, stream) write (1, (ptr), (nmemb) * (size))
+
+#include "../../affinity-fmt.c"
-- 
2.16.4



[C++ Patch] Fix three additional locations

2019-01-08 Thread Paolo Carlini

Hi,

a few additional easy to fix cases where we thought that a plain error 
was good enough. Tested x86_64-linux.


Thanks, Paolo.



/cp
2019-01-08  Paolo Carlini  

* decl.c (grok_reference_init): Improve error location.
(grokdeclarator): Likewise, improve two locations.

/testsuite
2019-01-08  Paolo Carlini  

* g++.dg/diagnostic/constexpr2.C: New.
* g++.dg/diagnostic/ref3.C: Likewise.
Index: cp/decl.c
===
--- cp/decl.c   (revision 267675)
+++ cp/decl.c   (working copy)
@@ -5357,7 +5357,8 @@ grok_reference_init (tree decl, tree type, tree in
   if ((DECL_LANG_SPECIFIC (decl) == 0
   || DECL_IN_AGGR_P (decl) == 0)
  && ! DECL_THIS_EXTERN (decl))
-   error ("%qD declared as reference but not initialized", decl);
+   error_at (DECL_SOURCE_LOCATION (decl),
+ "%qD declared as reference but not initialized", decl);
   return NULL_TREE;
 }
 
@@ -12517,8 +12518,9 @@ grokdeclarator (const cp_declarator *declarator,
unqualified_id);
else if (constexpr_p && !initialized)
  {
-   error ("% static data member %qD must have an "
-  "initializer", decl);
+   error_at (DECL_SOURCE_LOCATION (decl),
+ "% static data member %qD must "
+ "have an initializer", decl);
constexpr_p = false;
  }
 
@@ -12756,8 +12758,9 @@ grokdeclarator (const cp_declarator *declarator,
  }
else if (constexpr_p && DECL_EXTERNAL (decl))
  {
-   error ("declaration of % variable %qD "
-  "is not a definition", decl);
+   error_at (DECL_SOURCE_LOCATION (decl),
+ "declaration of % variable %qD "
+ "is not a definition", decl);
constexpr_p = false;
  }
 
Index: testsuite/g++.dg/diagnostic/constexpr2.C
===
--- testsuite/g++.dg/diagnostic/constexpr2.C(nonexistent)
+++ testsuite/g++.dg/diagnostic/constexpr2.C(working copy)
@@ -0,0 +1,8 @@
+// { dg-do compile { target c++11 } }
+
+extern constexpr int i __attribute__((unused));  // { dg-error "22:declaration 
of .constexpr. variable .i." }
+
+struct S
+{
+  constexpr static int i __attribute__((unused));  // { dg-error 
"24:.constexpr. static data member .i." }
+};
Index: testsuite/g++.dg/diagnostic/ref3.C
===
--- testsuite/g++.dg/diagnostic/ref3.C  (nonexistent)
+++ testsuite/g++.dg/diagnostic/ref3.C  (working copy)
@@ -0,0 +1 @@
+int& i __attribute__((unused));  // { dg-error "6:.i. declared as reference" }


Re: [PATCH] Use proper type in linear transformation in tree-switch-conversion (PR tree-optimization/88753).

2019-01-08 Thread Martin Liška
On 1/8/19 1:52 PM, Jakub Jelinek wrote:
> On Tue, Jan 08, 2019 at 01:40:41PM +0100, Martin Liška wrote:
>> As seen in the PR, when doing switch convert linear transformation,
>> one needs to first convert to unsigned type for constructor values.
>>
>> Patch survives regression tests and bootstrap on x86_64-linux-gnu.
>> Ready for trunk?
>> Thanks,
>> Martin
>>
>>
>> gcc/ChangeLog:
>>
>> 2019-01-08  Martin Liska  
>>
>>  PR tree-optimization/88753
>>  * tree-switch-conversion.c (switch_conversion::build_one_array):
>>  Come up with local variable constructor.  Convert first to
>>  type of constructor values.
> 
> Why is the testcase missing?

Because I forgot to call git add ;)

> 
> Otherwise LGTM.
> 
> Also, note contains_linear_function_p
>   wide_int range_min = wi::to_wide (fold_convert (TREE_TYPE (elt0),
>   m_range_min));
> could be written on wide_ints directly:
>   wide_int range_min
> = wide_int::from (wi::to_wide (m_range_min),
> TYPE_PRECISION (TREE_TYPE (elt0)),
> TYPE_SIGN (TREE_TYPE (m_range_min)));
> 
>   Jakub
> 

Changed that. I verified that tree-ssa.exp tests work fine. May I install the 
hunk you suggested
without testing?

Martin
>From 52f8961361f894cca35c616a38486f69c48c194c Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 8 Jan 2019 12:56:03 +0100
Subject: [PATCH] Use proper type in linear transformation in
 tree-switch-conversion (PR tree-optimization/88753).

gcc/ChangeLog:

2019-01-08  Martin Liska  

	PR tree-optimization/88753
	* tree-switch-conversion.c (switch_conversion::build_one_array):
	Come up with local variable constructor.  Convert first to
	type of constructor values.

gcc/testsuite/ChangeLog:

2019-01-08  Martin Liska  

	* gcc.dg/tree-ssa/pr88753.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr88753.c | 57 +
 gcc/tree-switch-conversion.c| 17 +---
 2 files changed, 67 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88753.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88753.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88753.c
new file mode 100644
index 000..eaefc38962f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88753.c
@@ -0,0 +1,57 @@
+/* PR tree-optimization/88753 */
+/* { dg-options "-O2 -fdump-tree-switchconv" } */
+/* { dg-do run { target nonpic } } */
+
+typedef unsigned short int uint16_t;
+typedef unsigned char uint8_t;
+
+uint16_t length;
+uint16_t enc_method_global;
+
+uint8_t
+__attribute__((noipa))
+_zip_buffer_get_8(int buffer)
+{
+  return buffer;
+}
+
+  int
+  __attribute__((noipa))
+foo(int v)
+{
+  uint16_t enc_method;
+  switch (_zip_buffer_get_8(v)) {
+case 1:
+  enc_method = 0x0101;
+  break;
+case 2:
+  enc_method = 0x0102;
+  break;
+case 3:
+  enc_method = 0x0103;
+  break;
+default:
+  __builtin_abort ();
+  }
+
+  enc_method_global = enc_method;
+}
+
+int main(int argc, char **argv)
+{
+  foo (1);
+  if (enc_method_global != 0x0101)
+__builtin_abort ();
+
+  foo (2);
+  if (enc_method_global != 0x0102)
+__builtin_abort ();
+
+  foo (3);
+  if (enc_method_global != 0x0103)
+__builtin_abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Linear transformation with A = 1 and B = 256" 1 "switchconv" } } */
diff --git a/gcc/tree-switch-conversion.c b/gcc/tree-switch-conversion.c
index 614c450dd4d..c3f2baf39d7 100644
--- a/gcc/tree-switch-conversion.c
+++ b/gcc/tree-switch-conversion.c
@@ -473,8 +473,10 @@ switch_conversion::contains_linear_function_p (vec *vec,
   if (TREE_CODE (elt0) != INTEGER_CST || TREE_CODE (elt1) != INTEGER_CST)
 return false;
 
-  wide_int range_min = wi::to_wide (fold_convert (TREE_TYPE (elt0),
-		  m_range_min));
+  wide_int range_min
+= wide_int::from (wi::to_wide (m_range_min),
+		  TYPE_PRECISION (TREE_TYPE (elt0)),
+		  TYPE_SIGN (TREE_TYPE (m_range_min)));
   wide_int y1 = wi::to_wide (elt0);
   wide_int y2 = wi::to_wide (elt1);
   wide_int a = y2 - y1;
@@ -600,9 +602,9 @@ switch_conversion::build_one_array (int num, tree arr_index_type,
   name = copy_ssa_name (PHI_RESULT (phi));
   m_target_inbound_names[num] = name;
 
+  vec *constructor = m_constructors[num];
   wide_int coeff_a, coeff_b;
-  bool linear_p = contains_linear_function_p (m_constructors[num], _a,
-	  _b);
+  bool linear_p = contains_linear_function_p (constructor, _a, _b);
   if (linear_p)
 {
   if (dump_file && coeff_a.to_uhwi () > 0)
@@ -610,7 +612,8 @@ switch_conversion::build_one_array (int num, tree arr_index_type,
 		 " and B = %" PRId64 "\n", coeff_a.to_shwi (),
 		 coeff_b.to_shwi ());
 
-  tree t = unsigned_type_for (TREE_TYPE (m_index_expr));
+  /* We must use type of constructor values.  */
+  tree t = unsigned_type_for (TREE_TYPE ((*constructor)[0].value));
   gimple_seq seq = NULL;
   tree tmp = gimple_convert (, t, 

Re: [PATCH] Use proper type in linear transformation in tree-switch-conversion (PR tree-optimization/88753).

2019-01-08 Thread Jakub Jelinek
On Tue, Jan 08, 2019 at 01:52:35PM +0100, Jakub Jelinek wrote:
> > 2019-01-08  Martin Liska  
> > 
> > PR tree-optimization/88753
> > * tree-switch-conversion.c (switch_conversion::build_one_array):
> > Come up with local variable constructor.  Convert first to
> > type of constructor values.
> 
> Why is the testcase missing?

Oh, one more thing.  What happens if the index is wider (higher precision)
than the type of the constructor elts?  At least for the two_value_replacement
optimization in phiopt, I'm using the wider of the two types for the
arithmetics (conditionally unsigned if not proven not to overflow).
Shouldn't that be the case in this optimization too?

With using just the constructor elt type, do you count on the analysis to
fail if starting with casting the index to the elt type (or unsigned variant
thereof) affects the computation?

Jakub


[PATCH] Fix PR88739

2019-01-08 Thread Richard Biener


The following fixes PR88739, VN figuring equivalences that later
lead to code insertions for code hoisting using a possibly overflowing
computation that wasn't done unconditionally before.

It is now possible to verify this with the new VN scheme.  Still
a more complete fix is pending (there's a related testcase and
the underlying issue is how PRE keeps track of expressions).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

>From 6d63150293c634cdea4e220a4163cf00f91318bf Mon Sep 17 00:00:00 2001
From: Richard Guenther 
Date: Tue, 8 Jan 2019 10:24:25 +0100
Subject: [PATCH] fix-pr86554

PR tree-optimization/86554
* tree-ssa-sccvn.c (eliminate_dom_walker, rpo_elim,
rpo_avail): Move earlier.
(visit_nary_op): When value-numbering to expressions
with different overflow behavior make sure there's an
available expression on the path.

* gcc.dg/torture/pr86554-1.c: New testcase.
* gcc.dg/torture/pr86554-2.c: Likewise.

diff --git a/gcc/testsuite/gcc.dg/torture/pr86554-1.c 
b/gcc/testsuite/gcc.dg/torture/pr86554-1.c
new file mode 100644
index 000..64f851e896e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr86554-1.c
@@ -0,0 +1,35 @@
+/* { dg-do run } */
+
+struct foo
+{
+  unsigned x;
+};
+typedef struct foo foo;
+
+static inline int zot(foo *f)
+{
+  int ret;
+
+  if (f->x > 0x7FFF)
+ret = (int)(f->x - 0x7FFF);
+  else
+ret = (int)f->x - 0x7FFF;
+  return ret;
+}
+
+void __attribute__((noinline,noclone)) bar(foo *f)
+{
+  int ret = zot(f);
+  volatile int x = ret;
+  if (ret < 1)
+__builtin_abort ();
+}
+
+int main()
+{
+  foo f;
+  f.x = 0x83f8;
+
+  bar();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/torture/pr86554-2.c 
b/gcc/testsuite/gcc.dg/torture/pr86554-2.c
new file mode 100644
index 000..9e57a9ca725
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr86554-2.c
@@ -0,0 +1,49 @@
+/* { dg-do run } */
+/* { dg-require-effective-target int32plus } */
+
+struct s { __INT64_TYPE__ e; };
+
+static void f (struct s *ps)
+{
+  volatile __INT64_TYPE__ m = 9223372036854775807;
+  const char *str = "11E";
+  int r;
+  __INT64_TYPE__ sum;
+
+  ps->e = 0;
+
+  for (;;)
+{
+  if (*str++ != '1')
+   break;
+  ps->e ++;
+}
+
+  r = 1;
+  sum = m;
+
+  if (sum >= 0 && ps->e >= 0)
+{
+  __UINT64_TYPE__ uc;
+  uc = (__UINT64_TYPE__) sum + (__UINT64_TYPE__) ps->e;
+  if (uc > 9223372036854775807)
+   r = 2;
+  else
+   sum = 17;
+}
+  else
+sum = sum + ps->e;
+
+  if (sum != 9223372036854775807)
+__builtin_abort ();
+  if (r != 2)
+__builtin_abort ();
+  ps->e = sum;
+}
+
+int main (void)
+{
+  struct s s;
+  f ();
+  return 0;
+}
diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
index 2cebb6b37dc..ff54a66534e 100644
--- a/gcc/tree-ssa-sccvn.c
+++ b/gcc/tree-ssa-sccvn.c
@@ -1865,6 +1865,86 @@ vn_nary_simplify (vn_nary_op_t nary)
   return vn_nary_build_or_lookup_1 (, false);
 }
 
+/* Elimination engine.  */
+
+class eliminate_dom_walker : public dom_walker
+{
+public:
+  eliminate_dom_walker (cdi_direction, bitmap);
+  ~eliminate_dom_walker ();
+
+  virtual edge before_dom_children (basic_block);
+  virtual void after_dom_children (basic_block);
+
+  virtual tree eliminate_avail (basic_block, tree op);
+  virtual void eliminate_push_avail (basic_block, tree op);
+  tree eliminate_insert (basic_block, gimple_stmt_iterator *gsi, tree val);
+
+  void eliminate_stmt (basic_block, gimple_stmt_iterator *);
+
+  unsigned eliminate_cleanup (bool region_p = false);
+
+  bool do_pre;
+  unsigned int el_todo;
+  unsigned int eliminations;
+  unsigned int insertions;
+
+  /* SSA names that had their defs inserted by PRE if do_pre.  */
+  bitmap inserted_exprs;
+
+  /* Blocks with statements that have had their EH properties changed.  */
+  bitmap need_eh_cleanup;
+
+  /* Blocks with statements that have had their AB properties changed.  */
+  bitmap need_ab_cleanup;
+
+  /* Local state for the eliminate domwalk.  */
+  auto_vec to_remove;
+  auto_vec to_fixup;
+  auto_vec avail;
+  auto_vec avail_stack;
+};
+
+/* Adaptor to the elimination engine using RPO availability.  */
+
+class rpo_elim : public eliminate_dom_walker
+{
+public:
+  rpo_elim(basic_block entry_)
+: eliminate_dom_walker (CDI_DOMINATORS, NULL), entry (entry_) {}
+  ~rpo_elim();
+
+  virtual tree eliminate_avail (basic_block, tree op);
+
+  virtual void eliminate_push_avail (basic_block, tree);
+
+  basic_block entry;
+  /* Instead of having a local availability lattice for each
+ basic-block and availability at X defined as union of
+ the local availabilities at X and its dominators we're
+ turning this upside down and track availability per
+ value given values are usually made available at very
+ few points (at least one).
+ So we have a value -> vec map where
+ LOCATION is specifying the basic-block LEADER is made
+ 

Re: [PATCH] Use proper type in linear transformation in tree-switch-conversion (PR tree-optimization/88753).

2019-01-08 Thread Jakub Jelinek
On Tue, Jan 08, 2019 at 01:40:41PM +0100, Martin Liška wrote:
> As seen in the PR, when doing switch convert linear transformation,
> one needs to first convert to unsigned type for constructor values.
> 
> Patch survives regression tests and bootstrap on x86_64-linux-gnu.
> Ready for trunk?
> Thanks,
> Martin
> 
> 
> gcc/ChangeLog:
> 
> 2019-01-08  Martin Liska  
> 
>   PR tree-optimization/88753
>   * tree-switch-conversion.c (switch_conversion::build_one_array):
>   Come up with local variable constructor.  Convert first to
>   type of constructor values.

Why is the testcase missing?

Otherwise LGTM.

Also, note contains_linear_function_p
  wide_int range_min = wi::to_wide (fold_convert (TREE_TYPE (elt0),
  m_range_min));
could be written on wide_ints directly:
  wide_int range_min
= wide_int::from (wi::to_wide (m_range_min),
  TYPE_PRECISION (TREE_TYPE (elt0)),
  TYPE_SIGN (TREE_TYPE (m_range_min)));

Jakub


[PATCH] Use proper type in linear transformation in tree-switch-conversion (PR tree-optimization/88753).

2019-01-08 Thread Martin Liška
Hi.

As seen in the PR, when doing switch convert linear transformation,
one needs to first convert to unsigned type for constructor values.

Patch survives regression tests and bootstrap on x86_64-linux-gnu.
Ready for trunk?
Thanks,
Martin


gcc/ChangeLog:

2019-01-08  Martin Liska  

PR tree-optimization/88753
* tree-switch-conversion.c (switch_conversion::build_one_array):
Come up with local variable constructor.  Convert first to
type of constructor values.
---
 gcc/tree-switch-conversion.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)


diff --git a/gcc/tree-switch-conversion.c b/gcc/tree-switch-conversion.c
index 614c450dd4d..de41050f317 100644
--- a/gcc/tree-switch-conversion.c
+++ b/gcc/tree-switch-conversion.c
@@ -600,9 +600,9 @@ switch_conversion::build_one_array (int num, tree arr_index_type,
   name = copy_ssa_name (PHI_RESULT (phi));
   m_target_inbound_names[num] = name;
 
+  vec *constructor = m_constructors[num];
   wide_int coeff_a, coeff_b;
-  bool linear_p = contains_linear_function_p (m_constructors[num], _a,
-	  _b);
+  bool linear_p = contains_linear_function_p (constructor, _a, _b);
   if (linear_p)
 {
   if (dump_file && coeff_a.to_uhwi () > 0)
@@ -610,7 +610,8 @@ switch_conversion::build_one_array (int num, tree arr_index_type,
 		 " and B = %" PRId64 "\n", coeff_a.to_shwi (),
 		 coeff_b.to_shwi ());
 
-  tree t = unsigned_type_for (TREE_TYPE (m_index_expr));
+  /* We must use type of constructor values.  */
+  tree t = unsigned_type_for (TREE_TYPE ((*constructor)[0].value));
   gimple_seq seq = NULL;
   tree tmp = gimple_convert (, t, m_index_expr);
   tree tmp2 = gimple_build (, MULT_EXPR, t,
@@ -633,10 +634,10 @@ switch_conversion::build_one_array (int num, tree arr_index_type,
 	  unsigned int i;
 	  constructor_elt *elt;
 
-	  FOR_EACH_VEC_SAFE_ELT (m_constructors[num], i, elt)
+	  FOR_EACH_VEC_SAFE_ELT (constructor, i, elt)
 	elt->value = fold_convert (value_type, elt->value);
 	}
-  ctor = build_constructor (array_type, m_constructors[num]);
+  ctor = build_constructor (array_type, constructor);
   TREE_CONSTANT (ctor) = true;
   TREE_STATIC (ctor) = true;
 



Re: add tsv110 pipeline scheduling

2019-01-08 Thread Kyrill Tkachov

Hi Wuyuan,

Thanks for pinging.
Some comments inline

On 08/01/19 11:23, wuyuan (E) wrote:

Hi , Maintainers
  I submitted a tsv110 pipeline patch on the 20th of last month , Have 
you reviewed the patch?look forward to your reply.
Best Regards,
Wuyuan

2019-1-8   wuyuan  



Please use the date format 2019-01-08.

Also, only two spaces between date and your name.


* config/aarch64/aarch64-cores.def: New CPU.


This should be
* config/aarch64/aarch64-cores.def (tsv1100): Change scheduling model.


* config/aarch64/aarch64.md : Add "tsv110.md"
* config/aarch64/tsv110.md : tsv110.md   new file


This should be:
* config/aarch64/tsv110.md: New file.




diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
old mode 100644
new mode 100755
index 20f4924..ea9b7c5
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -97,7 +97,7 @@ AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 AARCH64_CORE("ares",  ares, cortexa57, 8_2A, 
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | 
AARCH64_FL_PROFILE, cortexa72, 0x41, 0xd0c, -1)

 /* HiSilicon ('H') cores. */
-AARCH64_CORE("tsv110",  tsv110, cortexa57, 8_2A, AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, tsv110,   0x48, 
0xd01, -1)
+AARCH64_CORE("tsv110",  tsv110, tsv110, 8_2A, AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, tsv110,   0x48, 
0xd01, -1)

 /* ARMv8.4-A Architecture Processors.  */

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md old 
mode 100644 new mode 100755 index cf2732e..7f7673a
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -349,6 +349,7 @@
 (include "thunderx.md")
 (include "../arm/xgene1.md")
 (include "thunderx2t99.md")
+(include "tsv110.md")

 ;; ---
 ;; Jumps and other miscellaneous insns
diff --git a/gcc/config/aarch64/tsv110.md b/gcc/config/aarch64/tsv110.md new 
file mode 100644 index 000..758ab95
--- /dev/null
+++ b/gcc/config/aarch64/tsv110.md
@@ -0,0 +1,708 @@
+;; tsv110 pipeline description
+;; Copyright (C) 2018 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it ;;
+under the terms of the GNU General Public License as published by ;;
+the Free Software Foundation; either version 3, or (at your option) ;;
+any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but ;;
+WITHOUT ANY WARRANTY; without even the implied warranty of ;;
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU ;;
+General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License ;;
+along with GCC; see the file COPYING3.  If not see ;;
+.
+
+(define_automaton "tsv110")
+
+(define_attr "tsv110_neon_type"
+  "neon_arith_acc, neon_arith_acc_q,
+   neon_arith_basic, neon_arith_complex,
+   neon_reduc_add_acc, neon_multiply, neon_multiply_q,
+   neon_multiply_long, neon_mla, neon_mla_q, neon_mla_long,
+   neon_sat_mla_long, neon_shift_acc, neon_shift_imm_basic,
+   neon_shift_imm_complex,
+   neon_shift_reg_basic, neon_shift_reg_basic_q, neon_shift_reg_complex,
+   neon_shift_reg_complex_q, neon_fp_negabs, neon_fp_arith,
+   neon_fp_arith_q, neon_fp_reductions_q, neon_fp_cvt_int,
+   neon_fp_cvt_int_q, neon_fp_cvt16, neon_fp_minmax, neon_fp_mul,
+   neon_fp_mul_q, neon_fp_mla, neon_fp_mla_q, neon_fp_recpe_rsqrte,
+   neon_fp_recpe_rsqrte_q, neon_fp_recps_rsqrts, neon_fp_recps_rsqrts_q,
+   neon_bitops, neon_bitops_q, neon_from_gp,
+   neon_from_gp_q, neon_move, neon_tbl3_tbl4, neon_zip_q, neon_to_gp,
+   neon_load_a, neon_load_b, neon_load_c, neon_load_d, neon_load_e,
+   neon_load_f, neon_store_a, neon_store_b, neon_store_complex,
+   unknown"
+  (cond [
+ (eq_attr "type" "neon_arith_acc, neon_reduc_add_acc,\
+  neon_reduc_add_acc_q")
+   (const_string "neon_arith_acc")
+ (eq_attr "type" "neon_arith_acc_q")
+   (const_string "neon_arith_acc_q")
+ (eq_attr "type" "neon_abs,neon_abs_q,neon_add, neon_add_q, 
neon_add_long,\
+  neon_add_widen, neon_neg, neon_neg_q,\
+  neon_reduc_add, neon_reduc_add_q,\
+  neon_reduc_add_long, neon_sub, neon_sub_q,\
+  neon_sub_long, neon_sub_widen, neon_logic,\
+  neon_logic_q, neon_tst, neon_tst_q,\
+  neon_compare, neon_compare_q,\
+  neon_compare_zero, neon_compare_zero_q,\
+  neon_minmax, neon_minmax_q, neon_reduc_minmax,\
+  neon_reduc_minmax_q")
+   

Re: [PATCH] Export explicit instantiations for C++17 members of std::string

2019-01-08 Thread Rainer Orth
Hi Jonathan,

> On 06/01/19 17:59 +0100, Rainer Orth wrote:
>>Hi Jonathan,
>>
>>> The C++17 standard added some new members to std::basic_string, which
>>> were not previously instantiated in the library. This meant that the
>>> extern template declarations had to be disabled for C++17 mode. With
>>> this patch the new members are instantiated in the library and so the
>>> explicit instantiation declarations can be used for C++17.
>>>
>>> The new members added by C++2a are still not exported, and so the
>>> explicit instantiation declarations are still disabled for C++2a.
>>>
>>> * config/abi/pre/gnu.ver (GLIBCXX_3.4.21): Make patterns less greedy
>>> for const member functions of std::basic_string.
>>> (GLIBCXX_3.4.26): Export member functions of std::basic_string added
>>> in C++17.
>>> * include/bits/basic_string.h (basic_string(__sv_wrapper, const A&)):
>>> Make non-standard constructor private.
>>> [!_GLIBCXX_USE_CXX11_ABI] (basic_string(__sv_wrapper, const A&)):
>>> Likewise.
>>> * include/bits/basic_string.tcc (std::string, std::wstring): Declare
>>> explicit instantiations for C++17 as well as earlier dialects.
>>> * src/c++17/Makefile.am: Add new source files.
>>> * src/c++17/Makefile.in: Regenerate.
>>> * src/c++17/cow-string-inst.cc: New file defining explicit
>>> instantiations for basic_string member functions added in C++17.
>>> * src/c++17/string-inst.cc: Likewise.
>>>
>>> Tested powerpc64le-linux, committed to trunk.
>>
>>this patch broke Solaris bootstrap:
>>
>>ld: fatal: libstdc++-symbols.ver-sun: 6705: symbol
>> 'std::basic_string, std::allocator
>> >::operator std::basic_string_view >()
>> const': symbol version conflict
>>ld: fatal: libstdc++-symbols.ver-sun: 6707: symbol
>> 'std::basic_string,
>> std::allocator >::operator std::basic_string_view> std::char_traits >() const': symbol version conflict
>>ld: fatal: libstdc++-symbols.ver-sun: 6712: symbol
>> 'std::basic_string, std::allocator
>> >::data()': symbol version conflict
>>ld: fatal: libstdc++-symbols.ver-sun: 6714: symbol
>> 'std::basic_string,
>> std::allocator >::data()': symbol version conflict
>>ld: fatal: libstdc++-symbols.ver-sun: 6723: symbol
>> 'std::__cxx11::basic_string,
>> std::allocator >::_S_to_string_view(std::basic_string_view> std::char_traits >)': symbol version conflict
>>ld: fatal: libstdc++-symbols.ver-sun: 6724: symbol
>> 'std::__cxx11::basic_string,
>> std::allocator
>> >::_S_to_string_view(std::basic_string_view> std::char_traits >)': symbol version conflict
>>collect2: error: ld returned 1 exit status
>>make[6]: *** [Makefile:696: libstdc++.la] Error 1
>
> Sorry :-(
>
> [...]
>
>>The following patch allowed the build to finish.
>
> OK for trunk, thanks.

installed now with the following ChangeLog entry:

2019-01-07  Rainer Orth  

* config/abi/pre/gnu.ver (GLIBCXX_3.4): Tighten existing patterns.
(GLIBCXX_3.4.21): Likewise.

> I'll make sure to run my script to check for such conflicts before
> adding any more symbols.

Thanks for double-checking.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] [RFC] PR target/52813 and target/11807

2019-01-08 Thread Richard Sandiford
Bernd Edlinger  writes:
> On 1/7/19 10:23 AM, Jakub Jelinek wrote:
>> On Sun, Dec 16, 2018 at 06:13:57PM +0200, Dimitar Dimitrov wrote:
>>> -  /* Clobbering the STACK POINTER register is an error.  */
>>> +  /* Clobbered STACK POINTER register is not saved/restored by GCC,
>>> + which is often unexpected by users.  See PR52813.  */
>>>if (overlaps_hard_reg_set_p (regset, Pmode, STACK_POINTER_REGNUM))
>>>  {
>>> -  error ("Stack Pointer register clobbered by %qs in %", 
>>> regname);
>>> +  warning (0, "Stack Pointer register clobbered by %qs in %",
>>> +  regname);
>>> +  warning (0, "GCC has always ignored Stack Pointer % clobbers");
>> 
>> Why do we write Stack Pointer rather than stack pointer?  That is really
>> weird.  The second warning would be a note based on the first one, i.e.
>> if (warning ()) note ();
>> and better have some -W* option to silence the warning.
>> 
>
> Yes, thanks for this suggestion.
>
> Meanwhile I found out, that the stack clobber has only been ignored up to
> gcc-5 (at least with lra targets, not really sure about reload targets).
> From gcc-6 on, with the exception of PR arm/77904 which was a regression due
> to the underlying lra change, but fixed later, and back-ported to gcc-6.3.0,
> this works for all targets I tried so far.
>
> To me, it starts to look like a rather unique and useful feature, that I would
> like to keep working.

Not sure what you mean by "unique".  But forcing a frame is a bit of
a slippery concept.  Force it where?  For the asm only, or the whole
function?  This depends on optimisation and hasn't been consistent
across GCC versions, since it depends on the shrink-wrapping
optimisation.  (There was a similar controversy a while ago about
to what extent -fno-omit-frame-pointer should "force a frame".)

The effect on the redzone seems like something that should be specified
explicitly rather than as an (accidental?) side effect of listing the
sp in the clobber list.  Maybe this would be another use for the "asm
attributes" proposal.  "noreturn" was another attribute suggested on
IRC yesterday.

But either way, the general feeling seems to be that going straight to a
hard error is too harsh, since there's quite a bit of existing code that
has the clobber.  This patch implements the compromise discussed on IRC
yesterday of making it a -Wdeprecated warning instead.

Tested on x86_64-linux-gnu and aarch64-linux-gnu.  OK to install?

Richard

Dimitar: sorry the run-around on this patch, and thanks for the
submission.  It looks from all the controversy like it was a
long-festering PR for a reason. :-/


2019-01-07  Richard Sandiford  

gcc/
PR inline-asm/52813
* doc/extend.texi: Document that listing the stack pointer in the
clobber list of an asm is a deprecated feature.
* common.opt (Wdeprecated): Moved from c-family/c.opt.
* cfgexpand.c (asm_clobber_reg_is_valid): Issue a -Wdeprecated
warning instead of an error for clobbers of the stack pointer.
Add a note explaining why.

gcc/c-family/
PR inline-asm/52813
* c.opt (Wdeprecated): Move documentation and variable to common.opt.

gcc/d/
PR inline-asm/52813
* lang.opt (Wdeprecated): Reference common.opt instead of c.opt.

gcc/testsuite/
PR inline-asm/52813
* gcc.target/i386/pr52813.c (test1): Turn the diagnostic into a
-Wdeprecated warning and expect a following note:.

Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi 2019-01-07 12:14:31.699490485 +
+++ gcc/doc/extend.texi 2019-01-08 11:40:20.807906878 +
@@ -9441,6 +9441,15 @@ When the compiler selects which register
 operands, it does not use any of the clobbered registers. As a result, 
 clobbered registers are available for any use in the assembler code.
 
+Another restriction is that the clobber list should not contain the
+stack pointer register.  This is because the compiler requires the
+value of the stack pointer to be the same after an @code{asm}
+statement as it was on entry to the statement.  However, previous
+versions of GCC did not enforce this rule and allowed the stack
+pointer to appear in the list, with unclear semantics.  This behavior
+is deprecated and listing the stack pointer may become an error in
+future versions of GCC@.
+
 Here is a realistic example for the VAX showing the use of clobbered 
 registers: 
 
Index: gcc/common.opt
===
--- gcc/common.opt  2019-01-04 11:39:27.178246751 +
+++ gcc/common.opt  2019-01-08 11:40:20.803906912 +
@@ -579,6 +579,10 @@ Wattribute-warning
 Common Var(warn_attribute_warning) Init(1) Warning
 Warn about uses of __attribute__((warning)) declarations.
 
+Wdeprecated
+Common Var(warn_deprecated) Init(1) Warning
+Warn if a deprecated compiler feature, class, method, or field is used.
+
 

Re: [C++ Patch] Fix four more locations

2019-01-08 Thread Paolo Carlini

Hi,

On 08/01/19 11:24, Christophe Lyon wrote:

On Mon, 7 Jan 2019 at 18:08, Paolo Carlini  wrote:

Hi,

should be straightforward material. Tested x86_64-linux, as usual.

Thanks, Paolo.

/


Hi,

The new g++.dg/diagnostic/thread1.C passes on aarch64, but fails on arm:
FAIL: g++.dg/diagnostic/thread1.C  -std=c++14  (test for errors, line 13)
FAIL: g++.dg/diagnostic/thread1.C  -std=c++14 (test for excess errors)
FAIL: g++.dg/diagnostic/thread1.C  -std=c++17  (test for errors, line 13)
FAIL: g++.dg/diagnostic/thread1.C  -std=c++17 (test for excess errors)

the logs say:
/gcc/testsuite/g++.dg/diagnostic/thread1.C:13:12: error: non-local
variable 's' declared '__thread' needs dynamic initialization

I don't know why the error message does not match?


Evidently on some targets that variable also needs dynamic 
initialization, the issue isn't just that it has a non-trivial 
destructor. Thus, I'm simply going to shorten by one word the expected 
string.


Thanks, Paolo.



Re: Remove overall growth from badness metrics

2019-01-08 Thread Jan Hubicka
> > I plan to commit the patch tomorrow after re-testing everything after
> > the bugfixes from today and yesterday.  In addition to this have found
> > that current inline-unit-growth is too small for LTO of large programs
> > (especially Firefox:) and there are important improvements when
> > increased from 20 to 30 or 40.  I am re-running C++ benchmarks and other
> > tests to decide about precise setting.  Finally I plan to increase
> > the new parameters for bit more inlining at -O2 and -Os.
> 
> Usually increasing these parameters might increase the compilation time and 
> the 
> final code size, do you have any data for compilation time and code size 
> impact from
> these parameter change?

Yes, currently LNT is down because some machines apparently ran out of
disk space after christmas, so I can not show you data on that, but I
can show Firefox.  Will make summary of LNT too once it restarts.

In general this parameter affects primarily -O3 builds, becuase -O2
hardly hits the limit. From -O3 only programs with very large units are
affected (-O2 units hits the limit only if you do have a lot of inline
hints in the code).

In my test bed this included Firefox with or without LTO becuase they do
"poor man's" LTO by #including multiple .cpp files into single unified
source which makes average units large.  Also tramp3d, DLV from our C++
benhcmark is affected. 

I have some data on Firefox and I will build remainin ones:

 growth LTO+PGOPGO   LTOnone  -finline-functions
 20 (default)   83752215   94390023  93085455  103437191  94351191
 40 85299111   97220935  101600151 108910311  115311719
 clang  111520431114863807 108437807

Build times are within noise of my setup, but they are less pronounced
than the code size difference. I think at most 1 minute out of 100.
Note that Firefox consists of 6% Rust code that is not built by GCC and
and building that consumes over half of the build time.

Problem I am trying to solve here are is to get consistent LTO
performance improvements compared to non-LTO. Currently there are
some regressions:
 
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try=b6ba1ebfe913d152989495d8cb450bce02f27d44=try=c7bd18804e328ed490eab707072b3cf59da91042=1=1=1
All those regressions goes away with limit increase.

I tracked them down to the fact that we do not inline some very small
functions already (such as IsHTMLWhitespace .  In GCC 5 timeframe I
tuned this parameter to 20% based on Firefox LTO benchmarks but I was
not that serious about performance since my setup was not giving very
reproducible results for sub 5% differences on tp5o. Since we plan to
enable LTO by default for Tumbleweed I need to find something that does
not cause too many regression while keeping code size advantage of
non-LTO.

Honza
> 
> thanks.
> 
> Qing
> > 
> > Bootstrapped/regtested x86_64-linux, will commit it tomorrow.
> > 
> > * ipa-inline.c (edge_badness): Do not account overall_growth into
> > badness metrics.
> > Index: ipa-inline.c
> > ===
> > --- ipa-inline.c(revision 267612)
> > +++ ipa-inline.c(working copy)
> > @@ -1082,8 +1082,8 @@ edge_badness (struct cgraph_edge *edge,
> >   /* When profile is available. Compute badness as:
> > 
> >  time_saved * caller_count
> > - goodness =  -
> > -growth_of_caller * overall_growth * combined_size
> > + goodness =  
> > +growth_of_caller * combined_size
> > 
> >  badness = - goodness
> > 
> > @@ -1094,7 +1094,6 @@ edge_badness (struct cgraph_edge *edge,
> >|| caller->count.ipa ().nonzero_p ())
> > {
> >   sreal numerator, denominator;
> > -  int overall_growth;
> >   sreal inlined_time = compute_inlined_call_time (edge, edge_time);
> > 
> >   numerator = (compute_uninlined_call_time (edge, unspec_edge_time)
> > @@ -1106,73 +1105,6 @@ edge_badness (struct cgraph_edge *edge,
> >   else if (caller->count.ipa ().initialized_p ())
> > numerator = numerator >> 11;
> >   denominator = growth;
> > -
> > -  overall_growth = callee_info->growth;
> > -
> > -  /* Look for inliner wrappers of the form:
> > -
> > -inline_caller ()
> > -  {
> > -do_fast_job...
> > -if (need_more_work)
> > -  noninline_callee ();
> > -  }
> > -Withhout panilizing this case, we usually inline noninline_callee
> > -into the inline_caller because overall_growth is small preventing
> > -further inlining of inline_caller.
> > -
> > -Penalize only callgraph edges to functions with small overall
> > -growth ...
> > -   */
> > -  if (growth > overall_growth
> > - /* ... and having only one caller which is not inlined ... */
> > - && callee_info->single_caller
> > - && 

Re: [PATCH] Optimize away x86 mem stores of what the mem contains already (PR rtl-optimization/79593)

2019-01-08 Thread Jakub Jelinek
On Tue, Jan 08, 2019 at 11:49:10AM +0100, Uros Bizjak wrote:
> FLD from memory in SF and DFmode is considered a conversion, and
> converts sNaN to NaN (and emits #IA exception). But sNaN handling is
> already busted in the compiler as RA is free to spill the register in
> non-XFmode. IMO, the peephole2 pattern is no worse than the current
> situation.

Ok.

> At least for x86, there are no SUBREGs after reload, otherwise other
> parts of the compiler would break.

The new patch would really handle even a SUBREG there...

> > I don't see how, that would mean I'd have to write two peephole2s instead of
> > one.  It tries to deal with two different cases, one is where the temporary
> > reg is dead, in that case we can optimize away both the load or store, the
> > second case is where the temporary reg isn't dead, in that case we can
> > optimize away the store, but not the load.  With the optimizing away of both
> > load and store I was just trying to do a cheap DCE there.
> 
> I didn't realize this is an optimization, a comment would be welcome here.

Ugh, except that it doesn't work.  peep2_reg_dead_p (1, operands[0])
is not what I meant, that is always false, as the register must be live in
between the first and second instruction.  I meant
peep2_reg_dead_p (2, operands[0]), the register dead at the end of the
second instruction, except we don't really support
define_split/define_peephole2 splitting into zero instructions, DONE; in
that case returns NULL like FAIL; does.  So, let's just wait for DCE to
finish it up.

Here is what I'll bootstrap/regtest then.  Added also
reg_overlap_mentioned_p, in case there is e.g.
  movl (%eax,%edx), %eax
  movl %eax, (%eax,%edx)
or similar and as I said earlier, explicit match_operand so that I can
check MEM_VOLATILE_P on both MEMs.

2019-01-08  Jakub Jelinek  

PR rtl-optimization/79593
* config/i386/i386.md (reg = mem; mem = reg): New define_peephole2.

--- gcc/config/i386/i386.md.jj  2019-01-07 23:54:54.494800693 +0100
+++ gcc/config/i386/i386.md 2019-01-08 12:34:18.916832780 +0100
@@ -18740,6 +18740,18 @@ (define_peephole2
   const0_rtx);
 })
 
+;; Attempt to optimize away memory stores of values the memory already
+;; has.  See PR79593.
+(define_peephole2
+  [(set (match_operand 0 "register_operand")
+(match_operand 1 "memory_operand"))
+   (set (match_operand 2 "memory_operand") (match_dup 0))]
+  "!MEM_VOLATILE_P (operands[1])
+   && !MEM_VOLATILE_P (operands[2])
+   && rtx_equal_p (operands[1], operands[2])
+   && !reg_overlap_mentioned_p (operands[0], operands[2])"
+  [(set (match_dup 0) (match_dup 1))])
+
 ;; Attempt to always use XOR for zeroing registers (including FP modes).
 (define_peephole2
   [(set (match_operand 0 "general_reg_operand")


Jakub


Re: [PATCH 3/3][GCC][AARCH64] Add support for pointer authentication B key

2019-01-08 Thread Sam Tebbs

On 1/7/19 6:28 PM, James Greenhalgh wrote:
> On Fri, Dec 21, 2018 at 09:00:10AM -0600, Sam Tebbs wrote:
>> On 11/9/18 11:04 AM, Sam Tebbs wrote:
>   
> 
>
>> Attached is an improved patch with "hint" removed from the test scans,
>> pauth_hint_num_a and pauth_hint_num_b merged into pauth_hint_num and the
>> "gcc_assert (cfun->machine->frame.laid_out)" removal reverted since was
>> an unnecessary change.
>>
>> OK for trunk?
> While the AArch64 parts look OK to me and are buried behind an option so are
> relatively safe even though we're late in development, you'll need someone
> else to approve the libgcc changes. Especially as you change a generic
> routine with an undocumented (?) AArch64-specific change.
>
> Thanks,
> James

Thanks James, CC'ing Ian Lance Taylor.

The documentation relevant to the libgcc change is expected to be 
published in the near future.

>
>> gcc/
>> 2018-12-21  Sam Tebbs
>>
>>  * config/aarch64/aarch64-builtins.c (aarch64_builtins): Add
>>  AARCH64_PAUTH_BUILTIN_AUTIB1716 and AARCH64_PAUTH_BUILTIN_PACIB1716.
>>  * config/aarch64/aarch64-builtins.c (aarch64_init_pauth_hint_builtins):
>>  Add autib1716 and pacib1716 initialisation.
>>  * config/aarch64/aarch64-builtins.c (aarch64_expand_builtin): Add checks
>>  for autib1716 and pacib1716.
>>  * config/aarch64/aarch64-protos.h (aarch64_key_type,
>>  aarch64_post_cfi_startproc): Define.
>>  * config/aarch64/aarch64-protos.h (aarch64_ra_sign_key): Define extern.
>>  * config/aarch64/aarch64.c (aarch64_return_address_signing_enabled): Add
>>  check for b-key.
>>  * config/aarch64/aarch64.c (aarch64_ra_sign_key,
>>  aarch64_post_cfi_startproc, aarch64_handle_pac_ret_b_key): Define.
>>  * config/aarch64/aarch64.h (TARGET_ASM_POST_CFI_STARTPROC): Define.
>>  * config/aarch64/aarch64.c (aarch64_pac_ret_subtypes): Add "b-key".
>>  * config/aarch64/aarch64.md (unspec): Add UNSPEC_AUTIA1716,
>>  UNSPEC_AUTIB1716, UNSPEC_AUTIASP, UNSPEC_AUTIBSP, UNSPEC_PACIA1716,
>>  UNSPEC_PACIB1716, UNSPEC_PACIASP, UNSPEC_PACIBSP.
>>  * config/aarch64/aarch64.md (do_return): Add check for b-key.
>>  * config/aarch64/aarch64.md (sp): Replace
>>  pauth_hint_num_a with pauth_hint_num.
>>  * config/aarch64/aarch64.md (1716): Replace
>>  pauth_hint_num_a with pauth_hint_num.
>>  * config/aarch64/aarch64.opt (msign-return-address=): Deprecate.
>>  * config/aarch64/iterators.md (PAUTH_LR_SP): Add UNSPEC_AUTIASP,
>>  UNSPEC_AUTIBSP, UNSPEC_PACIASP, UNSPEC_PACIBSP.
>>  * config/aarch64/iterators.md (PAUTH_17_16): Add UNSPEC_AUTIA1716,
>>  UNSPEC_AUTIB1716, UNSPEC_PACIA1716, UNSPEC_PACIB1716.
>>  * config/aarch64/iterators.md (pauth_mnem_prefix): Add UNSPEC_AUTIA1716,
>>  UNSPEC_AUTIB1716, UNSPEC_PACIA1716, UNSPEC_PACIB1716, UNSPEC_AUTIASP,
>>  UNSPEC_AUTIBSP, UNSPEC_PACIASP, UNSPEC_PACIBSP.
>>  * config/aarch64/iterators.md (pauth_hint_num_a): Replace
>>  UNSPEC_PACI1716 and UNSPEC_AUTI1716 with UNSPEC_PACIA1716 and
>>  UNSPEC_AUTIA1716 respectively.
>>  * config/aarch64/iterators.md (pauth_hint_num_a): Rename to 
>> pauth_hint_num
>>  and add UNSPEC_PACIBSP, UNSPEC_AUTIBSP, UNSPEC_PACIB1716, 
>> UNSPEC_AUTIB1716.
>>
>> gcc/testsuite
>> 2018-12-21  Sam Tebbs
>>
>>  * gcc.target/aarch64/return_address_sign_1.c (dg-final): Replace
>>  "autiasp" and "paciasp" with "hint\t29 // autisp" and
>>  "hint\t25 // pacisp" respectively.
>>  * gcc.target/aarch64/return_address_sign_2.c (dg-final): Replace
>>  "paciasp" with "hint\t25 // pacisp".
>>  * gcc.target/aarch64/return_address_sign_3.c (dg-final): Replace
>>  "paciasp" and "autiasp" with "pacisp" and "autisp" respectively.
>>  * gcc.target/aarch64/return_address_sign_b_1.c: New file.
>>  * gcc.target/aarch64/return_address_sign_b_2.c: New file.
>>  * gcc.target/aarch64/return_address_sign_b_3.c: New file.
>>  * gcc.target/aarch64/return_address_sign_b_exception.c: New file.
>>  * gcc.target/aarch64/return_address_sign_builtin.c: New file
>>
>> libgcc/
>> 2018-12-21  Sam Tebbs
>>
>>  * config/aarch64/aarch64-unwind.h (aarch64_cie_signed_with_b_key): New
>>  function.
>>  * config/aarch64/aarch64-unwind.h (aarch64_post_extract_frame_addr,
>>  aarch64_post_frob_eh_handler_addr): Add check for b-key.
>>  * unwind-dw2-fde.c (get_cie_encoding): Add check for 'B' in augmentation
>>  string.
>>  * unwind-dw2.c (extract_cie_info): Add check for 'B' in augmentation
>>  string.
>>


Re: add tsv110 pipeline scheduling

2019-01-08 Thread wuyuan (E)
Hi , Maintainers
  I submitted a tsv110 pipeline patch on the 20th of last month , Have 
you reviewed the patch?look forward to your reply.


 
Best Regards,


   
Wuyuan

2019-1-8   wuyuan  

* config/aarch64/aarch64-cores.def: New CPU.
* config/aarch64/aarch64.md : Add "tsv110.md"
* config/aarch64/tsv110.md : tsv110.md   new file


diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
old mode 100644
new mode 100755
index 20f4924..ea9b7c5
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -97,7 +97,7 @@ AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2  AARCH64_CORE("ares",  ares, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD 
| AARCH64_FL_PROFILE, cortexa72, 0x41, 0xd0c, -1)
 
 /* HiSilicon ('H') cores. */
-AARCH64_CORE("tsv110",  tsv110, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, tsv110,  
 0x48, 0xd01, -1)
+AARCH64_CORE("tsv110",  tsv110, tsv110, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, tsv110,  
 0x48, 0xd01, -1)
 
 /* ARMv8.4-A Architecture Processors.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md old 
mode 100644 new mode 100755 index cf2732e..7f7673a
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -349,6 +349,7 @@
 (include "thunderx.md")
 (include "../arm/xgene1.md")
 (include "thunderx2t99.md")
+(include "tsv110.md")
 
 ;; ---
 ;; Jumps and other miscellaneous insns
diff --git a/gcc/config/aarch64/tsv110.md b/gcc/config/aarch64/tsv110.md new 
file mode 100644 index 000..758ab95
--- /dev/null
+++ b/gcc/config/aarch64/tsv110.md
@@ -0,0 +1,708 @@
+;; tsv110 pipeline description
+;; Copyright (C) 2018 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it ;; 
+under the terms of the GNU General Public License as published by ;; 
+the Free Software Foundation; either version 3, or (at your option) ;; 
+any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but ;; 
+WITHOUT ANY WARRANTY; without even the implied warranty of ;; 
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU ;; 
+General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License ;; 
+along with GCC; see the file COPYING3.  If not see ;; 
+.
+
+(define_automaton "tsv110")
+
+(define_attr "tsv110_neon_type"
+  "neon_arith_acc, neon_arith_acc_q,
+   neon_arith_basic, neon_arith_complex,
+   neon_reduc_add_acc, neon_multiply, neon_multiply_q,
+   neon_multiply_long, neon_mla, neon_mla_q, neon_mla_long,
+   neon_sat_mla_long, neon_shift_acc, neon_shift_imm_basic,
+   neon_shift_imm_complex,
+   neon_shift_reg_basic, neon_shift_reg_basic_q, neon_shift_reg_complex,
+   neon_shift_reg_complex_q, neon_fp_negabs, neon_fp_arith,
+   neon_fp_arith_q, neon_fp_reductions_q, neon_fp_cvt_int,
+   neon_fp_cvt_int_q, neon_fp_cvt16, neon_fp_minmax, neon_fp_mul,
+   neon_fp_mul_q, neon_fp_mla, neon_fp_mla_q, neon_fp_recpe_rsqrte,
+   neon_fp_recpe_rsqrte_q, neon_fp_recps_rsqrts, neon_fp_recps_rsqrts_q,
+   neon_bitops, neon_bitops_q, neon_from_gp,
+   neon_from_gp_q, neon_move, neon_tbl3_tbl4, neon_zip_q, neon_to_gp,
+   neon_load_a, neon_load_b, neon_load_c, neon_load_d, neon_load_e,
+   neon_load_f, neon_store_a, neon_store_b, neon_store_complex,
+   unknown"
+  (cond [
+ (eq_attr "type" "neon_arith_acc, neon_reduc_add_acc,\
+  neon_reduc_add_acc_q")
+   (const_string "neon_arith_acc")
+ (eq_attr "type" "neon_arith_acc_q")
+   (const_string "neon_arith_acc_q")
+ (eq_attr "type" "neon_abs,neon_abs_q,neon_add, neon_add_q, 
neon_add_long,\
+  neon_add_widen, neon_neg, neon_neg_q,\
+  neon_reduc_add, neon_reduc_add_q,\
+  neon_reduc_add_long, neon_sub, neon_sub_q,\
+  neon_sub_long, neon_sub_widen, neon_logic,\
+  neon_logic_q, neon_tst, neon_tst_q,\
+  neon_compare, neon_compare_q,\
+  neon_compare_zero, 

Re: [PATCH] Optimize away x86 mem stores of what the mem contains already (PR rtl-optimization/79593)

2019-01-08 Thread Uros Bizjak
On Tue, Jan 8, 2019 at 10:27 AM Jakub Jelinek  wrote:
>
> On Tue, Jan 08, 2019 at 08:29:03AM +0100, Uros Bizjak wrote:
> > Is there a reason stack registers are excluded? Before stackreg pass,
> > these registers are just like other hard registers.
>
> I was just afraid of those, after stack pass removing a store but not
> load would result in the stack getting out of sync, and I wasn't sure if the
> stack pass will tolerate a RA chosen stack reg not being touched anymore.

It does. There is a stack compensating code that handles these situations.

> Also, doesn't loading a float/double/long double into a stack reg and
> storing again canonicalize it in any way?  Say NaNs?  Or unnormal or
> pseudo-denormal long doubles etc.?  Tried that now with a sNaN and the
> load/store didn't change it, but haven't tried with exceptions enabled to
> see if it would raise one.

FLD from memory in SF and DFmode is considered a conversion, and
converts sNaN to NaN (and emits #IA exception). But sNaN handling is
already busted in the compiler as RA is free to spill the register in
non-XFmode. IMO, the peephole2 pattern is no worse than the current
situation.

> If that is fine, I'll leave that out.

I think we are good to do so.

> Looking around, I was using a wrong macro anyway and am surprised nothing
> complained, it should have been STACK_REG_P rather than STACK_REGNO_P.
>
> > Other that that, there is no need for REG_P predicate; after reload we
> > don't have subregs and register_operand will match only hard regs.
>
> Ok, I wasn't sure because "register_operand" allows (subreg (reg)) even if
> reload_completed, just disallows subregs of mem after that.

At least for x86, there are no SUBREGs after reload, otherwise other
parts of the compiler would break.

> > Also, please put peep2_reg_dead_p predicate in the pattern predicate.
>
> I don't see how, that would mean I'd have to write two peephole2s instead of
> one.  It tries to deal with two different cases, one is where the temporary
> reg is dead, in that case we can optimize away both the load or store, the
> second case is where the temporary reg isn't dead, in that case we can
> optimize away the store, but not the load.  With the optimizing away of both
> load and store I was just trying to do a cheap DCE there.

I didn't realize this is an optimization, a comment would be welcome here.

Uros.

> Looking around more, I actually think I need to replace
> (match_dup 1) with (match_operand 2 "memory_operand"), add rtx_equal_p
> (operands[1], operands[2]) and !MEM_VOLATILE_P (operands[2]), because
> apparently rtx_equal_p doesn't check the MEM_VOLATILE_P bit.
>
> > > 2019-01-07  Jakub Jelinek  
> > >
> > > PR rtl-optimization/79593
> > > * config/i386/i386.md (reg = mem; mem = reg): New 
> > > define_peephole2.
> > >
> > > --- gcc/config/i386/i386.md.jj  2019-01-01 12:37:31.564738571 +0100
> > > +++ gcc/config/i386/i386.md 2019-01-07 17:11:21.056392168 +0100
> > > @@ -18740,6 +18740,21 @@ (define_peephole2
> > >const0_rtx);
> > >  })
> > >
> > > +;; Attempt to optimize away memory stores of values the memory already
> > > +;; has.  See PR79593.
> > > +(define_peephole2
> > > +  [(set (match_operand 0 "register_operand")
> > > +(match_operand 1 "memory_operand"))
> > > +   (set (match_dup 1) (match_dup 0))]
> > > +  "REG_P (operands[0])
> > > +   && !STACK_REGNO_P (operands[0])
> > > +   && !MEM_VOLATILE_P (operands[1])"
> > > +  [(set (match_dup 0) (match_dup 1))]
> > > +{
> > > +  if (peep2_reg_dead_p (1, operands[0]))
> > > +DONE;
> > > +})
> > > +
> > >  ;; Attempt to always use XOR for zeroing registers (including FP modes).
> > >  (define_peephole2
> > >[(set (match_operand 0 "general_reg_operand")
>
> Jakub


Backports to 8.x

2019-01-08 Thread Jakub Jelinek
Hi!

I've backported the following patches, bootstrapped/regtested on
x86_64-linux and i686-linux and committed to gcc-8-branch.

Jakub
2019-01-08  Jakub Jelinek  

Backported from mainline
2018-11-13  Jakub Jelinek  

PR tree-optimization/87898
* omp-simd-clone.c (ipa_simd_modify_function_body): Remove debug stmts
where the first argument was changed into a non-decl.

* gcc.dg/gomp/pr87898.c: New test.

--- gcc/omp-simd-clone.c(revision 266092)
+++ gcc/omp-simd-clone.c(revision 266093)
@@ -1014,6 +1011,21 @@ ipa_simd_modify_function_body (struct cg
  if (info.modified)
{
  update_stmt (stmt);
+ /* If the above changed the var of a debug bind into something
+different, remove the debug stmt.  We could also for all the
+replaced parameters add VAR_DECLs for debug info purposes,
+add debug stmts for those to be the simd array accesses and
+replace debug stmt var operand with that var.  Debugging of
+vectorized loops doesn't work too well, so don't bother for
+now.  */
+ if ((gimple_debug_bind_p (stmt)
+  && !DECL_P (gimple_debug_bind_get_var (stmt)))
+ || (gimple_debug_source_bind_p (stmt)
+ && !DECL_P (gimple_debug_source_bind_get_var (stmt
+   {
+ gsi_remove (, true);
+ continue;
+   }
  if (maybe_clean_eh_stmt (stmt))
gimple_purge_dead_eh_edges (gimple_bb (stmt));
}
--- gcc/testsuite/gcc.dg/gomp/pr87898.c (nonexistent)
+++ gcc/testsuite/gcc.dg/gomp/pr87898.c (revision 266093)
@@ -0,0 +1,10 @@
+/* PR tree-optimization/87898 */
+/* { dg-do compile { target fgraphite } } */
+/* { dg-options "-O1 -floop-parallelize-all -fopenmp 
-ftree-parallelize-loops=2 -g" } */
+
+#pragma omp declare simd
+void
+foo (int x)
+{
+  x = 0;
+}
2019-01-08  Jakub Jelinek  

Backported from mainline
2018-11-15  Jakub Jelinek  

PR rtl-optimization/88018
* cfgrtl.c (fixup_abnormal_edges): Guard moving insns to fallthru edge
on the presence of fallthru edge, rather than if it is a USE or not.

* g++.dg/tsan/pr88018.C: New test.

--- gcc/cfgrtl.c(revision 266173)
+++ gcc/cfgrtl.c(revision 266174)
@@ -3332,8 +3332,15 @@ fixup_abnormal_edges (void)
 If it's placed after a trapping call (i.e. that
 call is the last insn anyway), we have no fallthru
 edge.  Simply delete this use and don't try to insert
-on the non-existent edge.  */
- if (GET_CODE (PATTERN (insn)) != USE)
+on the non-existent edge.
+Similarly, sometimes a call that can throw is
+followed in the source with __builtin_unreachable (),
+meaning that there is UB if the call returns rather
+than throws.  If there weren't any instructions
+following such calls before, supposedly even the ones
+we've deleted aren't significant and can be
+removed.  */
+ if (e)
{
  /* We're not deleting it, we're moving it.  */
  insn->set_undeleted ();
--- gcc/testsuite/g++.dg/tsan/pr88018.C (nonexistent)
+++ gcc/testsuite/g++.dg/tsan/pr88018.C (revision 266236)
@@ -0,0 +1,6 @@
+// PR rtl-optimization/88018
+// { dg-do compile }
+// { dg-skip-if "" { *-*-* }  { "*" } { "-O0" } }
+// { dg-options "-fsanitize=thread -fno-ipa-pure-const -O1 
-fno-inline-functions-called-once -w" }
+
+#include "../pr69667.C"
2019-01-08  Jakub Jelinek  

Backported from mainline
2018-11-16  Jakub Jelinek  

PR rtl-optimization/87475
* cfgrtl.c (patch_jump_insn): Allow redirection failure for
CROSSING_JUMP_P insns.
(cfg_layout_redirect_edge_and_branch): Don't ICE if ret is NULL.

* g++.dg/opt/pr87475.C: New test.

--- gcc/cfgrtl.c(revision 266218)
+++ gcc/cfgrtl.c(revision 266219)
@@ -1268,11 +1268,13 @@ patch_jump_insn (rtx_insn *insn, rtx_ins
 
  /* If the substitution doesn't succeed, die.  This can happen
 if the back end emitted unrecognizable instructions or if
-target is exit block on some arches.  */
+target is exit block on some arches.  Or for crossing
+jumps.  */
  if (!redirect_jump (as_a  (insn),
  block_label (new_bb), 0))
{
- gcc_assert (new_bb == EXIT_BLOCK_PTR_FOR_FN (cfun));
+ gcc_assert (new_bb == EXIT_BLOCK_PTR_FOR_FN (cfun)
+ || CROSSING_JUMP_P 

Re: [PATCH 2/3][GCC][AARCH64] Add new -mbranch-protection option to combine pointer signing and BTI

2019-01-08 Thread Sam Tebbs
On 1/7/19 6:11 PM, James Greenhalgh wrote:

> On Thu, Dec 20, 2018 at 10:38:42AM -0600, Sam Tebbs wrote:
>> On 11/22/18 4:54 PM, Sam Tebbs wrote:
> 
>
>> Hi all,
>>
>> Attached is an updated patch with branch_protec_type renamed to
>> branch_protect_type, some unneeded ATTRIBUTE_USED removed and an added
>> use of ARRAY_SIZE.
>>
>> Below is the updated changelog.
>>
>> OK for trunk? I have committed the preceding patch in the series.
>
> OK. Please get this in soon as we really want to be closing down for Stage 4
> (and fix a few bugs in return :-) ).
>
> Thanks,
> James
Thanks James, committed as r267717.
>
>> gcc/ChangeLog:
>>
>> 2018-12-20  Sam Tebbs
>>
>>  * config/aarch64/aarch64.c (BRANCH_PROTECT_STR_MAX,
>>  aarch64_parse_branch_protection,
>>  struct aarch64_branch_protect_type,
>>  aarch64_handle_no_branch_protection,
>>  aarch64_handle_standard_branch_protection,
>>  aarch64_validate_mbranch_protection,
>>  aarch64_handle_pac_ret_protection,
>>  aarch64_handle_attr_branch_protection,
>>  accepted_branch_protection_string,
>>  aarch64_pac_ret_subtypes,
>>  aarch64_branch_protect_types,
>>  aarch64_handle_pac_ret_leaf): Define.
>>  (aarch64_override_options_after_change_1): Add check for
>>  accepted_branch_protection_string.
>>  (aarch64_override_options): Add check for
>>  accepted_branch_protection_string.
>>  (aarch64_option_save): Save accepted_branch_protection_string.
>>  (aarch64_option_restore): Save
>>  accepted_branch_protection_string.
>>  * config/aarch64/aarch64.c (aarch64_attributes): Add branch-protection.
>>  * config/aarch64/aarch64.opt: Add mbranch-protection. Deprecate
>>  msign-return-address.
>>  * doc/invoke.texi: Add mbranch-protection.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2018-12-20  Sam Tebbs
>>
>>  * (gcc.target/aarch64/return_address_sign_1.c,
>>  gcc.target/aarch64/return_address_sign_2.c,
>>  gcc.target/aarch64/return_address_sign_3.c (__attribute__)): Change
>>  option to -mbranch-protection.
>>  * gcc.target/aarch64/(branch-protection-option.c,
>>  branch-protection-option-2.c, branch-protection-attr.c,
>>  branch-protection-attr-2.c): New file.
>>


Re: [C++ Patch] Fix four more locations

2019-01-08 Thread Christophe Lyon
On Mon, 7 Jan 2019 at 18:08, Paolo Carlini  wrote:
>
> Hi,
>
> should be straightforward material. Tested x86_64-linux, as usual.
>
> Thanks, Paolo.
>
> /
>

Hi,

The new g++.dg/diagnostic/thread1.C passes on aarch64, but fails on arm:
FAIL: g++.dg/diagnostic/thread1.C  -std=c++14  (test for errors, line 13)
FAIL: g++.dg/diagnostic/thread1.C  -std=c++14 (test for excess errors)
FAIL: g++.dg/diagnostic/thread1.C  -std=c++17  (test for errors, line 13)
FAIL: g++.dg/diagnostic/thread1.C  -std=c++17 (test for excess errors)

the logs say:
/gcc/testsuite/g++.dg/diagnostic/thread1.C:13:12: error: non-local
variable 's' declared '__thread' needs dynamic initialization

I don't know why the error message does not match?

Christophe


Re: [PATCH 2/2] PR libstdc++/86756 Move rest of std::filesystem to libstdc++.so

2019-01-08 Thread Christophe Lyon
On Mon, 7 Jan 2019 at 15:14, Christophe Lyon  wrote:
>
> On Mon, 7 Jan 2019 at 13:39, Jonathan Wakely  wrote:
> >
> > On 07/01/19 09:48 +, Jonathan Wakely wrote:
> > >On 07/01/19 10:24 +0100, Christophe Lyon wrote:
> > >>Hi Jonathan
> > >>
> > >>On Sun, 6 Jan 2019 at 23:37, Jonathan Wakely  wrote:
> > >>>
> > >>>Move std::filesystem directory iterators and operations from
> > >>>libstdc++fs.a to main libstdc++ library. These components have many
> > >>>dependencies on OS support, which is not available on all targets. Some
> > >>>additional autoconf checks and conditional compilation is needed to
> > >>>ensure the files will build for all targets. Previously this code was
> > >>>not compiled without --enable-libstdcxx-filesystem-ts but the C++17
> > >>>components should be available for all hosted builds.
> > >>>
> > >>>The tests for these components no longer need to link to libstdc++fs.a,
> > >>>but are not expected to pass on all targets. To avoid numerous failures
> > >>>on targets which are not expected to pass the tests (due to missing OS
> > >>>functionality) leave the dg-require-filesystem-ts directives in place
> > >>>for now. This will ensure the tests only run for builds where the
> > >>>filesystem-ts library is built, which presumably means some level of OS
> > >>>support is present.
> > >>>
> > >>>
> > >>>Tested x86_64-linux (old/new string ABIs, 32/64 bit), x86_64-w64-mingw32.
> > >>>
> > >>>Committed to trunk.
> > >>>
> > >>
> > >>After this commit (r267616), I've noticed build failures for my
> > >>newlib-based toolchains:
> > >>aarch64-elf, arm-eabi:
> > >>
> > >>In file included from
> > >>/tmp/5241593_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/src/c++17/fs_ops.cc:57:
> > >>/tmp/5241593_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/src/c++17/../filesystem/ops-common.h:142:11:
> > >>error: '::truncate' has not been declared
> > >> 142 |   using ::truncate;
> > >> |   ^~~~
> > >>/tmp/5241593_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/src/c++17/fs_ops.cc:
> > >>In function 'void std::filesystem::resize_file(const
> > >>std::filesystem::__cxx11::path&, uintmax_t, std::error_code&)':
> > >>/tmp/5241593_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/src/c++17/fs_ops.cc:1274:19:
> > >>error: 'truncate' is not a member of 'posix'
> > >>1274 |   else if (posix::truncate(p.c_str(), size))
> > >> |   ^~~~
> > >>make[5]: *** [fs_ops.lo] Error 1
> > >>
> > >>I'm not sure if there's an obvious fix? Note that I'm using a rather
> > >>old newlib version, if that matters.
> > >
> > >That's probably the reason, as I didn't see this in my tests with
> > >newlib builds.
> > >
> > >The fix is to add yet another autoconf check and guard the uses of
> > >truncate with a _GLIBCXX_USE_TRUNCATE macro. I'll do that now ...
> >
> >
> > Should be fixed with this patch, committed to trunk as r267647.
> >
>
> Yes, it works. Thanks!
>

Hi Jonathan,

So... this was a confirmation that the GCC build succeeded, not that
the tests pass :)

And there are actually a couple new errors with my newlib-based toolchains:
FAIL: 27_io/filesystem/operations/all.cc (test for excess errors)
FAIL: 27_io/filesystem/operations/resize_file.cc (test for excess errors)
FAIL: 27_io/filesystem/path/generation/normal2.cc (test for excess errors)
which are also UNRESOLVED, because of link-time undefined reference to `chdir',
chmod, mkdir, pathconf and getcwd.

On aarch64, I'm seeing an addtional:
FAIL: 27_io/filesystem/path/compare/strings.cc execution test
because:
/libstdc++-v3/testsuite/27_io/filesystem/path/compare/strings.cc:39:
void test01(): Assertion 'p.compare(p0) == p.compare(s0)' failed.


Christophe


Re: [PATCH] ARM: fix -masm-syntax-unified (PR88648)

2019-01-08 Thread Kyrill Tkachov



On 08/01/19 09:33, Kyrill Tkachov wrote:

Hi Stefan,

On 01/01/19 23:34, Stefan Agner wrote:
> This allows to use unified asm syntax when compiling for the
> ARM instruction. This matches documentation and seems what the
> initial patch was intended doing when the flag got added.
> ---
>  gcc/config/arm/arm.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 3419b6bd0f8..67b2b199f3f 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -3095,7 +3095,8 @@ arm_option_override_internal (struct gcc_options *opts,
>
>/* Thumb2 inline assembly code should always use unified syntax.
>   This will apply to ARM and Thumb1 eventually.  */
> -  opts->x_inline_asm_unified = TARGET_THUMB2_P (opts->x_target_flags);
> +  if (TARGET_THUMB2_P (opts->x_target_flags))
> +opts->x_inline_asm_unified = true;

This looks right to me and is the logic we had in GCC 5.
How has this patch been tested?



For the avoidance of doubt, I mean that your patch is correct :)
(not that the existing code is right).


Can you please provide a ChangeLog entry for this patch[1].

Thanks,
Kyrill

[1] https://gcc.gnu.org/contribute.html

>
>  #ifdef SUBTARGET_OVERRIDE_INTERNAL_OPTIONS
>SUBTARGET_OVERRIDE_INTERNAL_OPTIONS;
> --
> 2.20.1
>





Re: [PATCH] ARM: add test case for -masm-syntax-unified (PR88648)

2019-01-08 Thread Kyrill Tkachov

Hi Stefan,

On 02/01/19 21:47, Stefan Agner wrote:

Add a test case to check whether -masm-syntax-unified is indeed
emitting the inline assembler with .syntax unified.


Can you please provide a ChangeLog entry for this change.


---
 .../gcc.target/arm/pr88648-asm-syntax-unified.c| 14 ++
 1 file changed, 14 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arm/pr88648-asm-syntax-unified.c

diff --git a/gcc/testsuite/gcc.target/arm/pr88648-asm-syntax-unified.c 
b/gcc/testsuite/gcc.target/arm/pr88648-asm-syntax-unified.c
new file mode 100644
index 000..2bd9d891b9e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr88648-asm-syntax-unified.c
@@ -0,0 +1,14 @@
+/* Test for unified syntax assembly generation.  */
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_arch_v7a_ok } */
+/* { dg-add-options arm_arch_v7a } */
+/* { dg-options "-marm -march=armv7-a -masm-syntax-unified" } */
+
+void test ()
+{
+  asm("nop");
+}
+
+/* { dg-final { scan-assembler-times {\.syntax\sunified} 3 } } */
+/* { dg-final { scan-assembler-times {\.syntax\sdivided} 0 } } */
+


Please use scan-assembler-not here to check for the absence of the ".syntax 
divided".

Looks ok to me otherwise.
Do you need someone to commit these for you?

Thanks,
Kyrill


--
2.20.1





Re: [PATCH] ARM: fix -masm-syntax-unified (PR88648)

2019-01-08 Thread Kyrill Tkachov

Hi Stefan,

On 01/01/19 23:34, Stefan Agner wrote:

This allows to use unified asm syntax when compiling for the
ARM instruction. This matches documentation and seems what the
initial patch was intended doing when the flag got added.
---
 gcc/config/arm/arm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 3419b6bd0f8..67b2b199f3f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3095,7 +3095,8 @@ arm_option_override_internal (struct gcc_options *opts,

   /* Thumb2 inline assembly code should always use unified syntax.
  This will apply to ARM and Thumb1 eventually.  */
-  opts->x_inline_asm_unified = TARGET_THUMB2_P (opts->x_target_flags);
+  if (TARGET_THUMB2_P (opts->x_target_flags))
+opts->x_inline_asm_unified = true;


This looks right to me and is the logic we had in GCC 5.
How has this patch been tested?

Can you please provide a ChangeLog entry for this patch[1].

Thanks,
Kyrill

[1] https://gcc.gnu.org/contribute.html



 #ifdef SUBTARGET_OVERRIDE_INTERNAL_OPTIONS
   SUBTARGET_OVERRIDE_INTERNAL_OPTIONS;
--
2.20.1





Re: [PATCH] Optimize away x86 mem stores of what the mem contains already (PR rtl-optimization/79593)

2019-01-08 Thread Jakub Jelinek
On Tue, Jan 08, 2019 at 08:29:03AM +0100, Uros Bizjak wrote:
> Is there a reason stack registers are excluded? Before stackreg pass,
> these registers are just like other hard registers.

I was just afraid of those, after stack pass removing a store but not
load would result in the stack getting out of sync, and I wasn't sure if the
stack pass will tolerate a RA chosen stack reg not being touched anymore.

Also, doesn't loading a float/double/long double into a stack reg and
storing again canonicalize it in any way?  Say NaNs?  Or unnormal or
pseudo-denormal long doubles etc.?  Tried that now with a sNaN and the
load/store didn't change it, but haven't tried with exceptions enabled to
see if it would raise one.

If that is fine, I'll leave that out.

Looking around, I was using a wrong macro anyway and am surprised nothing
complained, it should have been STACK_REG_P rather than STACK_REGNO_P.

> Other that that, there is no need for REG_P predicate; after reload we
> don't have subregs and register_operand will match only hard regs.

Ok, I wasn't sure because "register_operand" allows (subreg (reg)) even if
reload_completed, just disallows subregs of mem after that.

> Also, please put peep2_reg_dead_p predicate in the pattern predicate.

I don't see how, that would mean I'd have to write two peephole2s instead of
one.  It tries to deal with two different cases, one is where the temporary
reg is dead, in that case we can optimize away both the load or store, the
second case is where the temporary reg isn't dead, in that case we can
optimize away the store, but not the load.  With the optimizing away of both
load and store I was just trying to do a cheap DCE there.

Looking around more, I actually think I need to replace
(match_dup 1) with (match_operand 2 "memory_operand"), add rtx_equal_p
(operands[1], operands[2]) and !MEM_VOLATILE_P (operands[2]), because
apparently rtx_equal_p doesn't check the MEM_VOLATILE_P bit.

> > 2019-01-07  Jakub Jelinek  
> >
> > PR rtl-optimization/79593
> > * config/i386/i386.md (reg = mem; mem = reg): New define_peephole2.
> >
> > --- gcc/config/i386/i386.md.jj  2019-01-01 12:37:31.564738571 +0100
> > +++ gcc/config/i386/i386.md 2019-01-07 17:11:21.056392168 +0100
> > @@ -18740,6 +18740,21 @@ (define_peephole2
> >const0_rtx);
> >  })
> >
> > +;; Attempt to optimize away memory stores of values the memory already
> > +;; has.  See PR79593.
> > +(define_peephole2
> > +  [(set (match_operand 0 "register_operand")
> > +(match_operand 1 "memory_operand"))
> > +   (set (match_dup 1) (match_dup 0))]
> > +  "REG_P (operands[0])
> > +   && !STACK_REGNO_P (operands[0])
> > +   && !MEM_VOLATILE_P (operands[1])"
> > +  [(set (match_dup 0) (match_dup 1))]
> > +{
> > +  if (peep2_reg_dead_p (1, operands[0]))
> > +DONE;
> > +})
> > +
> >  ;; Attempt to always use XOR for zeroing registers (including FP modes).
> >  (define_peephole2
> >[(set (match_operand 0 "general_reg_operand")

Jakub


[Patch, Fortran] PR 88047: [9 Regression] ICE in gfc_find_vtab, at fortran/class.c:2843

2019-01-08 Thread Janus Weil
Hi all,

the attached patch is close to obvious and fixes another small
ICE-on-invalid regression. Since there was a bit of discussion in the
PR, I am submitting it for approval instead of just committing as
obvious.

Regtests cleanly on x86_64-linux-gnu. Ok for trunk?

Cheers,
Janus
diff --git a/gcc/fortran/ChangeLog b/gcc/fortran/ChangeLog
index ba95a26e6ae..4d50f880d38 100644
--- a/gcc/fortran/ChangeLog
+++ b/gcc/fortran/ChangeLog
@@ -1,3 +1,9 @@
+2019-01-08  Janus Weil  
+
+	PR fortran/88047
+	* class.c (gfc_find_vtab): For polymorphic typespecs, the components of
+	the class container may not be available (in case of invalid code).
+
 2019-01-07  Thomas Koenig  
 	Harald Anlauf 
 	Tobias Burnus 
diff --git a/gcc/fortran/class.c b/gcc/fortran/class.c
index 77f0fca9385..8809b5b5b6e 100644
--- a/gcc/fortran/class.c
+++ b/gcc/fortran/class.c
@@ -2846,7 +2846,10 @@ gfc_find_vtab (gfc_typespec *ts)
 case BT_DERIVED:
   return gfc_find_derived_vtab (ts->u.derived);
 case BT_CLASS:
-  return gfc_find_derived_vtab (ts->u.derived->components->ts.u.derived);
+  if (ts->u.derived->components && ts->u.derived->components->ts.u.derived)
+	return gfc_find_derived_vtab (ts->u.derived->components->ts.u.derived);
+  else
+	return NULL;
 default:
   return find_intrinsic_vtab (ts);
 }
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 23de2ea6f0b..45279ccae0c 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2019-01-08  Janus Weil  
+
+	PR fortran/88047
+	* gfortran.dg/class_69.f90: New test case.
+
 2019-01-07  David Malcolm  
 
 	PR jit/88747
diff --git a/gcc/testsuite/gfortran.dg/class_69.f90 b/gcc/testsuite/gfortran.dg/class_69.f90
new file mode 100644
index 000..e45e03528b7
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/class_69.f90
@@ -0,0 +1,21 @@
+! { dg-do compile }
+!
+! PR 88047: [9 Regression] ICE in gfc_find_vtab, at fortran/class.c:2843
+!
+! Contributed by G. Steinmetz 
+
+subroutine sub_a
+   type t
+   end type
+   class(t) :: x(2)   ! { dg-error "must be dummy, allocatable or pointer" }
+   class(t), parameter :: a(2) = t()  ! { dg-error "cannot have the PARAMETER attribute" }
+   x = a  ! { dg-error "Nonallocatable variable must not be polymorphic in intrinsic assignment" }
+end
+
+subroutine sub_b
+   type t
+  integer :: n
+   end type
+   class(t) :: a, x   ! { dg-error "must be dummy, allocatable or pointer" }
+   x = a  ! { dg-error "Nonallocatable variable must not be polymorphic in intrinsic assignment" }
+end


Re: [PATCH] Fix PR88611

2019-01-08 Thread Janne Blomqvist
On Tue, Jan 8, 2019 at 10:18 AM Richard Biener  wrote:

>
> This is about the Fortran FE creating global variable initializers
> with wrong type (integer type rather than pointer type) for
> ISOCBINDING_NULL_* initializers.  The patch simplifies the logic
> in gfc_conv_initializer to directly create the expected GENERIC
> rather than trying to use the scalarizer.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, ok?
>
> Thanks,
> Richard.
>
> 2019-01-08  Richard Biener  
>
> PR fortran/88611
> * trans-expr.c (gfc_conv_initializer): For ISOCBINDING_NULL_*
> directly build the expected GENERIC tree.
>
> * gfortran.dg/pr88611.f90: New testcase.
>
> Index: gcc/fortran/trans-expr.c
> ===
> --- gcc/fortran/trans-expr.c(revision 267671)
> +++ gcc/fortran/trans-expr.c(working copy)
> @@ -7086,19 +7086,12 @@ gfc_conv_initializer (gfc_expr * expr, g
>if (expr != NULL && expr->ts.type == BT_DERIVED
>&& expr->ts.is_iso_c && expr->ts.u.derived)
>  {
> -  gfc_symbol *derived = expr->ts.u.derived;
> -
> -  /* The derived symbol has already been converted to a (void *).  Use
> -its kind.  */
> -  if (derived->ts.kind == 0)
> -   derived->ts.kind = gfc_default_integer_kind;
> -  expr = gfc_get_int_expr (derived->ts.kind, NULL, 0);
> -  expr->ts.f90_type = derived->ts.f90_type;
> -
> -  gfc_init_se (, NULL);
> -  gfc_conv_constant (, expr);
> -  gcc_assert (TREE_CODE (se.expr) != CONSTRUCTOR);
> -  return se.expr;
> +  if (TREE_CODE (type) == ARRAY_TYPE)
> +   return build_constructor (type, NULL);
> +  else if (POINTER_TYPE_P (type))
> +   return build_int_cst (type, 0);
> +  else
> +   gcc_unreachable ();
>  }
>
>if (array && !procptr)
> Index: gcc/testsuite/gfortran.dg/pr88611.f90
> ===
> --- gcc/testsuite/gfortran.dg/pr88611.f90   (nonexistent)
> +++ gcc/testsuite/gfortran.dg/pr88611.f90   (working copy)
> @@ -0,0 +1,14 @@
> +! { dg-do run }
> +! { dg-options "-fdefault-integer-8 -fno-tree-forwprop -O3 -fno-tree-ccp"
> }
> +! PR 82869
> +! A temp variable of type logical was incorrectly transferred
> +! to the I/O library as a logical type of a different kind.
> +program pr82869_8
> +  use, intrinsic :: iso_c_binding
> +  type(c_ptr) :: p = c_null_ptr
> +  character(len=4) :: s
> +  write (s, *) c_associated(p), c_associated(c_null_ptr)
> +  if (s /= ' F F') then
> + STOP 1
> +  end if
> +end program pr82869_8
>


Ok, thanks.

-- 
Janne Blomqvist


[PATCH] Fix PR88611

2019-01-08 Thread Richard Biener


This is about the Fortran FE creating global variable initializers
with wrong type (integer type rather than pointer type) for
ISOCBINDING_NULL_* initializers.  The patch simplifies the logic
in gfc_conv_initializer to directly create the expected GENERIC
rather than trying to use the scalarizer.

Bootstrapped and tested on x86_64-unknown-linux-gnu, ok?

Thanks,
Richard.

2019-01-08  Richard Biener  

PR fortran/88611
* trans-expr.c (gfc_conv_initializer): For ISOCBINDING_NULL_*
directly build the expected GENERIC tree.

* gfortran.dg/pr88611.f90: New testcase.

Index: gcc/fortran/trans-expr.c
===
--- gcc/fortran/trans-expr.c(revision 267671)
+++ gcc/fortran/trans-expr.c(working copy)
@@ -7086,19 +7086,12 @@ gfc_conv_initializer (gfc_expr * expr, g
   if (expr != NULL && expr->ts.type == BT_DERIVED
   && expr->ts.is_iso_c && expr->ts.u.derived)
 {
-  gfc_symbol *derived = expr->ts.u.derived;
-
-  /* The derived symbol has already been converted to a (void *).  Use
-its kind.  */
-  if (derived->ts.kind == 0)
-   derived->ts.kind = gfc_default_integer_kind;
-  expr = gfc_get_int_expr (derived->ts.kind, NULL, 0);
-  expr->ts.f90_type = derived->ts.f90_type;
-
-  gfc_init_se (, NULL);
-  gfc_conv_constant (, expr);
-  gcc_assert (TREE_CODE (se.expr) != CONSTRUCTOR);
-  return se.expr;
+  if (TREE_CODE (type) == ARRAY_TYPE)
+   return build_constructor (type, NULL);
+  else if (POINTER_TYPE_P (type))
+   return build_int_cst (type, 0);
+  else
+   gcc_unreachable ();
 }
 
   if (array && !procptr)
Index: gcc/testsuite/gfortran.dg/pr88611.f90
===
--- gcc/testsuite/gfortran.dg/pr88611.f90   (nonexistent)
+++ gcc/testsuite/gfortran.dg/pr88611.f90   (working copy)
@@ -0,0 +1,14 @@
+! { dg-do run }
+! { dg-options "-fdefault-integer-8 -fno-tree-forwprop -O3 -fno-tree-ccp" }
+! PR 82869
+! A temp variable of type logical was incorrectly transferred
+! to the I/O library as a logical type of a different kind.
+program pr82869_8
+  use, intrinsic :: iso_c_binding
+  type(c_ptr) :: p = c_null_ptr
+  character(len=4) :: s
+  write (s, *) c_associated(p), c_associated(c_null_ptr)
+  if (s /= ' F F') then
+ STOP 1
+  end if
+end program pr82869_8