[PATCH, RFC] Fix PR62147 by passing finiteness information to RTL phase

2019-06-24 Thread Kewen.Lin
Hi all,

As PR62147, for some cases loop iv analysis is unable to identify one loop is
finite even if the loop is actually finite and we have known it in middle-end. 
It will prevent doloop_optimize and end up with worse codes.

This patch is going to leverage existing middle-end function finite_loop_p, 
record the finiteness information in loop structure. Later loop iv can make use
of this saved information.

It's based on two observations:
  1) the loop structure for one specific loop is shared between middle-end and 
 back-end.
  2) for one specific loop, if it's finite then never become infinite itself.

As one gcc newbie, I'm not sure whether these two observations are true in all
cases.  Please feel free to correct me if anything missing.

btw, I also took a look at how the loop constraint LOOP_C_FINITE is used, I 
think
it's not suitable for this purpose, it's mainly set by vectorizer and tell 
niter 
and scev to take one loop as finite.  The original patch has the words 
"constraint flag is mainly set by consumers and affects certain semantics of 
niter analyzer APIs".

Bootstrapped and regression testing passed on powerpc64le-unknown-linux-gnu.


Thanks,
Kewen

---

gcc/ChangeLog

2019-06-25  Kewen Lin  

PR target/62147
* gcc/cfgloop.c (alloc_loop): Init new field.
* gcc/cfgloop.h (struct loop): New field.
* gcc/cfgloopmanip.c (copy_loop_info): Copy new field.
* gcc/tree-ssa-loop-niter.c (finite_loop_p): Rename...
(test_and_update_loop_finite): ...this.  Test and update above field.
* gcc/tree-ssa-loop-niter.h (finite_loop_p): Rename...
(test_and_update_loop_finite): ...this. 
* gcc/ipa-pure-const.c (analyze_function): Adjust to use above renamed
function.
* gcc/tree-ssa-dce.c (find_obviously_necessary_stmts): Likewise.
* gcc/tree-ssa-loop-im.c (fill_always_executed_in_1): Likewise.
* gcc/loop-iv.c (find_simple_exit): Update finiteness by new field.
* gcc/tree-ssa-loop-ivopts.c (struct ivopts_data): New field.
(rewrite_use_compare): Update new field.
(tree_ssa_iv_optimize_loop): Init new field and call 
test_and_update_loop_finite if set.

gcc/testsuite/ChangeLog

2019-06-25  Kewen Lin  

PR target/62147
* gcc.target/powerpc/pr62147.c: New test.




diff --git a/gcc/cfgloop.c b/gcc/cfgloop.c
index f64326b944e..89e4dd069ac 100644
--- a/gcc/cfgloop.c
+++ b/gcc/cfgloop.c
@@ -355,6 +355,7 @@ alloc_loop (void)
   loop->nb_iterations_upper_bound = 0;
   loop->nb_iterations_likely_upper_bound = 0;
   loop->nb_iterations_estimate = 0;
+  loop->is_finite = false;
   return loop;
 }

diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 2f8ab106d03..08ec930597e 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -224,6 +224,9 @@ struct GTY ((chain_next ("%h.next"))) loop {
   /* True if the loop is part of an oacc kernels region.  */
   unsigned in_oacc_kernels_region : 1;

+  /* True if middle end analysis ensure this loop is finite.  */
+  unsigned is_finite : 1;
+
   /* The number of times to unroll the loop.  0 means no information given,
  just do what we always do.  A value of 1 means do not unroll the loop.
  A value of USHRT_MAX means unroll with no specific unrolling factor.
diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c
index 50250ec4d7c..f1033d11865 100644
--- a/gcc/cfgloopmanip.c
+++ b/gcc/cfgloopmanip.c
@@ -1026,6 +1026,7 @@ copy_loop_info (struct loop *loop, struct loop *target)
   target->in_oacc_kernels_region = loop->in_oacc_kernels_region;
   target->unroll = loop->unroll;
   target->owned_clique = loop->owned_clique;
+  target->is_finite = loop->is_finite;
 }

 /* Copies copy of LOOP as subloop of TARGET loop, placing newly
diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
index bb561d00853..f7282adc5b1 100644
--- a/gcc/ipa-pure-const.c
+++ b/gcc/ipa-pure-const.c
@@ -1089,7 +1089,7 @@ end:
  struct loop *loop;
  scev_initialize ();
  FOR_EACH_LOOP (loop, 0)
-   if (!finite_loop_p (loop))
+   if (!test_and_update_loop_finite (loop))
  {
if (dump_file)
  fprintf (dump_file, "cannot prove finiteness of "
diff --git a/gcc/loop-iv.c b/gcc/loop-iv.c
index 82b4bdb1523..a6bc82975cc 100644
--- a/gcc/loop-iv.c
+++ b/gcc/loop-iv.c
@@ -2997,6 +2997,19 @@ find_simple_exit (struct loop *loop, struct niter_desc 
*desc)
fprintf (dump_file, "Loop %d is not simple.\n", loop->num);
 }

+  /* Fix up the finiteness if possible.  We can only do it for single exit,
+ since the loop is finite, but it's possible that we predicate one loop
+ exit to be finite which can not be determined as finite in middle-end as
+ well.  It results in incorrect predicate information on the exit condition
+ expression.  For example, if says [(int) _1 + -8, + , -8] != 0 finite,
+ it means _1 can exa

libgo patch committed: memequal and memclrNoHeapPointers nosplit

2019-06-24 Thread Ian Lance Taylor
This patch by Cherry Zhang changes libgo to mark the memequal and
memclrNoHeapPointers functions as nosplit.  They are wrappers of libc
functions that use no stack.  Mark them nosplit so the linker won't
patch it to call __morestack_non_split.  Bootstrapped and ran Go
testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 272624)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-89b442a0100286ee569b8d2562ce1b2ea602f7e7
+a857aad2f3994e6fa42a6fc65330e65d209597a0
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/runtime/go-memclr.c
===
--- libgo/runtime/go-memclr.c   (revision 272608)
+++ libgo/runtime/go-memclr.c   (working copy)
@@ -7,7 +7,8 @@
 #include "runtime.h"
 
 void memclrNoHeapPointers(void *, uintptr)
-  __asm__ (GOSYM_PREFIX "runtime.memclrNoHeapPointers");
+  __asm__ (GOSYM_PREFIX "runtime.memclrNoHeapPointers")
+  __attribute__ ((no_split_stack));
 
 void
 memclrNoHeapPointers (void *p1, uintptr len)
Index: libgo/runtime/go-memequal.c
===
--- libgo/runtime/go-memequal.c (revision 272608)
+++ libgo/runtime/go-memequal.c (working copy)
@@ -7,7 +7,8 @@
 #include "runtime.h"
 
 _Bool memequal (void *, void *, uintptr)
-  __asm__ (GOSYM_PREFIX "runtime.memequal");
+  __asm__ (GOSYM_PREFIX "runtime.memequal")
+  __attribute__ ((no_split_stack));
 
 _Bool
 memequal (void *p1, void *p2, uintptr len)


Re: [PATCH] Enable GCC support for AVX512_VP2INTERSECT.

2019-06-24 Thread Hongtao Liu
On Sat, Jun 22, 2019 at 3:38 PM Uros Bizjak  wrote:
>
> On Fri, Jun 21, 2019 at 8:38 PM H.J. Lu  wrote:
>
> > > > > > > > > > >> > > +/* Register pair.  */
> > > > > > > > > > >> > > +VECTOR_MODES_WITH_PREFIX (P, INT, 2); /* P2QI */
> > > > > > > > > > >> > > +VECTOR_MODES_WITH_PREFIX (P, INT, 4); /* P2HI P4QI 
> > > > > > > > > > >> > > */
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > I think
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > INT_MODE (P2QI, 16);
> > > > > > > > > > >> > > INT_MODE (P2HI, 32);
> > > > > Why P2QI need 16 bytes but not 2 bytes?
> > > > > Same question with P2HI.
> > > >
> > > > Because we made a mistake. It should be 2 and 4, since these arguments
> > > Then it will run into internal comiler error when building libgcc.
> > > I'm still invertigating it.
> > > > are bytes, not bits.
> >
> > I don't think we can have 2 integer modes with the same number of bytes 
> > since
> > it breaks things like
> >
> > scalar_int_mode wider_mode = GET_MODE_WIDER_MODE (mode).require ();
> >
> > We can get
> >
> > (gdb) p mode
> > $2 = {m_mode = E_SImode}
> > (gdb) p wider_mode
> > $3 = {m_mode = E_P2HImode}
> > (gdb)
> >
> > Neither middle-end nor backend support it.
>
> Ouch... It looks we hit the limitation of the middle end (which should
> at least warn/error out if two modes of the same width are declared).
>
> OTOH, we can't solve this problem by using two HI/QImode registers,
> since a consecutive register pair has to be allocated It is also not
> possible to overload existing SI/HImode mode with different
> requirements w.r.t register pair allocation (e.g. sometimes the whole
> register is allocated, and sometimes a register pair is allocated).
>
> I think we have to invent something like SPECIAL_INT_MODE, which would
> avoid mode promotion functionality (basically, it should not be listed
> in mode_wider and similar arrays). This would prevent mode promotion
> issues, while it would still allow to have mode, having the same width
> as existing mode, but with special properties.
>
> I'm adding Jeff and Jakub to the discussion about SPECIAL_INT_MODE.
>
> Uros.

Patch from H.J using PARTIAL_INT_MODE fixed this issue.

+/* Register pair.  */
+PARTIAL_INT_MODE (HI, 16, P2QI);
+PARTIAL_INT_MODE (SI, 32, P2HI);
+

Here is updated patch.
-- 
BR,
Hongtao
Index: gcc/ChangeLog
===
--- gcc/ChangeLog	(revision 271984)
+++ gcc/ChangeLog	(working copy)
@@ -1,3 +1,53 @@
+2019-06-06  Hongtao Liu  
+	H.J. Lu  
+	Olga Makhotina  
+
+	* common/config/i386/i386-common.c
+	(OPTION_MASK_ISA_AVX512VP2INTERSECT_SET,
+	OPTION_MASK_ISA_AVX512VP2INTERSECT_UNSET): New macros.
+	(OPTION_MASK_ISA2_AVX512F_UNSET): Add
+	OPTION_MASK_ISA_AVX512VP2INTERSECT_UNSET.
+	(ix86_handle_option): Handle -mavx512vp2intersect.
+	* config/i386/avx512vp2intersectintrin.h: New.
+	* config/i386/avx512vp2intersectvlintrin.h: New.
+	* config/i386/cpuid.h (bit_AVX512VP2INTERSECT): New.
+	* config/i386/driver-i386.c (host_detect_local_cpu): Detect
+	AVX512VP2INTERSECT.
+	* config/i386/i386-builtin-types.def: Add new types.
+	* config/i386/i386-builtin.def: Add new builtins.
+	* config/i386/i386-builtins.c: (enum processor_features): Add
+	F_AVX512VP2INTERSECT.
+	(static const _isa_names_table isa_names_table): Ditto.
+	* config/i386/i386-c.c (ix86_target_macros_internal): Define
+	__AVX512VP2INTERSECT__.
+	* config/i386/i386-expand.c (ix86_expand_builtin): Expand
+	IX86_BUILTIN_2INTERSECTD512, IX86_BUILTIN_2INTERSECTQ512,
+	IX86_BUILTIN_2INTERSECTD256, IX86_BUILTIN_2INTERSECTQ256,
+	IX86_BUILTIN_2INTERSECTD128, IX86_BUILTIN_2INTERSECTQ128.
+	* config/i386/i386-modes.def (P2QI, P2HI): New modes.
+	* config/i386/i386-options.c (ix86_target_string): Add
+	-mavx512vp2intersect.
+	(ix86_option_override_internal): Handle AVX512VP2INTERSECT.
+	* config/i386/i386.c (ix86_hard_regno_nregs): Allocate two regs for
+	P2HImode and P2QImode.
+	(ix86_hard_regno_mode_ok): Register pair only starts at even hardreg
+	number for P2QImode and P2HImode.
+	* config/i386/i386.h (TARGET_AVX512VP2INTERSECT,
+	TARGET_AVX512VP2INTERSECT_P): New.
+	(PTA_AVX512VP2INTERSECT): Ditto.
+	* config/i386/i386.opt: Add -mavx512vp2intersect.
+	* config/i386/immintrin.h: Include avx512vp2intersectintrin.h and
+	avx512vp2intersectvlintrin.h.
+	* config/i386/sse.md (define_c_enum "unspec"): Add UNSPEC_VP2INTERSECT.
+	(define_mode_iterator VI48_AVX512VP2VL): New.
+	(avx512vp2intersect_2intersect,
+	avx512vp2intersect_2intersectv16si): New define_insn patterns.
+	(*vec_extractp2hi, *vec_extractp2qi): New define_insn_and_split
+	patterns.
+	* config.gcc: Add avx512vp2intersectvlintrin.h and
+	avx512vp2intersectintrin.h to extra_headers.
+	* doc/invoke.texi: Document -mavx512vp2intersect.
+
 2019-06-05  Hongtao Liu  
 
 	* config/i386/sse.md (define_mode_suffix vecmemsuffix): New.
Index: gcc/common/config/i386/i386-common.c
===
--- gcc

Re: [PATCH] constrain one character optimization to one character stores (PR 90989)

2019-06-24 Thread Martin Sebor

On 6/24/19 5:59 PM, Jeff Law wrote:

On 6/24/19 5:50 PM, Martin Sebor wrote:

The strlen enhancement committed in r263018 to handle multi-character
assignments extended the handle_char_store() function to handle such
stores via MEM_REFs.  Prior to that the function only dealt with
single-char stores.  The enhancement neglected to constrain a case
in the function that assumed the function's previous constraint.
As a result, when the original optimization takes place with
a multi-character store, the function computes the wrong string
length.

The attached patch adds the missing constraint.

Martin

gcc-90989.diff

PR tree-optimization - incorrrect strlen result after second strcpy into
the same destination

gcc/ChangeLog:

* tree-ssa-strlen.c (handle_char_store): Constrain a single character
optimization to just single character stores.

gcc/testsuite/ChangeLog:

* gcc.dg/strlenopt-26.c: Exit with test result status.
* gcc.dg/strlenopt-67.c: New test.

Index: gcc/tree-ssa-strlen.c
===
--- gcc/tree-ssa-strlen.c   (revision 272618)
+++ gcc/tree-ssa-strlen.c   (working copy)
@@ -3462,34 +3462,38 @@ handle_char_store (gimple_stmt_iterator *gsi)
  return false;
}
}
-  /* If si->nonzero_chars > OFFSET, we aren't overwriting '\0',
-and if we aren't storing '\0', we know that the length of the
-string and any other zero terminated string in memory remains
-the same.  In that case we move to the next gimple statement and
-return to signal the caller that it shouldn't invalidate anything.
  
-	 This is benefical for cases like:

+  if (cmp > 0
+ && storing_nonzero_p
+ && TREE_CODE (TREE_TYPE (rhs)) == INTEGER_TYPE)

I'm not sure I follow why checking for TREE_CODE (TREE_TYPE (rhs)) ==
INTEGER_TYPE helps here.  If you need to check that we're storing bytes,
then don't you need to check the size, not just the TREE_CODE of the type?


handle_char_store is only called for single-character assignments
or MEM_REF assignments to/from arrays.  The type of the RHS is only
integer when storing a single character.

To determine the number of characters, the subsequent handling in
the block below calls get_min_string_length which also relies on
INTEGRAL_TYPE_P to get the "length" of a single character (zero
or 1).

Martin


[RFA] Handle _CHK builtins in tree-ssa-dse.c

2019-06-24 Thread Jeff Law
These are some minor improvements to tree-ssa-dse, in particular it adds
handling of the _CHK variants of the supported functions (memcpy,
memmove, memset).  It's just something I noticed while poking at 90883.

These don't trigger during bootstraps, but do in the testsuite.   The
tests that were changed were verified to make sure that the
removal/trimming of *_CHK calls were correct.  For example, some tests
in the c-torture/builtins directory do things like

  __builtin_memset_chk (...);
  abort ();

Since the memory locations are never read, DSE just removes the call.
This happened with enough regularity in the c-torture/execute/builtins
tests that I changed the .exp driver to add the -fno-tree-dse flag.  In
the other cases I just changed the affected tests.

Bootstrapped and regression tested  on x86_64, ppc64le, sparc64, others
will follow (aarch, ppc64, i686, etc).  It's built toolchains &
libraries and regression tested on a wide variety of other platforms as
well.


OK for the trunk?

jeff
* tree-ssa-dse.c (initialize_ao_ref_for_dse): Handle _chk variants of
memcpy, memmove and memset builtins.
(maybe_trim_memstar_call): Likewise.

* gcc.c-torture/execute/builtins/builtins.exp: Add -fno-tree-dse
as DSE compromises several of these tests.
* gcc.dg/builtin-stringop-chk-1.c: Similarly.
* gcc.dg/memcpy-2.c: Similarly.
* gcc.dg/pr40340-1.c: Similarly.
* gcc.dg/pr40340-2.c: Similarly.
* gcc.dg/pr40340-5.c: Similarly.

diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
index aa998f416c7..82e6dbfeb96 100644
--- a/gcc/tree-ssa-dse.c
+++ b/gcc/tree-ssa-dse.c
@@ -98,6 +98,9 @@ initialize_ao_ref_for_dse (gimple *stmt, ao_ref *write)
  case BUILT_IN_MEMCPY:
  case BUILT_IN_MEMMOVE:
  case BUILT_IN_MEMSET:
+ case BUILT_IN_MEMCPY_CHK:
+ case BUILT_IN_MEMMOVE_CHK:
+ case BUILT_IN_MEMSET_CHK:
{
  tree size = NULL_TREE;
  if (gimple_call_num_args (stmt) == 3)
@@ -434,6 +437,8 @@ maybe_trim_memstar_call (ao_ref *ref, sbitmap live, gimple 
*stmt)
 {
 case BUILT_IN_MEMCPY:
 case BUILT_IN_MEMMOVE:
+case BUILT_IN_MEMCPY_CHK:
+case BUILT_IN_MEMMOVE_CHK:
   {
int head_trim, tail_trim;
compute_trims (ref, live, &head_trim, &tail_trim, stmt);
@@ -455,6 +460,7 @@ maybe_trim_memstar_call (ao_ref *ref, sbitmap live, gimple 
*stmt)
   }
 
 case BUILT_IN_MEMSET:
+case BUILT_IN_MEMSET_CHK:
   {
int head_trim, tail_trim;
compute_trims (ref, live, &head_trim, &tail_trim, stmt);
diff --git a/gcc/testsuite/gcc.c-torture/execute/builtins/builtins.exp 
b/gcc/testsuite/gcc.c-torture/execute/builtins/builtins.exp
index 5e899d5e31e..fb9d3ece181 100644
--- a/gcc/testsuite/gcc.c-torture/execute/builtins/builtins.exp
+++ b/gcc/testsuite/gcc.c-torture/execute/builtins/builtins.exp
@@ -37,7 +37,7 @@ load_lib c-torture.exp
 torture-init
 set-torture-options $C_TORTURE_OPTIONS {{}} $LTO_TORTURE_OPTIONS
 
-set additional_flags "-fno-tree-loop-distribute-patterns -fno-tracer"
+set additional_flags "-fno-tree-dse -fno-tree-loop-distribute-patterns 
-fno-tracer"
 if [istarget "powerpc-*-darwin*"] {
lappend additional_flags "-Wl,-multiply_defined,suppress"
 }
diff --git a/gcc/testsuite/gcc.dg/builtin-stringop-chk-1.c 
b/gcc/testsuite/gcc.dg/builtin-stringop-chk-1.c
index afd07ddd08d..40cfa047290 100644
--- a/gcc/testsuite/gcc.dg/builtin-stringop-chk-1.c
+++ b/gcc/testsuite/gcc.dg/builtin-stringop-chk-1.c
@@ -1,7 +1,7 @@
 /* Test whether buffer overflow warnings for __*_chk builtins
are emitted properly.  */
 /* { dg-do compile } */
-/* { dg-options "-O2 -Wno-format -std=gnu99 -ftrack-macro-expansion=0" } */
+/* { dg-options "-O2 -Wno-format -std=gnu99 -ftrack-macro-expansion=0 
-fno-tree-dse" } */
 // { dg-skip-if "packed attribute missing for t" { "epiphany-*-*" } }
 
 extern void abort (void);
diff --git a/gcc/testsuite/gcc.dg/memcpy-2.c b/gcc/testsuite/gcc.dg/memcpy-2.c
index 7f839d27abd..6ad887416e3 100644
--- a/gcc/testsuite/gcc.dg/memcpy-2.c
+++ b/gcc/testsuite/gcc.dg/memcpy-2.c
@@ -1,6 +1,6 @@
 /* PR middle-end/38454 */
 /* { dg-do compile } */
-/* { dg-options "-O2" } */
+/* { dg-options "-O2 -fno-tree-dse" } */
 
 typedef __SIZE_TYPE__ size_t;
 
diff --git a/gcc/testsuite/gcc.dg/pr40340-1.c b/gcc/testsuite/gcc.dg/pr40340-1.c
index 8fbb206a21e..6307e064c3d 100644
--- a/gcc/testsuite/gcc.dg/pr40340-1.c
+++ b/gcc/testsuite/gcc.dg/pr40340-1.c
@@ -1,6 +1,6 @@
 /* PR middle-end/40340 */
 /* { dg-do compile } */
-/* { dg-options "-O2 -Wall -Wno-system-headers" } */
+/* { dg-options "-O2 -Wall -Wno-system-headers -fno-tree-dse" } */
 
 #include "pr40340.h"
 
diff --git a/gcc/testsuite/gcc.dg/pr40340-2.c b/gcc/testsuite/gcc.dg/pr40340-2.c
index 10083acd102..ea76e10082d 100644
--- a/gcc/testsuite/gcc.dg/pr40340-2.c
+++ b/gcc/testsuite/gcc.dg/pr40340-2.c
@@ -1,6 +1,6 @@
 /* PR middle-end/40340 */
 /* { dg-do com

Re: [PATCH] constrain one character optimization to one character stores (PR 90989)

2019-06-24 Thread Jeff Law
On 6/24/19 5:50 PM, Martin Sebor wrote:
> The strlen enhancement committed in r263018 to handle multi-character
> assignments extended the handle_char_store() function to handle such
> stores via MEM_REFs.  Prior to that the function only dealt with
> single-char stores.  The enhancement neglected to constrain a case
> in the function that assumed the function's previous constraint.
> As a result, when the original optimization takes place with
> a multi-character store, the function computes the wrong string
> length.
> 
> The attached patch adds the missing constraint.
> 
> Martin
> 
> gcc-90989.diff
> 
> PR tree-optimization - incorrrect strlen result after second strcpy into
> the same destination
> 
> gcc/ChangeLog:
> 
>   * tree-ssa-strlen.c (handle_char_store): Constrain a single character
>   optimization to just single character stores.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/strlenopt-26.c: Exit with test result status.
>   * gcc.dg/strlenopt-67.c: New test.
> 
> Index: gcc/tree-ssa-strlen.c
> ===
> --- gcc/tree-ssa-strlen.c (revision 272618)
> +++ gcc/tree-ssa-strlen.c (working copy)
> @@ -3462,34 +3462,38 @@ handle_char_store (gimple_stmt_iterator *gsi)
> return false;
>   }
>   }
> -  /* If si->nonzero_chars > OFFSET, we aren't overwriting '\0',
> -  and if we aren't storing '\0', we know that the length of the
> -  string and any other zero terminated string in memory remains
> -  the same.  In that case we move to the next gimple statement and
> -  return to signal the caller that it shouldn't invalidate anything.
>  
> -  This is benefical for cases like:
> +  if (cmp > 0
> +   && storing_nonzero_p
> +   && TREE_CODE (TREE_TYPE (rhs)) == INTEGER_TYPE)
I'm not sure I follow why checking for TREE_CODE (TREE_TYPE (rhs)) ==
INTEGER_TYPE helps here.  If you need to check that we're storing bytes,
then don't you need to check the size, not just the TREE_CODE of the type?




Jeff


Re: [PATCH][MSP430] Implement alternate "__intN__" form of "__intN" type

2019-06-24 Thread Jeff Law
On 6/24/19 4:25 AM, Jozef Lawrynowicz wrote:
> The MSP430 target in the large memory model uses the (non-ISO) __int20 type 
> for
> SIZE_TYPE and PTRDIFF_TYPE.
> The preprocessor therefore expands a builtin such as __SIZE_TYPE__ to
> "__int20 unsigned" in user code.
> When compiling with the "-pedantic-errors" flag, the use of any of these
> builtin macros results in an error of the form:
> 
>> tester.c:4:9: error: ISO C does not support '__int20' types [-Wpedantic]
> Since -pendantic-errors is often passed as a default flag in the testsuite,
> there are hundreds of false failures when testing with -mlarge, caused by this
> ISO C error.
> 
> The attached patch implements a new builtin type, "__intN__". Apart from the
> name of the type, it is identical and shares RIDs with the corresponding
> "__intN".
> 
> This means the ISO C pedantic warnings can be disabled for __intN__ types,
> but otherwise these types can be used in place of __intN without any other
> changes to behaviour.
> 
> By replacing "__int20" with "__int20__" in the definition of SIZE_TYPE and
> PTRDIFF_TYPE in msp430.h, the following builtin macros can now be used in a
> program compiled with -pedantic-errors, without causing ISO C errors:
>   __SIZE_TYPE__
>   __INTPTR_TYPE__
>   __UINTPTR_TYPE__
>   __PTRDIFF_TYPE__
> 
> Successfully bootstrapped and regtested on x86_64-pc-linux-gnu.
> Successfully regtested for msp430-elf. Additionally, this fixes many tests:
>   332 FAIL->PASS
>   52  UNTESTED->PASS
>   29  FAIL->UNSUPPORTED (test previously failed to compile, now too big to 
> link)
> 
> Ok for trunk?
> 
> There is a patch to Newlib's "_intsup.h" required to support __int20__ that I
> will submit to that mailing list before applying this patch, if this patch is
> accepted.
> 
> 
> 0001-Implement-alternate-__intN__-form-of-__intN-type.patch
> 
> From 61dfff1b6b3fcaa9f31341ee47623100505bf2e8 Mon Sep 17 00:00:00 2001
> From: Jozef Lawrynowicz 
> Date: Wed, 12 Jun 2019 10:40:00 +0100
> Subject: [PATCH] Implement alternate "__intN__" form of "__intN" type
> 
> gcc/ChangeLog:
> 
> 2019-06-18  Jozef Lawrynowicz  
> 
>   * gcc/c-family/c-common.c (c_common_nodes_and_builtins): Define
>   alternate "__intN__" name for "__intN" types.
>   * gcc/c/c-parser.c (c_parse_init): Create keyword for "__intN__" type.
>   * gcc/cp/lex.c (init_reswords): Likewise.
>   * gcc/config/msp430/msp430.h: Use __int20__ for SIZE_TYPE and
>   PTRDIFF_TYPE.
>   * gcc/cp/cp-tree.h (cp_decl_specifier_seq): New bitfield "int_n_alt".
>   * gcc/c/c-decl.c (declspecs_add_type): Don't pedwarn about "__intN" ISO
>   C incompatibility if alternate "__intN__" form is used.
>   * gcc/cp/decl.c (grokdeclarator): Likewise.
>   * gcc/cp/parser.c (cp_parser_simple_type_specifier): Set
>   decl_specs->int_n_alt if "__intN__" form is used.
>   * gcc/gimple-ssa-sprintf.c (build_intmax_type_nodes): Accept "__intN__"
>   format of "__intN" types for UINTMAX_TYPE.
>   * gcc/brig/brig-lang.c (brig_build_c_type_nodes): Accept "__intN__"
>   format of "__intN" types for SIZE_TYPE.
>   * gcc/lto/lto-lang.c (lto_build_c_type_nodes): Likewise.
>   * gcc/stor-layout.c (initialize_sizetypes): Accept "__intN__"
>   format of "__intN" types for SIZETYPE.
>   * gcc/tree.c (build_common_tree_nodes): Accept "__intN__"
>   format of "__intN" types for SIZE_TYPE and PTRDIFF_TYPE.
>   * gcc/doc/invoke.texi: Document that __intN__ disables pedantic
>   warnings.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-06-18  Jozef Lawrynowicz  
> 
>   * gcc.target/msp430/mlarge-pedwarns.c: New test.
> ---
>  gcc/brig/brig-lang.c  |  6 --
>  gcc/c-family/c-common.c   |  6 ++
>  gcc/c/c-decl.c|  6 +-
>  gcc/c/c-parser.c  |  5 +
>  gcc/config/msp430/msp430.h|  6 --
>  gcc/cp/cp-tree.h  |  3 +++
>  gcc/cp/decl.c |  6 +-
>  gcc/cp/lex.c  |  5 +
>  gcc/cp/parser.c   |  6 ++
>  gcc/doc/invoke.texi   |  6 --
>  gcc/gimple-ssa-sprintf.c  |  6 --
>  gcc/lto/lto-lang.c|  6 --
>  gcc/stor-layout.c |  6 --
>  gcc/testsuite/gcc.target/msp430/mlarge-pedwarns.c | 11 +++
>  gcc/tree.c| 13 +
>  15 files changed, 79 insertions(+), 18 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/msp430/mlarge-pedwarns.c
> 
> diff --git a/gcc/brig/brig-lang.c b/gcc/brig/brig-lang.c
> index 91c7cfa35da..be853ccbc02 100644
> --- a/gcc/brig/brig-lang.c
> +++ b/gcc/brig/brig-lang.c
> @@ -864,10 +864,12 @@ brig_build_c_type_nodes (void)
>for (i = 0; i 

[PATCH] constrain one character optimization to one character stores (PR 90989)

2019-06-24 Thread Martin Sebor

The strlen enhancement committed in r263018 to handle multi-character
assignments extended the handle_char_store() function to handle such
stores via MEM_REFs.  Prior to that the function only dealt with
single-char stores.  The enhancement neglected to constrain a case
in the function that assumed the function's previous constraint.
As a result, when the original optimization takes place with
a multi-character store, the function computes the wrong string
length.

The attached patch adds the missing constraint.

Martin
PR tree-optimization - incorrrect strlen result after second strcpy into
the same destination

gcc/ChangeLog:

	* tree-ssa-strlen.c (handle_char_store): Constrain a single character
	optimization to just single character stores.

gcc/testsuite/ChangeLog:

	* gcc.dg/strlenopt-26.c: Exit with test result status.
	* gcc.dg/strlenopt-67.c: New test.

Index: gcc/testsuite/gcc.dg/strlenopt-26.c
===
--- gcc/testsuite/gcc.dg/strlenopt-26.c	(revision 272618)
+++ gcc/testsuite/gcc.dg/strlenopt-26.c	(working copy)
@@ -17,8 +17,7 @@ main (void)
 {
   char p[] = "foobar";
   const char *volatile q = "xyzzy";
-  fn1 (p, q);
-  return 0;
+  return fn1 (p, q);
 }
 
 /* { dg-final { scan-tree-dump-times "strlen \\(" 2 "strlen" } } */
Index: gcc/testsuite/gcc.dg/strlenopt-67.c
===
--- gcc/testsuite/gcc.dg/strlenopt-67.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/strlenopt-67.c	(working copy)
@@ -0,0 +1,52 @@
+/* PR tree-optimization/90989 - incorrrect strlen result after second strcpy
+   into the same destination.
+   { dg-do compile }
+   { dg-options "-O2 -Wall -fdump-tree-optimized" } */
+
+// #include "strlenopt.h"
+
+char a[4];
+
+int f4 (void)
+{
+  char b[4];
+  __builtin_strcpy (b, "12");
+
+  int i = __builtin_strcmp (a, b);
+
+  __builtin_strcpy (b, "123");
+  if (__builtin_strlen (b) != 3)
+__builtin_abort ();
+
+  return i;
+}
+
+int f6 (void)
+{
+  char b[6];
+  __builtin_strcpy (b, "1234");
+
+  int i = __builtin_strcmp (a, b);
+
+  __builtin_strcpy (b, "12345");
+  if (__builtin_strlen (b) != 5)
+__builtin_abort ();
+
+  return i;
+}
+
+int f8 (void)
+{
+  char b[8];
+  __builtin_strcpy (b, "1234");
+
+  int i = __builtin_strcmp (a, b);
+
+  __builtin_strcpy (b, "1234567");
+  if (__builtin_strlen (b) != 7)
+__builtin_abort ();
+
+  return i;
+}
+
+/* { dg-final { scan-tree-dump-times "abort|strlen" 0 "optimized" } } */
Index: gcc/tree-ssa-strlen.c
===
--- gcc/tree-ssa-strlen.c	(revision 272618)
+++ gcc/tree-ssa-strlen.c	(working copy)
@@ -3462,34 +3462,38 @@ handle_char_store (gimple_stmt_iterator *gsi)
 	  return false;
 	}
 	}
-  /* If si->nonzero_chars > OFFSET, we aren't overwriting '\0',
-	 and if we aren't storing '\0', we know that the length of the
-	 string and any other zero terminated string in memory remains
-	 the same.  In that case we move to the next gimple statement and
-	 return to signal the caller that it shouldn't invalidate anything.
 
-	 This is benefical for cases like:
+  if (cmp > 0
+	  && storing_nonzero_p
+	  && TREE_CODE (TREE_TYPE (rhs)) == INTEGER_TYPE)
+	{
+	  /* Handle a single non-nul character store.
+	 If si->nonzero_chars > OFFSET, we aren't overwriting '\0',
+	 and if we aren't storing '\0', we know that the length of the
+	 string and any other zero terminated string in memory remains
+	 the same.  In that case we move to the next gimple statement and
+	 return to signal the caller that it shouldn't invalidate anything.
 
-	 char p[20];
-	 void foo (char *q)
-	 {
-	   strcpy (p, "foobar");
-	   size_t len = strlen (p);// This can be optimized into 6
-	   size_t len2 = strlen (q);// This has to be computed
-	   p[0] = 'X';
-	   size_t len3 = strlen (p);// This can be optimized into 6
-	   size_t len4 = strlen (q);// This can be optimized into len2
-	   bar (len, len2, len3, len4);
-}
-	*/
-  else if (storing_nonzero_p && cmp > 0)
-	{
+	 This is benefical for cases like:
+
+	 char p[20];
+	 void foo (char *q)
+	 {
+	   strcpy (p, "foobar");
+	   size_t len = strlen (p); // can be folded to 6
+	   size_t len2 = strlen (q);// has to be computed
+	   p[0] = 'X';
+	   size_t len3 = strlen (p);// can be folded to 6
+	   size_t len4 = strlen (q);// can be folded to len2
+	   bar (len, len2, len3, len4);
+	   } */
 	  gsi_next (gsi);
 	  return false;
 	}
-  else if (storing_all_zeros_p
-	   || storing_nonzero_p
-	   || (offset != 0 && cmp > 0))
+
+  if (storing_all_zeros_p
+	  || storing_nonzero_p
+	  || (offset != 0 && cmp > 0))
 	{
 	  /* When STORING_NONZERO_P, we know that the string will start
 	 with at least OFFSET + 1 nonzero characters.  If storing


Re: [PATCH] Automatics in equivalence statements

2019-06-24 Thread Jeff Law
On 6/24/19 2:19 AM, Bernhard Reutner-Fischer wrote:
> On Fri, 21 Jun 2019 07:10:11 -0700
> Steve Kargl  wrote:
> 
>> On Fri, Jun 21, 2019 at 02:31:51PM +0100, Mark Eggleston wrote:
>>> Currently variables with the AUTOMATIC attribute can not appear in an 
>>> EQUIVALENCE statement. However its counterpart, STATIC, can be used in 
>>> an EQUIVALENCE statement.
>>>
>>> Where there is a clear conflict in the attributes of variables in an 
>>> EQUIVALENCE statement an error message will be issued as is currently 
>>> the case.
>>>
>>> If there is no conflict e.g. a variable with a AUTOMATIC attribute and a 
>>> variable(s) without attributes all variables in the EQUIVALENCE will 
>>> become AUTOMATIC.
>>>
>>> Note: most of this patch was written by Jeff Law 
>>>
>>> Please review.
>>>
>>> ChangeLogs:
>>>
>>> gcc/fortran
>>>
>>>      Jeff Law  
>>>      Mark Eggleston  
>>>
>>>      * gfortran.h: Add check_conflict declaration.  
>>
>> This is wrong.  By convention a routine that is not static
>> has the gfc_ prefix.
>>
> Furthermore doesn't this export indicate that you're committing a
> layering violation somehow?
Possibly.  I'm the original author, but my experience in our fortran
front-end is minimal.  I fully expected this patch to need some tweaking.

We certainly don't want to recreate all the checking that's done in
check_conflict.  We just need to defer it to a later point --
find_equivalence seemed like a good point since we've got the full
equivalence list handy and can accumulate the attributes across the
entire list, then check for conflicts.

If there's a concrete place where you think we should be doing this, I'm
all ears.


> 
>>      * symbol.c (check_conflict): Remove automatic in equivalence conflict
>>      check.
>>      * symbol.c (save_symbol): Add check for in equivalence to stop the
>>      the save attribute being added.
>>      * trans-common.c (build_equiv_decl): Add is_auto parameter and
>>      add !is_auto to condition where TREE_STATIC (decl) is set.
>>      * trans-common.c (build_equiv_decl): Add local variable is_auto,
>>      set it true if an atomatic attribute is encountered in the variable
> 
> atomatic? I read atomic but you mean automatic.
Yes.

> 
>>      list.  Call build_equiv_decl with is_auto as an additional parameter.
>>      flag_dec_format_defaults is enabled.
>>      * trans-common.c (accumulate_equivalence_attributes) : New subroutine.
>>      * trans-common.c (find_equivalence) : New local variable dummy_symbol,
>>      accumulated equivalence attributes from each symbol then check for
>>      conflicts.
> 
> I'm just curious why you don't gfc_copy_attr for the most part of 
> accumulate_equivalence_attributes?
> thanks,
Simply didn't know about it.  It could probably significantly simplify
the accumulation of attributes step.

Jeff




Re: [PATCH] Enable use of #pragma omp simd reduction(inscan,...) even for GCC10+ in PSTL

2019-06-24 Thread Thomas Rodgers


Ok for trunk.

> Can you push it into upstream PSTL?

Yes.

Thanks,
Tom.

Jakub Jelinek writes:

> Hi!
>
> Now that GCC supports inclusive/exclusive scans (like ICC 19.0 so far in
> simd constructs only), we can enable it in PSTL as well.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, additionally tested
> with
> #include 
> #include 
>
> auto
> foo (std::vector &ca, std::vector &co)
> {
>   return std::inclusive_scan(std::execution::unseq, ca.begin(), ca.end(), 
> co.begin());
> }
>
> auto
> bar (std::vector &ca, std::vector &co)
> {
>   return std::exclusive_scan(std::execution::unseq, ca.begin(), ca.end(), 
> co.begin(), 0);
> }
> and verifying with -O2 -fopenmp-simd it is vectorized.  Ok for trunk?
> Can you push it into upstream PSTL?
>
> 2019-06-21  Jakub Jelinek  
>
>   * include/pstl/pstl_config.h (_PSTL_PRAGMA_SIMD_SCAN,
>   _PSTL_PRAGMA_SIMD_INCLUSIVE_SCAN, _PSTL_PRAGMA_SIMD_EXCLUSIVE_SCAN):
>   Define to OpenMP 5.0 pragmas even for GCC 10.0+.
>   (_PSTL_UDS_PRESENT): Define to 1 for GCC 10.0+.
>
> --- libstdc++-v3/include/pstl/pstl_config.h.jj2019-06-10 
> 18:18:01.551191212 +0200
> +++ libstdc++-v3/include/pstl/pstl_config.h   2019-06-20 17:03:31.466367344 
> +0200
> @@ -70,7 +70,7 @@
>  #define _PSTL_PRAGMA_FORCEINLINE
>  #endif
>  
> -#if (__INTEL_COMPILER >= 1900)
> +#if (__INTEL_COMPILER >= 1900) || (_PSTL_GCC_VERSION >= 10)
>  #define _PSTL_PRAGMA_SIMD_SCAN(PRM) _PSTL_PRAGMA(omp simd 
> reduction(inscan, PRM))
>  #define _PSTL_PRAGMA_SIMD_INCLUSIVE_SCAN(PRM) _PSTL_PRAGMA(omp scan 
> inclusive(PRM))
>  #define _PSTL_PRAGMA_SIMD_EXCLUSIVE_SCAN(PRM) _PSTL_PRAGMA(omp scan 
> exclusive(PRM))
> @@ -100,7 +100,11 @@
>  #define _PSTL_UDR_PRESENT 0
>  #endif
>  
> -#define _PSTL_UDS_PRESENT (__INTEL_COMPILER >= 1900 && 
> __INTEL_COMPILER_BUILD_DATE >= 20180626)
> +#if ((__INTEL_COMPILER >= 1900 && __INTEL_COMPILER_BUILD_DATE >= 20180626) 
> || _PSTL_GCC_VERSION >= 10)
> +#define _PSTL_UDS_PRESENT 1
> +#else
> +#define _PSTL_UDS_PRESENT 0
> +#endif
>  
>  #if _PSTL_EARLYEXIT_PRESENT
>  #define _PSTL_PRAGMA_SIMD_EARLYEXIT _PSTL_PRAGMA(omp simd early_exit)
>
>   Jakub



Re: [PATCH][gcc] libgccjit: check result_type in gcc_jit_context_new_binary_op

2019-06-24 Thread Andrea Corallo

David Malcolm writes:

> On Mon, 2019-06-24 at 16:37 +, Andrea Corallo wrote:
>> David Malcolm writes:
>>
>> > On Mon, 2019-06-24 at 15:30 +, Andrea Corallo wrote:
>> > > Hi all,
>> > > second version for this patch.
>> > > Given the suggestion for the bit-field one I've tried to improve
>> > > also
>> > > here the error message.
>> >
>> > Thanks.
>> >
>> > > I've added a simple testcase as requested, here I'm trying to do
>> > > *void=int+int.
>> > > This without checking would normally crash verifying gimple.
>> >
>> > Thanks.  FWIW, I think the testcase can be simplified slightly, in
>> > that
>> > all that's needed is a bogus call to gcc_jit_context_new_binary_op,
>> > so
>> > I don't think the testcase needs the calls to:
>> >   gcc_jit_context_new_function,
>> >   gcc_jit_function_new_block, and
>> >   gcc_jit_block_end_with_return,
>> > it just needs the types and the gcc_jit_context_new_binary_op call.
>>
>> Hi Dave,
>> thanks for your feedback.
>> I've tried that but the reproducer is then incomplete with no call to
>> gcc_jit_context_new_binary_op so I would keep it like it is if you
>> are
>> ok with that.
>
> Sorry, I think I was unclear.
>
> What I meant is that I think you can remove the calls I mentioned, but
> keep the call to gcc_jit_context_new_binary_op, moving it to be a "top-
> level" call within create_code (discarding the result).  That ought to
> be enough to trigger the error within the gcc_jit_context.
>
> Does that make more sense?

Hi,
sorry yes it absolutely does.
What I meant is that in the test I did without these calls the produced
reproducer
test-error-gcc_jit_context_new_binary_op-bad-res-type.c.exe.reproducer.c
was without the call to gcc_jit_context_new_binary_op.
At the beginning I thought was due the removal of these other calls but
I've just realized the obvious fact that we do not record at all if we catch an
error there while recording... and that's the reason why the reproducer
is without the call itself.
By the way we could probably make this more clear in the
gcc_jit_context_dump_reproducer_to_file doc.

>> > > More complex cases can be cause of crashes having the
>> > > result type structures etc...
>> > >
>> > > Tested with make check-jit
>> > > OK for trunk?
>> >
>> > Looks good as-is, or you may prefer to simplify the testcase.
>> >
>> > Thanks for the patch.
>> >
>> > BTW, I don't see you listed in the MAINTAINERS file; are you able
>> > to
>> > commit patches yourself?
>> >
>> > Dave
>>
>> Sorry I realize my "OK for trunk?" was quite misleading.
>> I'm not a maintainer and till now I have now write access so I can't
>> apply patches myself.
>
> I believe ARM has a corporate copyright-assignment in place with the
> FSF for GCC contributions.

Correct, is thanks to that I was able to contribute in the past.

> I can commit the patch myself; alternatively, do you want to get commit
> access?
>
> Dave

I think if I could get commit access would be great and certainly
easier for everybody for the future.

Thanks for the feedbacks on both the patches, I'll be able to update
them most likely tomorrow.

Bests
  Andrea


[PATCH v6][C][ADA] use function descriptors instead of trampolines in C

2019-06-24 Thread Uecker, Martin


Hi,

here is a new version of this patch. It makes "-fno-trampolines"
work for C which then makes it possible to use nested functions
without executable stack. The only change in this version is in
the documentation.

Maybe it could be reconsidered at this stage?


Bootstrapped and regression tested on x86.

Martin


gcc/
* common.opt (flag_trampolines): Change default.
* calls.c (prepare_call_address): Remove check for
flag_trampolines.  Decision is now made in FEs.
* defaults.h (FUNCTION_ALIGNMENT): Add test for flag_trampolines.
* tree-nested.c (convert_tramp_reference_op): Likewise.
* toplev.c (process_options): Add warning for -fno-trampolines on
unsupported targets.
* doc/invoke.texi (-fno-trampolines): Document support for C.
gcc/ada/
* gcc-interface/trans.c (Attribute_to_gnu): Add check for
flag_trampolines.
gcc/c/
* c-typeck.c (function_to_pointer_conversion): If using descriptors
instead of trampolines, amend function address with
FUNC_ADDR_BY_DESCRIPTOR and calls with ALL_EXPR_BY_DESCRIPTOR.
gcc/testsuite/
* gcc.dg/trampoline-2.c: New test.
* lib/target-supports.exp
(check_effective_target_notrampolines): New.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index d2dd391b39c..568e203bdcc 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,16 @@
+2019-06-24  Martin Uecker  
+
+   * common.opt (flag_trampolines): Change default.
+   * calls.c (prepare_call_address): Remove check for
+   flag_trampolines.  Decision is now made in FEs.
+   * defaults.h (FUNCTION_ALIGNMENT): Add test for flag_trampolines.
+   * tree-nested.c (convert_tramp_reference_op): Likewise.
+   * toplev.c (process_options): Add warning for -fno-trampolines on
+   unsupported targets.
+   * doc/invoke.texi (-fno-trampolines): Document support for C.
+   * doc/sourcebuild.texi (target attributes): Document new
+   "notrampolines" effective target keyword.
+
 2019-06-22  Jeff Law  
 
    * config/avr/avr.c (TARGET_BUILTIN_SETJMP_FRAME_VALUE): Remove.
diff --git a/gcc/ada/ChangeLog b/gcc/ada/ChangeLog
index 1b6aa2fd11f..5b8d0ef44c2 100644
--- a/gcc/ada/ChangeLog
+++ b/gcc/ada/ChangeLog
@@ -1,3 +1,8 @@
+2019-06-24  Martin Uecker  
+
+   * gcc-interface/trans.c (Attribute_to_gnu): Add check for
+   flag_trampolines.
+
 2019-06-18  Arnaud Charlet  
 
 PR ada/80590
diff --git a/gcc/ada/gcc-interface/trans.c b/gcc/ada/gcc-interface/trans.c
index e2d2ddae3fe..cd244808954 100644
--- a/gcc/ada/gcc-interface/trans.c
+++ b/gcc/ada/gcc-interface/trans.c
@@ -2267,7 +2267,8 @@ Attribute_to_gnu (Node_Id gnat_node, tree 
*gnu_result_type_p, int attribute)
      if ((attribute == Attr_Access
       || attribute == Attr_Unrestricted_Access)
      && targetm.calls.custom_function_descriptors > 0
-     && Can_Use_Internal_Rep (Etype (gnat_node)))
+     && Can_Use_Internal_Rep (Etype (gnat_node))
+  && flag_trampolines != 1)
    FUNC_ADDR_BY_DESCRIPTOR (gnu_expr) = 1;
 
      /* Otherwise, we need to check that we are not violating the
@@ -5106,7 +5107,8 @@ Call_to_gnu (Node_Id gnat_node, tree *gnu_result_type_p, 
tree gnu_target,
   /* If the access type doesn't require foreign-compatible representation,
     be prepared for descriptors.  */
   if (targetm.calls.custom_function_descriptors > 0
-     && Can_Use_Internal_Rep (Etype (Prefix (Name (gnat_node)
+     && Can_Use_Internal_Rep (Etype (Prefix (Name (gnat_node
+  && flag_trampolines != 1)
    by_descriptor = true;
 }
   else if (Nkind (Name (gnat_node)) == N_Attribute_Reference)
diff --git a/gcc/c/ChangeLog b/gcc/c/ChangeLog
index 103634eff73..b724ed7d99c 100644
--- a/gcc/c/ChangeLog
+++ b/gcc/c/ChangeLog
@@ -1,3 +1,9 @@
+2019-06-24  Martin Uecker  
+
+   * c-typeck.c (function_to_pointer_conversion): If using descriptors
+   instead of trampolines, amend function address with
+   FUNC_ADDR_BY_DESCRIPTOR and calls with ALL_EXPR_BY_DESCRIPTOR.
+
 2019-06-10  Jakub Jelinek  
 
    * c-parser.c (c_parser_pragma): Reject PRAGMA_OMP_SCAN.
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 6abfd101f30..2216518e024 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -1914,7 +1914,14 @@ function_to_pointer_conversion (location_t loc, tree exp)
   if (TREE_NO_WARNING (orig_exp))
 TREE_NO_WARNING (exp) = 1;
 
-  return build_unary_op (loc, ADDR_EXPR, exp, false);
+  tree r = build_unary_op (loc, ADDR_EXPR, exp, false);
+
+  if (TREE_CODE(r) == ADDR_EXPR
+  && targetm.calls.custom_function_descriptors > 0
+  && flag_trampolines == 0)
+ FUNC_ADDR_BY_DESCRIPTOR (r) = 1;
+
+  return r;
 }
 
 /* Mark EXP as read, not just set, for set but not used -Wunused
@@ -3136,6 +3143,12 

Re: [PATCH][gcc] libgccjit: add bitfield support

2019-06-24 Thread David Malcolm
On Mon, 2019-06-24 at 15:26 +, Andrea Corallo wrote:
> Hi all,
> second version here of the gcc_jit_context_new_bitfield patch
> addressing
> review comments.
> 
> Checked with make check-jit runs clean.
> 
> Bests
> 
> Andrea
> 
> 2019-06-20  Andrea Corallo andrea.cora...@arm.com
> 
> * docs/topics/compatibility.rst (LIBGCCJIT_ABI_12): New ABI tag.
> * docs/topics/types.rst: Add gcc_jit_context_new_bitfield.
> * jit-common.h (namespace recording): Add class bitfield.
> * jit-playback.c:
> (DECL_C_BIT_FIELD, SET_DECL_C_BIT_FIELD): Add macros.
> (playback::context::new_bitfield): New method.
> (playback::compound_type::set_fields): Add bitfield support.
> (playback::lvalue::mark_addressable): Was jit_mark_addressable make
> this
> a method of lvalue plus return a bool to communicate success.
> (playback::lvalue::get_address): Check for jit_mark_addressable
> return
> value.
> * jit-playback.h (new_bitfield): New method.
> (class bitfield): New class.
> (class lvalue): Add jit_mark_addressable method.
> * jit-recording.c (recording::context::new_bitfield): New method.
> (recording::bitfield::replay_into): New method.
> (recording::bitfield::write_to_dump): Likewise.
> (recording::bitfield::make_debug_string): Likewise.
> (recording::bitfield::write_reproducer): Likewise.
> * jit-recording.h (class context): Add new_bitfield method.
> (class field): Make it derivable by class bitfield.
> (class bitfield): Add new class.
> * libgccjit++.h (class context): Add new_bitfield method.
> * libgccjit.c (struct gcc_jit_bitfield): New structure.
> (gcc_jit_context_new_bitfield): New function.
> * libgccjit.h
> (LIBGCCJIT_HAVE_gcc_jit_context_new_bitfield) New macro.
> (gcc_jit_context_new_bitfield): New function.
> * libgccjit.map (LIBGCCJIT_ABI_12) New ABI tag.
> 
> 
> 2019-06-20  Andrea Corallo andrea.cora...@arm.com
> 
> * jit.dg/all-non-failing-tests.h: Add test-accessing-bitfield.c.
> * jit.dg/test-accessing-bitfield.c: New testcase.
> * jit.dg/test-error-gcc_jit_context_new_bitfield-invalid-type.c:
> Likewise.
> * jit.dg/test-error-gcc_jit_context_new_bitfield-invalid-width.c:
> Likewise.
> * jit.dg/test-error-gcc_jit_lvalue_get_address-bitfield.c:
> Likewise.

Thanks for the updated patch.

[...]

> diff --git a/gcc/jit/jit-playback.c b/gcc/jit/jit-playback.c
> index b74495c..a6a9e2d 100644
> --- a/gcc/jit/jit-playback.c
> +++ b/gcc/jit/jit-playback.c
> @@ -47,6 +47,12 @@ along with GCC; see the file COPYING3.  If not see
>  #include "jit-builtins.h"
>  #include "jit-tempdir.h"
>  
> +/* Compare with gcc/c-family/c-common.h
> +   This is redifined here to avoid depending from the C frontend.  */
> +#define DECL_C_BIT_FIELD(NODE) \
> +  (DECL_LANG_FLAG_4 (FIELD_DECL_CHECK (NODE)) == 1)
> +#define SET_DECL_C_BIT_FIELD(NODE) \
> +  (DECL_LANG_FLAG_4 (FIELD_DECL_CHECK (NODE)) = 1)

Can you rename these (and their users) from
  *_C_BIT_FIELD
to
  *_JIT_BIT_FIELD
(and update the comment please).

Nit: "redifined" -> "redefined".
 
[...]

> diff --git a/gcc/testsuite/jit.dg/all-non-failing-tests.h 
> b/gcc/testsuite/jit.dg/all-non-failing-tests.h
> index 9a10418..f7af8e9 100644
> --- a/gcc/testsuite/jit.dg/all-non-failing-tests.h
> +++ b/gcc/testsuite/jit.dg/all-non-failing-tests.h
> @@ -8,6 +8,13 @@
> hooks provided by each test case.  */
>  #define COMBINED_TEST
>  
> +/* test-accessing-bitfield.c */
> +#define create_code create_code_accessing_bitfield
> +#define verify_code verify_code_accessing_bitfield
> +#include "test-accessing-bitfield.c"
> +#undef create_code
> +#undef verify_code
> +
>  /* test-accessing-struct.c */
>  #define create_code create_code_accessing_struct
>  #define verify_code verify_code_accessing_struct

You should also add an entry containing
   create_code_accessing_bitfield
   verify_code_accessing_bitfield
to the "testcases" array at the bottom of the file, so that the new
tests are also run as part of test-combination.c and test-threads.c;
hopefully there are no conflicts with existing tests in the suite (e.g.
names of generated functions within the gcc_jit_context).

[...]

> diff --git 
> a/gcc/testsuite/jit.dg/test-error-gcc_jit_context_new_bitfield-invalid-width.c
>  
> b/gcc/testsuite/jit.dg/test-error-gcc_jit_context_new_bitfield-invalid-width.c
> new file mode 100644
> index 000..6cb151b
> --- /dev/null
> +++ 
> b/gcc/testsuite/jit.dg/test-error-gcc_jit_context_new_bitfield-invalid-width.c
> @@ -0,0 +1,40 @@
> +#include 
> +#include 
> +
> +#include "libgccjit.h"
> +
> +#include "harness.h"
> +
> +/* Try to declare a bit-field with invalid width.  */
> +
> +void
> +create_code (gcc_jit_context *ctxt, void *user_data)
> +{
> +  gcc_jit_type *short_type =
> +gcc_jit_context_get_type (ctxt, GCC_JIT_TYPE_SHORT);
> +  gcc_jit_field *i =
> +gcc_jit_context_new_bitfield (ctxt,
> +   NULL,
> +   short_type,
> +   3,
> +   "i");
> +  gcc_jit_field *j =
> +g

Re: [PATCH][gcc] libgccjit: check result_type in gcc_jit_context_new_binary_op

2019-06-24 Thread David Malcolm
On Mon, 2019-06-24 at 16:37 +, Andrea Corallo wrote:
> David Malcolm writes:
> 
> > On Mon, 2019-06-24 at 15:30 +, Andrea Corallo wrote:
> > > Hi all,
> > > second version for this patch.
> > > Given the suggestion for the bit-field one I've tried to improve
> > > also
> > > here the error message.
> > 
> > Thanks.
> > 
> > > I've added a simple testcase as requested, here I'm trying to do
> > > *void=int+int.
> > > This without checking would normally crash verifying gimple.
> > 
> > Thanks.  FWIW, I think the testcase can be simplified slightly, in
> > that
> > all that's needed is a bogus call to gcc_jit_context_new_binary_op,
> > so
> > I don't think the testcase needs the calls to:
> >   gcc_jit_context_new_function,
> >   gcc_jit_function_new_block, and
> >   gcc_jit_block_end_with_return,
> > it just needs the types and the gcc_jit_context_new_binary_op call.
> 
> Hi Dave,
> thanks for your feedback.
> I've tried that but the reproducer is then incomplete with no call to
> gcc_jit_context_new_binary_op so I would keep it like it is if you
> are
> ok with that.

Sorry, I think I was unclear.

What I meant is that I think you can remove the calls I mentioned, but
keep the call to gcc_jit_context_new_binary_op, moving it to be a "top-
level" call within create_code (discarding the result).  That ought to
be enough to trigger the error within the gcc_jit_context.

Does that make more sense?

> > > More complex cases can be cause of crashes having the
> > > result type structures etc...
> > > 
> > > Tested with make check-jit
> > > OK for trunk?
> > 
> > Looks good as-is, or you may prefer to simplify the testcase.
> > 
> > Thanks for the patch.
> > 
> > BTW, I don't see you listed in the MAINTAINERS file; are you able
> > to
> > commit patches yourself?
> > 
> > Dave
> 
> Sorry I realize my "OK for trunk?" was quite misleading.
> I'm not a maintainer and till now I have now write access so I can't
> apply patches myself.

I believe ARM has a corporate copyright-assignment in place with the
FSF for GCC contributions.

I can commit the patch myself; alternatively, do you want to get commit
access?

Dave



Re: Start implementing -frounding-math

2019-06-24 Thread Marc Glisse

On Mon, 24 Jun 2019, Szabolcs Nagy wrote:


On 22/06/2019 23:21, Marc Glisse wrote:

We should care about the C standard, and do whatever makes sense for C++ 
without expecting the C++ standard to tell us exactly what that is. We
can check what visual studio and intel do, but we don't have to follow them.

-frounding-math is supposed to be equivalent to "#pragma stdc fenv_access on" 
covering the whole program.


i think there are 4 settings that make sense:
(i think function level granularity is ok for
this, iso c has block scope granularity, gcc
has translation unit level granularity.)

(1) except flags + only caller observes it.
i.e. exception flags raised during the execution
of the function matter, but only the caller
observes the flags by checking them.

(2) rounding mode + only caller changes it.
i.e. rounding mode may not be the default during
the execution of the function, but only the
caller may change the rounding mode.

(3) except flags + anything may observe/unset it.
i.e. exception flags raised during the execution
of the function matter, and any call or inline
asm may observe or unset them (unless the
compiler can prove otherwise).

(4) rounding mode + anything may change it.
i.e. rounding mode may not be the default or
change during the execution of a function,
and any call or inline asm may change it.

i think -frounding-math implements (2) fairly reliably,


I hadn't thought of it that way, but it is true that this is fairly well 
handled. I could possibly use this in some places in CGAL, using a wrapper 
so I can specify noinline/noipa at the call site. I'll have to experiment.


In particular it means that if I use -frounding-math to enable (4), there 
are valid uses where it will cause a speed regression :-(



and #pragma stdc fenv_access on requires (3) and (4).

-ftrapping-math was never clear, but it should
probably do (1) or (5) := (3)+"exceptions may trap".

so iso c has 2 levels: fenv access on/off, where
"on" means that essentially everything has to be
compiled with (3) and (4) (even functions that
don't do anything with fenv). this is not very
practical: most extern calls don't modify the fenv
so fp operations can be reordered around them,
(1) and (2) are more relaxed about this, however
that model needs fp barriers around the few calls
that actually does fenv access.

to me (1) + (2) + builtins for fp barriers seems
more useful than iso c (3) + (4), but iso c is
worth implementing too, since that's the standard.
so ideally there would be multiple flags/function
attributes and builtin barriers to make fenv access
usable in practice. (however not many things care
about fenv access so i don't know if that amount
of work is justifiable).


That makes sense. If we got (4), the interest for (2) would depend a lot 
on the speed difference. If the difference is small enough, then having 
only (4) might suffice. But at least separating rounding from exception 
flags seems good.


Depending on how we change things, it could be nice to add to the 
decription of -frounding-math the precision you gave above (only the 
caller may change it).



For constant expressions, I see a difference between
constexpr double third = 1. / 3.;
which really needs to be done at compile time, and
const double third = 1. / 3.;
which will try to evaluate the rhs as constexpr, but where the program is still 
valid if that fails. The second one clearly should refuse to be
evaluated at compile time if we are specifying a dynamic rounding direction. 
For the first one, I am not sure. I guess you should only write
that in "fenv_access off" regions and I wouldn't mind a compile error.

iso c specifies rules for const expressions:
http://port70.net/~nsz/c/c11/n1570.html#F.8.4

static/thread storage duration is evaluated with
default rounding mode and no exceptions are signaled.

other initialization is evaluated at runtime.
(i.e. rounding-mode dependent result and
exception flags are observable).


Thanks for the reference.

--
Marc Glisse


[Darwin, PPC, testsuite, committed] Skip tests for unimplemented functionality.

2019-06-24 Thread Iain Sandoe
The -mno-speculate-indirect-jumps functionality is not implemented for
Darwin and, given that it's deprecated, is unlikely to be. So skip the tests
for it that fail on Darwin.

tested on powerpc-darwin9,
applied to mainline
thanks
Iain.

2019-06-24  Iain Sandoe  

* gcc.target/powerpc/safe-indirect-jump-1.c: Skip for Darwin.
* gcc.target/powerpc/safe-indirect-jump-7.c: Likewise.

diff --git a/gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-1.c 
b/gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-1.c
index 16ccfe4..b9ad8c1 100644
--- a/gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-1.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-skip-if "not implemented for Darwin" { powerpc*-*-darwin* } } */
 /* { dg-additional-options "-mno-speculate-indirect-jumps" } */
 /* { dg-warning "'-mno-speculate-indirect-jumps' is deprecated" "" { target 
*-*-* } 0 } */
 
diff --git a/gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-7.c 
b/gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-7.c
index e7d81d4..a316e66 100644
--- a/gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-7.c
+++ b/gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-7.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-skip-if "not implemented for Darwin" { powerpc*-*-darwin* } } */
 /* { dg-additional-options "-mno-speculate-indirect-jumps" } */
 /* { dg-warning "'-mno-speculate-indirect-jumps' is deprecated" "" { target 
*-*-* } 0 } */
 



Re: std::vector code cleanup fixes optimizations

2019-06-24 Thread François Dumont

Hi

    Any feedback regarding this patch ?

Thanks

On 5/14/19 7:46 AM, François Dumont wrote:

Hi

    This is the patch on vector to:

- Optimize sizeof in Versioned namespace mode. We could go one step 
further by removing _M_p from _M_finish and just transform it into an 
offset but it is a little bit more impacting for the code.


- Implement the swap optimization already done on main std::vector 
template class.


- Fix move constructor so that it is noexcept no matter allocator move 
constructor noexcept qualification


- Optimize move constructor with allocator when allocator type is 
always equal.


- Use shortcuts in C++11 by skipping the _M_XXX_dispatch methods. 
Those are now defined only in pre-C++11 mode, I can't see any abi 
issue in doing so.


    * include/bits/stl_bvector.h
    [_GLIBCXX_INLINE_VERSION](_Bvector_impl_data::_M_start): Define as
    _Bit_type*.
    (_Bvector_impl_data(const _Bvector_impl_data&)): Default.
    (_Bvector_impl_data(_Bvector_impl_data&&)): Delegate to latter.
    (_Bvector_impl_data::operator=(const _Bvector_impl_data&)): Default.
(_Bvector_impl_data::_M_move_data(_Bvector_impl_data&&)): Use latter.
    (_Bvector_impl_data::_M_reset()): Likewise.
    (_Bvector_impl_data::_M_swap_data): New.
    (_Bvector_impl::_Bvector_impl(_Bvector_impl&&)): Implement 
explicitely.
    (_Bvector_impl::_Bvector_impl(_Bit_alloc_type&&, 
_Bvector_impl&&)): New.
    (_Bvector_base::_Bvector_base(_Bvector_base&&, const 
allocator_type&)):

    New, use latter.
    (vector::vector(vector&&, const allocator_type&, true_type)): New, 
use

    latter.
    (vector::vector(vector&&, const allocator_type&, false_type)): New.
    (vector::vector(vector&&, const allocator_type&)): Use latters.
    (vector::vector(const vector&, const allocator_type&)): Adapt.
    [__cplusplus >= 201103](vector::vector(_InputIt, _InputIt,
    const allocator_type&)): Use _M_initialize_range.
    (vector::operator[](size_type)): Use iterator operator[].
    (vector::operator[](size_type) const): Use const_iterator operator[].
    (vector::swap(vector&)): Adapt.
    (vector::_M_initialize(size_type)): Add assertions on allocators.
    Use _M_swap_data.
    [__cplusplus >= 201103](vector::insert(const_iterator, _InputIt,
    _InputIt)): Use _M_insert_range.
    [__cplusplus >= 201103](vector::_M_initialize_dispatch): Remove.
    [__cplusplus >= 201103](vector::_M_insert_dispatch): Remove.
    * testsuite/23_containers/vector/bool/allocator/swap.cc: Adapt.
    * 
testsuite/23_containers/vector/bool/cons/noexcept_move_construct.cc:

    Add check.

Tested under Linux x86_64, normal and debug modes.

Ok to commit ?

François





[Darwin, PPC, testsuite, committed] Fix spec-barr-1.c for Darwin.

2019-06-24 Thread Iain Sandoe
We just needed to adjust the regex to accept Darwin's register names.

tested on powerpc-darwin9, powerpc-linux-gnu,
applied to mainline
thanks
Iain

2019-06-24  Iain Sandoe  

* gcc.target/powerpc/spec-barr-1.c: Adjust scan assembler regex
to recognise Darwin's register names.

diff --git a/gcc/testsuite/gcc.target/powerpc/spec-barr-1.c 
b/gcc/testsuite/gcc.target/powerpc/spec-barr-1.c
index a22bf58..d837515 100644
--- a/gcc/testsuite/gcc.target/powerpc/spec-barr-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/spec-barr-1.c
@@ -6,4 +6,4 @@ void foo ()
   __builtin_ppc_speculation_barrier ();
 }
 
-/* { dg-final { scan-assembler "ori 31,31,0" } } */
+/* { dg-final { scan-assembler {ori\s+r?31,r?31,r?0} } } */



[Darwin, testsuite, committed] Fix isysroot-1.c for Darwin versions with fixincludes for stdio.h.

2019-06-24 Thread Iain Sandoe
For the test to succeed there needs to be some header that is to be found in
the 'expected' place i.e. /usr/include/.  It's important that it is 
not the
name of a header for which fixincludes have been applied, since such headers
will be found in the gcc include-fixed dir and, in general, reference additional
headers.  The dummy sysroot will prevent the additional headers from being
found, resulting in a failed test.  The fix is to use a header name that isn’t
expected to be present in a real sysroot.

tested on x86_64-darwin18, x86_64-darwin16, applied to mainline
thanks
Iain

2019-06-24  Iain Sandoe  

* gcc.dg/cpp/isysroot-1.c (main): Use  as the test header.
* gcc.dg/cpp/usr/include/stdio.h: Rename...
* gcc.dg/cpp/usr/include/example.h: ... to this.

diff --git a/gcc/testsuite/gcc.dg/cpp/isysroot-1.c 
b/gcc/testsuite/gcc.dg/cpp/isysroot-1.c
index 7263ce4..4c54f9e 100644
--- a/gcc/testsuite/gcc.dg/cpp/isysroot-1.c
+++ b/gcc/testsuite/gcc.dg/cpp/isysroot-1.c
@@ -1,10 +1,17 @@
 /* { dg-options "-isysroot ${srcdir}/gcc.dg/cpp" } */
 /* { dg-do compile  { target *-*-darwin* } } */
 
-#include 
+/* For the test to succeed there needs to be some header that is to be found
+   in the 'expected' place i.e. /usr/include/.  It's important that
+   it is not the name of a header for which fixincludes have been applied,
+   since such headers will be found in the gcc include-fixed dir and, in
+   general, reference additional headers.  The dummy sysroot will prevent the
+   additional headers from being found, resulting in a failed test.  So use
+   a header name we don't expect to see. */
+#include 
 int main()
 {
-  /* Special stdio.h supplies function foo.  */
+  /* Special example.h supplies function foo.  */
   void (*x)(void) = foo;
   return 0;
 }
diff --git a/gcc/testsuite/gcc.dg/cpp/usr/include/example.h 
b/gcc/testsuite/gcc.dg/cpp/usr/include/example.h
new file mode 100644
index 000..c674e89
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/cpp/usr/include/example.h
@@ -0,0 +1,4 @@
+/* Used by gcc.dg/cpp/isysroot-1.c to test isysroot.  */
+void foo()
+{
+}
diff --git a/gcc/testsuite/gcc.dg/cpp/usr/include/stdio.h 
b/gcc/testsuite/gcc.dg/cpp/usr/include/stdio.h
deleted file mode 100644
index c674e89..000
--- a/gcc/testsuite/gcc.dg/cpp/usr/include/stdio.h
+++ /dev/null
@@ -1,4 +0,0 @@
-/* Used by gcc.dg/cpp/isysroot-1.c to test isysroot.  */
-void foo()
-{
-}



Re: [PATCH] Add .gnu.lto_.meta section.

2019-06-24 Thread Richard Biener
On Mon, Jun 24, 2019 at 3:31 PM Martin Liška  wrote:
>
> On 6/24/19 2:44 PM, Richard Biener wrote:
> > On Mon, Jun 24, 2019 at 2:12 PM Martin Liška  wrote:
> >>
> >> On 6/24/19 2:02 PM, Richard Biener wrote:
> >>> On Fri, Jun 21, 2019 at 4:01 PM Martin Liška  wrote:
> 
>  On 6/21/19 2:57 PM, Jan Hubicka wrote:
> > This looks like good step (and please stream it in host independent
> > way). I suppose all these issues can be done one-by-one.
> 
>  So there's a working patch for that. However one will see following 
>  errors
>  when using an older compiler or older LTO bytecode:
> 
>  $ gcc main9.o -flto
>  lto1: fatal error: bytecode stream in file ‘main9.o’ generated with LTO 
>  version -25480.4493 instead of the expected 9.0
> 
>  $ gcc main.o
>  lto1: internal compiler error: compressed stream: data error
> >>>
> >>> This is because of your change to bitfields or because with the old
> >>> scheme the header with the
> >>> version is compressed (is it?).
> >>
> >> Because currently also the header is compressed.
> >
> > That was it, yeah :/  Stupid decisions in the past.
> >
> > I guess we have to bite the bullet and do this kind of incompatible
> > change, accepting
> > the odd error message above.
> >
> >>> I'd simply avoid any layout changes
> >>> in the version check range.
> >>
> >> Well, then we have to find out how to distinguish between compression 
> >> algorithms.
> >>
> >>>
>  To be honest, I would prefer the new .gnu.lto_.meta section.
>  Richi why is that so ugly?
> >>>
> >>> Because it's a change in the wrong direction and doesn't solve the
> >>> issue we already
> >>> have (cannot determine if a section is compressed or not).
> >>
> >> That's not true, the .gnu.lto_.meta section will be always uncompressed 
> >> and we can
> >> also backport changes to older compiler that can read it and print a 
> >> proper error
> >> message about LTO bytecode version mismatch.
> >
> > We can always backport changes, yes, but I don't see why we have to.
>
> I'm fine with the backward compatibility break. But we should also consider 
> lto-plugin.c
> that is parsing following 2 sections:
>
> 91  #define LTO_SECTION_PREFIX  ".gnu.lto_.symtab"
> 92  #define LTO_SECTION_PREFIX_LEN  (sizeof (LTO_SECTION_PREFIX) - 1)
> 93  #define OFFLOAD_SECTION ".gnu.offload_lto_.opts"
> 94  #define OFFLOAD_SECTION_LEN (sizeof (OFFLOAD_SECTION) - 1)

Yeah, I know.  And BFD and gold hard-coded those __gnu_lto_{v1,slim} symbols...

> >
> >>> ELF section overhead
> >>> is quite big if you have lots of small functions.
> >>
> >> My patch is actually shrinking space as I'm suggesting to add _one_ extra 
> >> ELF section
> >> and remove the section header from all other LTO sections. That will save 
> >> space
> >> for all function sections.
> >
> > But we want the header there to at least say if the section is
> > compressed or not.
> > The fact that we have so many ELF section means we have the redundant 
> > version
> > info everywhere.
> >
> > We should have a single .gnu.lto_ section (and also get rid of those
> > __gnu_lto_v1 and __gnu_lto_slim COMMON symbols - checking for
> > existence of a symbol is more expensive compared to existence
> > of a section).
>
> I like removal of the 2 aforementioned sections. To be honest I would 
> recommend to
> add a new .gnu.lto_.meta section.

Why .meta?  Why not just .gnu.lto_?

> We can use it instead of __gnu_lto_v1 and we can
> have a flag there instead of __gnu_lto_slim. As a second step, I'm willing to 
> concatenate all
>
>   LTO_section_function_body,
>   LTO_section_static_initializer
>
> sections into a single one. That will require an index that will have to be 
> created. I can discuss
> that with Honza as he suggested using something smarter than function names.

I think the index belongs to symtab?

Let's properly do it if we want to change it.  Removing of
__gnu_lto_v1/slim is going to be
the most intrusive change btw. and orthogonal to the section changes.

Richard.

>
> Thoughts?
> Martin
>
> >
> > Richard.
> >
> >> Martin
> >>
> >>>
> >>> Richard.
> >>>
> 
>  Martin
> >>
>


Go patch committed: Open code string equality comparisons

2019-06-24 Thread Ian Lance Taylor
This Go patch by Cherry Zhang open codes string equality with builtin
memcmp. This allows further optimizations in the backend.
Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 272620)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-338e4baf88a4ae676205dff601dbef2d31b19d2d
+89b442a0100286ee569b8d2562ce1b2ea602f7e7
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 272620)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -6226,10 +6226,27 @@ Binary_expression::do_flatten(Gogo* gogo
   bool is_idiv_op = ((this->op_ == OPERATOR_DIV &&
   left_type->integer_type() != NULL)
  || this->op_ == OPERATOR_MOD);
+  bool is_string_op = (left_type->is_string_type()
+   && this->right_->type()->is_string_type());
+
+  if (is_string_op)
+{
+  // Mark string([]byte) operands to reuse the backing store.
+  // String comparison does not keep the reference, so it is safe.
+  Type_conversion_expression* lce =
+this->left_->conversion_expression();
+  if (lce != NULL && lce->expr()->type()->is_slice_type())
+lce->set_no_copy(true);
+  Type_conversion_expression* rce =
+this->right_->conversion_expression();
+  if (rce != NULL && rce->expr()->type()->is_slice_type())
+rce->set_no_copy(true);
+}
 
   if (is_shift_op
   || (is_idiv_op
- && (gogo->check_divide_by_zero() || gogo->check_divide_overflow(
+ && (gogo->check_divide_by_zero() || gogo->check_divide_overflow()))
+  || is_string_op)
 {
   if (!this->left_->is_variable() && !this->left_->is_constant())
 {
@@ -7217,19 +7234,42 @@ Expression::comparison(Translate_context
 
   if (left_type->is_string_type() && right_type->is_string_type())
 {
-  // Mark string([]byte) operands to reuse the backing store.
-  // String comparison does not keep the reference, so it is safe.
-  Type_conversion_expression* lce = left->conversion_expression();
-  if (lce != NULL && lce->expr()->type()->is_slice_type())
-lce->set_no_copy(true);
-  Type_conversion_expression* rce = right->conversion_expression();
-  if (rce != NULL && rce->expr()->type()->is_slice_type())
-rce->set_no_copy(true);
+  go_assert(left->is_variable() || left->is_constant());
+  go_assert(right->is_variable() || right->is_constant());
 
   if (op == OPERATOR_EQEQ || op == OPERATOR_NOTEQ)
{
- left = Runtime::make_call(Runtime::EQSTRING, location, 2,
-   left, right);
+  // (l.len == r.len
+  //  ? (l.ptr == r.ptr ? true : memcmp(l.ptr, r.ptr, r.len) == 0)
+  //  : false)
+  Expression* llen = Expression::make_string_info(left,
+  STRING_INFO_LENGTH,
+  location);
+  Expression* rlen = Expression::make_string_info(right,
+  STRING_INFO_LENGTH,
+  location);
+  Expression* leneq = Expression::make_binary(OPERATOR_EQEQ, llen, 
rlen,
+  location);
+  Expression* lptr = Expression::make_string_info(left->copy(),
+  STRING_INFO_DATA,
+  location);
+  Expression* rptr = Expression::make_string_info(right->copy(),
+  STRING_INFO_DATA,
+  location);
+  Expression* ptreq = Expression::make_binary(OPERATOR_EQEQ, lptr, 
rptr,
+  location);
+  Expression* btrue = Expression::make_boolean(true, location);
+  Expression* call = Runtime::make_call(Runtime::MEMCMP, location, 3,
+lptr->copy(), rptr->copy(),
+rlen->copy());
+  Type* int32_type = Type::lookup_integer_type("int32");
+  Expression* zero = Expression::make_integer_ul(0, int32_type, 
location);
+  Expression* cmp = Expression::make_binary(OPERATOR_EQEQ, call, zero,
+location);
+  Expression* cond = Expression::make_conditional(ptreq, btrue, cmp,
+ 

Re: Start implementing -frounding-math

2019-06-24 Thread Richard Biener
On Mon, Jun 24, 2019 at 4:57 PM Marc Glisse  wrote:
>
> On Mon, 24 Jun 2019, Richard Biener wrote:
>
>  -frounding-math is supposed to be equivalent to "#pragma stdc fenv_access
>  on" covering the whole program.
> 
>  For constant expressions, I see a difference between
>  constexpr double third = 1. / 3.;
>  which really needs to be done at compile time, and
>  const double third = 1. / 3.;
>  which will try to evaluate the rhs as constexpr, but where the program is
>  still valid if that fails. The second one clearly should refuse to be
>  evaluated at compile time if we are specifying a dynamic rounding
>  direction. For the first one, I am not sure. I guess you should only 
>  write
>  that in "fenv_access off" regions and I wouldn't mind a compile error.
> 
>  Note that C2x adds a pragma fenv_round that specifies a rounding 
>  direction
>  for a region of code, which seems relevant for constant expressions. That
>  pragma looks hard, but maybe some pieces would be nice to add.
> >>>
> >>> Hmm.  My thinking was along the line that at the start of main() the
> >>> C abstract machine might specify the initial rounding mode (and exception
> >>> state) is implementation defined and all constant expressions are 
> >>> evaluated
> >>> whilst being in this state.  So we can define that to round-to-nearest and
> >>> simply fold all constants in contexts we are allowed to evaluate at
> >>> compile-time as we see them?
> >>
> >> There are way too many such contexts. In C++, any initializer is
> >> constexpr-evaluated if possible (PR 85746 shows that this is bad for
> >> __builtin_constant_p), and I do want
> >> double d = 1. / 3;
> >> to depend on the dynamic rounding direction. I'd rather err on the other
> >> extreme and only fold when we are forced to, say
> >> constexpr double d = 1. / 3;
> >> or even reject it because it is inexact, if pragmas put us in a region
> >> with dynamic rounding.
> >
> > OK, fair enough.  I just hoped that global
> >
> > double x = 1.0/3.0;
> >
> > do not become runtime initializers with -frounding-math ...
>
> Ah, I wasn't thinking of globals. Ignoring the new pragma fenv_round,
> which I guess could affect this (the C draft isn't very explicit), the
> program doesn't have many chances to set a rounding mode before
> initializing globals. It could do so in the initializer of another
> variable, but relying on the order of initialization this way seems bad.
> Maybe in this case it would make sense to assume the default rounding
> mode...
>
> In practice, I would only set -frounding-math on a per function basis
> (possibly using pragma fenv_access), so the optimization of what happens
> to globals doesn't seem so important.
>
> >> Side remark, I am sad that Intel added rounded versions for scalars and
> >> 512 bit vectors but not for intermediate sizes, while I am most
> >> interested in 128 bits. Masking most of the 512 bits still causes the
> >> dreaded clock slow-down.
> >
> > Ick.  I thought this was vector-length agnostic...
>
> I think all of the new stuff in AVX512 is, except rounding...
>
> Also, the rounded functions have exceptions disabled, which may make
> them hard to use with fenv_access.
>
> >>> I guess builtins need the same treatment for -ftrapping-math as they
> >>> do for -frounding-math.  I think you already mentioned the default
> >>> of this flag doesn't make much sense (well, the flag isn't fully
> >>> honored/implemented).
> >>
> >> PR 54192
> >> (coincidentally, it caused a missed vectorization in
> >> https://stackoverflow.com/a/56681744/1918193 last week)
> >
> > I commented there.  Lets just make -frounding-math == FENV_ACCESS ON
> > and keep -ftrapping-math as whether FP exceptions raise traps.
>
> One issue is that the C pragmas do not let me convey that I am interested
> in dynamic rounding but not exception flags. It is possible to optimize
> quite a bit more with just rounding. In particular, the functions are pure
> (at some point we will have to teach the compiler the difference between
> the FP environment and general memory, but I'd rather wait).
>
> > Yeah.  Auto-vectorizing would also need adjustment of course (also
> > costing like estimate_num_insns or others).
>
> Anything that is only about optimizing the code in -frounding-math
> functions can wait, that's the good point of implementing a new feature.

Sure - the only thing we may want to avoid is designing us into a corner
we cannot easily escape from.  Whenever I thought about -frounding-math
and friends (and not doing asm()-like hacks ;)) I thought we need to make
the data dependence on the FP environment explicit.  So I'd have done

{ FP result, new FP ENV state } = FENV_PLUS (op1, op2, old FP ENV state);

with the usual caveat of representing multiple return values.  Our standard
way via a projection riding ontop of _Complex types works as long as you
use scalars and matching types, a more general projection fa

Re: Use ODR for canonical types construction in LTO

2019-06-24 Thread Jan Hubicka
Hi,
> 
> I thought I remembered someone's recent-ish work to treat specially
> types containing a char array, but I'm not finding it now.
> 
> > For dynamically allocated memory as well as for stack space after stack
> > slot sharing done in cfgexpand I see this is necessary since we do not
> > preserve any information about placement new.
> 
> Yes, untyped memory is different, but I'd think that memory allocated
> through (non-placement) new should also be considered typed.

I will try to return to this once the code is cleaned up. It would
be quite interesting to make this better defined.
> >
> > I think this is what Richard refers to the code generating clobber
> > statements that is only leaking as-base types to the middle-end visible
> > part of IL and the code in call.c copying base structures.
> 
> Right.  Is there a better way we could express copying/clobbering only
> part of the object without involving the as-base types?

I think currently two variants was discussed
 1) use the trik Richard proposed for class a containing fake
subclass a_as_base that has reduced size (i.e. is the AS_BASE type
we have now) and adjust all component_refs accordingly introducing
the view_convert_exprs for outer decls.

Then clobbers and copies in call.c could use the as_base type.
 2) expose IS_FAKE_BASE_TYPE to middle-end and teach TBAA machinery
about the fact that these are actually same types for everything
it cares about.

We have similar hacks for Fortran commons already, but of couse
it is not most pretty (I outlined plan in previous mail)

I would personally lean towards 2 since it will keep all component_refs
in its current natural form and because I know how to implement it :) 
I guess it is Richi and yours call to pick variant which is better...
> 
> > So at this time basically every C++ type can inter-operate with non-C++.
> > I was thinking of relaxing this somewhat but wanted to see if C++
> > standard says something here. Things that may be sensible include:
> >  1) perhaps non-POD types especially those with vptr pointers do
> > not need to be inter-operable.
> 
> PODs were intended to be the C-compatible subset, yes.
> 
> >  2) anonymous namespace types
> >  3) types in namespace
> 
> As long as these types don't have explicit language linkage (e.g.
> extern "C"), sure.

Great, I will add those to my TODOs.
Do we have any way to tell language linkage from middle-end?

Honza
> 
> Jason


Re: Use ODR for canonical types construction in LTO

2019-06-24 Thread Jason Merrill
On Mon, Jun 24, 2019 at 1:23 PM Richard Biener  wrote:
> On Mon, 24 Jun 2019, Jason Merrill wrote:
>
> > On Mon, Jun 24, 2019 at 12:40 PM Jason Merrill  wrote:
> > > On Mon, Jun 24, 2019 at 11:57 AM Jan Hubicka  wrote:
> > > >
> > > > > > > As
> > > > > > >
> > > > > > >   class a var;
> > > > > > >   class b:a {} *bptr;
> > > > > > >
> > > > > > >   var.foo;
> > > > > > >
> > > > > > > Expanding this as var.as_base_a.foo would make access path oracle 
> > > > > > > to
> > > > > > > disambiguate it from bptr->as_base_b->as_base_a.foo which is 
> > > > > > > wrong with
> > > > > > > gimple memory moel where we allow placement new replacing var by
> > > > > > > instance of b.
> > > > >
> > > > > Why do we allow that?  I would expect that to only be allowed if a is
> > > > > something like aligned_storage, i.e. a thin wrapper around a char/byte
> > > > > buffer.
> > > >
> > > > I think because Richard defined gimple memory model this way after fair
> > > > amount of frustration from placement news, stack slot sharing issues
> > > > and non-conforming codebases :)
> > > >
> > > > I think for normal user variables this is overly conservative.
> > > > At the moment TBAA is bit of a mess. Once it is cleaned up, we could
> > > > see if restricting this more pays back and then we would need to
> > > > find way to pass the info to middle-end (as it does not
> > > > know difference between aligned_storage and other stuff).
> > >
> > > I thought I remembered someone's recent-ish work to treat specially
> > > types containing a char array, but I'm not finding it now.
> >
> > Specifically, aligned_storage has a first member which is a char
> > array, and a char array can alias anything, so we can put anything in
> > a char array, and a class can alias its first member, so transitively
> > we can put anything in such a class.
>
> You refer to TYPE_TYPELESS_STORAGE I guess.

Yes, thanks.

Jason


Re: Odd behaviour of indirect_ref_may_alias_decl_p in vn oracle

2019-06-24 Thread Richard Biener
On Mon, 24 Jun 2019, Jan Hubicka wrote:

> Hi,
> as discussed on IRC today, after all this patch should be correct.  I
> have re-tested it with x86_64-linux in the following variant which also
> moves load of ptrtype1 that is unnecesarily early.
> 
> Bootstrapped/regtested x86_64-linux, OK?

OK.

Richard.

>   * tree-ssa-alias.c (indirect_ref_may_alias_decl_p): Also give up
>   TBAA path when base2_alias_set is 0.
> 
> Index: tree-ssa-alias.c
> ===
> --- tree-ssa-alias.c  (revision 272614)
> +++ tree-ssa-alias.c  (working copy)
> @@ -1458,10 +1466,8 @@ indirect_ref_may_alias_decl_p (tree ref1
>if (!flag_strict_aliasing || !tbaa_p)
>  return true;
>  
> -  ptrtype1 = TREE_TYPE (TREE_OPERAND (base1, 1));
> -
>/* If the alias set for a pointer access is zero all bets are off.  */
> -  if (base1_alias_set == 0)
> +  if (base1_alias_set == 0 || base2_alias_set == 0)
>  return true;
>  
>/* When we are trying to disambiguate an access with a pointer dereference
> @@ -1479,6 +1485,9 @@ indirect_ref_may_alias_decl_p (tree ref1
>if (base1_alias_set != base2_alias_set
>&& !alias_sets_conflict_p (base1_alias_set, base2_alias_set))
>  return false;
> +
> +  ptrtype1 = TREE_TYPE (TREE_OPERAND (base1, 1));
> +
>/* If the size of the access relevant for TBAA through the pointer
>   is bigger than the size of the decl we can't possibly access the
>   decl via that pointer.  */


Re: Use ODR for canonical types construction in LTO

2019-06-24 Thread Richard Biener
On Mon, 24 Jun 2019, Jason Merrill wrote:

> On Mon, Jun 24, 2019 at 12:40 PM Jason Merrill  wrote:
> > On Mon, Jun 24, 2019 at 11:57 AM Jan Hubicka  wrote:
> > >
> > > > > > As
> > > > > >
> > > > > >   class a var;
> > > > > >   class b:a {} *bptr;
> > > > > >
> > > > > >   var.foo;
> > > > > >
> > > > > > Expanding this as var.as_base_a.foo would make access path oracle to
> > > > > > disambiguate it from bptr->as_base_b->as_base_a.foo which is wrong 
> > > > > > with
> > > > > > gimple memory moel where we allow placement new replacing var by
> > > > > > instance of b.
> > > >
> > > > Why do we allow that?  I would expect that to only be allowed if a is
> > > > something like aligned_storage, i.e. a thin wrapper around a char/byte
> > > > buffer.
> > >
> > > I think because Richard defined gimple memory model this way after fair
> > > amount of frustration from placement news, stack slot sharing issues
> > > and non-conforming codebases :)
> > >
> > > I think for normal user variables this is overly conservative.
> > > At the moment TBAA is bit of a mess. Once it is cleaned up, we could
> > > see if restricting this more pays back and then we would need to
> > > find way to pass the info to middle-end (as it does not
> > > know difference between aligned_storage and other stuff).
> >
> > I thought I remembered someone's recent-ish work to treat specially
> > types containing a char array, but I'm not finding it now.
> 
> Specifically, aligned_storage has a first member which is a char
> array, and a char array can alias anything, so we can put anything in
> a char array, and a class can alias its first member, so transitively
> we can put anything in such a class.

You refer to TYPE_TYPELESS_STORAGE I guess.

Richard.


[PATCH] Fix PR90930, alias oracle parts

2019-06-24 Thread Richard Biener


When improving oracle limits for PR90316 I missed one line when
cut&pasting from the adjustment in get_continuation_for_phi.

Bootstrapped and tested on x86_64-unknown-linux-gnu applied to
trunk and branch.

Richard.

2019-06-24  Richard Biener  

PR tree-optimization/90930
PR tree-optimization/90316
* tree-ssa-alias.c (walk_non_aliased_vuses): Add missing
decrement of limit.

Index: gcc/tree-ssa-alias.c
===
--- gcc/tree-ssa-alias.c(revision 272283)
+++ gcc/tree-ssa-alias.c(working copy)
@@ -3008,6 +3008,7 @@ walk_non_aliased_vuses (ao_ref *ref, tre
  res = NULL;
  break;
}
+ --limit;
  if (stmt_may_clobber_ref_p_1 (def_stmt, ref))
{
  if (!translate)


Re: Odd behaviour of indirect_ref_may_alias_decl_p in vn oracle

2019-06-24 Thread Jan Hubicka
Hi,
as discussed on IRC today, after all this patch should be correct.  I
have re-tested it with x86_64-linux in the following variant which also
moves load of ptrtype1 that is unnecesarily early.

Bootstrapped/regtested x86_64-linux, OK?

* tree-ssa-alias.c (indirect_ref_may_alias_decl_p): Also give up
TBAA path when base2_alias_set is 0.

Index: tree-ssa-alias.c
===
--- tree-ssa-alias.c(revision 272614)
+++ tree-ssa-alias.c(working copy)
@@ -1458,10 +1466,8 @@ indirect_ref_may_alias_decl_p (tree ref1
   if (!flag_strict_aliasing || !tbaa_p)
 return true;
 
-  ptrtype1 = TREE_TYPE (TREE_OPERAND (base1, 1));
-
   /* If the alias set for a pointer access is zero all bets are off.  */
-  if (base1_alias_set == 0)
+  if (base1_alias_set == 0 || base2_alias_set == 0)
 return true;
 
   /* When we are trying to disambiguate an access with a pointer dereference
@@ -1479,6 +1485,9 @@ indirect_ref_may_alias_decl_p (tree ref1
   if (base1_alias_set != base2_alias_set
   && !alias_sets_conflict_p (base1_alias_set, base2_alias_set))
 return false;
+
+  ptrtype1 = TREE_TYPE (TREE_OPERAND (base1, 1));
+
   /* If the size of the access relevant for TBAA through the pointer
  is bigger than the size of the decl we can't possibly access the
  decl via that pointer.  */


Re: Use ODR for canonical types construction in LTO

2019-06-24 Thread Jason Merrill
On Mon, Jun 24, 2019 at 12:40 PM Jason Merrill  wrote:
> On Mon, Jun 24, 2019 at 11:57 AM Jan Hubicka  wrote:
> >
> > > > > As
> > > > >
> > > > >   class a var;
> > > > >   class b:a {} *bptr;
> > > > >
> > > > >   var.foo;
> > > > >
> > > > > Expanding this as var.as_base_a.foo would make access path oracle to
> > > > > disambiguate it from bptr->as_base_b->as_base_a.foo which is wrong 
> > > > > with
> > > > > gimple memory moel where we allow placement new replacing var by
> > > > > instance of b.
> > >
> > > Why do we allow that?  I would expect that to only be allowed if a is
> > > something like aligned_storage, i.e. a thin wrapper around a char/byte
> > > buffer.
> >
> > I think because Richard defined gimple memory model this way after fair
> > amount of frustration from placement news, stack slot sharing issues
> > and non-conforming codebases :)
> >
> > I think for normal user variables this is overly conservative.
> > At the moment TBAA is bit of a mess. Once it is cleaned up, we could
> > see if restricting this more pays back and then we would need to
> > find way to pass the info to middle-end (as it does not
> > know difference between aligned_storage and other stuff).
>
> I thought I remembered someone's recent-ish work to treat specially
> types containing a char array, but I'm not finding it now.

Specifically, aligned_storage has a first member which is a char
array, and a char array can alias anything, so we can put anything in
a char array, and a class can alias its first member, so transitively
we can put anything in such a class.

Jason


Go patch committed: Use builtin memcmp directly

2019-06-24 Thread Ian Lance Taylor
This Go patch by Cherry Zhang changes the Go frontend to call builtin
memcmp directly, instead of going through a C function __go_memcmp.
This allows more optimizations in the compiler backend.  Bootstrapped
and ran Go testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 272608)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-1232eef628227ef855c5fa6d94b31778b2e74a85
+338e4baf88a4ae676205dff601dbef2d31b19d2d
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 272608)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -6199,7 +6199,8 @@ Binary_expression::lower_compare_to_memc
   TYPE_INFO_SIZE);
 
   Expression* call = Runtime::make_call(Runtime::MEMCMP, loc, 3, a1, a2, len);
-  Expression* zero = Expression::make_integer_ul(0, NULL, loc);
+  Type* int32_type = Type::lookup_integer_type("int32");
+  Expression* zero = Expression::make_integer_ul(0, int32_type, loc);
   return Expression::make_binary(this->op_, call, zero, loc);
 }
 
Index: gcc/go/gofrontend/runtime.def
===
--- gcc/go/gofrontend/runtime.def   (revision 272608)
+++ gcc/go/gofrontend/runtime.def   (working copy)
@@ -29,7 +29,7 @@
 // result types.
 
 // The standard C memcmp function, used for struct comparisons.
-DEF_GO_RUNTIME(MEMCMP, "__go_memcmp", P3(POINTER, POINTER, UINTPTR), R1(INT))
+DEF_GO_RUNTIME(MEMCMP, "__builtin_memcmp", P3(POINTER, POINTER, UINTPTR), 
R1(INT32))
 
 // Decode a non-ASCII rune from a string.
 DEF_GO_RUNTIME(DECODERUNE, "runtime.decoderune", P2(STRING, INT),
Index: libgo/Makefile.am
===
--- libgo/Makefile.am   (revision 272608)
+++ libgo/Makefile.am   (working copy)
@@ -459,7 +459,6 @@ runtime_files = \
runtime/go-fieldtrack.c \
runtime/go-matherr.c \
runtime/go-memclr.c \
-   runtime/go-memcmp.c \
runtime/go-memequal.c \
runtime/go-nanotime.c \
runtime/go-now.c \
Index: libgo/Makefile.in
===
--- libgo/Makefile.in   (revision 272608)
+++ libgo/Makefile.in   (working copy)
@@ -244,8 +244,8 @@ am__objects_3 = runtime/aeshash.lo runti
runtime/go-cgo.lo runtime/go-construct-map.lo \
runtime/go-ffi.lo runtime/go-fieldtrack.lo \
runtime/go-matherr.lo runtime/go-memclr.lo \
-   runtime/go-memcmp.lo runtime/go-memequal.lo \
-   runtime/go-nanotime.lo runtime/go-now.lo runtime/go-nosys.lo \
+   runtime/go-memequal.lo runtime/go-nanotime.lo \
+   runtime/go-now.lo runtime/go-nosys.lo \
runtime/go-reflect-call.lo runtime/go-runtime-error.lo \
runtime/go-setenv.lo runtime/go-signal.lo \
runtime/go-unsafe-pointer.lo runtime/go-unsetenv.lo \
@@ -892,7 +892,6 @@ runtime_files = \
runtime/go-fieldtrack.c \
runtime/go-matherr.c \
runtime/go-memclr.c \
-   runtime/go-memcmp.c \
runtime/go-memequal.c \
runtime/go-nanotime.c \
runtime/go-now.c \
@@ -1343,8 +1342,6 @@ runtime/go-matherr.lo: runtime/$(am__dir
runtime/$(DEPDIR)/$(am__dirstamp)
 runtime/go-memclr.lo: runtime/$(am__dirstamp) \
runtime/$(DEPDIR)/$(am__dirstamp)
-runtime/go-memcmp.lo: runtime/$(am__dirstamp) \
-   runtime/$(DEPDIR)/$(am__dirstamp)
 runtime/go-memequal.lo: runtime/$(am__dirstamp) \
runtime/$(DEPDIR)/$(am__dirstamp)
 runtime/go-nanotime.lo: runtime/$(am__dirstamp) \
@@ -1436,7 +1433,6 @@ distclean-compile:
 @AMDEP_TRUE@@am__include@ 
@am__quote@runtime/$(DEPDIR)/go-fieldtrack.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ 
@am__quote@runtime/$(DEPDIR)/go-matherr.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@runtime/$(DEPDIR)/go-memclr.Plo@am__quote@
-@AMDEP_TRUE@@am__include@ @am__quote@runtime/$(DEPDIR)/go-memcmp.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ 
@am__quote@runtime/$(DEPDIR)/go-memequal.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ 
@am__quote@runtime/$(DEPDIR)/go-nanotime.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@runtime/$(DEPDIR)/go-nosys.Plo@am__quote@
Index: libgo/runtime/go-memcmp.c
===
--- libgo/runtime/go-memcmp.c   (revision 272608)
+++ libgo/runtime/go-memcmp.c   (nonexistent)
@@ -1,13 +0,0 @@
-/* go-memcmp.c -- the go memory comparison function.
-
-   Copyright 2012 The Go Authors. All rights reserved.
-   Use of this source code is governed by a BSD-style
-   license that can be found in the LICENSE file.  */
-
-#include "runtime.h"
-
-intgo
-__go_

Re: Use ODR for canonical types construction in LTO

2019-06-24 Thread Jason Merrill
On Mon, Jun 24, 2019 at 11:57 AM Jan Hubicka  wrote:
>
> Hi,
> thanks for comitting the patch!
> > > > As
> > > >
> > > >   class a var;
> > > >   class b:a {} *bptr;
> > > >
> > > >   var.foo;
> > > >
> > > > Expanding this as var.as_base_a.foo would make access path oracle to
> > > > disambiguate it from bptr->as_base_b->as_base_a.foo which is wrong with
> > > > gimple memory moel where we allow placement new replacing var by
> > > > instance of b.
> >
> > Why do we allow that?  I would expect that to only be allowed if a is
> > something like aligned_storage, i.e. a thin wrapper around a char/byte
> > buffer.
>
> I think because Richard defined gimple memory model this way after fair
> amount of frustration from placement news, stack slot sharing issues
> and non-conforming codebases :)
>
> I think for normal user variables this is overly conservative.
> At the moment TBAA is bit of a mess. Once it is cleaned up, we could
> see if restricting this more pays back and then we would need to
> find way to pass the info to middle-end (as it does not
> know difference between aligned_storage and other stuff).

I thought I remembered someone's recent-ish work to treat specially
types containing a char array, but I'm not finding it now.

> For dynamically allocated memory as well as for stack space after stack
> slot sharing done in cfgexpand I see this is necessary since we do not
> preserve any information about placement new.

Yes, untyped memory is different, but I'd think that memory allocated
through (non-placement) new should also be considered typed.

> Note that devirtualization machinery is bit more agressive than TBAA
> model I am currently aiming for (for example assuming that user variable
> of given type are not placement new replaced), but I think here we are
> relatively safe because we do so only for non-POD types where
> construction/destruction ought to be paired.

Agreed.

> > > Ick.  IIRC the as-base types were necessary only for
> > > copying and clobber operations that may not touch the possibly
> > > re-used tail-padding.
> >
> > And temporarily during layout, yes.  This is all closely related to PR 
> > 22488.
>
> I think this is what Richard refers to the code generating clobber
> statements that is only leaking as-base types to the middle-end visible
> part of IL and the code in call.c copying base structures.

Right.  Is there a better way we could express copying/clobbering only
part of the object without involving the as-base types?

> > > Btw, I still wonder what the ODR says in the face of language
> > > inter-operation and what this means here?  For C++ I suppose PODs
> > > are not ODR?
> >
> > The ODR applies to PODs just like other classes.  But the ODR says
> > nothing about language interoperation, that's all
> > implementation-defined.
>
> My patchset considers all C++ types inter-operating with non-C++ types.
> So first we load all types and do the following:
>
> During streaming I populate ODR type hash with ODR types and canonical
> type hash with types not originating from C++.
>
> Once all types are in memory I do following:
>  1) For every structure/union with linkage
> - see if there is structurally equivalent non-C++ type in canonical
>   type hash (where structural equivalence is defined in very
>   generous way ignoring pointer types,  type tags and field names so
>   interoperability with fortran is safe)
>
>   if there is no matching type and no detected ODR violation declare
>   mark the type to be handled by ODR name in step 2
>  2) for every structure/union originating from C++ compute the canonical
> type by canonical type hash query. If in 1) we decided that given
> ODR type is unique the canonical type hash compare type by name
> rather than by structure.
> I do not handle enums since those conflicts with integer that is
> declared in every translation unit.
>
> So at this time basically every C++ type can inter-operate with non-C++.
> I was thinking of relaxing this somewhat but wanted to see if C++
> standard says something here. Things that may be sensible include:
>  1) perhaps non-POD types especially those with vptr pointers do
> not need to be inter-operable.

PODs were intended to be the C-compatible subset, yes.

>  2) anonymous namespace types
>  3) types in namespace

As long as these types don't have explicit language linkage (e.g.
extern "C"), sure.

Jason


Re: [PATCH][gcc] libgccjit: check result_type in gcc_jit_context_new_binary_op

2019-06-24 Thread Andrea Corallo

David Malcolm writes:

> On Mon, 2019-06-24 at 15:30 +, Andrea Corallo wrote:
>> Hi all,
>> second version for this patch.
>> Given the suggestion for the bit-field one I've tried to improve also
>> here the error message.
>
> Thanks.
>
>> I've added a simple testcase as requested, here I'm trying to do
>> *void=int+int.
>> This without checking would normally crash verifying gimple.
>
> Thanks.  FWIW, I think the testcase can be simplified slightly, in that
> all that's needed is a bogus call to gcc_jit_context_new_binary_op, so
> I don't think the testcase needs the calls to:
>   gcc_jit_context_new_function,
>   gcc_jit_function_new_block, and
>   gcc_jit_block_end_with_return,
> it just needs the types and the gcc_jit_context_new_binary_op call.

Hi Dave,
thanks for your feedback.
I've tried that but the reproducer is then incomplete with no call to
gcc_jit_context_new_binary_op so I would keep it like it is if you are
ok with that.

>> More complex cases can be cause of crashes having the
>> result type structures etc...
>>
>> Tested with make check-jit
>> OK for trunk?
>
> Looks good as-is, or you may prefer to simplify the testcase.
>
> Thanks for the patch.
>
> BTW, I don't see you listed in the MAINTAINERS file; are you able to
> commit patches yourself?
>
> Dave

Sorry I realize my "OK for trunk?" was quite misleading.
I'm not a maintainer and till now I have now write access so I can't
apply patches myself.

Bests
  Andrea

>> Bests
>>   Andrea
>>
>> 2019-06-09  Andrea Corallo  andrea.cora...@arm.com
>>
>> * libgccjit.c (gcc_jit_context_new_binary_op): Check result_type to
>> be a
>> numeric type.
>>
>>
>> 2019-06-20  Andrea Corallo andrea.cora...@arm.com
>>
>> * jit.dg/test-error-gcc_jit_context_new_binary_op-bad-res-type.c:
>> New testcase.


[PR 90939] Remove outdated assert in ipcp_bits_lattice::meet_with

2019-06-24 Thread Martin Jambor
Hi,

in August 2016 Prathamesh implemented inter-procedural propagation of
known non-zero bits on integers.  In August that same year he then also
added the ability to track it for pointer, replacing separate alignment
tracking.

However, we still have an assert in ipcp_bits_lattice::meet_with from
the first commit that checks that any binary operation coming from an
arithmetic jump function is performed on integers.  Martin discovered
that when you compile chromium and stars are aligned correctly, you can
end up evaluating a pointer expression MAX_EXPR (param, 0) there which
trips the assert.

Unless Prathamesh can remember a reason why the assert is important, I
believe the correct thing is just to remove it.  In the case of this
MAX_EXPR, it will end up being evaluated in bit_value_binop which cannot
handle it and so we will end up with a BOTTOM lattice in the end.  In
the general case, bit_value_binop operates on widest_ints and so shoud
have no problem dealing with pointers.

I'm bootstrapping and testing the following patch to satisfy the rules
but since it only removes an assert and adds a testcase that I checked,
I do not expect any problems.  Is it OK for trunk, GCC 9 and 8?

Thanks,

Martin




2019-06-24  Martin Jambor  

PR ipa/90939
* ipa-cp.c (ipcp_bits_lattice::meet_with): Remove assert.

testsuite/
* g++.dg/lto/pr90939_[01].C: New test.
---
 gcc/ipa-cp.c |  1 -
 gcc/testsuite/g++.dg/lto/pr90939_0.C | 64 
 gcc/testsuite/g++.dg/lto/pr90939_1.C | 45 +++
 3 files changed, 109 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/lto/pr90939_0.C
 create mode 100644 gcc/testsuite/g++.dg/lto/pr90939_1.C

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index d3a88756a91..69c00a9c5a5 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -1085,7 +1085,6 @@ ipcp_bits_lattice::meet_with (ipcp_bits_lattice& other, 
unsigned precision,
   if (TREE_CODE_CLASS (code) == tcc_binary)
 {
   tree type = TREE_TYPE (operand);
-  gcc_assert (INTEGRAL_TYPE_P (type));
   widest_int o_value, o_mask;
   get_value_and_mask (operand, &o_value, &o_mask);
 
diff --git a/gcc/testsuite/g++.dg/lto/pr90939_0.C 
b/gcc/testsuite/g++.dg/lto/pr90939_0.C
new file mode 100644
index 000..8987c348015
--- /dev/null
+++ b/gcc/testsuite/g++.dg/lto/pr90939_0.C
@@ -0,0 +1,64 @@
+// PR ipa/90939
+// { dg-lto-do link }
+// { dg-lto-options { { -flto -O3 } } }
+
+
+typedef char uint8_t;
+template  class A {
+public:
+  A(T *);
+};
+template  const Derived &To(Base &p1) {
+  return static_cast(p1);
+}
+class H;
+template  const H *To(Base *p1) {
+  return p1 ? &To(*p1) : nullptr;
+}
+enum TextDirection : uint8_t;
+enum WritingMode : unsigned;
+class B {
+public:
+  WritingMode m_fn1();
+};
+class C {
+public:
+  int &m_fn2();
+};
+class D { double d;};
+class H : public D {};
+class F {
+public:
+  F(C, A, B *, WritingMode, TextDirection);
+};
+
+class G {
+public:
+  C NGLayoutAlgorithm_node;
+  B NGLayoutAlgorithm_space;
+  TextDirection NGLayoutAlgorithm_direction;
+  H NGLayoutAlgorithm_break_token;
+  G(A p1) __attribute__((noinline))
+: break_token_(&NGLayoutAlgorithm_break_token),
+container_builder_(NGLayoutAlgorithm_node, p1, 
&NGLayoutAlgorithm_space,
+   NGLayoutAlgorithm_space.m_fn1(),
+   NGLayoutAlgorithm_direction) {}
+  G(C p1, const H *) : G(&p1.m_fn2()) {}
+  A break_token_;
+  F container_builder_;
+};
+
+class I : G {
+public:
+  I(const D *) __attribute__((noinline));
+};
+C a;
+I::I(const D *p1) : G(a, To(p1)) {}
+
+D gd[10];
+
+int main (int argc, char *argv[])
+{
+  I i(&(gd[argc%2]));
+  return 0;
+}
diff --git a/gcc/testsuite/g++.dg/lto/pr90939_1.C 
b/gcc/testsuite/g++.dg/lto/pr90939_1.C
new file mode 100644
index 000..9add89494d7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/lto/pr90939_1.C
@@ -0,0 +1,45 @@
+typedef char uint8_t;
+template  class A {
+public:
+  A(T *);
+};
+
+enum TextDirection : uint8_t;
+enum WritingMode : unsigned;
+class B {
+public:
+  WritingMode m_fn1();
+};
+class C {
+public:
+  int &m_fn2();
+};
+
+class F {
+public:
+  F(C, A, B *, WritingMode, TextDirection);
+};
+class D { double d;};
+class H : public D {};
+
+
+
+template  A::A(T*) {}
+
+template class A;
+template class A;
+
+WritingMode __attribute__((noipa))
+B::m_fn1()
+{
+  return (WritingMode) 0;
+}
+
+int gi;
+int & __attribute__((noipa))
+C::m_fn2 ()
+{
+  return gi;
+}
+
+__attribute__((noipa)) F::F(C, A, B *, WritingMode, TextDirection) 
{}
-- 
2.21.0



Re: [PATCH][Arm] Remove constraint strings from define_expand constructs in the back end

2019-06-24 Thread Kyrill Tkachov

Hi Dennis,

On 6/24/19 4:13 PM, Dennis Zhang wrote:

Hi,

A number of Arm define_expand patterns have specified constraints for
their operands. But the constraint strings are ignored at expand time
and are therefore redundant/useless. We now avoid specifying constraints
in new define_expands, but we should clean up the existing define_expand
definitions.

For example, the constraint "=r" is removed in the following case:
(define_expand "reload_inhi"
     [(parallel [(match_operand:HI 0 "s_register_operand" "=r")
The "" marks with an empty constraint in define_expand are removed as 
well.


The patch is tested with the build configuration of
--target=arm-linux-gnueabi and it passes gcc/testsuite.


Thank you for the patch.

Unfortunately I've hit an ICE building an arm-none-eabi target with your 
patch.


This appears to be due to:

@@ -6767,9 +6767,9 @@
 ;; temporary if the address isn't offsettable -- push_reload doesn't seem
 ;; to take any notice of the "o" constraints on reload_memory_operand 
operand.

 (define_expand "reload_outhi"
-  [(parallel [(match_operand:HI 0 "arm_reload_memory_operand" "=o")
-      (match_operand:HI 1 "s_register_operand"    "r")
-      (match_operand:DI 2 "s_register_operand" "=&l")])]
+  [(parallel [(match_operand:HI 0 "arm_reload_memory_operand")
+      (match_operand:HI 1 "s_register_operand")
+      (match_operand:DI 2 "s_register_operand")])]
   "TARGET_EITHER"
   "if (TARGET_ARM)
  arm_reload_out_hi (operands);
@@ -6780,9 +6780,9 @@
 )

 (define_expand "reload_inhi"
-  [(parallel [(match_operand:HI 0 "s_register_operand" "=r")
-      (match_operand:HI 1 "arm_reload_memory_operand" "o")
-      (match_operand:DI 2 "s_register_operand" "=&r")])]
+  [(parallel [(match_operand:HI 0 "s_register_operand")
+      (match_operand:HI 1 "arm_reload_memory_operand")
+      (match_operand:DI 2 "s_register_operand")])]
   "TARGET_EITHER"
   "
   if (TARGET_ARM)


the reload_in and reload_out patterns are somewhat special:

https://gcc.gnu.org/onlinedocs/gccint/Standard-Names.html#Standard-Names

where the constraints seems to matter.

We should migrate these patterns to the recommended secondary_reload 
hook, but that would be a separate patches.


For now, please try removing changes to these patterns and making sure 
that the build succeeds.


Thanks,

Kyrill



Re: [PATCH] i386: Separate costs of RTL expressions from costs of moves

2019-06-24 Thread H.J. Lu
On Mon, Jun 24, 2019 at 6:37 AM Richard Biener  wrote:
>
> On Thu, 20 Jun 2019, Jan Hubicka wrote:
>
> > > > Currently, costs of moves are also used for costs of RTL expressions.   
> > > > This
> > > > patch:
> > > >
> > > > https://gcc.gnu.org/ml/gcc-patches/2018-02/msg00405.html
> > > >
> > > > includes:
> > > >
> > > > diff --git a/gcc/config/i386/x86-tune-costs.h 
> > > > b/gcc/config/i386/x86-tune-costs.h
> > > > index e943d13..8409a5f 100644
> > > > --- a/gcc/config/i386/x86-tune-costs.h
> > > > +++ b/gcc/config/i386/x86-tune-costs.h
> > > > @@ -1557,7 +1557,7 @@ struct processor_costs skylake_cost = {
> > > >{4, 4, 4}, /* cost of loading integer registers
> > > >  in QImode, HImode and SImode.
> > > >  Relative to reg-reg move (2).  */
> > > > -  {6, 6, 6}, /* cost of storing integer registers */
> > > > +  {6, 6, 3}, /* cost of storing integer registers */
> > > >2, /* cost of reg,reg fld/fst */
> > > >{6, 6, 8}, /* cost of loading fp registers
> > > >  in SFmode, DFmode and XFmode */
> >
> > Well, it seems that the patch was fixing things on wrong spot - the
> > tables are intended to be mostly latency based. I think we ought to
> > document divergences from these including benchmarks where the change
> > helped. Otherwise it is very hard to figure out why the entry does not
> > match the reality.
> > > >
> > > > It lowered the cost for SImode store and made it cheaper than 
> > > > SSE<->integer
> > > > register move.  It caused a regression:
> > > >
> > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90878
> > > >
> > > > Since the cost for SImode store is also used to compute scalar_store
> > > > in ix86_builtin_vectorization_cost, it changed loop costs in
> > > >
> > > > void
> > > > foo (long p2, long *diag, long d, long i)
> > > > {
> > > >   long k;
> > > >   k = p2 < 3 ? p2 + p2 : p2 + 3;
> > > >   while (i < k)
> > > > diag[i++] = d;
> > > > }
> > > >
> > > > As the result, the loop is unrolled 4 times with -O3 -march=skylake,
> > > > instead of 3.
> > > >
> > > > My patch separates costs of moves from costs of RTL expressions.  We 
> > > > have
> > > > a follow up patch which restores the cost for SImode store back to 6 
> > > > and leave
> > > > the cost of scalar_store unchanged.  It keeps loop unrolling unchanged 
> > > > and
> > > > improves powf performance in glibc by 30%.  We are collecting SPEC CPU 
> > > > 2017
> > > > data now.
> >
> > I have seen the problem with scalar_store with AMD tuning as well.
> > It seems to make SLP vectorizer to be happy about idea of turning
> > sequence of say integer tores into code which moves all the values into
> > AVX register and then does one vector store.
> >
> > The cost basically compare cost of N scalar stores to 1 scalar store +
> > vector construction. Vector construction then N*sse_op+addss.
> >
> > With testcase:
> >
> > short array[8];
> > test (short a,short b,short c,short d,short e,short f,short g,short h)
> > {
> >   array[0]=a;
> >   array[1]=b;
> >   array[2]=c;
> >   array[3]=d;
> >   array[4]=e;
> >   array[5]=f;
> >   array[6]=g;
> >   array[7]=h;
> > }
> > int iarray[8];
> > test2 (int a,int b,int c,int d,int e,int f,int g,int h)
> > {
> >   iarray[0]=a;
> >   iarray[1]=b;
> >   iarray[2]=c;
> >   iarray[3]=d;
> >   iarray[4]=e;
> >   iarray[5]=f;
> >   iarray[6]=g;
> >   iarray[7]=h;
> > }
> >
> > I get the following codegen:
> >
> >
> > test:
> > vmovd   %edi, %xmm0
> > vmovd   %edx, %xmm2
> > vmovd   %r8d, %xmm1
> > vmovd   8(%rsp), %xmm3
> > vpinsrw $1, 16(%rsp), %xmm3, %xmm3
> > vpinsrw $1, %esi, %xmm0, %xmm0
> > vpinsrw $1, %ecx, %xmm2, %xmm2
> > vpinsrw $1, %r9d, %xmm1, %xmm1
> > vpunpckldq  %xmm2, %xmm0, %xmm0
> > vpunpckldq  %xmm3, %xmm1, %xmm1
> > vpunpcklqdq %xmm1, %xmm0, %xmm0
> > vmovaps %xmm0, array(%rip)
> > ret
> >
> > test2:
> > vmovd   %r8d, %xmm5
> > vmovd   %edx, %xmm6
> > vmovd   %edi, %xmm7
> > vpinsrd $1, %r9d, %xmm5, %xmm1
> > vpinsrd $1, %ecx, %xmm6, %xmm3
> > vpinsrd $1, %esi, %xmm7, %xmm0
> > vpunpcklqdq %xmm3, %xmm0, %xmm0
> > vmovd   16(%rbp), %xmm4
> > vpinsrd $1, 24(%rbp), %xmm4, %xmm2
> > vpunpcklqdq %xmm2, %xmm1, %xmm1
> > vinserti128 $0x1, %xmm1, %ymm0, %ymm0
> > vmovdqu %ymm0, iarray(%rip)
> > vzeroupper
> >   ret
> >
> > which is about 20% slower on my skylake notebook than the
> > non-SLP-vectorized variant.
> >
> > I wonder if the vec_construct costs should be made more realistic.
> > It is computed as:
> >
> >   case vec_construct:
> > {
> >   /* N element inserts into SSE vectors.  */
> >   int cost = TYPE_VECTOR_SUBPARTS (vectype) * ix86_cost->sse_op;
> >   /* One vinserti128 for combining two SSE vectors for AVX256.  */
> >   if (GET_MODE_BITSIZE (mode) == 256)
> > 

Re: [SVE] [fwprop] PR88833 - Redundant moves for WHILELO-based loops

2019-06-24 Thread Prathamesh Kulkarni
On Mon, 24 Jun 2019 at 19:51, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > @@ -1415,6 +1460,19 @@ forward_propagate_into (df_ref use)
> >if (!def_set)
> >  return false;
> >
> > +  if (reg_prop_only
> > +  && !REG_P (SET_SRC (def_set))
> > +  && !REG_P (SET_DEST (def_set)))
> > +return false;
>
> This should be:
>
>   if (reg_prop_only
>   && (!REG_P (SET_SRC (def_set)) || !REG_P (SET_DEST (def_set
> return false;
>
> so that we return false if either operand isn't a register.
Oops, sorry about that  -:(
>
> > +
> > +  /* Allow propagations into a loop only for reg-to-reg copies, since
> > + replacing one register by another shouldn't increase the cost.  */
> > +
> > +  if (DF_REF_BB (def)->loop_father != DF_REF_BB (use)->loop_father
> > +  && !REG_P (SET_SRC (def_set))
> > +  && !REG_P (SET_DEST (def_set)))
> > +return false;
>
> Same here.
>
> OK with that change, thanks.
Thanks for the review, will make the changes and commit the patch
after re-testing.

Thanks,
Prathamesh
>
> Richard


Re: [PATCH 0/3] RFC: Let debug stmts influence codegen at -Og

2019-06-24 Thread Richard Sandiford
Segher Boessenkool  writes:
> On Mon, Jun 24, 2019 at 04:28:34PM +0200, Jakub Jelinek wrote:
>> On Sun, Jun 23, 2019 at 02:51:06PM +0100, Richard Sandiford wrote:
>> > What do you think?  Is it worth pursuing this further?
>> 
>> Wouldn't it be more useful to just force all automatic variables to be
>> used at the end of their corresponding scope?

This is what patch 2 does FWIW.

>> That is IMHO the main issue
>> with -Og debugging, VTA is a best effort, if we can express a variable with
>> some expression, nice, but if there is no expression nor memory nor register
>> that holds the value, we are out of luck.  Could be some magic stmt like
>> gimple_clobber or ifn or something similar, which would make sure that at
>> least until expansion to RTL we force those vars to be live in either a
>> register or memory.
>> I'm afraid having different modes, one in which debug stmts can't and one
>> where they can affect code generation might be a maintainance nightmare.

Very few places need to check for the mode explicitly.  Most code just
uses a new test instead of (rather than on top of) is_gimple_debug/
NONDEBUG_INSN_P.

The vast majority of the existing tests for is_gimple_debug aren't
interested in the debug contents of a debug stmt.  They're just
interested in whether the stmt is allowed to affect codegen.  With this
series that becomes a new predicate in its own right.  We then only use
is_gimple_debug if we want to process the contents of a debug insn in a
particular way, just like we only test for gimple calls or assignments
when we're interested in their contents.

Old habits die hard of course, so there'd definitely be a period while
people instinctively use is_gimple_debug/NONDEBUG_INSN_P instead of
the new test.

> This is pretty much exactly what USE in RTL is?  Maybe use a similar name
> in Gimple?

Yeah, I wondered about doing it that way, but one problem with USE is
that it isn't clear why the USE is there or what the ordering constraints
on it are.  Here we want the same rules as for normal debug stmts
(as modified by patch 1).

Also, if the final value ends up being constant, it's fine to propagate
that value into the debug stmt that marks the end of the scope.
I'm not sure it's valid to propagate constants into a use.

But the rtl side of patch 2 (using debug insns instead of USEs)
is only a small change.  The main part of it is on the gimple side.
Unless we prevent all user values from becoming gimple registers
and being rewritten into SSA, we'd still need something like the gimple
side of patch 2 to ensure that there's sufficient tracking of the variable.

Thanks,
Richard


Re: Use ODR for canonical types construction in LTO

2019-06-24 Thread Jan Hubicka
Hi,
thanks for comitting the patch!
> > > As
> > >
> > >   class a var;
> > >   class b:a {} *bptr;
> > >
> > >   var.foo;
> > >
> > > Expanding this as var.as_base_a.foo would make access path oracle to
> > > disambiguate it from bptr->as_base_b->as_base_a.foo which is wrong with
> > > gimple memory moel where we allow placement new replacing var by
> > > instance of b.
> 
> Why do we allow that?  I would expect that to only be allowed if a is
> something like aligned_storage, i.e. a thin wrapper around a char/byte
> buffer.

I think because Richard defined gimple memory model this way after fair
amount of frustration from placement news, stack slot sharing issues 
and non-conforming codebases :)

I think for normal user variables this is overly conservative.
At the moment TBAA is bit of a mess. Once it is cleaner up, we could
see if restricting this more pays back and then we would need to
find way to pass the info to middle-end (as it does not
know difference between aligned_storage and other stuff).

For dynamically allocated memory as well as for stack space after stack
slot sharing done in cfgexpand I see this is necessary since we do not
preserve any information about placement new.

Note that devirtualization machinery is bit more agressive than TBAA
model I am currently aiming for (for example assuming that user variable
of given type are not placement new replaced), but I think here we are
relatively safe because we do so only for non-POD types where
construction/destruction ought to be paired.
> 
> > Ick.  IIRC the as-base types were necessary only for
> > copying and clobber operations that may not touch the possibly
> > re-used tail-padding.
> 
> And temporarily during layout, yes.  This is all closely related to PR 22488.

I think this is what Richard reffers to the code generating clobber
statements that is only leaking as-base types to the middle-end visible
part of IL and the code in call.c copying base structures.
> 
> > Btw, I still wonder what the ODR says in the face of language
> > inter-operation and what this means here?  For C++ I suppose PODs
> > are not ODR?
> 
> The ODR applies to PODs just like other classes.  But the ODR says
> nothing about language interoperation, that's all
> implementation-defined.

My patchset considers all C++ types inter-operating with non-C++ types.
So first we load all types and do the following:

During streaming I populate ODR type hash with ODR types and canonical
type hash with types not originating from C++.

Once all types are in memory I do following:
 1) For every structure/union with linkage
- see if there is structurally equivalent non-C++ type in canonical
  type hash (where structural equivalence is defined in very
  generous way ignoring pointer types,  type tags and field names so
  interoperability with fortran is safe)

  if there is no mathing type and no detected ODR violation declare
  mark the type to be hanled by ODR name in step 2
 2) for every structure/union originating from C++ compute the canonical
type by canonical type hash query. If in 1) we decided that given
ODR type is unique the cnaonical type hash compare type by name
rather than by structure.
I do not handle enums since those conflicts with integer that is
declared in every translation unit.

So at this time basically every C++ type can inter-operate with non-C++.
I was thinking of relaxing this somewhat but wanted to see if C++
standard says something here. Things that may be sensible include:
 1) perhaps non-POD types especially those with vptr pointers do
not need to be inter-operable.
 2) anonymous namespace types
 3) types in namespace
Honza
> 
> Jason


Re: Start implementing -frounding-math

2019-06-24 Thread Szabolcs Nagy
On 22/06/2019 23:21, Marc Glisse wrote:
> We should care about the C standard, and do whatever makes sense for C++ 
> without expecting the C++ standard to tell us exactly what that is. We
> can check what visual studio and intel do, but we don't have to follow them.
> 
> -frounding-math is supposed to be equivalent to "#pragma stdc fenv_access on" 
> covering the whole program.

i think there are 4 settings that make sense:
(i think function level granularity is ok for
this, iso c has block scope granularity, gcc
has translation unit level granularity.)

(1) except flags + only caller observes it.
 i.e. exception flags raised during the execution
 of the function matter, but only the caller
 observes the flags by checking them.

(2) rounding mode + only caller changes it.
 i.e. rounding mode may not be the default during
 the execution of the function, but only the
 caller may change the rounding mode.

(3) except flags + anything may observe/unset it.
 i.e. exception flags raised during the execution
 of the function matter, and any call or inline
 asm may observe or unset them (unless the
 compiler can prove otherwise).

(4) rounding mode + anything may change it.
 i.e. rounding mode may not be the default or
 change during the execution of a function,
 and any call or inline asm may change it.

i think -frounding-math implements (2) fairly reliably,
and #pragma stdc fenv_access on requires (3) and (4).

-ftrapping-math was never clear, but it should
probably do (1) or (5) := (3)+"exceptions may trap".

so iso c has 2 levels: fenv access on/off, where
"on" means that essentially everything has to be
compiled with (3) and (4) (even functions that
don't do anything with fenv). this is not very
practical: most extern calls don't modify the fenv
so fp operations can be reordered around them,
(1) and (2) are more relaxed about this, however
that model needs fp barriers around the few calls
that actually does fenv access.

to me (1) + (2) + builtins for fp barriers seems
more useful than iso c (3) + (4), but iso c is
worth implementing too, since that's the standard.
so ideally there would be multiple flags/function
attributes and builtin barriers to make fenv access
usable in practice. (however not many things care
about fenv access so i don't know if that amount
of work is justifiable).

> For constant expressions, I see a difference between
> constexpr double third = 1. / 3.;
> which really needs to be done at compile time, and
> const double third = 1. / 3.;
> which will try to evaluate the rhs as constexpr, but where the program is 
> still valid if that fails. The second one clearly should refuse to be
> evaluated at compile time if we are specifying a dynamic rounding direction. 
> For the first one, I am not sure. I guess you should only write
> that in "fenv_access off" regions and I wouldn't mind a compile error.
iso c specifies rules for const expressions:
http://port70.net/~nsz/c/c11/n1570.html#F.8.4

static/thread storage duration is evaluated with
default rounding mode and no exceptions are signaled.

other initialization is evaluated at runtime.
(i.e. rounding-mode dependent result and
exception flags are observable).



Re: [PATCH][gcc] libgccjit: check result_type in gcc_jit_context_new_binary_op

2019-06-24 Thread David Malcolm
On Mon, 2019-06-24 at 15:30 +, Andrea Corallo wrote:
> Hi all,
> second version for this patch.
> Given the suggestion for the bit-field one I've tried to improve also
> here the error message.

Thanks.

> I've added a simple testcase as requested, here I'm trying to do
> *void=int+int.
> This without checking would normally crash verifying gimple.

Thanks.  FWIW, I think the testcase can be simplified slightly, in that
all that's needed is a bogus call to gcc_jit_context_new_binary_op, so
I don't think the testcase needs the calls to:
  gcc_jit_context_new_function,
  gcc_jit_function_new_block, and
  gcc_jit_block_end_with_return,
it just needs the types and the gcc_jit_context_new_binary_op call.

> More complex cases can be cause of crashes having the
> result type structures etc...
> 
> Tested with make check-jit
> OK for trunk?

Looks good as-is, or you may prefer to simplify the testcase.

Thanks for the patch.

BTW, I don't see you listed in the MAINTAINERS file; are you able to
commit patches yourself?

Dave

> Bests
>   Andrea
> 
> 2019-06-09  Andrea Corallo  andrea.cora...@arm.com
> 
> * libgccjit.c (gcc_jit_context_new_binary_op): Check result_type to
> be a
> numeric type.
> 
> 
> 2019-06-20  Andrea Corallo andrea.cora...@arm.com
> 
> * jit.dg/test-error-gcc_jit_context_new_binary_op-bad-res-type.c:
> New testcase.


[PATCH][AArch64] Remove constraint strings from define_expand constructs in the back end

2019-06-24 Thread Dennis Zhang
Hi,

A number of AArch64 define_expand patterns have specified constraints 
for their operands. But the constraint strings are ignored at expand 
time and are therefore redundant/useless. We now avoid specifying 
constraints in new define_expands, but we should clean up the existing 
define_expand definitions.

For example, the constraint "=w" is removed in the following case:
(define_expand "sqrt2"
   [(set (match_operand:GPF_F16 0 "register_operand" "=w")
The "" marks with an empty constraint in define_expand are removed as well.

The patch is tested with the build configuration of 
--target=aarch64-none-linux-gnu, and it passes gcc/testsuite.

Thanks
Dennis

gcc/ChangeLog:

2019-06-21  Dennis Zhang  

* config/aarch64/aarch64-simd.md: Remove redundant constraints
from define_expand.
* config/aarch64/aarch64-sve.md: Likewise.
* config/aarch64/aarch64.md: Likewise.
* config/aarch64/atomics.md: Likewise.
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index df8bf1d9778..837242c7e56 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -19,8 +19,8 @@
 ;; .
 
 (define_expand "mov"
-  [(set (match_operand:VALL_F16 0 "nonimmediate_operand" "")
-	(match_operand:VALL_F16 1 "general_operand" ""))]
+  [(set (match_operand:VALL_F16 0 "nonimmediate_operand")
+	(match_operand:VALL_F16 1 "general_operand"))]
   "TARGET_SIMD"
   "
   /* Force the operand into a register if it is not an
@@ -39,8 +39,8 @@
 )
 
 (define_expand "movmisalign"
-  [(set (match_operand:VALL 0 "nonimmediate_operand" "")
-(match_operand:VALL 1 "general_operand" ""))]
+  [(set (match_operand:VALL 0 "nonimmediate_operand")
+(match_operand:VALL 1 "general_operand"))]
   "TARGET_SIMD"
 {
   /* This pattern is not permitted to fail during expansion: if both arguments
@@ -652,8 +652,8 @@
   [(set_attr "type" "neon_fp_rsqrts_")])
 
 (define_expand "rsqrt2"
-  [(set (match_operand:VALLF 0 "register_operand" "=w")
-	(unspec:VALLF [(match_operand:VALLF 1 "register_operand" "w")]
+  [(set (match_operand:VALLF 0 "register_operand")
+	(unspec:VALLF [(match_operand:VALLF 1 "register_operand")]
 		 UNSPEC_RSQRT))]
   "TARGET_SIMD"
 {
@@ -1025,9 +1025,9 @@
 )
 
 (define_expand "ashl3"
-  [(match_operand:VDQ_I 0 "register_operand" "")
-   (match_operand:VDQ_I 1 "register_operand" "")
-   (match_operand:SI  2 "general_operand" "")]
+  [(match_operand:VDQ_I 0 "register_operand")
+   (match_operand:VDQ_I 1 "register_operand")
+   (match_operand:SI  2 "general_operand")]
  "TARGET_SIMD"
 {
   int bit_width = GET_MODE_UNIT_SIZE (mode) * BITS_PER_UNIT;
@@ -1072,9 +1072,9 @@
 )
 
 (define_expand "lshr3"
-  [(match_operand:VDQ_I 0 "register_operand" "")
-   (match_operand:VDQ_I 1 "register_operand" "")
-   (match_operand:SI  2 "general_operand" "")]
+  [(match_operand:VDQ_I 0 "register_operand")
+   (match_operand:VDQ_I 1 "register_operand")
+   (match_operand:SI  2 "general_operand")]
  "TARGET_SIMD"
 {
   int bit_width = GET_MODE_UNIT_SIZE (mode) * BITS_PER_UNIT;
@@ -1119,9 +1119,9 @@
 )
 
 (define_expand "ashr3"
-  [(match_operand:VDQ_I 0 "register_operand" "")
-   (match_operand:VDQ_I 1 "register_operand" "")
-   (match_operand:SI  2 "general_operand" "")]
+  [(match_operand:VDQ_I 0 "register_operand")
+   (match_operand:VDQ_I 1 "register_operand")
+   (match_operand:SI  2 "general_operand")]
  "TARGET_SIMD"
 {
   int bit_width = GET_MODE_UNIT_SIZE (mode) * BITS_PER_UNIT;
@@ -1166,9 +1166,9 @@
 )
 
 (define_expand "vashl3"
- [(match_operand:VDQ_I 0 "register_operand" "")
-  (match_operand:VDQ_I 1 "register_operand" "")
-  (match_operand:VDQ_I 2 "register_operand" "")]
+ [(match_operand:VDQ_I 0 "register_operand")
+  (match_operand:VDQ_I 1 "register_operand")
+  (match_operand:VDQ_I 2 "register_operand")]
  "TARGET_SIMD"
 {
   emit_insn (gen_aarch64_simd_reg_sshl (operands[0], operands[1],
@@ -1180,9 +1180,9 @@
 ;; Negating individual lanes most certainly offsets the
 ;; gain from vectorization.
 (define_expand "vashr3"
- [(match_operand:VDQ_BHSI 0 "register_operand" "")
-  (match_operand:VDQ_BHSI 1 "register_operand" "")
-  (match_operand:VDQ_BHSI 2 "register_operand" "")]
+ [(match_operand:VDQ_BHSI 0 "register_operand")
+  (match_operand:VDQ_BHSI 1 "register_operand")
+  (match_operand:VDQ_BHSI 2 "register_operand")]
  "TARGET_SIMD"
 {
   rtx neg = gen_reg_rtx (mode);
@@ -1194,9 +1194,9 @@
 
 ;; DI vector shift
 (define_expand "aarch64_ashr_simddi"
-  [(match_operand:DI 0 "register_operand" "=w")
-   (match_operand:DI 1 "register_operand" "w")
-   (match_operand:SI 2 "aarch64_shift_imm64_di" "")]
+  [(match_operand:DI 0 "register_operand")
+   (match_operand:DI 1 "register_operand")
+   (match_operand:SI 2 "aarch64_shift_imm64_di")]
   "TARGET_SIMD"
   {
 /* An arithmetic shift right by 64 fills the result with copies of the sign
@@ -1210,9 +1210,9 @@
 )
 
 (define_expand "vlshr3"
- [(match_operand:

Re: [PATCH][gcc] libgccjit: check result_type in gcc_jit_context_new_binary_op

2019-06-24 Thread Andrea Corallo
Hi all,
second version for this patch.
Given the suggestion for the bit-field one I've tried to improve also
here the error message.
I've added a simple testcase as requested, here I'm trying to do
*void=int+int.
This without checking would normally crash verifying gimple.
More complex cases can be cause of crashes having the
result type structures etc...

Tested with make check-jit
OK for trunk?

Bests
  Andrea

2019-06-09  Andrea Corallo  andrea.cora...@arm.com

* libgccjit.c (gcc_jit_context_new_binary_op): Check result_type to be a
numeric type.


2019-06-20  Andrea Corallo andrea.cora...@arm.com

* jit.dg/test-error-gcc_jit_context_new_binary_op-bad-res-type.c:
New testcase.
diff --git a/gcc/jit/libgccjit.c b/gcc/jit/libgccjit.c
index e4f17f8..3507d0b 100644
--- a/gcc/jit/libgccjit.c
+++ b/gcc/jit/libgccjit.c
@@ -1345,6 +1345,12 @@ gcc_jit_context_new_binary_op (gcc_jit_context *ctxt,
 a->get_type ()->get_debug_string (),
 b->get_debug_string (),
 b->get_type ()->get_debug_string ());
+  RETURN_NULL_IF_FAIL_PRINTF4 (
+result_type->is_numeric (), ctxt, loc,
+"gcc_jit_binary_op %i with operands a: %s b: %s "
+"has non numeric result_type: %s",
+op, a->get_debug_string (), b->get_debug_string (),
+result_type->get_debug_string ());
 
   return (gcc_jit_rvalue *)ctxt->new_binary_op (loc, op, result_type, a, b);
 }
diff --git a/gcc/testsuite/jit.dg/test-error-gcc_jit_context_new_binary_op-bad-res-type.c b/gcc/testsuite/jit.dg/test-error-gcc_jit_context_new_binary_op-bad-res-type.c
new file mode 100644
index 000..1addc67
--- /dev/null
+++ b/gcc/testsuite/jit.dg/test-error-gcc_jit_context_new_binary_op-bad-res-type.c
@@ -0,0 +1,52 @@
+#include 
+#include 
+
+#include "libgccjit.h"
+
+#include "harness.h"
+
+/* Try to create a binary operator with invalid result type.  */
+
+void
+create_code (gcc_jit_context *ctxt, void *user_data)
+{
+  gcc_jit_type *int_type =
+gcc_jit_context_get_type (ctxt, GCC_JIT_TYPE_INT);
+  gcc_jit_type *void_ptr_type =
+gcc_jit_context_get_type (ctxt, GCC_JIT_TYPE_VOID_PTR);
+
+  gcc_jit_function *func =
+gcc_jit_context_new_function (ctxt, NULL,
+  GCC_JIT_FUNCTION_EXPORTED,
+  void_ptr_type,
+  "foo_func",
+  0, NULL, 0);
+  gcc_jit_block *block = gcc_jit_function_new_block (func, NULL);
+  gcc_jit_block_end_with_return (
+block,
+NULL,
+gcc_jit_context_new_binary_op (
+  ctxt,
+  NULL,
+  GCC_JIT_BINARY_OP_MINUS,
+  void_ptr_type,
+  gcc_jit_context_new_rvalue_from_int (ctxt,
+	   int_type,
+	   1),
+  gcc_jit_context_new_rvalue_from_int (ctxt,
+	   int_type,
+	   2)));
+
+}
+
+void
+verify_code (gcc_jit_context *ctxt, gcc_jit_result *result)
+{
+  CHECK_VALUE (result, NULL);
+
+  /* Verify that the correct error message was emitted.	 */
+  CHECK_STRING_VALUE (gcc_jit_context_get_first_error (ctxt),
+		  "gcc_jit_context_new_binary_op: gcc_jit_binary_op 1 with"
+		  " operands a: (int)1 b: (int)2 has non numeric "
+		  "result_type: void *");
+}


Re: [PATCH][gcc] libgccjit: add bitfield support

2019-06-24 Thread Andrea Corallo
Hi all,
second version here of the gcc_jit_context_new_bitfield patch addressing
review comments.

Checked with make check-jit runs clean.

Bests

Andrea

2019-06-20  Andrea Corallo andrea.cora...@arm.com

* docs/topics/compatibility.rst (LIBGCCJIT_ABI_12): New ABI tag.
* docs/topics/types.rst: Add gcc_jit_context_new_bitfield.
* jit-common.h (namespace recording): Add class bitfield.
* jit-playback.c:
(DECL_C_BIT_FIELD, SET_DECL_C_BIT_FIELD): Add macros.
(playback::context::new_bitfield): New method.
(playback::compound_type::set_fields): Add bitfield support.
(playback::lvalue::mark_addressable): Was jit_mark_addressable make this
a method of lvalue plus return a bool to communicate success.
(playback::lvalue::get_address): Check for jit_mark_addressable return
value.
* jit-playback.h (new_bitfield): New method.
(class bitfield): New class.
(class lvalue): Add jit_mark_addressable method.
* jit-recording.c (recording::context::new_bitfield): New method.
(recording::bitfield::replay_into): New method.
(recording::bitfield::write_to_dump): Likewise.
(recording::bitfield::make_debug_string): Likewise.
(recording::bitfield::write_reproducer): Likewise.
* jit-recording.h (class context): Add new_bitfield method.
(class field): Make it derivable by class bitfield.
(class bitfield): Add new class.
* libgccjit++.h (class context): Add new_bitfield method.
* libgccjit.c (struct gcc_jit_bitfield): New structure.
(gcc_jit_context_new_bitfield): New function.
* libgccjit.h
(LIBGCCJIT_HAVE_gcc_jit_context_new_bitfield) New macro.
(gcc_jit_context_new_bitfield): New function.
* libgccjit.map (LIBGCCJIT_ABI_12) New ABI tag.


2019-06-20  Andrea Corallo andrea.cora...@arm.com

* jit.dg/all-non-failing-tests.h: Add test-accessing-bitfield.c.
* jit.dg/test-accessing-bitfield.c: New testcase.
* jit.dg/test-error-gcc_jit_context_new_bitfield-invalid-type.c:
Likewise.
* jit.dg/test-error-gcc_jit_context_new_bitfield-invalid-width.c:
Likewise.
* jit.dg/test-error-gcc_jit_lvalue_get_address-bitfield.c:
Likewise.
diff --git a/gcc/jit/docs/topics/compatibility.rst b/gcc/jit/docs/topics/compatibility.rst
index abefa56..da64920 100644
--- a/gcc/jit/docs/topics/compatibility.rst
+++ b/gcc/jit/docs/topics/compatibility.rst
@@ -177,3 +177,8 @@ entrypoints:
 
 ``LIBGCCJIT_ABI_11`` covers the addition of
 :func:`gcc_jit_context_add_driver_option`
+
+``LIBGCCJIT_ABI_12``
+
+``LIBGCCJIT_ABI_12`` covers the addition of
+:func:`gcc_jit_context_new_bitfield`
diff --git a/gcc/jit/docs/topics/types.rst b/gcc/jit/docs/topics/types.rst
index 1d2dcd4..37d9d01 100644
--- a/gcc/jit/docs/topics/types.rst
+++ b/gcc/jit/docs/topics/types.rst
@@ -247,6 +247,30 @@ You can model C `struct` types by creating :c:type:`gcc_jit_struct *` and
underlying string, so it is valid to pass in a pointer to an on-stack
buffer.
 
+.. function:: gcc_jit_field *\
+  gcc_jit_context_new_bitfield (gcc_jit_context *ctxt,\
+gcc_jit_location *loc,\
+gcc_jit_type *type,\
+int width,\
+const char *name)
+
+   Construct a new bit field, with the given type width and name.
+
+   The parameter ``name`` must be non-NULL.  The call takes a copy of the
+   underlying string, so it is valid to pass in a pointer to an on-stack
+   buffer.
+
+   The parameter ``type`` must be an integer type.
+
+   The parameter ``width`` must be a positive integer that does not exceed the
+   size of ``type``.
+
+   This API entrypoint was added in :ref:`LIBGCCJIT_ABI_12`; you can test
+   for its presence using
+   .. code-block:: c
+
+  #ifdef LIBGCCJIT_HAVE_gcc_jit_context_new_bitfield
+
 .. function:: gcc_jit_object *\
   gcc_jit_field_as_object (gcc_jit_field *field)
 
diff --git a/gcc/jit/jit-common.h b/gcc/jit/jit-common.h
index 1d96cc3..e747d96 100644
--- a/gcc/jit/jit-common.h
+++ b/gcc/jit/jit-common.h
@@ -119,6 +119,7 @@ namespace recording {
 	class union_;
   class vector_type;
 class field;
+  class bitfield;
 class fields;
 class function;
 class block;
diff --git a/gcc/jit/jit-playback.h b/gcc/jit/jit-playback.h
index bc4de9c..d4b148e 100644
--- a/gcc/jit/jit-playback.h
+++ b/gcc/jit/jit-playback.h
@@ -75,6 +75,12 @@ public:
 	 type *type,
 	 const char *name);
 
+  field *
+  new_bitfield (location *loc,
+		type *type,
+		int width,
+		const char *name);
+
   compound_type *
   new_compound_type (location *loc,
 		 const char *name,
@@ -426,6 +432,8 @@ private:
   tree m_inner;
 };
 
+class bitfield : public field {};
+
 class function : public wrapper
 {
 public:
@@ -614,6 +622,8 @@ public:
   rvalue *
   get_address (location *loc);
 
+private:
+  bool mark_addressable (location *loc);
 };
 
 class param : public lvalue
@@ -703,4 +713,3 @@ extern playback::context *active_playback_ctxt;
 } // namesp

Re: [PATCH] some more -Wformat-diag cleanup

2019-06-24 Thread Jeff Law
On 6/23/19 6:00 PM, Martin Sebor wrote:
> The attached patch cleans up a number of outstanding -Wformat-diag
> instances.  I plan to commit it tomorrow.
> 
> With it applied, an x86_64-linux bootstrap shows just 79 unique
> instances of the warning originating in 17 files.  49 of those are
> in the Go front-end that Ian is already dealing with.  I will work
> on the rest.
> 
> Martin
> 
> gcc-wformat-diag-cleanup.diff
> 
> gcc/ada/ChangeLog:
> 
>   * gcc-interface/utils.c (handle_nonnull_attribute): Quote attribute
>   name.
> 
> gcc/c/ChangeLog:
> 
>   * c-typeck.c (build_binary_op): Hyphenate floating-point.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/Wfloat-equal-1.c: Adjust text of expected diagnostic.
>   * gcc.dg/misc-column.c: Ditto.
> 
> gcc/ChangeLog:
> 
>   * tree-pretty-print.h: Remove unnecessary punctuation characters
>   from a diagnostic.
>   * tree-ssa.c (release_defs_bitset): Correct preprocessor conditional.
OK
jeff


[PATCH][Arm] Remove constraint strings from define_expand constructs in the back end

2019-06-24 Thread Dennis Zhang
Hi,

A number of Arm define_expand patterns have specified constraints for 
their operands. But the constraint strings are ignored at expand time 
and are therefore redundant/useless. We now avoid specifying constraints 
in new define_expands, but we should clean up the existing define_expand 
definitions.

For example, the constraint "=r" is removed in the following case:
(define_expand "reload_inhi"
     [(parallel [(match_operand:HI 0 "s_register_operand" "=r")
The "" marks with an empty constraint in define_expand are removed as well.

The patch is tested with the build configuration of 
--target=arm-linux-gnueabi and it passes gcc/testsuite.

Thanks,
Dennis

gcc/ChangeLog:

2019-06-21  Dennis Zhang  

       * config/arm/arm-fixed.md: Remove redundant constraints from
       define_expand.
       * config/arm/arm.md: Likewise.
       * config/arm/iwmmxt.md: Likewise.
       * config/arm/neon.md: Likewise.
       * config/arm/sync.md: Likewise.
       * config/arm/thumb1.md: Likewise.
       * config/arm/vec-common.md: Likewise.

diff --git a/gcc/config/arm/arm-fixed.md b/gcc/config/arm/arm-fixed.md
index 6534ed41488..fcab40d13f6 100644
--- a/gcc/config/arm/arm-fixed.md
+++ b/gcc/config/arm/arm-fixed.md
@@ -98,9 +98,9 @@
 ; Note: none of these do any rounding.
 
 (define_expand "mulqq3"
-  [(set (match_operand:QQ 0 "s_register_operand" "")
-	(mult:QQ (match_operand:QQ 1 "s_register_operand" "")
-		 (match_operand:QQ 2 "s_register_operand" "")))]
+  [(set (match_operand:QQ 0 "s_register_operand")
+	(mult:QQ (match_operand:QQ 1 "s_register_operand")
+		 (match_operand:QQ 2 "s_register_operand")))]
   "TARGET_DSP_MULTIPLY && arm_arch_thumb2"
 {
   rtx tmp1 = gen_reg_rtx (HImode);
@@ -116,9 +116,9 @@
 })
 
 (define_expand "mulhq3"
-  [(set (match_operand:HQ 0 "s_register_operand" "")
-	(mult:HQ (match_operand:HQ 1 "s_register_operand" "")
-		 (match_operand:HQ 2 "s_register_operand" "")))]
+  [(set (match_operand:HQ 0 "s_register_operand")
+	(mult:HQ (match_operand:HQ 1 "s_register_operand")
+		 (match_operand:HQ 2 "s_register_operand")))]
   "TARGET_DSP_MULTIPLY && arm_arch_thumb2"
 {
   rtx tmp = gen_reg_rtx (SImode);
@@ -134,9 +134,9 @@
 })
 
 (define_expand "mulsq3"
-  [(set (match_operand:SQ 0 "s_register_operand" "")
-	(mult:SQ (match_operand:SQ 1 "s_register_operand" "")
-		 (match_operand:SQ 2 "s_register_operand" "")))]
+  [(set (match_operand:SQ 0 "s_register_operand")
+	(mult:SQ (match_operand:SQ 1 "s_register_operand")
+		 (match_operand:SQ 2 "s_register_operand")))]
   "TARGET_32BIT"
 {
   rtx tmp1 = gen_reg_rtx (DImode);
@@ -156,9 +156,9 @@
 ;; Accumulator multiplies.
 
 (define_expand "mulsa3"
-  [(set (match_operand:SA 0 "s_register_operand" "")
-	(mult:SA (match_operand:SA 1 "s_register_operand" "")
-		 (match_operand:SA 2 "s_register_operand" "")))]
+  [(set (match_operand:SA 0 "s_register_operand")
+	(mult:SA (match_operand:SA 1 "s_register_operand")
+		 (match_operand:SA 2 "s_register_operand")))]
   "TARGET_32BIT"
 {
   rtx tmp1 = gen_reg_rtx (DImode);
@@ -175,9 +175,9 @@
 })
 
 (define_expand "mulusa3"
-  [(set (match_operand:USA 0 "s_register_operand" "")
-	(mult:USA (match_operand:USA 1 "s_register_operand" "")
-		  (match_operand:USA 2 "s_register_operand" "")))]
+  [(set (match_operand:USA 0 "s_register_operand")
+	(mult:USA (match_operand:USA 1 "s_register_operand")
+		  (match_operand:USA 2 "s_register_operand")))]
   "TARGET_32BIT"
 {
   rtx tmp1 = gen_reg_rtx (DImode);
@@ -317,9 +317,9 @@
 		  (const_int 32)))])
 
 (define_expand "mulha3"
-  [(set (match_operand:HA 0 "s_register_operand" "")
-	(mult:HA (match_operand:HA 1 "s_register_operand" "")
-		 (match_operand:HA 2 "s_register_operand" "")))]
+  [(set (match_operand:HA 0 "s_register_operand")
+	(mult:HA (match_operand:HA 1 "s_register_operand")
+		 (match_operand:HA 2 "s_register_operand")))]
   "TARGET_DSP_MULTIPLY && arm_arch_thumb2"
 {
   rtx tmp = gen_reg_rtx (SImode);
@@ -333,9 +333,9 @@
 })
 
 (define_expand "muluha3"
-  [(set (match_operand:UHA 0 "s_register_operand" "")
-	(mult:UHA (match_operand:UHA 1 "s_register_operand" "")
-		  (match_operand:UHA 2 "s_register_operand" "")))]
+  [(set (match_operand:UHA 0 "s_register_operand")
+	(mult:UHA (match_operand:UHA 1 "s_register_operand")
+		  (match_operand:UHA 2 "s_register_operand")))]
   "TARGET_DSP_MULTIPLY"
 {
   rtx tmp1 = gen_reg_rtx (SImode);
@@ -353,9 +353,9 @@
 })
 
 (define_expand "ssmulha3"
-  [(set (match_operand:HA 0 "s_register_operand" "")
-	(ss_mult:HA (match_operand:HA 1 "s_register_operand" "")
-		(match_operand:HA 2 "s_register_operand" "")))]
+  [(set (match_operand:HA 0 "s_register_operand")
+	(ss_mult:HA (match_operand:HA 1 "s_register_operand")
+		(match_operand:HA 2 "s_register_operand")))]
   "TARGET_32BIT && TARGET_DSP_MULTIPLY && arm_arch6"
 {
   rtx tmp = gen_reg_rtx (SImode);
@@ -373,9 +373,9 @@
 })
 
 (define_expand "usmuluha3"
-  [(set (match_operand:UHA 0 "s_register_operand" "")
-	(us_mult:UHA (match_operand:UHA 1 "s_regi

Re: Start implementing -frounding-math

2019-06-24 Thread Marc Glisse

On Mon, 24 Jun 2019, Richard Biener wrote:


-frounding-math is supposed to be equivalent to "#pragma stdc fenv_access
on" covering the whole program.

For constant expressions, I see a difference between
constexpr double third = 1. / 3.;
which really needs to be done at compile time, and
const double third = 1. / 3.;
which will try to evaluate the rhs as constexpr, but where the program is
still valid if that fails. The second one clearly should refuse to be
evaluated at compile time if we are specifying a dynamic rounding
direction. For the first one, I am not sure. I guess you should only write
that in "fenv_access off" regions and I wouldn't mind a compile error.

Note that C2x adds a pragma fenv_round that specifies a rounding direction
for a region of code, which seems relevant for constant expressions. That
pragma looks hard, but maybe some pieces would be nice to add.


Hmm.  My thinking was along the line that at the start of main() the
C abstract machine might specify the initial rounding mode (and exception
state) is implementation defined and all constant expressions are evaluated
whilst being in this state.  So we can define that to round-to-nearest and
simply fold all constants in contexts we are allowed to evaluate at
compile-time as we see them?


There are way too many such contexts. In C++, any initializer is
constexpr-evaluated if possible (PR 85746 shows that this is bad for
__builtin_constant_p), and I do want
double d = 1. / 3;
to depend on the dynamic rounding direction. I'd rather err on the other
extreme and only fold when we are forced to, say
constexpr double d = 1. / 3;
or even reject it because it is inexact, if pragmas put us in a region
with dynamic rounding.


OK, fair enough.  I just hoped that global

double x = 1.0/3.0;

do not become runtime initializers with -frounding-math ...


Ah, I wasn't thinking of globals. Ignoring the new pragma fenv_round, 
which I guess could affect this (the C draft isn't very explicit), the 
program doesn't have many chances to set a rounding mode before 
initializing globals. It could do so in the initializer of another 
variable, but relying on the order of initialization this way seems bad. 
Maybe in this case it would make sense to assume the default rounding 
mode...


In practice, I would only set -frounding-math on a per function basis
(possibly using pragma fenv_access), so the optimization of what happens
to globals doesn't seem so important.


Side remark, I am sad that Intel added rounded versions for scalars and
512 bit vectors but not for intermediate sizes, while I am most
interested in 128 bits. Masking most of the 512 bits still causes the
dreaded clock slow-down.


Ick.  I thought this was vector-length agnostic...


I think all of the new stuff in AVX512 is, except rounding...

Also, the rounded functions have exceptions disabled, which may make
them hard to use with fenv_access.


I guess builtins need the same treatment for -ftrapping-math as they
do for -frounding-math.  I think you already mentioned the default
of this flag doesn't make much sense (well, the flag isn't fully
honored/implemented).


PR 54192
(coincidentally, it caused a missed vectorization in
https://stackoverflow.com/a/56681744/1918193 last week)


I commented there.  Lets just make -frounding-math == FENV_ACCESS ON
and keep -ftrapping-math as whether FP exceptions raise traps.


One issue is that the C pragmas do not let me convey that I am interested 
in dynamic rounding but not exception flags. It is possible to optimize 
quite a bit more with just rounding. In particular, the functions are pure 
(at some point we will have to teach the compiler the difference between 
the FP environment and general memory, but I'd rather wait).



Yeah.  Auto-vectorizing would also need adjustment of course (also
costing like estimate_num_insns or others).


Anything that is only about optimizing the code in -frounding-math
functions can wait, that's the good point of implementing a new feature.

--
Marc Glisse


Re: Use ODR for canonical types construction in LTO

2019-06-24 Thread Jason Merrill
On Mon, Jun 24, 2019 at 7:28 AM Richard Biener  wrote:
> On Mon, 24 Jun 2019, Jan Hubicka wrote:
>
> > > > This simple (untested) patch doesn't avoid creating the unnecessary
> > > > as-base types, but it should avoid using them in a way that causes
> > > > them to be streamed, and should let them be discarded by GC.
> > > > Thoughts?
> > >
> > > Looks better than Honzas patch fixing a single place.
> >
> > I wonder if we can go ahead with Jason's patch to handle the common
> > case.
>
> I hope so - Jason?

Committed.

> > > I've spent some thoughts on this and I wonder whether we can
> > > re-implement classtype-as-base with fake inheritance (which would
> > > also solve the TBAA alias set issue in a natural way).  That is,
> > > we'd lay out structs as-base and make instances of it use a
> > >
> > > class as-instance { as-base b; X pad1; Y pad2; };
> > >
> > > with either explicit padding fields or with implicit ones
> > > (I didn't check how we trick stor-layout to not pad the as-base
> > > type to its natural alignment...).
> > >
> > > I realize that this impacts all code building component-refs ontop
> > > of as-instance typed objects so this might rule out this approach
> > > completely - but maybe that's reasonably well abstracted into common
> > > code so only few places need adjustments.
> >
> > Modulo the empty virtual bases which I have no understnading to I
> > suppose this should work.
> >
> > One issue is that we will need to introduce view_convert_exprs at some
> > times.
> >
> > As
> >
> >   class a var;
> >   class b:a {} *bptr;
> >
> >   var.foo;
> >
> > Expanding this as var.as_base_a.foo would make access path oracle to
> > disambiguate it from bptr->as_base_b->as_base_a.foo which is wrong with
> > gimple memory moel where we allow placement new replacing var by
> > instance of b.

Why do we allow that?  I would expect that to only be allowed if a is
something like aligned_storage, i.e. a thin wrapper around a char/byte
buffer.

> Ick.  IIRC the as-base types were necessary only for
> copying and clobber operations that may not touch the possibly
> re-used tail-padding.

And temporarily during layout, yes.  This is all closely related to PR 22488.

> Btw, I still wonder what the ODR says in the face of language
> inter-operation and what this means here?  For C++ I suppose PODs
> are not ODR?

The ODR applies to PODs just like other classes.  But the ODR says
nothing about language interoperation, that's all
implementation-defined.

Jason


Re: [PATCH 0/3] RFC: Let debug stmts influence codegen at -Og

2019-06-24 Thread Segher Boessenkool
On Mon, Jun 24, 2019 at 04:28:34PM +0200, Jakub Jelinek wrote:
> On Sun, Jun 23, 2019 at 02:51:06PM +0100, Richard Sandiford wrote:
> > What do you think?  Is it worth pursuing this further?
> 
> Wouldn't it be more useful to just force all automatic variables to be
> used at the end of their corresponding scope?  That is IMHO the main issue
> with -Og debugging, VTA is a best effort, if we can express a variable with
> some expression, nice, but if there is no expression nor memory nor register
> that holds the value, we are out of luck.  Could be some magic stmt like
> gimple_clobber or ifn or something similar, which would make sure that at
> least until expansion to RTL we force those vars to be live in either a
> register or memory.
> I'm afraid having different modes, one in which debug stmts can't and one
> where they can affect code generation might be a maintainance nightmare.

This is pretty much exactly what USE in RTL is?  Maybe use a similar name
in Gimple?


Segher


Re: [PATCH 5/5] Use ira_setup_alts for conflict detection

2019-06-24 Thread Vladimir Makarov



On 2019-06-21 9:43 a.m., Richard Sandiford wrote:

make_early_clobber_and_input_conflicts records allocno conflicts
between inputs and earlyclobber outputs.  It (rightly) avoids
doing this for inputs that are explicitly allowed to match the
output due to matching constraints.

The problem is that whether this matching is allowed varies
between alternatives.  At the moment the code avoids adding
a clobber if *any* enabled alternative allows the match,
even if some other operand makes that alternative impossible.

The specific instance of this for SVE is that some alternatives
allow matched earlyclobbers when a third operand X is constant zero.
We should avoid adding conflicts when X really is constant zero,
but should ignore the match if X is nonzero or nonconstant.

ira_setup_alts can already filter these alternatives out for us,
so all we need to do is use it in process_bb_node_lives.  The
preferred_alternatives variable is only used for this earlyclobber
detection, so no other check should be affected.

With the previous patch to check the reject weight in ira_setup_alts,
this has the effect of ignoring expensive alternatives if we have
other valid alternatives with zero cost.  It seems reasonable to base
the heuristic on only the alternatives that we'd actually like to use,
but if this ends up being too aggressive, we could instead make the new
reject behaviour conditional and only use it for add_insn_allocno_copies.


This patch definitely improves the heuristics.

The only missed part is a comment for preferred_alternatives

/* The value of get_preferred_alternatives for the current instruction,
   supplemental to recog_data.  */
static alternative_mask preferred_alternatives;

The comment becomes misleading after the patch.

With changing the comment, the patch is ok for me.

Richard, thank you for the patches improving the RA.


2019-06-21  Richard Sandiford  

gcc/
* ira-lives.c (process_bb_node_lives): Use ira_setup_alts.

Index: gcc/ira-lives.c
===
--- gcc/ira-lives.c 2019-05-29 10:49:39.512701998 +0100
+++ gcc/ira-lives.c 2019-06-21 14:34:19.071605402 +0100
@@ -1236,9 +1236,7 @@ process_bb_node_lives (ira_loop_tree_nod
  }
  }
  
-	  extract_insn (insn);

- preferred_alternatives = get_preferred_alternatives (insn);
- preprocess_constraints (insn);
+ preferred_alternatives = ira_setup_alts (insn);
  process_single_reg_class_operands (false, freq);
  
  	  if (call_p)


Re: [PATCH 4/5] Allow earlyclobbers in ira_get_dup_out_num

2019-06-24 Thread Vladimir Makarov



On 2019-06-21 9:42 a.m., Richard Sandiford wrote:

ira_get_dup_out_num punted on operands that are matched to
earlyclobber outputs:

/* It is better ignore an alternative with early clobber.  */
else if (*str == '&')
  goto fail;

But I'm not sure why this is the right thing to do.  At this stage
we've established that *all* alternatives of interest require the
input to match the output, so

(a) the earlyclobber can only affect other operands and
(b) not tying the registers is bound to introduce a move

The code was part of the initial commit and so isn't obviously
related to a specific testcase.  Also, I can imagine LRA makes
a much better job of this situation than reload did.  (Certainly
SVE uses matched earlyclobbers extensively and I haven't seen any
problems.)
I don't remember the reason for this either as the code is more 10 years 
old.  I can only speculate that this approach was influenced by the old 
RA (global.c).  I remember that it processed earlyclobber registers too 
but may be in a different way.

In case this turns out to regress something important: the main
case that matters for SVE is the one in which all alternatives
are earlyclobber.


OK to commit.

2019-06-21  Richard Sandiford  

gcc/
* ira.c (ira_get_dup_out_num): Don't punt for earlyclobbers.
Use recog_data to test for an output operand.

Index: gcc/ira.c
===
--- gcc/ira.c   2019-06-21 14:34:13.239653892 +0100
+++ gcc/ira.c   2019-06-21 14:34:15.947631377 +0100
@@ -1999,26 +1999,8 @@ ira_get_dup_out_num (int op_num, alterna
}
if (original == -1)
goto fail;
-  dup = -1;
-  for (ignore_p = false, str = recog_data.constraints[original - '0'];
-  *str != 0;
-  str++)
-   if (ignore_p)
- {
-   if (*str == ',')
- ignore_p = false;
- }
-   else if (*str == '#')
- ignore_p = true;
-   else if (! ignore_p)
- {
-   if (*str == '=')
- dup = original - '0';
-   /* It is better ignore an alternative with early clobber.  */
-   else if (*str == '&')
- goto fail;
- }
-  if (dup >= 0)
+  dup = original - '0';
+  if (recog_data.operand_type[dup] == OP_OUT)
return dup;
  fail:
if (use_commut_op_p)


Re: [PATCH 3/5] Make ira_get_dup_out_num handle more cases

2019-06-24 Thread Vladimir Makarov



On 2019-06-21 9:42 a.m., Richard Sandiford wrote:

SVE has a prefix instruction (MOVPRFX) that acts as a move but is
designed to be easily fusible with the following instruction.  The SVE
port therefore has lots of patterns with constraints of the form:

   A: operand 0: =w,?w
  ...
  operand n:  0, w

where the first alternative is a single instruction and the second
alternative uses MOVPRFX.

Ideally we want operand n to be allocated to the same register as
operand 0 in this case.

add_insn_allocno_copies is the main IRA routine that deals with tied
operands.  It is (rightly) very conservative, and only handles cases in
which we're confident about saving a full move.  So for a pattern like:

   B: operand 0: =w,w
  ...
  operand n:  0,w

we don't (and shouldn't) assume that tying operands 0 and n would
save the cost of a move.

But in A, the second alternative has a ? marker, which makes it more
expensive than the first alternative by a full reload.  So I think for
copy elision we should ignore the untied operand n in the second
alternative of A.

One approach would be to add '*' markers to each pattern and make
ira_get_dup_out_num honour them.  But I think the rule applies on
first principles, so marking with '*' shouldn't be necessary.
I think direct approach is better.  The modifiers were designed for the 
old algorithms.  Now I think their treatment prevent and will prevent 
significant RA algorithm modifications.  I found that when tried to 
significantly change algorithm for calculation of register class costs 
and register class preferences.

This patch instead makes ira_get_dup_out_num ignore expensive
alternatives if there are other alternatives that match exactly.
The cheapest way of doing that seemed to be to take expensive
alternatives out of consideration in ira_setup_alts, which provides
a bitmask of alternatives and has all the information available.
add_insn_allocno_copies is the only current user of ira_setup_alts,
so no other code should be affected.


2019-06-21  Richard Sandiford  

gcc/
* ira.c (ira_setup_alts): If any valid alternatives have zero cost,
exclude any others that are disparaged or that are bound to need
a reload or spill.
(ira_get_dup_out_num): Expand comment.

The patch is ok for me.

Index: gcc/ira.c
===
--- gcc/ira.c   2019-06-21 14:34:09.455685354 +0100
+++ gcc/ira.c   2019-06-21 14:34:13.239653892 +0100
@@ -1787,7 +1787,9 @@ setup_prohibited_mode_move_regs (void)
  /* Extract INSN and return the set of alternatives that we should consider.
 This excludes any alternatives whose constraints are obviously impossible
 to meet (e.g. because the constraint requires a constant and the operand
-   is nonconstant).  */
+   is nonconstant).  It also excludes alternatives that are bound to need
+   a spill or reload, as long as we have other alternatives that match
+   exactly.  */
  alternative_mask
  ira_setup_alts (rtx_insn *insn)
  {
@@ -1800,6 +1802,7 @@ ira_setup_alts (rtx_insn *insn)
preprocess_constraints (insn);
alternative_mask preferred = get_preferred_alternatives (insn);
alternative_mask alts = 0;
+  alternative_mask exact_alts = 0;
/* Check that the hard reg set is enough for holding all
   alternatives.  It is hard to imagine the situation when the
   assertion is wrong.  */
@@ -1816,20 +1819,24 @@ ira_setup_alts (rtx_insn *insn)
  {
for (nalt = 0; nalt < recog_data.n_alternatives; nalt++)
{
- if (!TEST_BIT (preferred, nalt) || TEST_BIT (alts, nalt))
+ if (!TEST_BIT (preferred, nalt) || TEST_BIT (exact_alts, nalt))
continue;
  
  	  const operand_alternative *op_alt

= &recog_op_alt[nalt * recog_data.n_operands];
+ int this_reject = 0;
  for (nop = 0; nop < recog_data.n_operands; nop++)
{
  int c, len;
  
+	  this_reject += op_alt[nop].reject;

+
  rtx op = recog_data.operand[nop];
  p = op_alt[nop].constraint;
  if (*p == 0 || *p == ',')
continue;
-   
+
+ bool win_p = false;
  do
switch (c = *p, len = CONSTRAINT_LEN (c, p), c)
  {
@@ -1847,7 +1854,14 @@ ira_setup_alts (rtx_insn *insn)
  
  		  case '0':  case '1':  case '2':  case '3':  case '4':

  case '5':  case '6':  case '7':  case '8':  case '9':
-   goto op_success;
+   {
+ rtx other = recog_data.operand[c - '0'];
+ if (MEM_P (other)
+ ? rtx_equal_p (other, op)
+ : REG_P (op) || SUBREG_P (op))
+   goto op_success;
+ win_p = true;
+   }
break;

  case 'g':
@@ -1861,7 +1875,11 @@ ira_setup_alts (rtx_insn *ins

Re: [PATCH 2/5] Simplify ira_setup_alts

2019-06-24 Thread Vladimir Makarov



On 2019-06-21 9:40 a.m., Richard Sandiford wrote:

ira_setup_alts has its own code to calculate the start of the
constraint string for each operand/alternative combination,
but preprocess_constraints now provides that information in (almost)
constant time for non-asm instructions.  Using it here should speed
up the common case at the cost of potentially slowing down the handling
of asm statements.


The documentation says that '%' should be the very first constraint 
character.  But I think there is a possibility that somebody can forget 
this and put a blank before '%' and effect of this would be very hard to 
find as still the correct code would be generated although the code 
might be slower.  That was my thoughts why I processed all constraint 
string.


It is hard to for me to say what the probability of this can be. I guess 
it is tiny.  So the patch is ok for me.



The real reason for doing this is that a later patch wants to use
more of the operand_alternative information.

2019-06-21  Richard Sandiford  

gcc/
* ira.c (ira_setup_alts): Use preprocess_constraints to get the
constraint string for each operand/alternative combo.  Only handle
'%' at the start of constraint strings, and look for it outside
the main loop.

Index: gcc/ira.c
===
--- gcc/ira.c   2019-06-21 14:34:05.887715020 +0100
+++ gcc/ira.c   2019-06-21 14:34:09.455685354 +0100
@@ -1791,60 +1791,42 @@ setup_prohibited_mode_move_regs (void)
  alternative_mask
  ira_setup_alts (rtx_insn *insn)
  {
-  /* MAP nalt * nop -> start of constraints for given operand and
- alternative.  */
-  static vec insn_constraints;
int nop, nalt;
bool curr_swapped;
const char *p;
int commutative = -1;
  
extract_insn (insn);

+  preprocess_constraints (insn);
alternative_mask preferred = get_preferred_alternatives (insn);
alternative_mask alts = 0;
-  insn_constraints.release ();
-  insn_constraints.safe_grow_cleared (recog_data.n_operands
- * recog_data.n_alternatives + 1);
/* Check that the hard reg set is enough for holding all
   alternatives.  It is hard to imagine the situation when the
   assertion is wrong.  */
ira_assert (recog_data.n_alternatives
  <= (int) MAX (sizeof (HARD_REG_ELT_TYPE) * CHAR_BIT,
FIRST_PSEUDO_REGISTER));
+  for (nop = 0; nop < recog_data.n_operands; nop++)
+if (recog_data.constraints[nop][0] == '%')
+  {
+   commutative = nop;
+   break;
+  }
for (curr_swapped = false;; curr_swapped = true)
  {
-  /* Calculate some data common for all alternatives to speed up the
-function.  */
-  for (nop = 0; nop < recog_data.n_operands; nop++)
-   {
- for (nalt = 0, p = recog_data.constraints[nop];
-  nalt < recog_data.n_alternatives;
-  nalt++)
-   {
- insn_constraints[nop * recog_data.n_alternatives + nalt] = p;
- while (*p && *p != ',')
-   {
- /* We only support one commutative marker, the first
-one.  We already set commutative above.  */
- if (*p == '%' && commutative < 0)
-   commutative = nop;
- p++;
-   }
- if (*p)
-   p++;
-   }
-   }
for (nalt = 0; nalt < recog_data.n_alternatives; nalt++)
{
  if (!TEST_BIT (preferred, nalt) || TEST_BIT (alts, nalt))
continue;
  
+	  const operand_alternative *op_alt

+   = &recog_op_alt[nalt * recog_data.n_operands];
  for (nop = 0; nop < recog_data.n_operands; nop++)
{
  int c, len;
  
  	  rtx op = recog_data.operand[nop];

- p = insn_constraints[nop * recog_data.n_alternatives + nalt];
+ p = op_alt[nop].constraint;
  if (*p == 0 || *p == ',')
continue;



Re: [PATCH 0/3] RFC: Let debug stmts influence codegen at -Og

2019-06-24 Thread Jakub Jelinek
On Sun, Jun 23, 2019 at 02:51:06PM +0100, Richard Sandiford wrote:
> What do you think?  Is it worth pursuing this further?

Wouldn't it be more useful to just force all automatic variables to be
used at the end of their corresponding scope?  That is IMHO the main issue
with -Og debugging, VTA is a best effort, if we can express a variable with
some expression, nice, but if there is no expression nor memory nor register
that holds the value, we are out of luck.  Could be some magic stmt like
gimple_clobber or ifn or something similar, which would make sure that at
least until expansion to RTL we force those vars to be live in either a
register or memory.
I'm afraid having different modes, one in which debug stmts can't and one
where they can affect code generation might be a maintainance nightmare.

Jakub


Re: [PATCH] Add .gnu.lto_.meta section.

2019-06-24 Thread Iain Sandoe


> On 24 Jun 2019, at 14:31, Martin Liška  wrote:
> 
> On 6/24/19 2:44 PM, Richard Biener wrote:
>> On Mon, Jun 24, 2019 at 2:12 PM Martin Liška  wrote:
>>> 
>>> On 6/24/19 2:02 PM, Richard Biener wrote:
 On Fri, Jun 21, 2019 at 4:01 PM Martin Liška  wrote:
> 
> On 6/21/19 2:57 PM, Jan Hubicka wrote:
>> This looks like good step (and please stream it in host independent
>> way). I suppose all these issues can be done one-by-one.
> 
> So there's a working patch for that. However one will see following errors
> when using an older compiler or older LTO bytecode:
> 
> $ gcc main9.o -flto
> lto1: fatal error: bytecode stream in file ‘main9.o’ generated with LTO 
> version -25480.4493 instead of the expected 9.0
> 
> $ gcc main.o
> lto1: internal compiler error: compressed stream: data error
 
 This is because of your change to bitfields or because with the old
 scheme the header with the
 version is compressed (is it?).
>>> 
>>> Because currently also the header is compressed.
>> 
>> That was it, yeah :/  Stupid decisions in the past.
>> 
>> I guess we have to bite the bullet and do this kind of incompatible
>> change, accepting
>> the odd error message above.
>> 
 I'd simply avoid any layout changes
 in the version check range.
>>> 
>>> Well, then we have to find out how to distinguish between compression 
>>> algorithms.
>>> 
 
> To be honest, I would prefer the new .gnu.lto_.meta section.
> Richi why is that so ugly?
 
 Because it's a change in the wrong direction and doesn't solve the
 issue we already
 have (cannot determine if a section is compressed or not).
>>> 
>>> That's not true, the .gnu.lto_.meta section will be always uncompressed and 
>>> we can
>>> also backport changes to older compiler that can read it and print a proper 
>>> error
>>> message about LTO bytecode version mismatch.
>> 
>> We can always backport changes, yes, but I don't see why we have to.
> 
> I'm fine with the backward compatibility break. But we should also consider 
> lto-plugin.c
> that is parsing following 2 sections:
> 
>91  #define LTO_SECTION_PREFIX  ".gnu.lto_.symtab"
>92  #define LTO_SECTION_PREFIX_LEN  (sizeof (LTO_SECTION_PREFIX) - 1)
>93  #define OFFLOAD_SECTION ".gnu.offload_lto_.opts"
>94  #define OFFLOAD_SECTION_LEN (sizeof (OFFLOAD_SECTION) - 1)
> 
>> 
 ELF section overhead
 is quite big if you have lots of small functions.
>>> 
>>> My patch is actually shrinking space as I'm suggesting to add _one_ extra 
>>> ELF section
>>> and remove the section header from all other LTO sections. That will save 
>>> space
>>> for all function sections.
>> 
>> But we want the header there to at least say if the section is
>> compressed or not.
>> The fact that we have so many ELF section means we have the redundant version
>> info everywhere.
>> 
>> We should have a single .gnu.lto_ section (and also get rid of those
>> __gnu_lto_v1 and __gnu_lto_slim COMMON symbols - checking for
>> existence of a symbol is more expensive compared to existence
>> of a section).
> 
> I like removal of the 2 aforementioned sections. To be honest I would 
> recommend to
> add a new .gnu.lto_.meta section. We can use it instead of __gnu_lto_v1 and 
> we can
> have a flag there instead of __gnu_lto_slim. As a second step, I'm willing to 
> concatenate all
> 
>  LTO_section_function_body,
>  LTO_section_static_initializer
> 
> sections into a single one. That will require an index that will have to be 
> created. I can discuss
> that with Honza as he suggested using something smarter than function names.

I already implemented a scheme (using three sections: INDEX, NAMES, PAYLOAD)  
for Mach-O -
 since it doesn’t have  unlimited section count - it works - and hardly rocket 
science ;) 

- if one were to import the tabular portion of that at the start of a section 
and then the variable portion
 as a trailer … it could all be a single section.

iain

> Martin
> 
>> 
>> Richard.
>> 
>>> Martin
>>> 
 
 Richard.
 
> 
> Martin



Re: [PATCH 0/3] RFC: Let debug stmts influence codegen at -Og

2019-06-24 Thread Richard Sandiford
Segher Boessenkool  writes:
> Hi!
>
> What does -O1g do with OPT_LEVELS_1_PLUS_NOT_DEBUG, is it enabled or
> not there?  Maybe that name needs to change, with your patches?  It is
> currently documented as
>
> /* -O1 (and not -Og) optimizations.  */

Yeah, comment should change to be:

/* -O1 and -O1g (but not -Og) optimizations.  */

Richard


Re: [SVE] [fwprop] PR88833 - Redundant moves for WHILELO-based loops

2019-06-24 Thread Richard Sandiford
Prathamesh Kulkarni  writes:
> @@ -1415,6 +1460,19 @@ forward_propagate_into (df_ref use)
>if (!def_set)
>  return false;
>  
> +  if (reg_prop_only
> +  && !REG_P (SET_SRC (def_set))
> +  && !REG_P (SET_DEST (def_set)))
> +return false;

This should be:

  if (reg_prop_only
  && (!REG_P (SET_SRC (def_set)) || !REG_P (SET_DEST (def_set
return false;

so that we return false if either operand isn't a register.

> +
> +  /* Allow propagations into a loop only for reg-to-reg copies, since
> + replacing one register by another shouldn't increase the cost.  */
> +
> +  if (DF_REF_BB (def)->loop_father != DF_REF_BB (use)->loop_father
> +  && !REG_P (SET_SRC (def_set))
> +  && !REG_P (SET_DEST (def_set)))
> +return false;

Same here.

OK with that change, thanks.

Richard


Re: [PATCH 0/3] RFC: Let debug stmts influence codegen at -Og

2019-06-24 Thread Segher Boessenkool
Hi!

What does -O1g do with OPT_LEVELS_1_PLUS_NOT_DEBUG, is it enabled or
not there?  Maybe that name needs to change, with your patches?  It is
currently documented as

/* -O1 (and not -Og) optimizations.  */


Segher


Re: RFA: Synchronize top level files with binutils

2019-06-24 Thread Richard Earnshaw (lists)
On 20/06/2019 15:10, Nick Clifton wrote:
> Hi Richard,
> 
>   Please may I apply this patch to the gcc-9, gcc-8 and gcc-7 branches ?
> 
>   I have tested it on all three branches and found no problems.
> 
> Cheers
>   Nick
> 
> 2019-06-07  Nick Clifton  
> 
>   Import these changes from the binutils/gdb repository:
> 
>   2019-05-28  Nick Alcock  
> 
>   * Makefile.def (dependencies): configure-libctf depends on all-bfd
>   and all its deps.
>   * Makefile.in: Regenerated.
> 
>   2019-05-28  Nick Alcock  
> 
>   * Makefile.def (host_modules): Add libctf.
>   * Makefile.def (dependencies): Likewise.
>   libctf depends on zlib, libiberty, and bfd.
>   * Makefile.in: Regenerated.
>   * configure.ac (host_libs): Add libctf.
>   * configure: Regenerated.
> 
> 
> Synchronize top level files with binutils.patch
> 
> 2019-06-07  Nick Clifton  
> 
>   Import these changes from the binutils/gdb repository:
> 
>   2019-05-28  Nick Alcock  
> 
>   * Makefile.def (dependencies): configure-libctf depends on all-bfd
>   and all its deps.
>   * Makefile.in: Regenerated.
> 
>   2019-05-28  Nick Alcock  
> 
>   * Makefile.def (host_modules): Add libctf.
>   * Makefile.def (dependencies): Likewise.
>   libctf depends on zlib, libiberty, and bfd.
>   * Makefile.in: Regenerated.
>   * configure.ac (host_libs): Add libctf.
>   * configure: Regenerated.

FWIW, this looks good to me, but you probably need a RM to sign it off.

R.

> 
> Index: Makefile.def
> ===
> --- Makefile.def  (revision 272111)
> +++ Makefile.def  (working copy)
> @@ -4,7 +4,7 @@
>  // Makefile.in is generated from Makefile.tpl by 'autogen Makefile.def'.
>  // This file was originally written by Nathanael Nerode.
>  //
> -//   Copyright 2002-2013 Free Software Foundation
> +//   Copyright 2002-2019 Free Software Foundation
>  //
>  // This file is free software; you can redistribute it and/or modify
>  // it under the terms of the GNU General Public License as published by
> @@ -128,6 +128,8 @@
>   extra_make_flags='@extra_linker_plugin_flags@'; };
>  host_modules= { module= libcc1; extra_configure_flags=--enable-shared; };
>  host_modules= { module= gotools; };
> +host_modules= { module= libctf; no_install=true; no_check=true;
> + bootstrap=true; };
>  
>  target_modules = { module= libstdc++-v3;
>  bootstrap=true;
> @@ -428,6 +430,7 @@
>  dependencies = { module=all-binutils; on=all-build-bison; };
>  dependencies = { module=all-binutils; on=all-intl; };
>  dependencies = { module=all-binutils; on=all-gas; };
> +dependencies = { module=all-binutils; on=all-libctf; };
>  
>  // We put install-opcodes before install-binutils because the installed
>  // binutils might be on PATH, and they might need the shared opcodes
> @@ -518,6 +521,14 @@
>  dependencies = { module=all-fastjar; on=all-zlib; };
>  dependencies = { module=all-fastjar; on=all-build-texinfo; };
>  dependencies = { module=all-fastjar; on=all-libiberty; };
> +dependencies = { module=all-libctf; on=all-libiberty; hard=true; };
> +dependencies = { module=all-libctf; on=all-bfd; };
> +dependencies = { module=all-libctf; on=all-zlib; };
> +// So that checking for ELF support in BFD from libctf configure is possible.
> +dependencies = { module=configure-libctf; on=all-bfd; };
> +dependencies = { module=configure-libctf; on=all-intl; };
> +dependencies = { module=configure-libctf; on=all-zlib; };
> +dependencies = { module=configure-libctf; on=all-libiconv; };
>  
>  // Warning, these are not well tested.
>  dependencies = { module=all-bison; on=all-intl; };
> Index: configure.ac
> ===
> --- configure.ac  (revision 272111)
> +++ configure.ac  (working copy)
> @@ -131,7 +131,7 @@
>  
>  # these libraries are used by various programs built for the host environment
>  #f
> -host_libs="intl libiberty opcodes bfd readline tcl tk itcl libgui zlib 
> libbacktrace libcpp libdecnumber gmp mpfr mpc isl libelf libiconv"
> +host_libs="intl libiberty opcodes bfd readline tcl tk itcl libgui zlib 
> libbacktrace libcpp libdecnumber gmp mpfr mpc isl libelf libiconv libctf"
>  
>  # these tools are built for the host environment
>  # Note, the powerpc-eabi build depends on sim occurring before gdb in order 
> to
> 
> 
> 



Re: [PATCH][RFC] Sanitize equals and hash functions in hash-tables.

2019-06-24 Thread Richard Biener
On Mon, Jun 24, 2019 at 3:51 PM Martin Liška  wrote:
>
> On 6/24/19 2:29 PM, Richard Biener wrote:
> > On Mon, Jun 24, 2019 at 1:08 AM Ian Lance Taylor  wrote:
> >>
> >> On Fri, Jun 7, 2019 at 5:04 AM Martin Liška  wrote:
> >>>
> >>> On 6/7/19 10:57 AM, Richard Biener wrote:
>  On Mon, Jun 3, 2019 at 3:35 PM Martin Liška  wrote:
> >
> > On 6/1/19 12:06 AM, Jeff Law wrote:
> >> On 5/22/19 3:13 AM, Martin Liška wrote:
> >>> On 5/21/19 1:51 PM, Richard Biener wrote:
>  On Tue, May 21, 2019 at 1:02 PM Martin Liška  wrote:
> >
> > On 5/21/19 11:38 AM, Richard Biener wrote:
> >> On Tue, May 21, 2019 at 12:07 AM Jeff Law  wrote:
> >>>
> >>> On 5/13/19 1:41 AM, Martin Liška wrote:
>  On 11/8/18 9:56 AM, Martin Liška wrote:
> > On 11/7/18 11:23 PM, Jeff Law wrote:
> >> On 10/30/18 6:28 AM, Martin Liška wrote:
> >>> On 10/30/18 11:03 AM, Jakub Jelinek wrote:
>  On Mon, Oct 29, 2018 at 04:14:21PM +0100, Martin Liška wrote:
> > +hashtab_chk_error ()
> > +{
> > +  fprintf (stderr, "hash table checking failed: "
> > +   "equal operator returns true for a pair "
> > +   "of values with a different hash value");
>  BTW, either use internal_error here, or at least if using 
>  fprintf
>  terminate with \n, in your recent mail I saw:
>  ...different hash valueduring RTL pass: vartrack
>  ^^
> >>> Sure, fixed in attached patch.
> >>>
> >>> Martin
> >>>
> > +  gcc_unreachable ();
> > +}
>    Jakub
> 
> >>> 0001-Sanitize-equals-and-hash-functions-in-hash-tables.patch
> >>>
> >>> From 0d9c979c845580a98767b83c099053d36eb49bb9 Mon Sep 17 
> >>> 00:00:00 2001
> >>> From: marxin 
> >>> Date: Mon, 29 Oct 2018 09:38:21 +0100
> >>> Subject: [PATCH] Sanitize equals and hash functions in 
> >>> hash-tables.
> >>>
> >>> ---
> >>>  gcc/hash-table.h | 40 
> >>> +++-
> >>>  1 file changed, 39 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/gcc/hash-table.h b/gcc/hash-table.h
> >>> index bd83345c7b8..694eedfc4be 100644
> >>> --- a/gcc/hash-table.h
> >>> +++ b/gcc/hash-table.h
> >>> @@ -503,6 +503,7 @@ private:
> >>>
> >>>value_type *alloc_entries (size_t n CXX_MEM_STAT_INFO) 
> >>> const;
> >>>value_type *find_empty_slot_for_expand (hashval_t);
> >>> +  void verify (const compare_type &comparable, hashval_t 
> >>> hash);
> >>>bool too_empty_p (unsigned int);
> >>>void expand ();
> >>>static bool is_deleted (value_type &v)
> >>> @@ -882,8 +883,12 @@ hash_table
> >>>if (insert == INSERT && m_size * 3 <= m_n_elements * 4)
> >>>  expand ();
> >>>
> >>> -  m_searches++;
> >>> +#if ENABLE_EXTRA_CHECKING
> >>> +if (insert == INSERT)
> >>> +  verify (comparable, hash);
> >>> +#endif
> >>>
> >>> +  m_searches++;
> >>>value_type *first_deleted_slot = NULL;
> >>>hashval_t index = hash_table_mod1 (hash, 
> >>> m_size_prime_index);
> >>>hashval_t hash2 = hash_table_mod2 (hash, 
> >>> m_size_prime_index);
> >>> @@ -930,6 +935,39 @@ hash_table
> >>>return &m_entries[index];
> >>>  }
> >>>
> >>> +#if ENABLE_EXTRA_CHECKING
> >>> +
> >>> +/* Report a hash table checking error.  */
> >>> +
> >>> +ATTRIBUTE_NORETURN ATTRIBUTE_COLD
> >>> +static void
> >>> +hashtab_chk_error ()
> >>> +{
> >>> +  fprintf (stderr, "hash table checking failed: "
> >>> + "equal operator returns true for a pair "
> >>> + "of values with a different hash value\n");
> >>> +  gcc_unreachable ();
> >>> +}
> >> I think an internal_error here is probably still better than a 
> >> simple
> >> fprintf, even if the fprintf is terminated with a \n :-)
> > Fully agree with that, but I see a lot of build errors when 
> > using internal_error.
> >
> >> The question then becomes can we bootstrap with this stuff 
> >> enabled and
> >> if n

Re: Start implementing -frounding-math

2019-06-24 Thread Richard Biener
On Mon, Jun 24, 2019 at 3:47 PM Marc Glisse  wrote:
>
> On Mon, 24 Jun 2019, Richard Biener wrote:
>
> > On Sun, Jun 23, 2019 at 12:22 AM Marc Glisse  wrote:
> >>
> >> On Sat, 22 Jun 2019, Richard Biener wrote:
> >>
> >>> On June 22, 2019 6:10:15 PM GMT+02:00, Marc Glisse  
> >>> wrote:
>  Hello,
> 
>  as discussed in the PR, this seems like a simple enough approach to
>  handle
>  FENV functionality safely, while keeping it possible to implement
>  optimizations in the future.
> 
>  Some key missing things:
>  - handle C, not just C++ (I don't care, but some people probably do)
> >>>
> >>> As you tackle C++, what does the standard say to constexpr contexts and
> >>> FENV? That is, what's the FP environment at compiler - time (I suppose
> >>> FENV modifying functions are not constexpr declared).
> >>
> >> The C++ standard doesn't care much about fenv:
> >>
> >> [Note: This document does not require an implementation to support the
> >> FENV_ACCESS pragma; it is implementation-defined (15.8) whether the pragma
> >> is supported. As a consequence, it is implementation- defined whether
> >> these functions can be used to test floating-point status flags, set
> >> floating-point control modes, or run under non-default mode settings. If
> >> the pragma is used to enable control over the floating-point environment,
> >> this document does not specify the effect on floating-point evaluation in
> >> constant expressions. — end note]
> >
> > Oh, I see.
> >
> >> We should care about the C standard, and do whatever makes sense for C++
> >> without expecting the C++ standard to tell us exactly what that is. We can
> >> check what visual studio and intel do, but we don't have to follow them.
> >
> > This makes it somewhat odd to implement this for C++ first and not C, but 
> > hey ;)
>
> Well, I maintain a part of CGAL, a C++ library, that uses interval
> arithmetic and thus relies on a non-default rounding direction. I am
> trying to prepare this dog food so I can eat it myself...

;)

> >> -frounding-math is supposed to be equivalent to "#pragma stdc fenv_access
> >> on" covering the whole program.
> >>
> >> For constant expressions, I see a difference between
> >> constexpr double third = 1. / 3.;
> >> which really needs to be done at compile time, and
> >> const double third = 1. / 3.;
> >> which will try to evaluate the rhs as constexpr, but where the program is
> >> still valid if that fails. The second one clearly should refuse to be
> >> evaluated at compile time if we are specifying a dynamic rounding
> >> direction. For the first one, I am not sure. I guess you should only write
> >> that in "fenv_access off" regions and I wouldn't mind a compile error.
> >>
> >> Note that C2x adds a pragma fenv_round that specifies a rounding direction
> >> for a region of code, which seems relevant for constant expressions. That
> >> pragma looks hard, but maybe some pieces would be nice to add.
> >
> > Hmm.  My thinking was along the line that at the start of main() the
> > C abstract machine might specify the initial rounding mode (and exception
> > state) is implementation defined and all constant expressions are evaluated
> > whilst being in this state.  So we can define that to round-to-nearest and
> > simply fold all constants in contexts we are allowed to evaluate at
> > compile-time as we see them?
>
> There are way too many such contexts. In C++, any initializer is
> constexpr-evaluated if possible (PR 85746 shows that this is bad for
> __builtin_constant_p), and I do want
> double d = 1. / 3;
> to depend on the dynamic rounding direction. I'd rather err on the other
> extreme and only fold when we are forced to, say
> constexpr double d = 1. / 3;
> or even reject it because it is inexact, if pragmas put us in a region
> with dynamic rounding.

OK, fair enough.  I just hoped that global

double x = 1.0/3.0;

do not become runtime initializers with -frounding-math ...

> > I guess fenv_round aims at using a pragma to change the rounding mode?
>
> Yes. You can specify either a fixed rounding mode, or "dynamic". In the
> first case, it overrides the dynamic rounding mode.
>
>  - handle vectors (for complex, I don't know what it means)
> 
>  Then flag_trapping_math should also enable this path, meaning that we
>  should stop making it the default, or performance will suffer.
> >>>
> >>> Do we need N variants of the functions to really encode FP options into
> >>> the IL and thus allow inlining of say different signed-zero flag
> >>> functions?
> >>
> >> Not sure what you are suggesting. I am essentially creating a new
> >> tree_code (well, an internal function) for an addition-like function that
> >> actually reads/writes memory, so it should be orthogonal to inlining, and
> >> only the front-end should care about -frounding-math. I didn't think about
> >> the interaction with signed-zero. Ah, you mean
> >> IFN_FENV_ADD_WITH_ROUNDING_AND_SIGNED_ZEROS, etc?
> >
> > Yea

Re: [PATCH][RFC] Sanitize equals and hash functions in hash-tables.

2019-06-24 Thread Martin Liška
On 6/24/19 2:29 PM, Richard Biener wrote:
> On Mon, Jun 24, 2019 at 1:08 AM Ian Lance Taylor  wrote:
>>
>> On Fri, Jun 7, 2019 at 5:04 AM Martin Liška  wrote:
>>>
>>> On 6/7/19 10:57 AM, Richard Biener wrote:
 On Mon, Jun 3, 2019 at 3:35 PM Martin Liška  wrote:
>
> On 6/1/19 12:06 AM, Jeff Law wrote:
>> On 5/22/19 3:13 AM, Martin Liška wrote:
>>> On 5/21/19 1:51 PM, Richard Biener wrote:
 On Tue, May 21, 2019 at 1:02 PM Martin Liška  wrote:
>
> On 5/21/19 11:38 AM, Richard Biener wrote:
>> On Tue, May 21, 2019 at 12:07 AM Jeff Law  wrote:
>>>
>>> On 5/13/19 1:41 AM, Martin Liška wrote:
 On 11/8/18 9:56 AM, Martin Liška wrote:
> On 11/7/18 11:23 PM, Jeff Law wrote:
>> On 10/30/18 6:28 AM, Martin Liška wrote:
>>> On 10/30/18 11:03 AM, Jakub Jelinek wrote:
 On Mon, Oct 29, 2018 at 04:14:21PM +0100, Martin Liška wrote:
> +hashtab_chk_error ()
> +{
> +  fprintf (stderr, "hash table checking failed: "
> +   "equal operator returns true for a pair "
> +   "of values with a different hash value");
 BTW, either use internal_error here, or at least if using 
 fprintf
 terminate with \n, in your recent mail I saw:
 ...different hash valueduring RTL pass: vartrack
 ^^
>>> Sure, fixed in attached patch.
>>>
>>> Martin
>>>
> +  gcc_unreachable ();
> +}
   Jakub

>>> 0001-Sanitize-equals-and-hash-functions-in-hash-tables.patch
>>>
>>> From 0d9c979c845580a98767b83c099053d36eb49bb9 Mon Sep 17 
>>> 00:00:00 2001
>>> From: marxin 
>>> Date: Mon, 29 Oct 2018 09:38:21 +0100
>>> Subject: [PATCH] Sanitize equals and hash functions in 
>>> hash-tables.
>>>
>>> ---
>>>  gcc/hash-table.h | 40 +++-
>>>  1 file changed, 39 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/gcc/hash-table.h b/gcc/hash-table.h
>>> index bd83345c7b8..694eedfc4be 100644
>>> --- a/gcc/hash-table.h
>>> +++ b/gcc/hash-table.h
>>> @@ -503,6 +503,7 @@ private:
>>>
>>>value_type *alloc_entries (size_t n CXX_MEM_STAT_INFO) const;
>>>value_type *find_empty_slot_for_expand (hashval_t);
>>> +  void verify (const compare_type &comparable, hashval_t hash);
>>>bool too_empty_p (unsigned int);
>>>void expand ();
>>>static bool is_deleted (value_type &v)
>>> @@ -882,8 +883,12 @@ hash_table
>>>if (insert == INSERT && m_size * 3 <= m_n_elements * 4)
>>>  expand ();
>>>
>>> -  m_searches++;
>>> +#if ENABLE_EXTRA_CHECKING
>>> +if (insert == INSERT)
>>> +  verify (comparable, hash);
>>> +#endif
>>>
>>> +  m_searches++;
>>>value_type *first_deleted_slot = NULL;
>>>hashval_t index = hash_table_mod1 (hash, m_size_prime_index);
>>>hashval_t hash2 = hash_table_mod2 (hash, m_size_prime_index);
>>> @@ -930,6 +935,39 @@ hash_table
>>>return &m_entries[index];
>>>  }
>>>
>>> +#if ENABLE_EXTRA_CHECKING
>>> +
>>> +/* Report a hash table checking error.  */
>>> +
>>> +ATTRIBUTE_NORETURN ATTRIBUTE_COLD
>>> +static void
>>> +hashtab_chk_error ()
>>> +{
>>> +  fprintf (stderr, "hash table checking failed: "
>>> + "equal operator returns true for a pair "
>>> + "of values with a different hash value\n");
>>> +  gcc_unreachable ();
>>> +}
>> I think an internal_error here is probably still better than a 
>> simple
>> fprintf, even if the fprintf is terminated with a \n :-)
> Fully agree with that, but I see a lot of build errors when using 
> internal_error.
>
>> The question then becomes can we bootstrap with this stuff 
>> enabled and
>> if not, are we likely to soon?  It'd be a shame to put it into
>> EXTRA_CHECKING, but then not be able to really use EXTRA_CHECKING
>> because we've got too many bugs to fix.
> Unfortunately it's blocked with these 2 PRs:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87845
> https://gcc.gnu.org/bugzill

Re: Start implementing -frounding-math

2019-06-24 Thread Marc Glisse

On Mon, 24 Jun 2019, Richard Biener wrote:


On Sun, Jun 23, 2019 at 12:22 AM Marc Glisse  wrote:


On Sat, 22 Jun 2019, Richard Biener wrote:


On June 22, 2019 6:10:15 PM GMT+02:00, Marc Glisse  wrote:

Hello,

as discussed in the PR, this seems like a simple enough approach to
handle
FENV functionality safely, while keeping it possible to implement
optimizations in the future.

Some key missing things:
- handle C, not just C++ (I don't care, but some people probably do)


As you tackle C++, what does the standard say to constexpr contexts and
FENV? That is, what's the FP environment at compiler - time (I suppose
FENV modifying functions are not constexpr declared).


The C++ standard doesn't care much about fenv:

[Note: This document does not require an implementation to support the
FENV_ACCESS pragma; it is implementation-defined (15.8) whether the pragma
is supported. As a consequence, it is implementation- defined whether
these functions can be used to test floating-point status flags, set
floating-point control modes, or run under non-default mode settings. If
the pragma is used to enable control over the floating-point environment,
this document does not specify the effect on floating-point evaluation in
constant expressions. — end note]


Oh, I see.


We should care about the C standard, and do whatever makes sense for C++
without expecting the C++ standard to tell us exactly what that is. We can
check what visual studio and intel do, but we don't have to follow them.


This makes it somewhat odd to implement this for C++ first and not C, but hey ;)


Well, I maintain a part of CGAL, a C++ library, that uses interval 
arithmetic and thus relies on a non-default rounding direction. I am 
trying to prepare this dog food so I can eat it myself...



-frounding-math is supposed to be equivalent to "#pragma stdc fenv_access
on" covering the whole program.

For constant expressions, I see a difference between
constexpr double third = 1. / 3.;
which really needs to be done at compile time, and
const double third = 1. / 3.;
which will try to evaluate the rhs as constexpr, but where the program is
still valid if that fails. The second one clearly should refuse to be
evaluated at compile time if we are specifying a dynamic rounding
direction. For the first one, I am not sure. I guess you should only write
that in "fenv_access off" regions and I wouldn't mind a compile error.

Note that C2x adds a pragma fenv_round that specifies a rounding direction
for a region of code, which seems relevant for constant expressions. That
pragma looks hard, but maybe some pieces would be nice to add.


Hmm.  My thinking was along the line that at the start of main() the
C abstract machine might specify the initial rounding mode (and exception
state) is implementation defined and all constant expressions are evaluated
whilst being in this state.  So we can define that to round-to-nearest and
simply fold all constants in contexts we are allowed to evaluate at
compile-time as we see them?


There are way too many such contexts. In C++, any initializer is 
constexpr-evaluated if possible (PR 85746 shows that this is bad for 
__builtin_constant_p), and I do want

double d = 1. / 3;
to depend on the dynamic rounding direction. I'd rather err on the other 
extreme and only fold when we are forced to, say

constexpr double d = 1. / 3;
or even reject it because it is inexact, if pragmas put us in a region 
with dynamic rounding.



I guess fenv_round aims at using a pragma to change the rounding mode?


Yes. You can specify either a fixed rounding mode, or "dynamic". In the 
first case, it overrides the dynamic rounding mode.



- handle vectors (for complex, I don't know what it means)

Then flag_trapping_math should also enable this path, meaning that we
should stop making it the default, or performance will suffer.


Do we need N variants of the functions to really encode FP options into
the IL and thus allow inlining of say different signed-zero flag
functions?


Not sure what you are suggesting. I am essentially creating a new
tree_code (well, an internal function) for an addition-like function that
actually reads/writes memory, so it should be orthogonal to inlining, and
only the front-end should care about -frounding-math. I didn't think about
the interaction with signed-zero. Ah, you mean
IFN_FENV_ADD_WITH_ROUNDING_AND_SIGNED_ZEROS, etc?


Yeah.  Basically the goal is to have the IL fully defined on its own, without
having its semantic depend on flag_*.


The ones I am starting
from are supposed to be safe-for-everything. As refinement, I was thinking
in 2 directions:
* add a third constant argument, where we can specify extra info
* add a variant for the case where the function is pure (because I expect
that's easier on the compiler than "pure if (arg3 & 8) != 0")
I am not sure more variants are needed.


For optimization having a ADD_ROUND_TO_ZERO (or the extra params
specifying an explicit rounding mode) might be intere

Re: [PATCH] Automatics in equivalence statements

2019-06-24 Thread Mark Eggleston



On 24/06/2019 09:19, Bernhard Reutner-Fischer wrote:

On Fri, 21 Jun 2019 07:10:11 -0700
Steve Kargl  wrote:


On Fri, Jun 21, 2019 at 02:31:51PM +0100, Mark Eggleston wrote:

Currently variables with the AUTOMATIC attribute can not appear in an
EQUIVALENCE statement. However its counterpart, STATIC, can be used in
an EQUIVALENCE statement.

Where there is a clear conflict in the attributes of variables in an
EQUIVALENCE statement an error message will be issued as is currently
the case.

If there is no conflict e.g. a variable with a AUTOMATIC attribute and a
variable(s) without attributes all variables in the EQUIVALENCE will
become AUTOMATIC.

Note: most of this patch was written by Jeff Law 

Please review.

ChangeLogs:

gcc/fortran

      Jeff Law  
      Mark Eggleston  

      * gfortran.h: Add check_conflict declaration.

This is wrong.  By convention a routine that is not static
has the gfc_ prefix.


Furthermore doesn't this export indicate that you're committing a
layering violation somehow?

Don't know what this means.



      * symbol.c (check_conflict): Remove automatic in equivalence conflict
      check.
      * symbol.c (save_symbol): Add check for in equivalence to stop the
      the save attribute being added.
      * trans-common.c (build_equiv_decl): Add is_auto parameter and
      add !is_auto to condition where TREE_STATIC (decl) is set.
      * trans-common.c (build_equiv_decl): Add local variable is_auto,
      set it true if an atomatic attribute is encountered in the variable

atomatic? I read atomic but you mean automatic.


      list.  Call build_equiv_decl with is_auto as an additional parameter.
      flag_dec_format_defaults is enabled.
      * trans-common.c (accumulate_equivalence_attributes) : New subroutine.
      * trans-common.c (find_equivalence) : New local variable dummy_symbol,
      accumulated equivalence attributes from each symbol then check for
      conflicts.

I'm just curious why you don't gfc_copy_attr for the most part of 
accumulate_equivalence_attributes?
thanks,


I didn't write the original of this patch, I made a minor change and 
wrote the test cases. The main body of the work was done by Jeff Law  
. I'll have a look at gfc_copy_attr to see if better 
code can be used.


I have inherited the responsibility of getting this patch upstreamed, 
any help in achieving this will be appreciated.


Mark

--
https://www.codethink.co.uk/privacy.html



Re: Use ODR for canonical types construction in LTO

2019-06-24 Thread Richard Biener
On Mon, 24 Jun 2019, Jan Hubicka wrote:

> > Hi,
> > here is patch that adds TYPE_ODR_P to determine type that comply C++
> > ODR rules (i.e. ODR types themselves or structures/unions derived
> > from them).
> > I have decided to use STRING_FLAG which have meaning only for integers
> > and arrays which forced me to add type checks on places where
> > we check STRING_FLAG on other types.
> > 
> > The patch also let me to verify that all types we consider to have
> > linkage actually are created by C++ FE which turned out to not be the
> > case for Ada which I fixed in needs_assembler_name_p.
> > 
> > Bootstrapped/regtested x86_64-linux, OK?
> > 
> > * ipa-utils.h (type_with_linkage_p): Verify that type is
> > CXX_ODR_P.
> > (odr_type_p): Remove extra return.
> > * lto-streamer-out.c (hash_tree): Hash TYPE_CXX_ODR_P;
> > hash STRING_FLAG only for arrays and integers.
> > * tree-stremaer-in.c (unpack_ts_type_common_value_fields):
> > Update analogously.
> > * tree-streamer-out.c (pack_ts_type_common_value_fields):
> > Likewise.
> > * print-tree.c (print_node): Print cxx-odr-p
> > and string-flag.
> > * tree.c (need_assembler_name_p): Also check that type
> > is CXX_ODR_TYPE_P
> > (verify_type_variant): Update verification of SRING_FLAG;
> > also check CXX_ODR_P.
> > * tree.h (ARRAY_OR_INTEGER_TYPE_CHECK): New macro.
> > (TYPE_STRING_FLAG): Use it.
> > (TYPE_CXX_ODR_P): New macro.
> > 
> > * lto-common.c (compare_tree_sccs_1): Compare CXX_ODR_P;
> > compare STRING_FLAG only for arrays and integers.
> > 
> > * gcc-interface/decl.c (gnat_to_gnu_entity): Check that
> > type is array or integer prior checking string flag.
> > * gcc-interface/gigi.h (gnat_signed_type_for,
> > maybe_character_value): Likewise.
> > 
> > * c-common.c (braced_lists_to_strings): Check that
> > type is array or integer prior checking string flag.
> > 
> > * lex.c (cxx_make_type): Set TYPE_CXX_ODR_P.
> > 
> > * dwarf2out.c (gen_array_type_die): First check that type
> > is an array and then test string flag.
> > 
> > * trans-expr.c (gfc_conv_substring): Check that
> > type is array or integer prior checking string flag.
> > (gfc_conv_string_parameter): Likewise.
> > * trans-openmp.c (gfc_omp_scalar_p): Likewise.
> > * trans.c (gfc_build_array_ref): Likewise.
> 
> Hi,
> I would like to ping the patch - if it makes sense updating the original
> ODR patch should be easy.

Yes, this patch is OK if you amend the string_flag declaration in
tree-core.h with a comment explaining the uses on the two different
type classes.

Btw, I still wonder what the ODR says in the face of language
inter-operation and what this means here?  For C++ I suppose PODs
are not ODR?

Thanks,
Richard.


Re: [PATCH] i386: Separate costs of RTL expressions from costs of moves

2019-06-24 Thread Richard Biener
On Thu, 20 Jun 2019, Jan Hubicka wrote:

> > > Currently, costs of moves are also used for costs of RTL expressions.   
> > > This
> > > patch:
> > >
> > > https://gcc.gnu.org/ml/gcc-patches/2018-02/msg00405.html
> > >
> > > includes:
> > >
> > > diff --git a/gcc/config/i386/x86-tune-costs.h 
> > > b/gcc/config/i386/x86-tune-costs.h
> > > index e943d13..8409a5f 100644
> > > --- a/gcc/config/i386/x86-tune-costs.h
> > > +++ b/gcc/config/i386/x86-tune-costs.h
> > > @@ -1557,7 +1557,7 @@ struct processor_costs skylake_cost = {
> > >{4, 4, 4}, /* cost of loading integer registers
> > >  in QImode, HImode and SImode.
> > >  Relative to reg-reg move (2).  */
> > > -  {6, 6, 6}, /* cost of storing integer registers */
> > > +  {6, 6, 3}, /* cost of storing integer registers */
> > >2, /* cost of reg,reg fld/fst */
> > >{6, 6, 8}, /* cost of loading fp registers
> > >  in SFmode, DFmode and XFmode */
> 
> Well, it seems that the patch was fixing things on wrong spot - the
> tables are intended to be mostly latency based. I think we ought to
> document divergences from these including benchmarks where the change
> helped. Otherwise it is very hard to figure out why the entry does not
> match the reality.
> > >
> > > It lowered the cost for SImode store and made it cheaper than 
> > > SSE<->integer
> > > register move.  It caused a regression:
> > >
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90878
> > >
> > > Since the cost for SImode store is also used to compute scalar_store
> > > in ix86_builtin_vectorization_cost, it changed loop costs in
> > >
> > > void
> > > foo (long p2, long *diag, long d, long i)
> > > {
> > >   long k;
> > >   k = p2 < 3 ? p2 + p2 : p2 + 3;
> > >   while (i < k)
> > > diag[i++] = d;
> > > }
> > >
> > > As the result, the loop is unrolled 4 times with -O3 -march=skylake,
> > > instead of 3.
> > >
> > > My patch separates costs of moves from costs of RTL expressions.  We have
> > > a follow up patch which restores the cost for SImode store back to 6 and 
> > > leave
> > > the cost of scalar_store unchanged.  It keeps loop unrolling unchanged and
> > > improves powf performance in glibc by 30%.  We are collecting SPEC CPU 
> > > 2017
> > > data now.
> 
> I have seen the problem with scalar_store with AMD tuning as well.
> It seems to make SLP vectorizer to be happy about idea of turning
> sequence of say integer tores into code which moves all the values into
> AVX register and then does one vector store.
> 
> The cost basically compare cost of N scalar stores to 1 scalar store +
> vector construction. Vector construction then N*sse_op+addss.
> 
> With testcase:
> 
> short array[8];
> test (short a,short b,short c,short d,short e,short f,short g,short h)
> { 
>   array[0]=a;
>   array[1]=b;
>   array[2]=c;
>   array[3]=d;
>   array[4]=e;
>   array[5]=f;
>   array[6]=g;
>   array[7]=h;
> }
> int iarray[8];
> test2 (int a,int b,int c,int d,int e,int f,int g,int h)
> { 
>   iarray[0]=a;
>   iarray[1]=b;
>   iarray[2]=c;
>   iarray[3]=d;
>   iarray[4]=e;
>   iarray[5]=f;
>   iarray[6]=g;
>   iarray[7]=h;
> }
> 
> I get the following codegen:
> 
> 
> test:
> vmovd   %edi, %xmm0
> vmovd   %edx, %xmm2
> vmovd   %r8d, %xmm1
> vmovd   8(%rsp), %xmm3
> vpinsrw $1, 16(%rsp), %xmm3, %xmm3
> vpinsrw $1, %esi, %xmm0, %xmm0
> vpinsrw $1, %ecx, %xmm2, %xmm2
> vpinsrw $1, %r9d, %xmm1, %xmm1
> vpunpckldq  %xmm2, %xmm0, %xmm0
> vpunpckldq  %xmm3, %xmm1, %xmm1
> vpunpcklqdq %xmm1, %xmm0, %xmm0
> vmovaps %xmm0, array(%rip)
> ret
> 
> test2:
> vmovd   %r8d, %xmm5
> vmovd   %edx, %xmm6
> vmovd   %edi, %xmm7
> vpinsrd $1, %r9d, %xmm5, %xmm1
> vpinsrd $1, %ecx, %xmm6, %xmm3
> vpinsrd $1, %esi, %xmm7, %xmm0
> vpunpcklqdq %xmm3, %xmm0, %xmm0
> vmovd   16(%rbp), %xmm4
> vpinsrd $1, 24(%rbp), %xmm4, %xmm2
> vpunpcklqdq %xmm2, %xmm1, %xmm1
> vinserti128 $0x1, %xmm1, %ymm0, %ymm0
> vmovdqu %ymm0, iarray(%rip)
> vzeroupper
>   ret
> 
> which is about 20% slower on my skylake notebook than the
> non-SLP-vectorized variant.
> 
> I wonder if the vec_construct costs should be made more realistic.
> It is computed as:
> 
>   case vec_construct:
> {
>   /* N element inserts into SSE vectors.  */
>   int cost = TYPE_VECTOR_SUBPARTS (vectype) * ix86_cost->sse_op;
>   /* One vinserti128 for combining two SSE vectors for AVX256.  */
>   if (GET_MODE_BITSIZE (mode) == 256)
> cost += ix86_vec_cost (mode, ix86_cost->addss);
>   /* One vinserti64x4 and two vinserti128 for combining SSE
>  and AVX256 vectors to AVX512.  */
>   else if (GET_MODE_BITSIZE (mode) == 512)
> cost += 3 * ix86_vec_cost (mode, ix86_cost->addss);
>   return cost;
> 
> So it expects 8 simple SSE o

Re: [PATCH] Add .gnu.lto_.meta section.

2019-06-24 Thread Martin Liška
On 6/24/19 2:44 PM, Richard Biener wrote:
> On Mon, Jun 24, 2019 at 2:12 PM Martin Liška  wrote:
>>
>> On 6/24/19 2:02 PM, Richard Biener wrote:
>>> On Fri, Jun 21, 2019 at 4:01 PM Martin Liška  wrote:

 On 6/21/19 2:57 PM, Jan Hubicka wrote:
> This looks like good step (and please stream it in host independent
> way). I suppose all these issues can be done one-by-one.

 So there's a working patch for that. However one will see following errors
 when using an older compiler or older LTO bytecode:

 $ gcc main9.o -flto
 lto1: fatal error: bytecode stream in file ‘main9.o’ generated with LTO 
 version -25480.4493 instead of the expected 9.0

 $ gcc main.o
 lto1: internal compiler error: compressed stream: data error
>>>
>>> This is because of your change to bitfields or because with the old
>>> scheme the header with the
>>> version is compressed (is it?).
>>
>> Because currently also the header is compressed.
> 
> That was it, yeah :/  Stupid decisions in the past.
> 
> I guess we have to bite the bullet and do this kind of incompatible
> change, accepting
> the odd error message above.
> 
>>> I'd simply avoid any layout changes
>>> in the version check range.
>>
>> Well, then we have to find out how to distinguish between compression 
>> algorithms.
>>
>>>
 To be honest, I would prefer the new .gnu.lto_.meta section.
 Richi why is that so ugly?
>>>
>>> Because it's a change in the wrong direction and doesn't solve the
>>> issue we already
>>> have (cannot determine if a section is compressed or not).
>>
>> That's not true, the .gnu.lto_.meta section will be always uncompressed and 
>> we can
>> also backport changes to older compiler that can read it and print a proper 
>> error
>> message about LTO bytecode version mismatch.
> 
> We can always backport changes, yes, but I don't see why we have to.

I'm fine with the backward compatibility break. But we should also consider 
lto-plugin.c
that is parsing following 2 sections:

91  #define LTO_SECTION_PREFIX  ".gnu.lto_.symtab"
92  #define LTO_SECTION_PREFIX_LEN  (sizeof (LTO_SECTION_PREFIX) - 1)
93  #define OFFLOAD_SECTION ".gnu.offload_lto_.opts"
94  #define OFFLOAD_SECTION_LEN (sizeof (OFFLOAD_SECTION) - 1)

> 
>>> ELF section overhead
>>> is quite big if you have lots of small functions.
>>
>> My patch is actually shrinking space as I'm suggesting to add _one_ extra 
>> ELF section
>> and remove the section header from all other LTO sections. That will save 
>> space
>> for all function sections.
> 
> But we want the header there to at least say if the section is
> compressed or not.
> The fact that we have so many ELF section means we have the redundant version
> info everywhere.
> 
> We should have a single .gnu.lto_ section (and also get rid of those
> __gnu_lto_v1 and __gnu_lto_slim COMMON symbols - checking for
> existence of a symbol is more expensive compared to existence
> of a section).

I like removal of the 2 aforementioned sections. To be honest I would recommend 
to
add a new .gnu.lto_.meta section. We can use it instead of __gnu_lto_v1 and we 
can
have a flag there instead of __gnu_lto_slim. As a second step, I'm willing to 
concatenate all

  LTO_section_function_body,
  LTO_section_static_initializer

sections into a single one. That will require an index that will have to be 
created. I can discuss
that with Honza as he suggested using something smarter than function names.

Thoughts?
Martin

> 
> Richard.
> 
>> Martin
>>
>>>
>>> Richard.
>>>

 Martin
>>



Re: value_range and irange unification

2019-06-24 Thread Richard Biener
On Fri, Jun 21, 2019 at 1:41 PM Aldy Hernandez  wrote:
>
> Hi Richard.  Hi folks.
>
> In order to unify the APIs for value_range and irange, we'd like to make
> some minor changes to value_range.  We believe most of these changes
> could go in now, and would prefer so, to get broader testing and
> minimize the plethora of changes we drag around on our branch.
>
> First, introduce a type for VR_VARYING and VR_UNDEFINED.
> 
>
> irange utilizes 0 or more sub-ranges to represent a range, and VARYING
> is simply one subrange [MIN, MAX].value_range represents this with
> VR_VARYING, and since there is no type associated with it, we cannot
> calculate the lower and upper bounds for the range.  There is also a
> lack of canonicalness in value range in that VR_VARYING and [MIN, MAX]
> are two different representations of the same value.
>
> We tried to adjust irange to not associate a type with the empty range
> [] (representing undefined), but found we were unable to perform all
> operations properly.  In particular, we cannot invert an empty range.
> i.e. invert ( [] ) should produce [MIN, MAX].  Again, we need to have a
> type associated with this empty range.
>
> We'd like to tweak value_range so that set_varying() and set_undefined()
> both take a type, and then always set the min/max fields based on that
> type.  This takes no additional memory in the structure, and is
> virtually transparent to all the existing uses of value_range.
>
> This allows:
>1)  invert to be implemented properly for both VARYING and UNDEFINED
> by simply changing one to the other.
>2)  the type() method to always work without any special casing by
> simply returning TREE_TYPE(min)
>3)  the new incoming bounds() routines to work trivially for these
> cases as well (lbound/ubound, num_pairs(), etc).
>
> This functionality is provided in the first attached patch.
>
> Note, the current implementation sets min/max to TREE_TYPE, not to
> TYPE_MIN/MAX_VALUE.  We can fix this if preferred.

How does this work with

value_range *
vr_values::get_value_range (const_tree var)
{
  static const value_range vr_const_varying (VR_VARYING, NULL, NULL);
...
  /* If we query the range for a new SSA name return an unmodifiable VARYING.
 We should get here at most from the substitute-and-fold stage which
 will never try to change values.  */
  if (ver >= num_vr_values)
return CONST_CAST (value_range *, &vr_const_varying);

?

> Second, enforce canonicalization at value_range build time.
> ---
>
> As discussed above, value_range has multiple representations for the
> same range.  For instance, ~[0,0] is the same as [1,MAX] in unsigned and
> [MIN, MAX] is really varying, among others.  We found it quite difficult
> to make things work, with multiple representations for a given range.
> Canonicalizing at build time solves this, as well as removing explicit
> set_and_canonicalize() calls throughout.  Furthermore, it avoids some
> special casing in VRP.
>
> Along with canonicalizing, we also enforce the existing value_range API
> more strongly.  For instance, we don't allow setting equivalences for
> either VR_VARYING or VR_UNDEFINED.
>
> This functionality is provided in the second patch.

Fair enough.  Didn't look at the patch yet, sending separate mails would have
been prefered - or are the patches not independent of each other?  Note
canonicalization performs quite some work so a shortcut
set () with just checking the input is already canonicalized would be nice?

I wonder you still have anti-ranges since you can handle > 1 subranges
in ranger?

> Third, irange on value_range implementation.
> -
>
> The third attached patch shows how we use the above two to implement
> irange using value_ranges.  value_range would be a drop-in replacement
> for irange, by just doing the following in range.h:
>
> +// Enable this to implement irange piggybacking on value_range.
> +#define IRANGE_WITH_VALUE_RANGE 1
> +
> +#if IRANGE_WITH_VALUE_RANGE
> +#include "tree-vrp.h"
> +typedef value_range_base irange;
> +typedef value_range_storage irange_storage;
> +#define IRANGE_PLAIN VR_RANGE
> +#define IRANGE_INVERSE VR_ANTI_RANGE
> +#else
> ...
>
> The additions to the value_range API would be mostly the following (full
> details in the third attached patch):
>
> +  value_range_base (tree, tree);
> +  value_range_base (value_range_kind,
> +   tree type, const wide_int &, const wide_int &);
> +  value_range_base (tree type, const wide_int &, const wide_int &);
> +  value_range_base (tree type, const value_range_storage *);
> +  value_range_base (tree type);
>
> void set (value_range_kind, tree, tree);
> void set (tree);
> @@ -77,7 +86,25 @@ public:
> bool singleton_p (tree *result = NULL) const;
> void dump (FILE *) const;
>
> +  /* Support machinery for irange.  */
>

[PATCH] Fix missing else keyword seen with clang-static-analyzer:

2019-06-24 Thread Martin Liška

Hi.

The patch is fixing following clang-static-analyzer error:
/home/marxin/Programming/gcc/gcc/bb-reorder.c:1031:2: warning: Value stored to 
'is_better_edge' is never read
is_better_edge = true;
^
/home/marxin/Programming/gcc/gcc/bb-reorder.c:1034:2: warning: Value stored to 
'is_better_edge' is never read
is_better_edge = false;
^~

It seems to me a missing else branch.
Honza?

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

2019-06-24  Martin Liska  

* bb-reorder.c (connect_better_edge_p): Add missing else
statement in the middle of if-else statements.

---
 gcc/bb-reorder.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c
index c21d204627e..0ac39140c6c 100644
--- a/gcc/bb-reorder.c
+++ b/gcc/bb-reorder.c
@@ -1032,7 +1032,7 @@ connect_better_edge_p (const_edge e, bool src_index_p, int best_len,
   else if (e->count () < cur_best_edge->count ())
 	/* The edge has lower probability than the temporary best edge.  */
 	is_better_edge = false;
-  if (e->probability > cur_best_edge->probability)
+  else if (e->probability > cur_best_edge->probability)
 	/* The edge has higher probability than the temporary best edge.  */
 	is_better_edge = true;
   else if (e->probability < cur_best_edge->probability)



[PATCH] Do not call strlen with NULL argument in libgcov.

2019-06-24 Thread Martin Liška
Hi.

The patch is fixing an issue reported with clang-static-analyzer.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
I'm going to install the patch tomorrow if there are no objections.

Thanks,
Martin

libgcc/ChangeLog:

2019-06-24  Martin Liska  

* libgcov-driver-system.c (replace_filename_variables): Do not
call strlen with NULL argument.
---
 libgcc/libgcov-driver-system.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)


diff --git a/libgcc/libgcov-driver-system.c b/libgcc/libgcov-driver-system.c
index b5f3e89ebdc..39dc62749d5 100644
--- a/libgcc/libgcov-driver-system.c
+++ b/libgcc/libgcov-driver-system.c
@@ -186,13 +186,14 @@ replace_filename_variables (char *filename)
 	  /* Concat beginning of the path, replacement and
 	 ending of the path.  */
 	  unsigned end = length - (p - filename);
-	  unsigned repl_length = strlen (replacement);
+	  unsigned repl_length = replacement != NULL ? strlen (replacement) : 0;
 
 	  char *buffer = (char *)xmalloc (start + end + repl_length + 1);
 	  char *buffer_ptr = buffer;
 	  buffer_ptr = (char *)memcpy (buffer_ptr, filename, start);
 	  buffer_ptr += start;
-	  buffer_ptr = (char *)memcpy (buffer_ptr, replacement, repl_length);
+	  if (replacement != NULL)
+	buffer_ptr = (char *)memcpy (buffer_ptr, replacement, repl_length);
 	  buffer_ptr += repl_length;
 	  buffer_ptr = (char *)memcpy (buffer_ptr, p, end);
 	  buffer_ptr += end;



[PATCH] Fix PR90972

2019-06-24 Thread Richard Biener


The following fixes the vectorizer to properly deal with STRING_CSTs
now appearing more often.

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

Richard.

2019-06-24  Richard Biener  

PR tree-optimization/90972
* tree-vect-stmts.c (vect_init_vector): Handle CONSTANT_CLASS_P
in common code, dealing with STRING_CST properly.

* gcc.dg/torture/pr90972.c: New testcase.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 272545)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -1481,20 +1481,19 @@ vect_init_vector (stmt_vec_info stmt_inf
  val = new_temp;
}
}
- else if (CONSTANT_CLASS_P (val))
-   val = fold_convert (TREE_TYPE (type), val);
  else
{
- new_temp = make_ssa_name (TREE_TYPE (type));
+ gimple_seq stmts = NULL;
  if (! INTEGRAL_TYPE_P (TREE_TYPE (val)))
-   init_stmt = gimple_build_assign (new_temp,
-fold_build1 (VIEW_CONVERT_EXPR,
- TREE_TYPE (type),
- val));
+   val = gimple_build (&stmts, VIEW_CONVERT_EXPR,
+   TREE_TYPE (type), val);
  else
-   init_stmt = gimple_build_assign (new_temp, NOP_EXPR, val);
- vect_init_vector_1 (stmt_info, init_stmt, gsi);
- val = new_temp;
+   /* ???  Condition vectorization expects us to do
+  promotion of invariant/external defs.  */
+   val = gimple_convert (&stmts, TREE_TYPE (type), val);
+ for (gimple_stmt_iterator gsi2 = gsi_start (stmts);
+  !gsi_end_p (gsi2); gsi_next (&gsi2))
+   vect_init_vector_1 (stmt_info, gsi_stmt (gsi2), gsi);
}
}
   val = build_vector_from_val (type, val);
Index: gcc/testsuite/gcc.dg/torture/pr90972.c
===
--- gcc/testsuite/gcc.dg/torture/pr90972.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr90972.c  (working copy)
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-mcpu=power8" { target powerpc*-*-* } } */
+
+long f;
+void a();
+void *g()
+{
+  char h[] = {}, j[] = {}, k[] = {}, l[] = {}, m[] = {}, n[] = {}, o[] = {},
+   q[] = {}, r[] = {};
+  static const char i[] = {6, 0};
+  const char *nops[] = {h, i, j, k, l, m, n, o, q, r};
+  long s = 2;
+  void *fill = a;
+  char *p = fill;
+  while (f) {
+  void *b = p;
+  const void *c = nops[1];
+  long d = s, e = __builtin_object_size(b, 0);
+  __builtin___memcpy_chk(b, c, d, e);
+  p += s;
+  f -= s;
+  }
+  return fill;
+}


Re: [PATCH 0/3] RFC: Let debug stmts influence codegen at -Og

2019-06-24 Thread Richard Sandiford
Richard Biener  writes:
> On Sun, Jun 23, 2019 at 3:51 PM Richard Sandiford
>  wrote:
>> To get an idea of the runtime cost, I tried compiling tree-into-ssa.ii
>> at -O2 -g with various --enable-checking=yes builds of cc1plus:
>>
>> time taken
>> cc1plus compiled with -O0: 100.00%   (baseline)
>> cc1plus compiled with old -Og:  30.94%
>
> is this -Og -g or just -Og?  I suppose all numbers are with -g enabled?

Yeah, all with -g enabled.  But these were the options used to build
cc1plus, whereas the timings are for running it.  So -g shouldn't make
a difference.

>> cc1plus compiled with new -Og:  31.82%
>> cc1plus compiled with -O1g: 28.22%
>> cc1plus compiled with -O1:  26.72%
>> cc1plus compiled with -O2:  25.15%
>>
>> So there is a noticeable but small performance cost to the new mode.
>>
>> To get an idea of the compile-time impact, I tried compiling
>> tree-into-ssa.ii at various optimisation levels, all using the
>> same --enable-checking=release bootstrap build:
>>
>>   time taken
>> tree-into-ssa.ii with -O0 -g: 100.0%  (baseline)
>> tree-into-ssa.ii with old -Og -g: 180.6%
>> tree-into-ssa.ii with new -Og -g: 198.2%
>> tree-into-ssa.ii with -O1g -g:237.1%
>> tree-into-ssa.ii with -O1 -g: 211.8%
>> tree-into-ssa.ii with -O2 -g: 331.5%
>>
>> So there's definitely a bit of a compile-time hit.  I haven't yet looked
>> at how easy it would be to fix.
>>
>> What do you think?  Is it worth pursuing this further?
>>
>> Of course, even if we do do this, it's still important that the debug
>> info for things like -O2 -g is as good as it can be.  I just think some
>> of the open bugs against -Og fundamentally can't be fixed properly while
>> -Og remains a cut-down version of -O1.
>
> Thanks for doing this experiment.  I'm not sure extra complication
> is welcome (but didn't look into the patch(es) yet...).

OK.  The complication is in having (to think about) a new pair of
conditions that might be relevant: tangible/shadow instead of
nondebug/debug.  But almost all the changes involve replacing the
latter with the former rather than adding new checks.  Hopefully once
the whole sourcebase has been converted, the check would become second
nature.  (But patch 1 doesn't convert the whole sourcebase.)

> My original motivation for -Og -g was to provide -O0 -g compile-time
> at -O1 runtime with better debuggability than -O0 -g (mainly because
> that doesn't enable var-tracking).  Of course that failed completely ;)

We fall short on the compile-time side, but being able to get a reasonable
debugging experience with code that runs 3 times faster than -O0 is still
a pretty nice feature. :-)

> I somewhat like the idea that -ONg forces debug info generation
> (but not necessarily output without -g) and thus we can take debug-stmts
> into account.  I suppose you're simply keying on optimize_debug here.

I wondered about doing it that way, but in the end added a new internal
variable instead (flag_tangible_debug, but not backed by a real -f option).
optimize_debug mostly disables optimisations, and whether we want to do
that for -ONg felt like a separate question from whether we should take
debug stmts into account.

Thanks,
Richard


Re: [PATCH] Add .gnu.lto_.meta section.

2019-06-24 Thread Richard Biener
On Mon, Jun 24, 2019 at 2:12 PM Martin Liška  wrote:
>
> On 6/24/19 2:02 PM, Richard Biener wrote:
> > On Fri, Jun 21, 2019 at 4:01 PM Martin Liška  wrote:
> >>
> >> On 6/21/19 2:57 PM, Jan Hubicka wrote:
> >>> This looks like good step (and please stream it in host independent
> >>> way). I suppose all these issues can be done one-by-one.
> >>
> >> So there's a working patch for that. However one will see following errors
> >> when using an older compiler or older LTO bytecode:
> >>
> >> $ gcc main9.o -flto
> >> lto1: fatal error: bytecode stream in file ‘main9.o’ generated with LTO 
> >> version -25480.4493 instead of the expected 9.0
> >>
> >> $ gcc main.o
> >> lto1: internal compiler error: compressed stream: data error
> >
> > This is because of your change to bitfields or because with the old
> > scheme the header with the
> > version is compressed (is it?).
>
> Because currently also the header is compressed.

That was it, yeah :/  Stupid decisions in the past.

I guess we have to bite the bullet and do this kind of incompatible
change, accepting
the odd error message above.

> > I'd simply avoid any layout changes
> > in the version check range.
>
> Well, then we have to find out how to distinguish between compression 
> algorithms.
>
> >
> >> To be honest, I would prefer the new .gnu.lto_.meta section.
> >> Richi why is that so ugly?
> >
> > Because it's a change in the wrong direction and doesn't solve the
> > issue we already
> > have (cannot determine if a section is compressed or not).
>
> That's not true, the .gnu.lto_.meta section will be always uncompressed and 
> we can
> also backport changes to older compiler that can read it and print a proper 
> error
> message about LTO bytecode version mismatch.

We can always backport changes, yes, but I don't see why we have to.

> > ELF section overhead
> > is quite big if you have lots of small functions.
>
> My patch is actually shrinking space as I'm suggesting to add _one_ extra ELF 
> section
> and remove the section header from all other LTO sections. That will save 
> space
> for all function sections.

But we want the header there to at least say if the section is
compressed or not.
The fact that we have so many ELF section means we have the redundant version
info everywhere.

We should have a single .gnu.lto_ section (and also get rid of those
__gnu_lto_v1 and __gnu_lto_slim COMMON symbols - checking for
existence of a symbol is more expensive compared to existence
of a section).

Richard.

> Martin
>
> >
> > Richard.
> >
> >>
> >> Martin
>


Re: [PATCH][RFC] Sanitize equals and hash functions in hash-tables.

2019-06-24 Thread Richard Biener
On Mon, Jun 24, 2019 at 1:08 AM Ian Lance Taylor  wrote:
>
> On Fri, Jun 7, 2019 at 5:04 AM Martin Liška  wrote:
> >
> > On 6/7/19 10:57 AM, Richard Biener wrote:
> > > On Mon, Jun 3, 2019 at 3:35 PM Martin Liška  wrote:
> > >>
> > >> On 6/1/19 12:06 AM, Jeff Law wrote:
> > >>> On 5/22/19 3:13 AM, Martin Liška wrote:
> >  On 5/21/19 1:51 PM, Richard Biener wrote:
> > > On Tue, May 21, 2019 at 1:02 PM Martin Liška  wrote:
> > >>
> > >> On 5/21/19 11:38 AM, Richard Biener wrote:
> > >>> On Tue, May 21, 2019 at 12:07 AM Jeff Law  wrote:
> > 
> >  On 5/13/19 1:41 AM, Martin Liška wrote:
> > > On 11/8/18 9:56 AM, Martin Liška wrote:
> > >> On 11/7/18 11:23 PM, Jeff Law wrote:
> > >>> On 10/30/18 6:28 AM, Martin Liška wrote:
> >  On 10/30/18 11:03 AM, Jakub Jelinek wrote:
> > > On Mon, Oct 29, 2018 at 04:14:21PM +0100, Martin Liška wrote:
> > >> +hashtab_chk_error ()
> > >> +{
> > >> +  fprintf (stderr, "hash table checking failed: "
> > >> +   "equal operator returns true for a pair "
> > >> +   "of values with a different hash value");
> > > BTW, either use internal_error here, or at least if using 
> > > fprintf
> > > terminate with \n, in your recent mail I saw:
> > > ...different hash valueduring RTL pass: vartrack
> > > ^^
> >  Sure, fixed in attached patch.
> > 
> >  Martin
> > 
> > >> +  gcc_unreachable ();
> > >> +}
> > >   Jakub
> > >
> >  0001-Sanitize-equals-and-hash-functions-in-hash-tables.patch
> > 
> >  From 0d9c979c845580a98767b83c099053d36eb49bb9 Mon Sep 17 
> >  00:00:00 2001
> >  From: marxin 
> >  Date: Mon, 29 Oct 2018 09:38:21 +0100
> >  Subject: [PATCH] Sanitize equals and hash functions in 
> >  hash-tables.
> > 
> >  ---
> >   gcc/hash-table.h | 40 +++-
> >   1 file changed, 39 insertions(+), 1 deletion(-)
> > 
> >  diff --git a/gcc/hash-table.h b/gcc/hash-table.h
> >  index bd83345c7b8..694eedfc4be 100644
> >  --- a/gcc/hash-table.h
> >  +++ b/gcc/hash-table.h
> >  @@ -503,6 +503,7 @@ private:
> > 
> > value_type *alloc_entries (size_t n CXX_MEM_STAT_INFO) 
> >  const;
> > value_type *find_empty_slot_for_expand (hashval_t);
> >  +  void verify (const compare_type &comparable, hashval_t 
> >  hash);
> > bool too_empty_p (unsigned int);
> > void expand ();
> > static bool is_deleted (value_type &v)
> >  @@ -882,8 +883,12 @@ hash_table
> > if (insert == INSERT && m_size * 3 <= m_n_elements * 4)
> >   expand ();
> > 
> >  -  m_searches++;
> >  +#if ENABLE_EXTRA_CHECKING
> >  +if (insert == INSERT)
> >  +  verify (comparable, hash);
> >  +#endif
> > 
> >  +  m_searches++;
> > value_type *first_deleted_slot = NULL;
> > hashval_t index = hash_table_mod1 (hash, 
> >  m_size_prime_index);
> > hashval_t hash2 = hash_table_mod2 (hash, 
> >  m_size_prime_index);
> >  @@ -930,6 +935,39 @@ hash_table
> > return &m_entries[index];
> >   }
> > 
> >  +#if ENABLE_EXTRA_CHECKING
> >  +
> >  +/* Report a hash table checking error.  */
> >  +
> >  +ATTRIBUTE_NORETURN ATTRIBUTE_COLD
> >  +static void
> >  +hashtab_chk_error ()
> >  +{
> >  +  fprintf (stderr, "hash table checking failed: "
> >  + "equal operator returns true for a pair "
> >  + "of values with a different hash value\n");
> >  +  gcc_unreachable ();
> >  +}
> > >>> I think an internal_error here is probably still better than a 
> > >>> simple
> > >>> fprintf, even if the fprintf is terminated with a \n :-)
> > >> Fully agree with that, but I see a lot of build errors when 
> > >> using internal_error.
> > >>
> > >>> The question then becomes can we bootstrap with this stuff 
> > >>> enabled and
> > >>> if not, are we likely to soon?  It'd be a shame to put it into
> > >>> EXTRA_CHECKING, but then not be able to really use 
> > >>> EXTRA_CHECKING
> > >>> because we've got too many bugs to fix.
> > >> Unfort

Re: [PATCH 0/3] RFC: Let debug stmts influence codegen at -Og

2019-06-24 Thread Richard Biener
On Sun, Jun 23, 2019 at 3:51 PM Richard Sandiford
 wrote:
>
> -Og is documented as:
>
>   @option{-Og} should be the optimization
>   level of choice for the standard edit-compile-debug cycle, offering
>   a reasonable level of optimization while maintaining fast compilation
>   and a good debugging experience.  It is a better choice than @option{-O0}
>   for producing debuggable code because some compiler passes
>   that collect debug information are disabled at @option{-O0}.
>
> One of the things hampering that is that, as for the "normal" -O* flags,
> the code produced by -Og -g must be the same as the code produced
> without any debug IL at all.  There are many cases in which that makes
> it impossible to stop useful values from being optimised out of the
> debug info, either because the value simply isn't available at runtime
> at the point that the debugger needs it, or because of limitations in
> the debug representation.  (E.g. pointers to external data are dropped
> from debug info because the relocations wouldn't be honoured.)
>
> I think it would be better to flip things around so that the debug IL
> is always present when optimising at -Og, and then allow the debug IL
> to influence codegen at -Og.  This still honours the -fcompare-debug
> principle, and the compile speed of -Og without -g doesn't seem very
> important.

Just to note it's also a common user misconception that -Og enables
debug-info ...

> This series therefore adds a mode in which debug stmts and debug insns
> are present even without -g and are explicitly allowed to affect codegen.
> In particular, when this mode is active:
>
> - uses in debug binds become first-class uses, acting like uses in
>   executable code
>
> - the use of DEBUG_EXPR_DECLs is banned.  If we want to refer to
>   a temporary value in debug binds, we need to calculate the value
>   with executable code instead
>
> This needs a new term to distinguish stmts/insns that affect codegen
> from those that don't.  I couldn't think of one that I was really
> happy with, but possibilities included:
>
> tangible/shadow
> manifest/hidden
> foreground/background
> reactive/inert
> active/dormant   (but "active insn" already means something else)
> peppy/sullen
>
> The series uses tangible/shadow.  There's a new global flag_tangible_debug
> that controls whether debug insns are "tangible" insns (for the new mode)
> or "shadow" insns (for normal optimisation).  -Og enables the new mode
> while the other optimisation levels leave it off.  (Despite the name,
> the new variable is just an internal flag, there's no -ftangible-debug
> option.)
>
> The first patch adds the infrastructure but doesn't improve the debug
> experience much on its own.
>
> As an example of one thing we can do with the new mode, the second patch
> ensures that the gimple IL has debug info for each is_gimple_reg variable
> throughout the variable's lifetime.  This fixes a couple of the PRs in
> the -Og meta-bug and from spot-testing seems to ensure that far fewer
> values are optimised out.
>
> Also, the new mode is mostly orthogonal to the optimisation level
> (although it would in effect disable optimisations like loop
> vectorisation, until we have a way of representing debug info for
> vectorised loops).  The third patch therefore adds an -O1g option
> that optimises more heavily than -Og but provides a better debug
> experience than -O1.
>
> I think -O2g would make sense too, and would be a viable option
> for people who want to deploy relatively heavily optimised binaries
> without compromising the debug experience too much.
>
> Other possible follow-ons for the new mode include:
>
> - Make sure that tangible debug stmts never read memory or take
>   an address.  (This is so that addressability and vops depend
>   only on non-debug insns.)
>
> - Fall back on expanding real code if expand_debug_expr fails.
>
> - Force debug insns to be simple enough for dwarf2out (e.g. no external
>   or TLS symbols).  This could be done by having a validation step for
>   debug insns, like we already do for normal insns.
>
> - Prevent the removal of dead stores if it would lead to wrong debug info.
>   (Maybe under control of an option?)
>
> To get an idea of the runtime cost, I tried compiling tree-into-ssa.ii
> at -O2 -g with various --enable-checking=yes builds of cc1plus:
>
> time taken
> cc1plus compiled with -O0: 100.00%   (baseline)
> cc1plus compiled with old -Og:  30.94%

is this -Og -g or just -Og?  I suppose all numbers are with -g enabled?

> cc1plus compiled with new -Og:  31.82%
> cc1plus compiled with -O1g: 28.22%
> cc1plus compiled with -O1:  26.72%
> cc1plus compiled with -O2:  25.15%
>
> So there is a noticeable but small performance cost to the new mode.
>
> To get an idea of the compile-time impact, I tried compiling
> tree-into-ssa.ii at various optimisation levels, all using the
> same --enable-checking=release bootstrap b

Re: [PATCH] Add .gnu.lto_.meta section.

2019-06-24 Thread Martin Liška
On 6/24/19 2:02 PM, Richard Biener wrote:
> On Fri, Jun 21, 2019 at 4:01 PM Martin Liška  wrote:
>>
>> On 6/21/19 2:57 PM, Jan Hubicka wrote:
>>> This looks like good step (and please stream it in host independent
>>> way). I suppose all these issues can be done one-by-one.
>>
>> So there's a working patch for that. However one will see following errors
>> when using an older compiler or older LTO bytecode:
>>
>> $ gcc main9.o -flto
>> lto1: fatal error: bytecode stream in file ‘main9.o’ generated with LTO 
>> version -25480.4493 instead of the expected 9.0
>>
>> $ gcc main.o
>> lto1: internal compiler error: compressed stream: data error
> 
> This is because of your change to bitfields or because with the old
> scheme the header with the
> version is compressed (is it?).

Because currently also the header is compressed.

> I'd simply avoid any layout changes
> in the version check range.

Well, then we have to find out how to distinguish between compression 
algorithms.

> 
>> To be honest, I would prefer the new .gnu.lto_.meta section.
>> Richi why is that so ugly?
> 
> Because it's a change in the wrong direction and doesn't solve the
> issue we already
> have (cannot determine if a section is compressed or not).

That's not true, the .gnu.lto_.meta section will be always uncompressed and we 
can
also backport changes to older compiler that can read it and print a proper 
error
message about LTO bytecode version mismatch. 

> ELF section overhead
> is quite big if you have lots of small functions.

My patch is actually shrinking space as I'm suggesting to add _one_ extra ELF 
section
and remove the section header from all other LTO sections. That will save space
for all function sections.

Martin

> 
> Richard.
> 
>>
>> Martin



Re: [PATCH] Define midpoint and lerp functions for C++20 (P0811R3)

2019-06-24 Thread Jonathan Wakely

On 12/03/19 23:04 +, Jonathan Wakely wrote:

On 12/03/19 22:49 +, Joseph Myers wrote:

On Tue, 5 Mar 2019, Jonathan Wakely wrote:


The midpoint and lerp functions for floating point types come straight
from the P0811R3 proposal, with no attempt at optimization.


I don't know whether P0811R3 states different requirements from the public
P0811R2, but the implementation of midpoint using isnormal does *not*
satisfy "at most one inexact operation occurs" and is *not* correctly
rounded, contrary to the claims made in P0811R2.


I did wonder how the implementation in the paper was meant to meet the
stated requirements, but I didn't wonder too hard.


Consider e.g. midpoint(DBL_MIN + DBL_TRUE_MIN, DBL_MIN + DBL_TRUE_MIN).
The value DBL_MIN + DBL_TRUE_MIN is normal, but dividing it by 2 is
inexact (and so that midpoint implementation would produce DBL_MIN as
result, so failing to satisfy midpoint(x, x) == x).

Replacing isnormal(x) by something like isgreaterequal(fabs(x), MIN*2)
would avoid those inexact divisions, but there would still be spurious
overflows in non-default rounding modes for e.g. midpoint(DBL_MAX,
DBL_TRUE_MIN) in FE_UPWARD mode, so failing "No overflow occurs" if that's
meant to apply in all rounding modes.


Thanks for this review, and the useful cases to test. Ed is working on
adding some more tests, so maybe he can also look at improving the
code :-)


I've committed r272616 to make this case work. This is the proposal
author's most recent suggestion for the implementation.

Tested x86_64-linux, committed to trunk.


commit e693fb3d93cfe938d700512e8bfe70e0a5c0dd8a
Author: redi 
Date:   Mon Jun 24 12:09:51 2019 +

Fix std::midpoint for denormal values

* include/std/numeric (midpoint(T, T)): Change implementation for
floating-point types to avoid incorrect rounding of denormals.
* testsuite/26_numerics/midpoint/floating.cc: Add check for correct
rounding with denormals.
* testsuite/26_numerics/gcd/gcd_neg.cc: Adjust dg-error line numbers.
* testsuite/26_numerics/lcm/lcm_neg.cc: Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@272616 138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libstdc++-v3/include/std/numeric b/libstdc++-v3/include/std/numeric
index fc2242f3de6..af684469769 100644
--- a/libstdc++-v3/include/std/numeric
+++ b/libstdc++-v3/include/std/numeric
@@ -69,9 +69,8 @@
  * @defgroup numerics Numerics
  *
  * Components for performing numeric operations. Includes support for
- * for complex number types, random number generation, numeric
- * (n-at-a-time) arrays, generalized numeric algorithms, and special
- * math functions.
+ * complex number types, random number generation, numeric (n-at-a-time)
+ * arrays, generalized numeric algorithms, and mathematical special functions.
  */
 
 #if __cplusplus >= 201402L
@@ -156,11 +155,22 @@ namespace __detail
 
 #endif // C++17
 
+_GLIBCXX_END_NAMESPACE_VERSION
+} // namespace std
+
+#endif // C++14
+
 #if __cplusplus > 201703L
+#include 
+#include 
+
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
   // midpoint
 # define __cpp_lib_interpolate 201902L
 
-template
+  template
 constexpr
 enable_if_t<__and_v, is_same, _Tp>,
 			__not_>>,
@@ -182,11 +192,17 @@ template
 	}
 	  return __a + __k * _Tp(_Up(__M - __m) / 2);
 	}
-  else
+  else // is_floating
 	{
-	  return __builtin_isnormal(__a) && __builtin_isnormal(__b)
-	? __a / 2 + __b / 2
-	: (__a + __b) / 2;
+	  constexpr _Tp __lo = numeric_limits<_Tp>::min() * 2;
+	  constexpr _Tp __hi = numeric_limits<_Tp>::max() / 2;
+	  if (std::abs(__a) <= __hi && std::abs(__b) <= __hi) [[likely]]
+	return (__a + __b) / 2; // always correctly rounded
+	  if (std::abs(__a) < __lo) // not safe to halve __a
+	return __a + __b/2;
+	  if (std::abs(__b) < __lo) // not safe to halve __b
+	return __a/2 + __b;
+	  return __a/2 + __b/2;	// otherwise correctly rounded
 	}
 }
 
@@ -197,12 +213,10 @@ template
 {
   return __a  + (__b - __a) / 2;
 }
-#endif // C++20
-
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace std
 
-#endif // C++14
+#endif // C++20
 
 #if __cplusplus > 201402L
 #include 
diff --git a/libstdc++-v3/testsuite/26_numerics/gcd/gcd_neg.cc b/libstdc++-v3/testsuite/26_numerics/gcd/gcd_neg.cc
index 87a74988fa4..05e21431313 100644
--- a/libstdc++-v3/testsuite/26_numerics/gcd/gcd_neg.cc
+++ b/libstdc++-v3/testsuite/26_numerics/gcd/gcd_neg.cc
@@ -46,9 +46,9 @@ test01()
   std::gcd(0.1, 0.1);   // { dg-error "from here" }
 }
 
+// { dg-error "integers" "" { target *-*-* } 133 }
 // { dg-error "integers" "" { target *-*-* } 134 }
-// { dg-error "integers" "" { target *-*-* } 135 }
-// { dg-error "not bools" "" { target *-*-* } 136 }
-// { dg-error "not bools" "" { target *-*-* } 138 }
+// { dg-error "not bools" "" { target *-*-* } 135 }
+// { dg-error "not bools" "" { target *-*-* } 137 }
 // { dg-prune-

Re: [PATCH] Add .gnu.lto_.meta section.

2019-06-24 Thread Richard Biener
On Fri, Jun 21, 2019 at 4:01 PM Martin Liška  wrote:
>
> On 6/21/19 2:57 PM, Jan Hubicka wrote:
> > This looks like good step (and please stream it in host independent
> > way). I suppose all these issues can be done one-by-one.
>
> So there's a working patch for that. However one will see following errors
> when using an older compiler or older LTO bytecode:
>
> $ gcc main9.o -flto
> lto1: fatal error: bytecode stream in file ‘main9.o’ generated with LTO 
> version -25480.4493 instead of the expected 9.0
>
> $ gcc main.o
> lto1: internal compiler error: compressed stream: data error

This is because of your change to bitfields or because with the old
scheme the header with the
version is compressed (is it?).  I'd simply avoid any layout changes
in the version check range.

> To be honest, I would prefer the new .gnu.lto_.meta section.
> Richi why is that so ugly?

Because it's a change in the wrong direction and doesn't solve the
issue we already
have (cannot determine if a section is compressed or not).  ELF section overhead
is quite big if you have lots of small functions.

Richard.

>
> Martin


Re: Start implementing -frounding-math

2019-06-24 Thread Richard Biener
On Sun, Jun 23, 2019 at 12:22 AM Marc Glisse  wrote:
>
> On Sat, 22 Jun 2019, Richard Biener wrote:
>
> > On June 22, 2019 6:10:15 PM GMT+02:00, Marc Glisse  
> > wrote:
> >> Hello,
> >>
> >> as discussed in the PR, this seems like a simple enough approach to
> >> handle
> >> FENV functionality safely, while keeping it possible to implement
> >> optimizations in the future.
> >>
> >> Some key missing things:
> >> - handle C, not just C++ (I don't care, but some people probably do)
> >
> > As you tackle C++, what does the standard say to constexpr contexts and
> > FENV? That is, what's the FP environment at compiler - time (I suppose
> > FENV modifying functions are not constexpr declared).
>
> The C++ standard doesn't care much about fenv:
>
> [Note: This document does not require an implementation to support the
> FENV_ACCESS pragma; it is implementation-defined (15.8) whether the pragma
> is supported. As a consequence, it is implementation- defined whether
> these functions can be used to test floating-point status flags, set
> floating-point control modes, or run under non-default mode settings. If
> the pragma is used to enable control over the floating-point environment,
> this document does not specify the effect on floating-point evaluation in
> constant expressions. — end note]

Oh, I see.

> We should care about the C standard, and do whatever makes sense for C++
> without expecting the C++ standard to tell us exactly what that is. We can
> check what visual studio and intel do, but we don't have to follow them.

This makes it somewhat odd to implement this for C++ first and not C, but hey ;)

> -frounding-math is supposed to be equivalent to "#pragma stdc fenv_access
> on" covering the whole program.
>
> For constant expressions, I see a difference between
> constexpr double third = 1. / 3.;
> which really needs to be done at compile time, and
> const double third = 1. / 3.;
> which will try to evaluate the rhs as constexpr, but where the program is
> still valid if that fails. The second one clearly should refuse to be
> evaluated at compile time if we are specifying a dynamic rounding
> direction. For the first one, I am not sure. I guess you should only write
> that in "fenv_access off" regions and I wouldn't mind a compile error.
>
> Note that C2x adds a pragma fenv_round that specifies a rounding direction
> for a region of code, which seems relevant for constant expressions. That
> pragma looks hard, but maybe some pieces would be nice to add.

Hmm.  My thinking was along the line that at the start of main() the
C abstract machine might specify the initial rounding mode (and exception
state) is implementation defined and all constant expressions are evaluated
whilst being in this state.  So we can define that to round-to-nearest and
simply fold all constants in contexts we are allowed to evaluate at
compile-time as we see them?

I guess fenv_round aims at using a pragma to change the rounding mode?

> >> - handle vectors (for complex, I don't know what it means)
> >>
> >> Then flag_trapping_math should also enable this path, meaning that we
> >> should stop making it the default, or performance will suffer.
> >
> > Do we need N variants of the functions to really encode FP options into
> > the IL and thus allow inlining of say different signed-zero flag
> > functions?
>
> Not sure what you are suggesting. I am essentially creating a new
> tree_code (well, an internal function) for an addition-like function that
> actually reads/writes memory, so it should be orthogonal to inlining, and
> only the front-end should care about -frounding-math. I didn't think about
> the interaction with signed-zero. Ah, you mean
> IFN_FENV_ADD_WITH_ROUNDING_AND_SIGNED_ZEROS, etc?

Yeah.  Basically the goal is to have the IL fully defined on its own, without
having its semantic depend on flag_*.

> The ones I am starting
> from are supposed to be safe-for-everything. As refinement, I was thinking
> in 2 directions:
> * add a third constant argument, where we can specify extra info
> * add a variant for the case where the function is pure (because I expect
> that's easier on the compiler than "pure if (arg3 & 8) != 0")
> I am not sure more variants are needed.

For optimization having a ADD_ROUND_TO_ZERO (or the extra params
specifying an explicit rounding mode) might be interesting since on x86
there are now instructions with rounding mode control bits.

> Also, while rounding clearly applies to an operation, signed-zero kind of
> seems to apply to a variable, and in an operation, I don't really know if
> it means that I can pretend that an argument of -0. is +0. (I can return
> +inf for 1/-0.) or if it means I can return 0. when the operation should
> return -0.. Probably both... If we have just -fsigned-zeros but no
> rounding or trapping, the penalty of using an IFN would be bad. But indeed
> inlining functions with different -f(no-)signed-zeros forces to use
> -fsigned-zeros for the whole merged function i

Re: [committed] Add OpenMP 5 exclusive scan support for simd constructs

2019-06-24 Thread Christophe Lyon
On Fri, 21 Jun 2019 at 08:57, Jakub Jelinek  wrote:
>
> Hi!
>
> The following patch adds exclusive scan support for simd, it is similar to
> the inclusive scan, just we need to swap the input and scan phases and
> use slightly different pattern at the start of the scan phase, so that it
> computes what we need.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.
>
> 2019-06-21  Jakub Jelinek  
>
> * omp-low.c (lower_rec_simd_input_clauses): Add rvar2 argument,
> create another "omp scan inscan exclusive" array if
> !ctx->scan_inclusive.
> (lower_rec_input_clauses): Handle exclusive scan inscan reductions.
> (lower_omp_scan): Likewise.
> * tree-vectorizer.h (struct _stmt_vec_info): Use 3-bit instead of
> 2-bit bitfield for simd_lane_access_p member.
> * tree-vect-data-refs.c (vect_analyze_data_refs): Also handle
> aux == (void *)-4 as simd lane access.
> * tree-vect-stmts.c (check_scan_store): Handle exclusive scan.  Update
> comment with permutations to show the canonical permutation order.
> (vectorizable_scan_store): Handle exclusive scan.
> (vectorizable_store): Call vectorizable_scan_store even for
> STMT_VINFO_SIMD_LANE_ACCESS_P > 3.
>
> * gcc.dg/vect/vect-simd-12.c: New test.
> * gcc.dg/vect/vect-simd-13.c: New test.
> * gcc.dg/vect/vect-simd-14.c: New test.
> * gcc.dg/vect/vect-simd-15.c: New test.
> * gcc.target/i386/sse2-vect-simd-12.c: New test.
> * gcc.target/i386/sse2-vect-simd-13.c: New test.
> * gcc.target/i386/sse2-vect-simd-14.c: New test.
> * gcc.target/i386/sse2-vect-simd-15.c: New test.
> * gcc.target/i386/avx2-vect-simd-12.c: New test.
> * gcc.target/i386/avx2-vect-simd-13.c: New test.
> * gcc.target/i386/avx2-vect-simd-14.c: New test.
> * gcc.target/i386/avx2-vect-simd-15.c: New test.
> * gcc.target/i386/avx512f-vect-simd-12.c: New test.
> * gcc.target/i386/avx512f-vect-simd-13.c: New test.
> * gcc.target/i386/avx512f-vect-simd-14.c: New test.
> * gcc.target/i386/avx512bw-vect-simd-15.c: New test.
> * g++.dg/vect/simd-6.cc: New test.
> * g++.dg/vect/simd-7.cc: New test.
> * g++.dg/vect/simd-8.cc: New test.
> * g++.dg/vect/simd-9.cc: New test.
> * c-c++-common/gomp/scan-2.c: Don't expect any diagnostics.
>
> --- gcc/omp-low.c.jj2019-06-20 13:26:29.085150770 +0200
> +++ gcc/omp-low.c   2019-06-20 15:46:25.964253058 +0200
> @@ -3692,7 +3692,8 @@ struct omplow_simd_context {
>  static bool
>  lower_rec_simd_input_clauses (tree new_var, omp_context *ctx,
>   omplow_simd_context *sctx, tree &ivar,
> - tree &lvar, tree *rvar = NULL)
> + tree &lvar, tree *rvar = NULL,
> + tree *rvar2 = NULL)
>  {
>if (known_eq (sctx->max_vf, 0U))
>  {
> @@ -3767,6 +3768,25 @@ lower_rec_simd_input_clauses (tree new_v
>   *rvar = build4 (ARRAY_REF, TREE_TYPE (new_var), iavar,
>   sctx->lastlane, NULL_TREE, NULL_TREE);
>   TREE_THIS_NOTRAP (*rvar) = 1;
> +
> + if (!ctx->scan_inclusive)
> +   {
> + /* And for exclusive scan yet another one, which will
> +hold the value during the scan phase.  */
> + tree savar = create_tmp_var_raw (atype);
> + if (TREE_ADDRESSABLE (new_var))
> +   TREE_ADDRESSABLE (savar) = 1;
> + DECL_ATTRIBUTES (savar)
> +   = tree_cons (get_identifier ("omp simd array"), NULL,
> +tree_cons (get_identifier ("omp simd inscan "
> +   "exclusive"), NULL,
> +   DECL_ATTRIBUTES (savar)));
> + gimple_add_tmp_var (savar);
> + ctx->cb.decl_map->put (iavar, savar);
> + *rvar2 = build4 (ARRAY_REF, TREE_TYPE (new_var), savar,
> +  sctx->idx, NULL_TREE, NULL_TREE);
> + TREE_THIS_NOTRAP (*rvar2) = 1;
> +   }
> }
>ivar = build4 (ARRAY_REF, TREE_TYPE (new_var), iavar, sctx->idx,
>  NULL_TREE, NULL_TREE);
> @@ -5185,14 +5205,15 @@ lower_rec_input_clauses (tree clauses, g
>   new_vard = TREE_OPERAND (new_var, 0);
>   gcc_assert (DECL_P (new_vard));
> }
> - tree rvar = NULL_TREE, *rvarp = NULL;
> + tree rvar = NULL_TREE, *rvarp = NULL, rvar2 = NULL_TREE;
>   if (is_simd
>   && OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
>   && OMP_CLAUSE_REDUCTION_INSCAN (c))
> rvarp = &rvar;
>   if (is_simd
>   && lower_rec_simd_

Re: Use ODR for canonical types construction in LTO

2019-06-24 Thread Jan Hubicka
> On Mon, 24 Jun 2019, Jan Hubicka wrote:
> 
> > > > This simple (untested) patch doesn't avoid creating the unnecessary
> > > > as-base types, but it should avoid using them in a way that causes
> > > > them to be streamed, and should let them be discarded by GC.
> > > > Thoughts?
> > > 
> > > Looks better than Honzas patch fixing a single place.
> > 
> > I wonder if we can go ahead with Jason's patch to handle the common
> > case.
> 
> I hope so - Jason?
> 
> > > 
> > > I've spent some thoughts on this and I wonder whether we can
> > > re-implement classtype-as-base with fake inheritance (which would
> > > also solve the TBAA alias set issue in a natural way).  That is,
> > > we'd lay out structs as-base and make instances of it use a
> > > 
> > > class as-instance { as-base b; X pad1; Y pad2; };
> > > 
> > > with either explicit padding fields or with implicit ones
> > > (I didn't check how we trick stor-layout to not pad the as-base
> > > type to its natural alignment...).
> > > 
> > > I realize that this impacts all code building component-refs ontop
> > > of as-instance typed objects so this might rule out this approach
> > > completely - but maybe that's reasonably well abstracted into common
> > > code so only few places need adjustments.
> > 
> > Modulo the empty virtual bases which I have no understnading to I
> > suppose this should work.
> > 
> > One issue is that we will need to introduce view_convert_exprs at some
> > times.
> > 
> > As
> > 
> >   class a var;
> >   class b:a {} *bptr;
> > 
> >   var.foo;
> > 
> > Expanding this as var.as_base_a.foo would make access path oracle to
> > disambiguate it from bptr->as_base_b->as_base_a.foo which is wrong with
> > gimple memory moel where we allow placement new replacing var by
> > instance of b.
> 
> Ick.  IIRC the as-base types were necessary only for
> copying and clobber operations that may not touch the possibly
> re-used tail-padding.  The question is whether we cannot invent
> a better mechanism to do this -- IIRC we used memmove for the
> former case at some point (that's arguably worse, also for TBAA).
> There's WITH_SIZE_EXPR which we only handle rudimentary in the
> middle-end though.

Yep, only place where as-base type surface to middle-end are clobbers.
call.c does:

{
  /* We must only copy the non-tail padding parts.  */
  tree arg0, arg2, t;
  tree array_type, alias_set;

  arg2 = TYPE_SIZE_UNIT (as_base);
  arg0 = cp_build_addr_expr (to, complain);

  array_type = build_array_type (unsigned_char_type_node,
 build_index_type
   (size_binop (MINUS_EXPR,
arg2, size_int (1;
  alias_set = build_int_cst (build_pointer_type (type), 0);
  t = build2 (MODIFY_EXPR, void_type_node,
  build2 (MEM_REF, array_type, arg0, alias_set),
  build2 (MEM_REF, array_type, arg, alias_set));
  val = build2 (COMPOUND_EXPR, TREE_TYPE (to), t, to);
  TREE_NO_WARNING (val) = 1;
}

I have noticed this earlier too since it disables access path oracle
(TREE_TYPE (mem_ref) is char array, while ptrtype is the original type)
and thus is pretty bad, too. At least it tests if types are different
and therefore this path is relatively rare.

We should solve both these places indeed.  I think call.c could also use
as-base type like clobber codegen if LTO TBAA was fixed on those.

I agree that having solution that does not complicate middle-end would
be nice, but if we want to get as-base types right the way they are now
we need to expose IS_FAKE_BASE_TYPE falg to the middle-end (perhaps with
better name).  Then we can make canonical_type_used_p return false for
those and make get_alias_set and same_type_for_tbaa_p to use
TYPE_CONTEXT (type) in this case.
This ought to get things work quite smoothly.
Since we do not do component refs on those, we should not get into
problem with nonoverlapping_component_refs not considering the types the
same.

Honza


Re: [SVE] [fwprop] PR88833 - Redundant moves for WHILELO-based loops

2019-06-24 Thread Prathamesh Kulkarni
On Mon, 24 Jun 2019 at 14:59, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > diff --git a/gcc/fwprop.c b/gcc/fwprop.c
> > index 45703fe5f01..93a1a10c9a6 100644
> > --- a/gcc/fwprop.c
> > +++ b/gcc/fwprop.c
> > @@ -547,6 +547,54 @@ propagate_rtx_1 (rtx *px, rtx old_rtx, rtx new_rtx, 
> > int flags)
> > tem = simplify_gen_subreg (mode, op0, GET_MODE (SUBREG_REG (x)),
> >SUBREG_BYTE (x));
> >   }
> > +
> > +  else
> > + {
> > +   rtvec vec;
> > +   rtvec newvec;
> > +   const char *fmt = GET_RTX_FORMAT (code);
> > +   rtx op;
> > +
> > +   for (int i = 0; fmt[i]; i++)
> > + switch (fmt[i])
> > +   {
> > +   case 'E':
> > + vec = XVEC (x, i);
> > + newvec = vec;
> > + for (int j = 0; j < GET_NUM_ELEM (vec); j++)
> > +   {
> > + op = RTVEC_ELT (vec, j);
> > + valid_ops &= propagate_rtx_1 (&op, old_rtx, new_rtx, 
> > flags);
> > + if (op != RTVEC_ELT (vec, j))
> > +   {
> > + if (newvec == vec)
> > +   {
> > + newvec = shallow_copy_rtvec (vec);
> > + if (!tem)
> > +   tem = shallow_copy_rtx (x);
> > + XVEC (tem, i) = newvec;
> > +   }
> > + RTVEC_ELT (newvec, j) = op;
> > +   }
> > +   }
> > +   break;
>
> Misindented break: should be indented two columns further.
>
> > +
> > +   case 'e':
> > + if (XEXP (x, i))
> > +   {
> > + op = XEXP (x, i);
> > + valid_ops &= propagate_rtx_1 (&op, old_rtx, new_rtx, 
> > flags);
> > + if (op != XEXP (x, i))
> > +   {
> > + if (!tem)
> > +   tem = shallow_copy_rtx (x);
> > + XEXP (tem, i) = op;
> > +   }
> > +   }
> > +   break;
>
> Same here.
>
> > +   }
> > + }
> > +
> >break;
> >
> >  case RTX_OBJ:
> > @@ -1370,10 +1418,11 @@ forward_propagate_and_simplify (df_ref use, 
> > rtx_insn *def_insn, rtx def_set)
> >
> >  /* Given a use USE of an insn, if it has a single reaching
> > definition, try to forward propagate it into that insn.
> > -   Return true if cfg cleanup will be needed.  */
> > +   Return true if cfg cleanup will be needed.
> > +   REG_PROP_ONLY is true if we should only propagate register copies.  */
> >
> >  static bool
> > -forward_propagate_into (df_ref use)
> > +forward_propagate_into (df_ref use, bool reg_prop_only = false)
> >  {
> >df_ref def;
> >rtx_insn *def_insn, *use_insn;
> > @@ -1394,10 +1443,6 @@ forward_propagate_into (df_ref use)
> >if (DF_REF_IS_ARTIFICIAL (def))
> >  return false;
> >
> > -  /* Do not propagate loop invariant definitions inside the loop.  */
> > -  if (DF_REF_BB (def)->loop_father != DF_REF_BB (use)->loop_father)
> > -return false;
> > -
> >/* Check if the use is still present in the insn!  */
> >use_insn = DF_REF_INSN (use);
> >if (DF_REF_FLAGS (use) & DF_REF_IN_NOTE)
> > @@ -1415,6 +1460,16 @@ forward_propagate_into (df_ref use)
> >if (!def_set)
> >  return false;
> >
> > +  if (reg_prop_only && !REG_P (SET_SRC (def_set)))
> > +return false;
> > +
> > +  /* Allow propagating def inside loop only if source of def_set is
> > + reg, since replacing reg by source reg shouldn't increase cost.  */
>
> Maybe:
>
>   /* Allow propagations into a loop only for reg-to-reg copies, since
>  replacing one register by another shouldn't increase the cost.  */
>
> > +
> > +  if (DF_REF_BB (def)->loop_father != DF_REF_BB (use)->loop_father
> > +  && !REG_P (SET_SRC (def_set)))
> > +return false;
>
> To be extra safe, I think we should check that the SET_DEST is a REG_P
> in both cases, to exclude REG-to-SUBREG copies.
Thanks for the suggestions.
Does the attached version look OK ?

Thanks,
Prathamesh
>
> Thanks,
> Richard
diff --git a/gcc/fwprop.c b/gcc/fwprop.c
index 45703fe5f01..c5abebb7832 100644
--- a/gcc/fwprop.c
+++ b/gcc/fwprop.c
@@ -547,6 +547,54 @@ propagate_rtx_1 (rtx *px, rtx old_rtx, rtx new_rtx, int flags)
 	  tem = simplify_gen_subreg (mode, op0, GET_MODE (SUBREG_REG (x)),
  SUBREG_BYTE (x));
 	}
+
+  else
+	{
+	  rtvec vec;
+	  rtvec newvec;
+	  const char *fmt = GET_RTX_FORMAT (code);
+	  rtx op;
+
+	  for (int i = 0; fmt[i]; i++)
+	switch (fmt[i])
+	  {
+	  case 'E':
+		vec = XVEC (x, i);
+		newvec = vec;
+		for (int j = 0; j < GET_NUM_ELEM (vec); j++)
+		  {
+		op = RTVEC_ELT (vec, j);
+		valid_ops &= propagate_rtx_1 (&op, old_rtx, new_rtx, flags);
+		if (op != RTVEC_ELT (vec, j))
+		  {
+			if (newvec == vec)
+			  {
+			newvec = shallow_copy_rtvec (vec);
+			if (!tem)

Re: Use ODR for canonical types construction in LTO

2019-06-24 Thread Richard Biener
On Mon, 24 Jun 2019, Jan Hubicka wrote:

> > > This simple (untested) patch doesn't avoid creating the unnecessary
> > > as-base types, but it should avoid using them in a way that causes
> > > them to be streamed, and should let them be discarded by GC.
> > > Thoughts?
> > 
> > Looks better than Honzas patch fixing a single place.
> 
> I wonder if we can go ahead with Jason's patch to handle the common
> case.

I hope so - Jason?

> > 
> > I've spent some thoughts on this and I wonder whether we can
> > re-implement classtype-as-base with fake inheritance (which would
> > also solve the TBAA alias set issue in a natural way).  That is,
> > we'd lay out structs as-base and make instances of it use a
> > 
> > class as-instance { as-base b; X pad1; Y pad2; };
> > 
> > with either explicit padding fields or with implicit ones
> > (I didn't check how we trick stor-layout to not pad the as-base
> > type to its natural alignment...).
> > 
> > I realize that this impacts all code building component-refs ontop
> > of as-instance typed objects so this might rule out this approach
> > completely - but maybe that's reasonably well abstracted into common
> > code so only few places need adjustments.
> 
> Modulo the empty virtual bases which I have no understnading to I
> suppose this should work.
> 
> One issue is that we will need to introduce view_convert_exprs at some
> times.
> 
> As
> 
>   class a var;
>   class b:a {} *bptr;
> 
>   var.foo;
> 
> Expanding this as var.as_base_a.foo would make access path oracle to
> disambiguate it from bptr->as_base_b->as_base_a.foo which is wrong with
> gimple memory moel where we allow placement new replacing var by
> instance of b.

Ick.  IIRC the as-base types were necessary only for
copying and clobber operations that may not touch the possibly
re-used tail-padding.  The question is whether we cannot invent
a better mechanism to do this -- IIRC we used memmove for the
former case at some point (that's arguably worse, also for TBAA).
There's WITH_SIZE_EXPR which we only handle rudimentary in the
middle-end though.

> So the way to generate it would be to first view convert expr *bptr to
> as_base_b and continue from that. This would probably force us to not
> give up at view converts in the access path disambiguation :)

Yeah.  No good solution either :/

Richard.


Re: Use ODR for canonical types construction in LTO

2019-06-24 Thread Jan Hubicka
> > This simple (untested) patch doesn't avoid creating the unnecessary
> > as-base types, but it should avoid using them in a way that causes
> > them to be streamed, and should let them be discarded by GC.
> > Thoughts?
> 
> Looks better than Honzas patch fixing a single place.

I wonder if we can go ahead with Jason's patch to handle the common
case.
> 
> I've spent some thoughts on this and I wonder whether we can
> re-implement classtype-as-base with fake inheritance (which would
> also solve the TBAA alias set issue in a natural way).  That is,
> we'd lay out structs as-base and make instances of it use a
> 
> class as-instance { as-base b; X pad1; Y pad2; };
> 
> with either explicit padding fields or with implicit ones
> (I didn't check how we trick stor-layout to not pad the as-base
> type to its natural alignment...).
> 
> I realize that this impacts all code building component-refs ontop
> of as-instance typed objects so this might rule out this approach
> completely - but maybe that's reasonably well abstracted into common
> code so only few places need adjustments.

Modulo the empty virtual bases which I have no understnading to I
suppose this should work.

One issue is that we will need to introduce view_convert_exprs at some
times.

As

  class a var;
  class b:a {} *bptr;

  var.foo;

Expanding this as var.as_base_a.foo would make access path oracle to
disambiguate it from bptr->as_base_b->as_base_a.foo which is wrong with
gimple memory moel where we allow placement new replacing var by
instance of b.

So the way to generate it would be to first view convert expr *bptr to
as_base_b and continue from that. This would probably force us to not
give up at view converts in the access path disambiguation :)

Honza


Re: Use ODR for canonical types construction in LTO

2019-06-24 Thread Jan Hubicka
> Hi,
> here is patch that adds TYPE_ODR_P to determine type that comply C++
> ODR rules (i.e. ODR types themselves or structures/unions derived
> from them).
> I have decided to use STRING_FLAG which have meaning only for integers
> and arrays which forced me to add type checks on places where
> we check STRING_FLAG on other types.
> 
> The patch also let me to verify that all types we consider to have
> linkage actually are created by C++ FE which turned out to not be the
> case for Ada which I fixed in needs_assembler_name_p.
> 
> Bootstrapped/regtested x86_64-linux, OK?
> 
>   * ipa-utils.h (type_with_linkage_p): Verify that type is
>   CXX_ODR_P.
>   (odr_type_p): Remove extra return.
>   * lto-streamer-out.c (hash_tree): Hash TYPE_CXX_ODR_P;
>   hash STRING_FLAG only for arrays and integers.
>   * tree-stremaer-in.c (unpack_ts_type_common_value_fields):
>   Update analogously.
>   * tree-streamer-out.c (pack_ts_type_common_value_fields):
>   Likewise.
>   * print-tree.c (print_node): Print cxx-odr-p
>   and string-flag.
>   * tree.c (need_assembler_name_p): Also check that type
>   is CXX_ODR_TYPE_P
>   (verify_type_variant): Update verification of SRING_FLAG;
>   also check CXX_ODR_P.
>   * tree.h (ARRAY_OR_INTEGER_TYPE_CHECK): New macro.
>   (TYPE_STRING_FLAG): Use it.
>   (TYPE_CXX_ODR_P): New macro.
> 
>   * lto-common.c (compare_tree_sccs_1): Compare CXX_ODR_P;
>   compare STRING_FLAG only for arrays and integers.
> 
>   * gcc-interface/decl.c (gnat_to_gnu_entity): Check that
>   type is array or integer prior checking string flag.
>   * gcc-interface/gigi.h (gnat_signed_type_for,
>   maybe_character_value): Likewise.
> 
>   * c-common.c (braced_lists_to_strings): Check that
>   type is array or integer prior checking string flag.
> 
>   * lex.c (cxx_make_type): Set TYPE_CXX_ODR_P.
> 
>   * dwarf2out.c (gen_array_type_die): First check that type
>   is an array and then test string flag.
> 
>   * trans-expr.c (gfc_conv_substring): Check that
>   type is array or integer prior checking string flag.
>   (gfc_conv_string_parameter): Likewise.
>   * trans-openmp.c (gfc_omp_scalar_p): Likewise.
>   * trans.c (gfc_build_array_ref): Likewise.

Hi,
I would like to ping the patch - if it makes sense updating the original
ODR patch should be easy.

Honza


[PATCH][MSP430] Implement alternate "__intN__" form of "__intN" type

2019-06-24 Thread Jozef Lawrynowicz
The MSP430 target in the large memory model uses the (non-ISO) __int20 type for
SIZE_TYPE and PTRDIFF_TYPE.
The preprocessor therefore expands a builtin such as __SIZE_TYPE__ to
"__int20 unsigned" in user code.
When compiling with the "-pedantic-errors" flag, the use of any of these
builtin macros results in an error of the form:

> tester.c:4:9: error: ISO C does not support '__int20' types [-Wpedantic]

Since -pendantic-errors is often passed as a default flag in the testsuite,
there are hundreds of false failures when testing with -mlarge, caused by this
ISO C error.

The attached patch implements a new builtin type, "__intN__". Apart from the
name of the type, it is identical and shares RIDs with the corresponding
"__intN".

This means the ISO C pedantic warnings can be disabled for __intN__ types,
but otherwise these types can be used in place of __intN without any other
changes to behaviour.

By replacing "__int20" with "__int20__" in the definition of SIZE_TYPE and
PTRDIFF_TYPE in msp430.h, the following builtin macros can now be used in a
program compiled with -pedantic-errors, without causing ISO C errors:
  __SIZE_TYPE__
  __INTPTR_TYPE__
  __UINTPTR_TYPE__
  __PTRDIFF_TYPE__

Successfully bootstrapped and regtested on x86_64-pc-linux-gnu.
Successfully regtested for msp430-elf. Additionally, this fixes many tests:
  332 FAIL->PASS
  52  UNTESTED->PASS
  29  FAIL->UNSUPPORTED (test previously failed to compile, now too big to link)

Ok for trunk?

There is a patch to Newlib's "_intsup.h" required to support __int20__ that I
will submit to that mailing list before applying this patch, if this patch is
accepted.
>From 61dfff1b6b3fcaa9f31341ee47623100505bf2e8 Mon Sep 17 00:00:00 2001
From: Jozef Lawrynowicz 
Date: Wed, 12 Jun 2019 10:40:00 +0100
Subject: [PATCH] Implement alternate "__intN__" form of "__intN" type

gcc/ChangeLog:

2019-06-18  Jozef Lawrynowicz  

	* gcc/c-family/c-common.c (c_common_nodes_and_builtins): Define
	alternate "__intN__" name for "__intN" types.
	* gcc/c/c-parser.c (c_parse_init): Create keyword for "__intN__" type.
	* gcc/cp/lex.c (init_reswords): Likewise.
	* gcc/config/msp430/msp430.h: Use __int20__ for SIZE_TYPE and
	PTRDIFF_TYPE.
	* gcc/cp/cp-tree.h (cp_decl_specifier_seq): New bitfield "int_n_alt".
	* gcc/c/c-decl.c (declspecs_add_type): Don't pedwarn about "__intN" ISO
	C incompatibility if alternate "__intN__" form is used.
	* gcc/cp/decl.c (grokdeclarator): Likewise.
	* gcc/cp/parser.c (cp_parser_simple_type_specifier): Set
	decl_specs->int_n_alt if "__intN__" form is used.
	* gcc/gimple-ssa-sprintf.c (build_intmax_type_nodes): Accept "__intN__"
	format of "__intN" types for UINTMAX_TYPE.
	* gcc/brig/brig-lang.c (brig_build_c_type_nodes): Accept "__intN__"
	format of "__intN" types for SIZE_TYPE.
	* gcc/lto/lto-lang.c (lto_build_c_type_nodes): Likewise.
	* gcc/stor-layout.c (initialize_sizetypes): Accept "__intN__"
	format of "__intN" types for SIZETYPE.
	* gcc/tree.c (build_common_tree_nodes): Accept "__intN__"
	format of "__intN" types for SIZE_TYPE and PTRDIFF_TYPE.
	* gcc/doc/invoke.texi: Document that __intN__ disables pedantic
	warnings.

gcc/testsuite/ChangeLog:

2019-06-18  Jozef Lawrynowicz  

	* gcc.target/msp430/mlarge-pedwarns.c: New test.
---
 gcc/brig/brig-lang.c  |  6 --
 gcc/c-family/c-common.c   |  6 ++
 gcc/c/c-decl.c|  6 +-
 gcc/c/c-parser.c  |  5 +
 gcc/config/msp430/msp430.h|  6 --
 gcc/cp/cp-tree.h  |  3 +++
 gcc/cp/decl.c |  6 +-
 gcc/cp/lex.c  |  5 +
 gcc/cp/parser.c   |  6 ++
 gcc/doc/invoke.texi   |  6 --
 gcc/gimple-ssa-sprintf.c  |  6 --
 gcc/lto/lto-lang.c|  6 --
 gcc/stor-layout.c |  6 --
 gcc/testsuite/gcc.target/msp430/mlarge-pedwarns.c | 11 +++
 gcc/tree.c| 13 +
 15 files changed, 79 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/msp430/mlarge-pedwarns.c

diff --git a/gcc/brig/brig-lang.c b/gcc/brig/brig-lang.c
index 91c7cfa35da..be853ccbc02 100644
--- a/gcc/brig/brig-lang.c
+++ b/gcc/brig/brig-lang.c
@@ -864,10 +864,12 @@ brig_build_c_type_nodes (void)
   for (i = 0; i < NUM_INT_N_ENTS; i++)
 	if (int_n_enabled_p[i])
 	  {
-	char name[50];
+	char name[25], altname[25];
 	sprintf (name, "__int%d unsigned", int_n_data[i].bitsize);
+	sprintf (altname, "__int%d__ unsigned", int_n_data[i].bitsize);
 
-	if (strcmp (name, SIZE_TYPE) == 0)
+	if (strcmp (name, SIZE_TYPE) == 0
+		|| strcmp (altname, SIZE_TYPE) == 0)
 	  {
 		intmax_type_node = int_n_trees[i].signed_type;
 		uintmax_type

Re: [SVE] [fwprop] PR88833 - Redundant moves for WHILELO-based loops

2019-06-24 Thread Richard Sandiford
Prathamesh Kulkarni  writes:
> diff --git a/gcc/fwprop.c b/gcc/fwprop.c
> index 45703fe5f01..93a1a10c9a6 100644
> --- a/gcc/fwprop.c
> +++ b/gcc/fwprop.c
> @@ -547,6 +547,54 @@ propagate_rtx_1 (rtx *px, rtx old_rtx, rtx new_rtx, int 
> flags)
> tem = simplify_gen_subreg (mode, op0, GET_MODE (SUBREG_REG (x)),
>SUBREG_BYTE (x));
>   }
> +
> +  else
> + {
> +   rtvec vec;
> +   rtvec newvec;
> +   const char *fmt = GET_RTX_FORMAT (code);
> +   rtx op;
> +
> +   for (int i = 0; fmt[i]; i++)
> + switch (fmt[i])
> +   {
> +   case 'E':
> + vec = XVEC (x, i);
> + newvec = vec;
> + for (int j = 0; j < GET_NUM_ELEM (vec); j++)
> +   {
> + op = RTVEC_ELT (vec, j);
> + valid_ops &= propagate_rtx_1 (&op, old_rtx, new_rtx, flags);
> + if (op != RTVEC_ELT (vec, j))
> +   {
> + if (newvec == vec)
> +   {
> + newvec = shallow_copy_rtvec (vec);
> + if (!tem)
> +   tem = shallow_copy_rtx (x);
> + XVEC (tem, i) = newvec;
> +   }
> + RTVEC_ELT (newvec, j) = op;
> +   }
> +   }
> +   break;

Misindented break: should be indented two columns further.

> +
> +   case 'e':
> + if (XEXP (x, i))
> +   {
> + op = XEXP (x, i);
> + valid_ops &= propagate_rtx_1 (&op, old_rtx, new_rtx, flags);
> + if (op != XEXP (x, i))
> +   {
> + if (!tem)
> +   tem = shallow_copy_rtx (x);
> + XEXP (tem, i) = op;
> +   }
> +   }
> +   break;

Same here.

> +   }
> + }
> +
>break;
>  
>  case RTX_OBJ:
> @@ -1370,10 +1418,11 @@ forward_propagate_and_simplify (df_ref use, rtx_insn 
> *def_insn, rtx def_set)
>  
>  /* Given a use USE of an insn, if it has a single reaching
> definition, try to forward propagate it into that insn.
> -   Return true if cfg cleanup will be needed.  */
> +   Return true if cfg cleanup will be needed.
> +   REG_PROP_ONLY is true if we should only propagate register copies.  */
>  
>  static bool
> -forward_propagate_into (df_ref use)
> +forward_propagate_into (df_ref use, bool reg_prop_only = false)
>  {
>df_ref def;
>rtx_insn *def_insn, *use_insn;
> @@ -1394,10 +1443,6 @@ forward_propagate_into (df_ref use)
>if (DF_REF_IS_ARTIFICIAL (def))
>  return false;
>  
> -  /* Do not propagate loop invariant definitions inside the loop.  */
> -  if (DF_REF_BB (def)->loop_father != DF_REF_BB (use)->loop_father)
> -return false;
> -
>/* Check if the use is still present in the insn!  */
>use_insn = DF_REF_INSN (use);
>if (DF_REF_FLAGS (use) & DF_REF_IN_NOTE)
> @@ -1415,6 +1460,16 @@ forward_propagate_into (df_ref use)
>if (!def_set)
>  return false;
>  
> +  if (reg_prop_only && !REG_P (SET_SRC (def_set)))
> +return false;
> +
> +  /* Allow propagating def inside loop only if source of def_set is
> + reg, since replacing reg by source reg shouldn't increase cost.  */

Maybe:

  /* Allow propagations into a loop only for reg-to-reg copies, since
 replacing one register by another shouldn't increase the cost.  */

> +
> +  if (DF_REF_BB (def)->loop_father != DF_REF_BB (use)->loop_father
> +  && !REG_P (SET_SRC (def_set)))
> +return false;

To be extra safe, I think we should check that the SET_DEST is a REG_P
in both cases, to exclude REG-to-SUBREG copies.

Thanks,
Richard


Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization

2019-06-24 Thread luoxhu




On 2019/6/24 10:34, luoxhu wrote:

Hi Honza,
Thanks very much to get so many useful comments from you.
As a newbie to GCC, not sure whether my questions are described clearly
enough.  Thanks for your patience in advance.  :)


On 2019/6/20 21:47, Jan Hubicka wrote:

Hi,
some comments on the ipa part of the patch
(and thanks for working on it - this was on my TODO list for years)


diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index de82316d4b1..0d373a67d1b 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -553,6 +553,7 @@ cgraph_node::get_create (tree decl)
  fprintf (dump_file, "Introduced new external node "
   "(%s) and turned into root of the clone tree.\n",
   node->dump_name ());
+  node->profile_id = first_clone->profile_id;
  }
    else if (dump_file)
  fprintf (dump_file, "Introduced new external node "


This is independent of the rest of changes.  Do you have example where
this matters? The inline clones are created in ipa-inline while
ipa-profile is run before it, so I can not think of such a scenario.
I see you also copy profile_id from function to clone.  I would like to
know why you needed that.

Also you mention that you hit some ICEs. If fixes are independent of
rest of your changes, send them separately.


I copy the profile_id for cloned node as when in LTO ltrans, there is no
references or referrings info for the specialized node/cloned node, so it 
is difficult to track the node's reference in 
cgraph_edge::speculative_call_info.  I use it mainly for debug purpose now.

Will remove it and split the patches in later version to include ICE fixes.



@@ -1110,6 +,7 @@ cgraph_edge::speculative_call_info (cgraph_edge 
*&direct,

    int i;
    cgraph_edge *e2;
    cgraph_edge *e = this;
+  cgraph_node *referred_node;
    if (!e->indirect_unknown_callee)
  for (e2 = e->caller->indirect_calls;
@@ -1142,8 +1144,20 @@ cgraph_edge::speculative_call_info (cgraph_edge 
*&direct,

  && ((ref->stmt && ref->stmt == e->call_stmt)
  || (!ref->stmt && ref->lto_stmt_uid == e->lto_stmt_uid)))
    {
-    reference = ref;
-    break;
+    if (e2->indirect_info && e2->indirect_info->num_of_ics)
+  {
+    referred_node = dyn_cast (ref->referred);
+    if (strstr (e->callee->name (), referred_node->name ()))
+  {
+    reference = ref;
+    break;
+  }
+  }
+    else
+  {
+    reference = ref;
+    break;
+  }
    }


This function is intended to return everything related to the
speculative call, so if you add multiple direct targets, i would expect
it to tage auto_vec of cgraph_nodes for direct and auto_vec of
references.


So will the signature becomes
cgraph_edge::speculative_call_info (auto_vec *direct,
     cgraph_edge *&indirect,
     auto_vec *reference)

Seems a lot of code related to it, maybe should split to another patch.
And will the sequence of direct and reference in each auto_vec be strictly
mapped for iteration convenience?
Second question is "this" is a direct edge will be pushed to auto_vec 
"direct", how can it get its next direct edge here?  From e->caller->callees?


There maybe some misunderstanding here.  The direct should be one edge
only, but reference could be multiple.

For example: two indirect edge on one single statement x = p(3);
the first speculative edge is main -> one;
the second speculative edge 2 is main -> two.
direct->call_stmt is: x_10 = p_3 (3);

call code in ipa-inline-transform.c:
for (e = node->callees; e; e = next)
  {
 next = e->next_callee;
 e->redirect_call_stmt_to_callee ();
  }

redirect_call_stmt_to_callee will call
e->speculative_call_info(e, e2, ref).

When e is “main -> one" being redirected, The returned auto_vec reference
length will be 2.
So the map should be 1:N instead of N:N.  (one direct edge will find N 
reference nodes, but only one of it is correct, need iterate to find it

out.)
e2 is the indirect call(e->caller->indirect_calls) can only be set to false
speculative if all indirect targets are redirected by "next=e->next_callee"
Or else, the next speculative edge couldn't finish the redirect as the e2
is not speculative again in next round iteration.
As a result, maybe still need similar logic to check the returned reference
length, only set "e2->speculative = false;" when the length is 1.  which
means all direct targets are redirected.




    /* Speculative edge always consist of all three components - direct 
edge,

@@ -1199,7 +1213,14 @@ cgraph_edge::resolve_speculation (tree callee_decl)
   in the functions inlined through it.  */
  }
    edge->count += e2->count;
-  edge->speculative = false;
+  if (edge->indirect_info && edge->indirect_info->num_of_ics)
+    {
+  edge->indirect_info->num_of_ics--;
+  if (edge->indirect_info->num_of_ics == 0)
+    edge->speculative = false;
+    }
+  else
+    edge->speculative = false;
    e2->speculative = 

Re: [C++ Patch] PR 90909 ("[10 Regression] call devirtualized to pure virtual")

2019-06-24 Thread Paolo Carlini

Hi,

On 23/06/19 19:45, Jason Merrill wrote:

On 6/23/19 7:53 AM, Paolo Carlini wrote:

... hi again ;)

The other day I was having a look at using declarations for this 
issue and noticed that only a few lines below the de-virtualization 
check we have to handle functions found by a using declaration, for 
various reasons. In particular, we know whether we found a function 
fn where has been declared or in a derived class. Thus the idea: for 
the purpose of making some progress, in particular all the cases in 
c++/67184 & co, would it make sense for the time being to simply add 
a check to the de-virtualization condition restricting it to 
non-using declarations? See the below (it also moves the conditional 
a few lines below only for clarity and consistency with the code 
handling using declarations, no functional impact) What do you think?


Hmm, perhaps we should check CLASSTYPE_FINAL in 
resolves_to_fixed_type_p rather than in build_over_call at all; then 
the code in build_new_method_call ought to set LOOKUP_NONVIRTUAL when 
appropriate.


I think your suggestion has to do with the initial implementation of 
this optimization, as contributed by my friend Roberto Agostino: we had 
the issue that it didn't handle at all user-defined operators and 
Vincenzo filed c++/53186. Thus, upon your suggestion, we moved the code 
to build_over_call, the current place:


    https://gcc.gnu.org/ml/gcc-patches/2012-05/msg00246.html

where it catched both member functions and operators. Now - before we 
get to the details - if I move the CLASSTYPE_FINAL check to 
resolves_to_fixed_type_p we exactly regress on c++/53186, that is 
other/final2.C, because resolves_to_fixed_type_p is *never* called. The 
pending final4.C, also involving operators (I constructed it exactly 
because I knew operators could be tricky) is also not fixed, but in that 
case at least resolves_to_fixed_type_p *is* called, only, too late (I 
think, I more details later, if you like).


All the other existing and pending testcases - involving member 
functions - appear to be Ok, even with a draft implementation of your 
suggestion (I slapped a 'if (CLASS_TYPE_P (t) && CLASSTYPE_FINAL (t)) 
return true;' in the middle of resolves_to_fixed_type_p.


Thanks, Paolo.




Re: [PATCH] Automatics in equivalence statements

2019-06-24 Thread Bernhard Reutner-Fischer
On Fri, 21 Jun 2019 07:10:11 -0700
Steve Kargl  wrote:

> On Fri, Jun 21, 2019 at 02:31:51PM +0100, Mark Eggleston wrote:
> > Currently variables with the AUTOMATIC attribute can not appear in an 
> > EQUIVALENCE statement. However its counterpart, STATIC, can be used in 
> > an EQUIVALENCE statement.
> > 
> > Where there is a clear conflict in the attributes of variables in an 
> > EQUIVALENCE statement an error message will be issued as is currently 
> > the case.
> > 
> > If there is no conflict e.g. a variable with a AUTOMATIC attribute and a 
> > variable(s) without attributes all variables in the EQUIVALENCE will 
> > become AUTOMATIC.
> > 
> > Note: most of this patch was written by Jeff Law 
> > 
> > Please review.
> > 
> > ChangeLogs:
> > 
> > gcc/fortran
> > 
> >      Jeff Law  
> >      Mark Eggleston  
> > 
> >      * gfortran.h: Add check_conflict declaration.  
> 
> This is wrong.  By convention a routine that is not static
> has the gfc_ prefix.
> 
Furthermore doesn't this export indicate that you're committing a
layering violation somehow?

>      * symbol.c (check_conflict): Remove automatic in equivalence conflict
>      check.
>      * symbol.c (save_symbol): Add check for in equivalence to stop the
>      the save attribute being added.
>      * trans-common.c (build_equiv_decl): Add is_auto parameter and
>      add !is_auto to condition where TREE_STATIC (decl) is set.
>      * trans-common.c (build_equiv_decl): Add local variable is_auto,
>      set it true if an atomatic attribute is encountered in the variable

atomatic? I read atomic but you mean automatic.

>      list.  Call build_equiv_decl with is_auto as an additional parameter.
>      flag_dec_format_defaults is enabled.
>      * trans-common.c (accumulate_equivalence_attributes) : New subroutine.
>      * trans-common.c (find_equivalence) : New local variable dummy_symbol,
>      accumulated equivalence attributes from each symbol then check for
>      conflicts.

I'm just curious why you don't gfc_copy_attr for the most part of 
accumulate_equivalence_attributes?
thanks,


Re: [PATCH] [RS6000] Change maddld match_operand from DI to GPR

2019-06-24 Thread Segher Boessenkool
On Mon, Jun 24, 2019 at 03:49:35PM +0800, Kewen.Lin wrote:
> > It sounds like we can have a clean up for some others like 
> > TARGET_EXTSWSLI. :)
> 
> Sorry, maybe not, it's not similar to maddld for 32bit operations.

Hey, it currently is

(define_insn_and_split "ashdi3_extswsli"
  [(set (match_operand:DI 0 "gpc_reg_operand" "=r,r")
(ashift:DI
 (sign_extend:DI (match_operand:SI 1 "reg_or_mem_operand" "r,m"))
 (match_operand:DI 2 "u6bit_cint_operand" "n,n")))]

so you could just do

(define_insn_and_split "ashdi3_extswsli"
  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r,r")
(ashift:GPR
 (sign_extend:GPR (match_operand:SI 1 "reg_or_mem_operand" "r,m"))
 (match_operand:GPR 2 "u6bit_cint_operand" "n,n")))]

and that will work, just generate insn patterns that will never match for SI.

But you can also do

(define_insn_and_split "ashl3_extswsli"
  [(set (match_operand:EXTSI 0 "gpc_reg_operand" "=r,r")
(ashift:EXTSI
 (sign_extend:EXTSI (match_operand:SI 1 "reg_or_mem_operand" "r,m"))
 (match_operand:EXTSI 2 "u6bit_cint_operand" "n,n")))]

and that should work fine, without needing any explicit TARGET_POWERPC64.
But now you need to adjust direct callers of this pattern (which probably
do exist, it is a named pattern (i.e. without *) for a reason ;-)


Segher


Re: [PATCH] [RS6000] Change maddld match_operand from DI to GPR

2019-06-24 Thread Kewen.Lin
Hi Segher,

on 2019/6/24 下午4:02, Segher Boessenkool wrote:
> Hi Kewen,
> 
> On Mon, Jun 24, 2019 at 03:43:26PM +0800, Kewen.Lin wrote:
>> on 2019/6/24 下午3:19, Segher Boessenkool wrote:
>>> Newer ISAs require 64-bit to be implemented.  There are no optional
>>> 64-bit categories anymore.  Since this instruction is enabled for P9
>>> (ISA 3.0) only (that's the TARGET_MODULO), it's fine.
>>>
>>> What you are saying is quite true for older CPUs/ISAs though: there you
>>> have to make sure you are targetting a CPU that supports the 64-bit
>>> categories, before using any 64-bit insns.
>>>
>>> But those days are gone :-)
>>
>> Good to know that, thanks a lot for the information!  It's fine then.
>>
>> It sounds like we can have a clean up for some others like 
>> TARGET_EXTSWSLI. :)
> 
> Yes, but be careful there!  The insn patterns for this use DImode, which
> does not mean the same thing without -mpowerpc64 (it's a register pair
> then, not what you want).
> 
> And it doesn't make much sense to allow this for SImode as well (using
> GPR, perhaps), because the insn just is a shift left for SImode, and we
> already have shift left instructions.
> 
> So we might want to just directly say "TARGET_MODULO && TARGET_POWERPC64"
> in those patterns (TARGET_MODULO is a funny way of saying "p9 or later").
> 

Thanks for further clarification!  Yes, I agree with you.  I just noticed
that extswsli isn't like maddld and not suitable for SImode.


Thanks,
Kewen

> 
> Segher
> 



Re: [PATCH 0/5] Tweak IRA handling of tying and earlyclobbers

2019-06-24 Thread Richard Sandiford
Eric Botcazou  writes:
>> Forgot to say that this list excludes targets for which there were
>> no changes in assembly length.  (Thought I'd better say that since
>> the list clearly doesn't have one entry per CPU directory.)
>> 
>> FWIW the full list was:
>> 
>> aarch64-linux-gnu aarch64_be-linux-gnu alpha-linux-gnu amdgcn-amdhsa
>> arc-elf arm-linux-gnueabi arm-linux-gnueabihf avr-elf bfin-elf
>> c6x-elf cr16-elf cris-elf csky-elf epiphany-elf fr30-elf
>> frv-linux-gnu ft32-elf h8300-elf hppa64-hp-hpux11.23 ia64-linux-gnu
>> i686-pc-linux-gnu i686-apple-darwin iq2000-elf lm32-elf m32c-elf
>> m32r-elf m68k-linux-gnu mcore-elf microblaze-elf mipsel-linux-gnu
>> mipsisa64-linux-gnu mmix mn10300-elf moxie-rtems msp430-elf
>> nds32le-elf nios2-linux-gnu nvptx-none or1k-elf pdp11
>> powerpc64-linux-gnu powerpc64le-linux-gnu powerpc-ibm-aix7.0 pru-elf
>> riscv32-elf riscv64-elf rl78-elf rx-elf s390-linux-gnu
>> s390x-linux-gnu sh-linux-gnu sparc-linux-gnu sparc64-linux-gnu
>> sparc-wrs-vxworks spu-elf tilegx-elf tilepro-elf xstormy16-elf
>> v850-elf vax-netbsdelf visium-elf x86_64-darwin x86_64-linux-gnu
>> xtensa-elf
>
> Thanks for the note, I was about to ask what happened for SPARC. :-)

:-)

> Btw, where does this sparc-wrs-vxworks target come from?  It's quite
> obsolete so should be replaced with sparc-elf at this point.

It's based on an old list that also tried to include at least one
target for each supported OS (although it looks like there's no
longer a -vms target, hmm).  I think at the time I was making target
hook changes for which the OS made a difference.

Richard


Re: [PATCH] [RS6000] Change maddld match_operand from DI to GPR

2019-06-24 Thread Segher Boessenkool
Hi Kewen,

On Mon, Jun 24, 2019 at 03:43:26PM +0800, Kewen.Lin wrote:
> on 2019/6/24 下午3:19, Segher Boessenkool wrote:
> > Newer ISAs require 64-bit to be implemented.  There are no optional
> > 64-bit categories anymore.  Since this instruction is enabled for P9
> > (ISA 3.0) only (that's the TARGET_MODULO), it's fine.
> > 
> > What you are saying is quite true for older CPUs/ISAs though: there you
> > have to make sure you are targetting a CPU that supports the 64-bit
> > categories, before using any 64-bit insns.
> > 
> > But those days are gone :-)
> 
> Good to know that, thanks a lot for the information!  It's fine then.
> 
> It sounds like we can have a clean up for some others like 
> TARGET_EXTSWSLI. :)

Yes, but be careful there!  The insn patterns for this use DImode, which
does not mean the same thing without -mpowerpc64 (it's a register pair
then, not what you want).

And it doesn't make much sense to allow this for SImode as well (using
GPR, perhaps), because the insn just is a shift left for SImode, and we
already have shift left instructions.

So we might want to just directly say "TARGET_MODULO && TARGET_POWERPC64"
in those patterns (TARGET_MODULO is a funny way of saying "p9 or later").


Segher


Re: [PATCH 0/5] Tweak IRA handling of tying and earlyclobbers

2019-06-24 Thread Eric Botcazou
> Forgot to say that this list excludes targets for which there were
> no changes in assembly length.  (Thought I'd better say that since
> the list clearly doesn't have one entry per CPU directory.)
> 
> FWIW the full list was:
> 
> aarch64-linux-gnu aarch64_be-linux-gnu alpha-linux-gnu amdgcn-amdhsa
> arc-elf arm-linux-gnueabi arm-linux-gnueabihf avr-elf bfin-elf
> c6x-elf cr16-elf cris-elf csky-elf epiphany-elf fr30-elf
> frv-linux-gnu ft32-elf h8300-elf hppa64-hp-hpux11.23 ia64-linux-gnu
> i686-pc-linux-gnu i686-apple-darwin iq2000-elf lm32-elf m32c-elf
> m32r-elf m68k-linux-gnu mcore-elf microblaze-elf mipsel-linux-gnu
> mipsisa64-linux-gnu mmix mn10300-elf moxie-rtems msp430-elf
> nds32le-elf nios2-linux-gnu nvptx-none or1k-elf pdp11
> powerpc64-linux-gnu powerpc64le-linux-gnu powerpc-ibm-aix7.0 pru-elf
> riscv32-elf riscv64-elf rl78-elf rx-elf s390-linux-gnu
> s390x-linux-gnu sh-linux-gnu sparc-linux-gnu sparc64-linux-gnu
> sparc-wrs-vxworks spu-elf tilegx-elf tilepro-elf xstormy16-elf
> v850-elf vax-netbsdelf visium-elf x86_64-darwin x86_64-linux-gnu
> xtensa-elf

Thanks for the note, I was about to ask what happened for SPARC. :-)

Btw, where does this sparc-wrs-vxworks target come from?  It's quite obsolete 
so should be replaced with sparc-elf at this point.

-- 
Eric Botcazou


Re: [PATCH] [RS6000] Change maddld match_operand from DI to GPR

2019-06-24 Thread Kewen.Lin


on 2019/6/24 下午3:43, Kewen.Lin wrote:
> on 2019/6/24 下午3:19, Segher Boessenkool wrote:
>> On Mon, Jun 24, 2019 at 02:45:09PM +0800, Kewen.Lin wrote:
>>> on 2019/6/24 下午2:00, Li Jia He wrote:
 -#define TARGET_MADDLD (TARGET_MODULO && TARGET_POWERPC64)
 +#define TARGET_MADDLD TARGET_MODULO
>>>
>>> IMHO, I don't think this removal of TARGET_POWERPC64 is reasonable.
>>> As ISA V3.0, the description of this insn maddld is:
>>> GPR[RT].dword[0] ← Chop(result, 64)
>>>
>>> It assumes the GPR has dword, it's a 64-bit specific insn, right?
>>> Your change relaxes it to be adopted on 32-bit.
>>> Although it's fine for powerpc LE since it's always 64-bit, it will
>>> have problems for power9 32bit like AIX?
>>
>> Hi Kewen,
>>
>> Newer ISAs require 64-bit to be implemented.  There are no optional
>> 64-bit categories anymore.  Since this instruction is enabled for P9
>> (ISA 3.0) only (that's the TARGET_MODULO), it's fine.
>>
>> What you are saying is quite true for older CPUs/ISAs though: there you
>> have to make sure you are targetting a CPU that supports the 64-bit
>> categories, before using any 64-bit insns.
>>
>> But those days are gone :-)
>>
> 
> Hi Segher,
> 
> Good to know that, thanks a lot for the information!  It's fine then.
> 
> It sounds like we can have a clean up for some others like 
> TARGET_EXTSWSLI. :)
> 

Sorry, maybe not, it's not similar to maddld for 32bit operations.


Thanks,
Kewen

> 
> Thanks,
> Kewen
> 
>>
>> Segher
>>
> 



Re: [PATCH] [RS6000] Change maddld match_operand from DI to GPR

2019-06-24 Thread Kewen.Lin
on 2019/6/24 下午3:19, Segher Boessenkool wrote:
> On Mon, Jun 24, 2019 at 02:45:09PM +0800, Kewen.Lin wrote:
>> on 2019/6/24 下午2:00, Li Jia He wrote:
>>> -#define TARGET_MADDLD  (TARGET_MODULO && TARGET_POWERPC64)
>>> +#define TARGET_MADDLD  TARGET_MODULO
>>
>> IMHO, I don't think this removal of TARGET_POWERPC64 is reasonable.
>> As ISA V3.0, the description of this insn maddld is:
>> GPR[RT].dword[0] ← Chop(result, 64)
>>
>> It assumes the GPR has dword, it's a 64-bit specific insn, right?
>> Your change relaxes it to be adopted on 32-bit.
>> Although it's fine for powerpc LE since it's always 64-bit, it will
>> have problems for power9 32bit like AIX?
> 
> Hi Kewen,
> 
> Newer ISAs require 64-bit to be implemented.  There are no optional
> 64-bit categories anymore.  Since this instruction is enabled for P9
> (ISA 3.0) only (that's the TARGET_MODULO), it's fine.
> 
> What you are saying is quite true for older CPUs/ISAs though: there you
> have to make sure you are targetting a CPU that supports the 64-bit
> categories, before using any 64-bit insns.
> 
> But those days are gone :-)
> 

Hi Segher,

Good to know that, thanks a lot for the information!  It's fine then.

It sounds like we can have a clean up for some others like 
TARGET_EXTSWSLI. :)


Thanks,
Kewen

> 
> Segher
> 



Re: [PATCH] [RS6000] Change maddld match_operand from DI to GPR

2019-06-24 Thread Segher Boessenkool
Hi Lijia,

On Mon, Jun 24, 2019 at 01:00:05AM -0500, Li Jia He wrote:
> >From PowerPC ISA3.0, the description of `maddld RT, RA.RB, RC` is as follows:
> 64-bit RA and RB are multiplied and then the RC is signed extend to 128 bits,
> and add them together.
> 
> We only apply it to 64-bit mode (DI) when implementing maddld.  However, if we
> can guarantee that the result of the maddld operation will be limited to 
> 32-bit
> mode (SI), we can still apply it to 32-bit mode (SI).

Great :-)  Just some testcase comments:

> diff --git a/gcc/testsuite/gcc.target/powerpc/maddld-1.c 
> b/gcc/testsuite/gcc.target/powerpc/maddld-1.c
> new file mode 100644
> index 000..06f5f5774d4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/maddld-1.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target { powerpc*-*-* } } } */

powerpc* is the default in gcc.target/powerpc, so you can leave it out:

/* { dg-do compile } */

(and that is default itself, but it is good documentation for the target
tests, many of those are run tests).

> +/* { dg-require-effective-target powerpc_p9modulo_ok } */

You don't need this line, it tests if the assembler supports p9.

> +/* { dg-final { scan-assembler-times "maddld " 2 } } */
> +/* { dg-final { scan-assembler-not   "mulld "} } */
> +/* { dg-final { scan-assembler-not   "add "  } } */

You can easier write this using \m and \M, a bit more exact even:

/* { dg-final { scan-assembler-times {\mmaddld\M} 2 } } */
/* { dg-final { scan-assembler-not   {\mmul} } } */
/* { dg-final { scan-assembler-not   {\madd} } } */

Which allows only the exact mnemonic "maddld", and disallows anything
starting with "mul" or "add".

Okay for trunk, with the testcase improvements please.  Thanks!


Segher


Re: [PATCH] [RS6000] Change maddld match_operand from DI to GPR

2019-06-24 Thread Segher Boessenkool
On Mon, Jun 24, 2019 at 02:45:09PM +0800, Kewen.Lin wrote:
> on 2019/6/24 下午2:00, Li Jia He wrote:
> > -#define TARGET_MADDLD  (TARGET_MODULO && TARGET_POWERPC64)
> > +#define TARGET_MADDLD  TARGET_MODULO
> 
> IMHO, I don't think this removal of TARGET_POWERPC64 is reasonable.
> As ISA V3.0, the description of this insn maddld is:
> GPR[RT].dword[0] ← Chop(result, 64)
> 
> It assumes the GPR has dword, it's a 64-bit specific insn, right?
> Your change relaxes it to be adopted on 32-bit.
> Although it's fine for powerpc LE since it's always 64-bit, it will
> have problems for power9 32bit like AIX?

Hi Kewen,

Newer ISAs require 64-bit to be implemented.  There are no optional
64-bit categories anymore.  Since this instruction is enabled for P9
(ISA 3.0) only (that's the TARGET_MODULO), it's fine.

What you are saying is quite true for older CPUs/ISAs though: there you
have to make sure you are targetting a CPU that supports the 64-bit
categories, before using any 64-bit insns.

But those days are gone :-)


Segher