Re: [PATCH][PR target/83994] Fix stack-clash-protection code generation on x86

2018-01-23 Thread Uros Bizjak
On Wed, Jan 24, 2018 at 12:15 AM, Jeff Law  wrote:
>
> pr83994 is a code generation bug in the stack-clash support that affects
> openssl (we've turned on stack-clash-protection by default for the F28
> builds).
>
> The core problem is stack-clash (like stack-check) will emit a probing
> loop if the prologue allocates enough stack space.  When emitting a loop
> both implementations will need a scratch register.
>
> They use get_scratch_register_at_entry to find a suitable scratch
> register.  This routine assumes that callee registers saves are
> completed at the point where the scratch register is going to be used.
>
> In this particular testcase we select %ebx because ax,cx,dx are used for
> parameter passing.  That's fine.  The problem is %ebx hasn't been saved yet!
>
> -fstack-check has a bit of code in the frame setup/layout code which
> forces the prologue to use pushes rather than reg->mem moves for saving
> registers.  There's a gcc_assert in the prologue expander to catch any
> case where the registers aren't saved.
>
> -fstack-clash-protection doesn't have that same bit of magic in the
> frame setup/layout code and it bypasses the assertion due to a change I
> made back in Nov 2017 due to not being aware of this particular issue.
>
> This patch reverts the assertion bypass I added back in Nov 2017 and
> adds clarifying comments.  The patch also forces use of push to save
> integer registers for a stack-clash protected prologue if probes are
> going to be needed.
>
> Bootstrapped and regression tested on x86_64.
>
> While the bug is not marked as a regression, ISTM this needs to be fixed
> for gcc-8.
>
> OK for the trunk?
>
> Jeff
>
> * i386.c (get_probe_interval): Move to earlier point.
> (ix86_compute_frame_layout): If -fstack-clash-protection and
> the frame is larger than the probe interval, then use pushes
> to save registers rather than reg->mem moves.
> (ix86_expand_prologue): Remove conditional for int_registers_saved
> assertion.
>
> * gcc.target/i386/pr83994.c: New test.

OK with the fixed testcase (see below).

Thanks,
Uros.

>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 72d25ae..4cb55a8 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -11497,6 +11497,18 @@ static void warn_once_call_ms2sysv_xlogues (const 
> char *feature)
>  }
>  }
>
> +/* Return the probing interval for -fstack-clash-protection.  */
> +
> +static HOST_WIDE_INT
> +get_probe_interval (void)
> +{
> +  if (flag_stack_clash_protection)
> +return (HOST_WIDE_INT_1U
> +   << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL));
> +  else
> +return (HOST_WIDE_INT_1U << STACK_CHECK_PROBE_INTERVAL_EXP);
> +}
> +
>  /* When using -fsplit-stack, the allocation routines set a field in
> the TCB to the bottom of the stack plus this much space, measured
> in bytes.  */
> @@ -11773,7 +11785,14 @@ ix86_compute_frame_layout (void)
>to_allocate = offset - frame->sse_reg_save_offset;
>
>if ((!to_allocate && frame->nregs <= 1)
> -  || (TARGET_64BIT && to_allocate >= HOST_WIDE_INT_C (0x8000)))
> +  || (TARGET_64BIT && to_allocate >= HOST_WIDE_INT_C (0x8000))
> +  /* If stack clash probing needs a loop, then it needs a
> +scratch register.  But the returned register is only guaranteed
> +to be safe to use after register saves are complete.  So if
> +stack clash protections are enabled and the allocated frame is
> +larger than the probe interval, then use pushes to save
> +callee saved registers.  */
> +  || (flag_stack_clash_protection && to_allocate > get_probe_interval 
> ()))
>  frame->save_regs_using_mov = false;
>
>if (ix86_using_red_zone ()
> @@ -12567,18 +12586,6 @@ release_scratch_register_on_entry (struct 
> scratch_reg *sr)
>  }
>  }
>
> -/* Return the probing interval for -fstack-clash-protection.  */
> -
> -static HOST_WIDE_INT
> -get_probe_interval (void)
> -{
> -  if (flag_stack_clash_protection)
> -return (HOST_WIDE_INT_1U
> -   << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL));
> -  else
> -return (HOST_WIDE_INT_1U << STACK_CHECK_PROBE_INTERVAL_EXP);
> -}
> -
>  /* Emit code to adjust the stack pointer by SIZE bytes while probing it.
>
> This differs from the next routine in that it tries hard to prevent
> @@ -13727,12 +13734,11 @@ ix86_expand_prologue (void)
>&& (flag_stack_check == STATIC_BUILTIN_STACK_CHECK
>   || flag_stack_clash_protection))
>  {
> -  /* This assert wants to verify that integer registers were saved
> -prior to probing.  This is necessary when probing may be implemented
> -as a function call (Windows).  It is not necessary for stack clash
> -protection probing.  */
> -  if (!flag_stack_clash_protection)
> -   gcc_assert (int_registers_saved);
> +  /* We expect the GP registers to be saved whe

[PATCH], PR target/81550, Rewrite PowerPC loop_align test so it still tests the original target hook

2018-01-23 Thread Michael Meissner
The loop_align.c test has been broken for some time, since I put in patches to
eliminate some debug hooks (-mno-upper-regs-{df,di,sf}) that we deemed to no
longer be needed.

As Segher and I were discussing over private IRC, the root cause of this bug is
the compiler no long generates the BDNZ instruction for a count down loop,
instead it decrements the index in a GPR and does a branch/comparison on it.
In doing so, it now unrolls the loop twice, and and the resulting loop is too
big for the target hook TARGET_ASM_LOOP_ALIGN_MAX_SKIP.  This means the loop
isn't aligned to a 32 byte boundary.

While it is important to ultimately fix the code generation bug to once again
generate the BDNZ instruction, it may be more involved in fixing the bug.  So,
I decided to rewrite the test to be simpler, and the resulting code fits within
the 4-8 instruction window the target hook is looking for.

I have tested this on a little endian power8 system, and a big endian power8
system, doing both bootstrap builds and regression tests.  The only change to
the regression test is that loop_align.c now passes on little endian 64-bit and
big endian 64-bit (big endian 32-bit did not fail with the new changes).  Can I
install this on the trunk?  Back ports to GCC 7/6 are not needed, since the
original code works on those systems.

[gcc/testsuite]
2018-01-24  Michael Meissner  

PR target/81550
* gcc.target/powerpc/loop_align.c: Rewrite test so that the loop
remains small enough that it tests the alignment of loops
specified by the target hook TARGET_ASM_LOOP_ALIGN_MAX_SKIP.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/testsuite/gcc.target/powerpc/loop_align.c
===
--- gcc/testsuite/gcc.target/powerpc/loop_align.c   (revision 256992)
+++ gcc/testsuite/gcc.target/powerpc/loop_align.c   (working copy)
@@ -1,11 +1,16 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-skip-if "" { powerpc*-*-darwin* powerpc-ibm-aix* } } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power7" } } */
-/* { dg-options "-O2 -mcpu=power7 -falign-functions=16" } */
+/* { dg-options "-O2 -mcpu=power7 -falign-functions=16 -fno-reorder-blocks" } 
*/
 /* { dg-final { scan-assembler ".p2align 5,,31" } } */
 
-void f(double *a, double *b, double *c, int n) {
-  int i;
+/* This test must be crafted so that the loop is less than 8 insns (and more
+   than 4) in order for the TARGET_ASM_LOOP_ALIGN_MAX_SKIP target hook to fire
+   and align the loop properly.  Originally the loop used double, and various
+   optimizations caused the loop to double in size, and fail the test.  */
+
+void f(long *a, long *b, long *c, long n) {
+  long i;
   for (i=0; i < n; i++)
 a[i] = b[i] + c[i];
 }


Re: [SFN+LVU+IEPM v4 9/9] [IEPM] Introduce inline entry point markers

2018-01-23 Thread Alexandre Oliva
On Dec 21, 2017, Jeff Law  wrote:

> On 12/11/2017 07:54 PM, Alexandre Oliva wrote:
>> +  ASM_GENERATE_INTERNAL_LABEL (label, "LVU", ied->view);
>> +  /* FIXME: this will resolve to a small number.  Could we
>> + possibly emit smaller data?  Ideally we'd emit a
>> + uleb128, but that would make the size of DIEs
>> + impossible for the compiler to compute, since it's
>> + the assembler that computes the value of the view
>> + label in this case.  Ideally, we'd have a single form
>> + encompassing both the address and the view, and
>> + indirecting them through a table might make things
>> + easier, but even that would be more wasteful,
>> + space-wise, than what we have now.  */
>> +  add_AT_lbl_id (die, DW_AT_GNU_entry_view, label);

> Do you have a sense of whether or not this really matters in practice?
> how much bigger do we get due when we enable LVU?

It's more of a matter of general design principle.  DWARF strives to be
compact, and outputting a very-likely small constant with most bits
known to be zero is kind of against the rules.

LVU proper doesn't run into this because the location view numbers are
emitted as uleb128 constants in location list sections, never in the
main debug section.  How much it really grows depends on the
representation: DWARF>5 loclists only get additional entries when at
least one view number in a range pair is nonzero, and I have a sense
that most view numbers in loclists are zero; for DWARF[2,5], we emit
list of pairs corresponding to the ranges in each entry in the actual
loclist, so we spend at least two bytes for both views, plus the pointer
to the locviewlist in an additional attribute.  Considering each range
amounts to a pair of pointers plus at least two bytes for the location
expression, and that each range gets a corresponding pair of views, and
assuming all views will fit in a single uleb128 byte (128+ views at the
same PC are very unlikely), the loclist section would grow by at most
20% with 32-bit pointers, and about half that much with 64-bit pointers.
In reality, it ought to be a lot less than that, since location
expressions's taking up a single byte (plus another byte representing
the size) are common but hardly a majority of the cases.

>> +  /* Sanity check the block tree.  This would catch a case in which
>> + BLOCK got removed from the tree reachable from the outermost
>> + lexical block, but got retained in markers.  It would still link
>> + back to its parents, but some ancestor would be missing a link
>> + down the path to the sub BLOCK.  If the block got removed, its
>> + BLOCK_NUMBER will not be a usable value.  */
>> +  gcc_checking_assert (block_within_block_p (block,
>> + DECL_INITIAL
>> + (current_function_decl),
>> + true));
>> +
>> +  gcc_assert (inlined_function_outer_scope_p (block));
>> +  gcc_assert (!BLOCK_DIE (block));
>> +
>> +  /* If we can't represent it, don't bother.  */
>> +  if (!(dwarf_version >= 3 || !dwarf_strict))
>> +return;
> Consider moving this check earlier.  I don't think it's a hard
> requirement, so if you put it after the asserts for a reason, then leave
> it has is.

The reason I put the check after the asserts was that I wanted to catch
messed up block trees even if we wouldn't emit the entry point after
all.  Now, considering the block tree messing up was figured out, I
guess moving the test earlier and saving the cycles if we're not
emitting the annotation makes sense.  Will do.

> Generally I think this is fine (it's much simpler than the dwarf2 bits
> of the LVU patch.

Yeah.  I wonder if there's anything I can do, now that I'm back from
vacations, to help get the LVU patch reviewed.  I suppose it might still
be eligible for GCC 8, considering it was posted even before stage3, let
alone stage4, and considering it only deals with debug info (mainly
location lists), but then...  unlike the SFN stuff, the additional
information it outputs will only make a difference once debug info
consumers learn to use it.  So maybe we could afford to leave it out, or
to bring it in but disabled.

Thoughts?  Does it make sense at this point for me to pester ^W entice
some review to have a look at it?

Thanks,

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


[PR81611] improve auto-inc

2018-01-23 Thread Alexandre Oliva
These two patches fix PR81611.

The first one improves forwprop so that we avoid adding SSA conflicting
by forwpropping the iv increment, which may cause both the incremented
and the original value to be live, even when the iv is copied between
the PHI node and the increment.  We already handled the case in which
there aren't any such copies.

Alas, this is not enough to address the problem on avr, even though it
fixes it on e.g. ppc.  The reason is that avr rejects var+offset
addresses, and this prevents the memory access in a post-inc code
sequence from being adjusted to an address that auto-inc-dec can
recognize.

The second patch adjusts auto-inc-dec to recognize a code sequence in
which the original, unincremented pseudo is used in an address after
it's incremented into another pseudo, and turn that into a post-inc
address, leaving the copying for subsequent passes to eliminate.

Regstrapped on x86_64-linux-gnu, i686-linux-gnu, ppc64-linux-gnu and
aarch64-linux-gnu.  Ok to install?


I'd appreciate suggestions on how to turn the submitted testcase into a
regression test; I suppose an avr-specific test that requires the
auto-inc transformation is a possibility, but that feels a bit too
limited, doesn't it?  Thoughts?  Thanks in advance,


[PR81611] accept copies in simple_iv_increment_p

If there are copies between the GIMPLE_PHI at the loop body and the
increment that reaches it (presumably through a back edge), still
regard it as a simple_iv_increment, so that we won't consider the
value in the back edge eligible for forwprop.  Doing so would risk
making the phi node and the incremented conflicting value live
within the loop, and the phi node to be preserved for propagated
uses after the loop.

for  gcc/ChangeLog

PR tree-optimization/81611
* tree-ssa-dom.c (simple_iv_increment_p): Skip intervening
copies.
---
 gcc/tree-ssa-dom.c |   21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index 2b371667253a..3c0ff9458342 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -1276,8 +1276,11 @@ record_equality (tree x, tree y, class const_and_copies 
*const_and_copies)
 /* Returns true when STMT is a simple iv increment.  It detects the
following situation:
 
-   i_1 = phi (..., i_2)
-   i_2 = i_1 +/- ...  */
+   i_1 = phi (..., i_k)
+   [...]
+   i_j = i_{j-1}  for each j : 2 <= j <= k-1
+   [...]
+   i_k = i_{k-1} +/- ...  */
 
 bool
 simple_iv_increment_p (gimple *stmt)
@@ -1305,8 +1308,18 @@ simple_iv_increment_p (gimple *stmt)
 return false;
 
   phi = SSA_NAME_DEF_STMT (preinc);
-  if (gimple_code (phi) != GIMPLE_PHI)
-return false;
+  while (gimple_code (phi) != GIMPLE_PHI)
+{
+  /* Follow trivial copies, but not the DEF used in a back edge,
+so that we don't prevent coalescing.  */
+  if (gimple_code (phi) != GIMPLE_ASSIGN
+ || gimple_assign_lhs (phi) != preinc
+ || !gimple_assign_ssa_name_copy_p (phi))
+   return false;
+
+  preinc = gimple_assign_rhs1 (phi);
+  phi = SSA_NAME_DEF_STMT (preinc);
+}
 
   for (i = 0; i < gimple_phi_num_args (phi); i++)
 if (gimple_phi_arg_def (phi, i) == lhs)


[PR81611] turn inc-and-use-of-dead-orig into auto-inc

When the addressing modes available on the machine don't allow offsets
in addresses, odds are that post-increments will be represented in
trees and RTL as:

  y <= x + 1
  ... *(x) ...
  x <= y

so deal with this form so as to create auto-inc addresses that we'd
otherwise miss.


for  gcc/ChangeLog

PR rtl-optimization/81611
* auto-inc-dec.c (attempt_change): Move dead note from
mem_insn if it's the next use of regno
(find_address): Take address use of reg holding
non-incremented value.
(merge_in_block): Attempt to use a mem insn that is the next
use of the original regno.
---
 gcc/auto-inc-dec.c |   46 --
 1 file changed, 44 insertions(+), 2 deletions(-)

diff --git a/gcc/auto-inc-dec.c b/gcc/auto-inc-dec.c
index d02fa9d081c7..4ffbcf56a456 100644
--- a/gcc/auto-inc-dec.c
+++ b/gcc/auto-inc-dec.c
@@ -508,7 +508,11 @@ attempt_change (rtx new_addr, rtx inc_reg)
 before the memory reference.  */
   gcc_assert (mov_insn);
   emit_insn_before (mov_insn, inc_insn.insn);
-  move_dead_notes (mov_insn, inc_insn.insn, inc_insn.reg0);
+  regno = REGNO (inc_insn.reg0);
+  if (reg_next_use[regno] == mem_insn.insn)
+   move_dead_notes (mov_insn, mem_insn.insn, inc_insn.reg0);
+  else
+   move_dead_notes (mov_insn, inc_insn.insn, inc_insn.reg0);
 
   regno = REGNO (inc_insn.reg_res);
   reg_next_def[regno] = mov_insn;
@@ -842,7 +846,7 @@ find_address (rtx *address_of_x)
 
   if (code == MEM && rtx_equal_p (XEXP (x, 0), inc_insn.reg_res))
 {
-  /* Match with *reg0.  */
+  /* Match with *reg_res.  */
   mem_insn.mem_loc = address_of_x;
  

Re: [RFC][PATCH] PR preprocessor/83173: Additional check before decrementing highest_location

2018-01-23 Thread Mike Gulick


On 01/14/2018 12:16 PM, Mike Gulick wrote:
> On 01/12/2018 06:16 PM, David Malcolm wrote:
[snip]
>>
>> I was going to suggest renaming your new test to e.g.
>>   location-overflow-test-pr83173.c
>> so that it matches the glob in those comments, but given that you refer
>> to the testname in dg-final directives, please update the line-map.h
>> comments to refer to e.g. "all of testcases in gcc.dg/plugin that use
>> location_overflow_plugin.c.", or somesuch wording.
>>
> 
> If I'm going to modify location_overflow_plugin.c and reuse it for this PR, 
> then
> it would make sense to rename the test and its accompanying helper files to 
> your
> suggested naming as well.  The dg-final regexes will likely continue to work.
> 

I've attached an new version of the test patch that updates
location_overflow_plugin.c to use PLUGIN_PRAGMAS, and updates the test filenames
to be more consistent with the existing location-overflow-test-* tests.

[snip]

>>
>> If I'm reading your description in the PR right, this test case covers
>> the
>>   loc > LINE_MAP_MAX_LOCATION_WITH_COLS
>> case, but not the:
>>   loc <= LINE_MAP_MAX_LOCATION_WITH_COLS
>> case.
>>
>> Can the latter be done by re-using the testcase, but with a different
>> (or no) plugin argument?
>>
> 
> I haven't really poked too hard to try to find if there is any corruption of 
> the
> line map occurring when loc <= LINE_MAP_MAX_LOCATION_WITH_COLS.  It is just a
> suspicion given the fact that this code is still decrementing
> line_table->highest_location in that case.  I would imagine this may result in
> corruption of the column number or range rather than the file name and line
> number.
>

I haven't been able to find any clear bugs in the gcc output when loc <=
LINE_MAP_MAX_LOCATION_WITH_COLS, but I'm not quite sure how this behavior, if
indeed incorrect, would manifest itself.  The original bug is only visible
(AFAIK) when running gcc with -E, as it results in incorrect file names and line
numbers in the preprocessed output.  If the file name and line number are
correct in this output (as they would be when loc <=
LINE_MAP_MAX_LOCATION_WITH_COLS), then everything will be fine when the .i file
is read back in.  I'm not sure if/how this bug can trigger any incorrect
behavior when not using -E (or -no-integrated-cpp).  Still, it does seem to me
that even when loc <= LINE_MAP_MAX_LOCATION_WITH_COLS, _cpp_stack_include() is
doing the wrong thing by decrementing pfile->line_table->highest_location when
CPP_INCREMENT_LINE was not called from _cpp_lex_direct().  I will think about
this a little more, and would value any insight you can offer.  There is some
more information about the details of what goes wrong in these functions in the
original bug report.

Thanks,
Mike
From a0320fc6e4292a194a8680b26f4951a320fceaf2 Mon Sep 17 00:00:00 2001
From: Mike Gulick 
Date: Thu, 30 Nov 2017 18:35:48 -0500
Subject: [PATCH v2] PR preprocessor/83173: New test

2017-12-01  Mike Gulick  

	PR preprocessor/83173
	* gcc.dg/plugin/location-overflow-test-pr83173.c: New test.
	* gcc.dg/plugin/location-overflow-test-pr83173.h: Header for
	pr83173.c.
	* gcc.dg/plugin/location-overflow-test-pr83173-1.h: Header for
	pr83173.c.
	* gcc.dg/plugin/location-overflow-test-pr83173-2.h: Header for
	pr83173.c.
	* gcc.dg/plugin/location_overflow_plugin.c: Use PLUGIN_PRAGMAS
	instead of PLUGIN_START_UNIT.
	* gcc.dg/plugin/plugin.exp: Enable new test.
---
 .../plugin/location-overflow-test-pr83173-1.h   |  2 ++
 .../plugin/location-overflow-test-pr83173-2.h   |  2 ++
 .../gcc.dg/plugin/location-overflow-test-pr83173.c  | 21 +
 .../gcc.dg/plugin/location-overflow-test-pr83173.h  |  2 ++
 .../gcc.dg/plugin/location_overflow_plugin.c| 13 -
 gcc/testsuite/gcc.dg/plugin/plugin.exp  |  3 ++-
 6 files changed, 37 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/location-overflow-test-pr83173-1.h
 create mode 100644 gcc/testsuite/gcc.dg/plugin/location-overflow-test-pr83173-2.h
 create mode 100644 gcc/testsuite/gcc.dg/plugin/location-overflow-test-pr83173.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/location-overflow-test-pr83173.h

diff --git a/gcc/testsuite/gcc.dg/plugin/location-overflow-test-pr83173-1.h b/gcc/testsuite/gcc.dg/plugin/location-overflow-test-pr83173-1.h
new file mode 100644
index 000..f47dc3643c2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/location-overflow-test-pr83173-1.h
@@ -0,0 +1,2 @@
+#pragma once
+#define LOCATION_OVERFLOW_TEST_PR83173_1_H
diff --git a/gcc/testsuite/gcc.dg/plugin/location-overflow-test-pr83173-2.h b/gcc/testsuite/gcc.dg/plugin/location-overflow-test-pr83173-2.h
new file mode 100644
index 000..bc23ed2a188
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/location-overflow-test-pr83173-2.h
@@ -0,0 +1,2 @@
+#pragma once
+#define LOCATION_OVERFLOW_TEST_PR83173_2_H
diff --git a/gcc/testsuite/gcc.dg/plugin/location-overflow-test-pr83173.c b/gcc/testsuite/gcc.d

Re: libgo patch committed: Update to Go1.10beta2 release

2018-01-23 Thread Ian Lance Taylor
On Tue, Jan 23, 2018 at 5:24 PM, Matthias Klose  wrote:
> On 17.01.2018 15:20, Ian Lance Taylor wrote:
>> This patch updates libgo to the Go1.10beta2 release.  The complete
>> patch is too large to include in this e-mail message, mainly due to
>> some test changes.  Bootstrapped and ran Go testsuite on
>> x86_64-pc-linux-gnu.  Committed to mainline.
>
> gotools now installs three more binaries into the internal gcclibdir. Are all 
> of
> these intended to be installed, and is the installation location correct?
>
> $ ls -l /usr/lib/gcc/x86_64-linux-gnu/8/
> total 2132
> -rwxr-xr-x 1 doko doko  412768 Jan 24 00:24 buildid
> -rwxr-xr-x 1 doko doko  391480 Jan 24 00:24 test2json
> -rwxr-xr-x 1 doko doko 1357656 Jan 24 00:24 vet

Yes, and yes.

They are separate helper executables invoked by the go tool installed
in $(bindir).

Ian


Re: libgo patch committed: Update to Go1.10beta2 release

2018-01-23 Thread Matthias Klose
On 17.01.2018 15:20, Ian Lance Taylor wrote:
> This patch updates libgo to the Go1.10beta2 release.  The complete
> patch is too large to include in this e-mail message, mainly due to
> some test changes.  Bootstrapped and ran Go testsuite on
> x86_64-pc-linux-gnu.  Committed to mainline.

gotools now installs three more binaries into the internal gcclibdir. Are all of
these intended to be installed, and is the installation location correct?

$ ls -l /usr/lib/gcc/x86_64-linux-gnu/8/
total 2132
-rwxr-xr-x 1 doko doko  412768 Jan 24 00:24 buildid
-rwxr-xr-x 1 doko doko  391480 Jan 24 00:24 test2json
-rwxr-xr-x 1 doko doko 1357656 Jan 24 00:24 vet

Matthias


[PATCH][PR target/83994] Fix stack-clash-protection code generation on x86

2018-01-23 Thread Jeff Law

pr83994 is a code generation bug in the stack-clash support that affects
openssl (we've turned on stack-clash-protection by default for the F28
builds).

The core problem is stack-clash (like stack-check) will emit a probing
loop if the prologue allocates enough stack space.  When emitting a loop
both implementations will need a scratch register.

They use get_scratch_register_at_entry to find a suitable scratch
register.  This routine assumes that callee registers saves are
completed at the point where the scratch register is going to be used.

In this particular testcase we select %ebx because ax,cx,dx are used for
parameter passing.  That's fine.  The problem is %ebx hasn't been saved yet!

-fstack-check has a bit of code in the frame setup/layout code which
forces the prologue to use pushes rather than reg->mem moves for saving
registers.  There's a gcc_assert in the prologue expander to catch any
case where the registers aren't saved.

-fstack-clash-protection doesn't have that same bit of magic in the
frame setup/layout code and it bypasses the assertion due to a change I
made back in Nov 2017 due to not being aware of this particular issue.

This patch reverts the assertion bypass I added back in Nov 2017 and
adds clarifying comments.  The patch also forces use of push to save
integer registers for a stack-clash protected prologue if probes are
going to be needed.

Bootstrapped and regression tested on x86_64.

While the bug is not marked as a regression, ISTM this needs to be fixed
for gcc-8.

OK for the trunk?

Jeff
* i386.c (get_probe_interval): Move to earlier point.
(ix86_compute_frame_layout): If -fstack-clash-protection and
the frame is larger than the probe interval, then use pushes
to save registers rather than reg->mem moves.
(ix86_expand_prologue): Remove conditional for int_registers_saved
assertion.

* gcc.target/i386/pr83994.c: New test.


diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 72d25ae..4cb55a8 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -11497,6 +11497,18 @@ static void warn_once_call_ms2sysv_xlogues (const char 
*feature)
 }
 }
 
+/* Return the probing interval for -fstack-clash-protection.  */
+
+static HOST_WIDE_INT
+get_probe_interval (void)
+{
+  if (flag_stack_clash_protection)
+return (HOST_WIDE_INT_1U
+   << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL));
+  else
+return (HOST_WIDE_INT_1U << STACK_CHECK_PROBE_INTERVAL_EXP);
+}
+
 /* When using -fsplit-stack, the allocation routines set a field in
the TCB to the bottom of the stack plus this much space, measured
in bytes.  */
@@ -11773,7 +11785,14 @@ ix86_compute_frame_layout (void)
   to_allocate = offset - frame->sse_reg_save_offset;
 
   if ((!to_allocate && frame->nregs <= 1)
-  || (TARGET_64BIT && to_allocate >= HOST_WIDE_INT_C (0x8000)))
+  || (TARGET_64BIT && to_allocate >= HOST_WIDE_INT_C (0x8000))
+  /* If stack clash probing needs a loop, then it needs a
+scratch register.  But the returned register is only guaranteed
+to be safe to use after register saves are complete.  So if
+stack clash protections are enabled and the allocated frame is
+larger than the probe interval, then use pushes to save
+callee saved registers.  */
+  || (flag_stack_clash_protection && to_allocate > get_probe_interval ()))
 frame->save_regs_using_mov = false;
 
   if (ix86_using_red_zone ()
@@ -12567,18 +12586,6 @@ release_scratch_register_on_entry (struct scratch_reg 
*sr)
 }
 }
 
-/* Return the probing interval for -fstack-clash-protection.  */
-
-static HOST_WIDE_INT
-get_probe_interval (void)
-{
-  if (flag_stack_clash_protection)
-return (HOST_WIDE_INT_1U
-   << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL));
-  else
-return (HOST_WIDE_INT_1U << STACK_CHECK_PROBE_INTERVAL_EXP);
-}
-
 /* Emit code to adjust the stack pointer by SIZE bytes while probing it.
 
This differs from the next routine in that it tries hard to prevent
@@ -13727,12 +13734,11 @@ ix86_expand_prologue (void)
   && (flag_stack_check == STATIC_BUILTIN_STACK_CHECK
  || flag_stack_clash_protection))
 {
-  /* This assert wants to verify that integer registers were saved
-prior to probing.  This is necessary when probing may be implemented
-as a function call (Windows).  It is not necessary for stack clash
-protection probing.  */
-  if (!flag_stack_clash_protection)
-   gcc_assert (int_registers_saved);
+  /* We expect the GP registers to be saved when probes are used
+as the probing sequences might need a scratch register and
+the routine to allocate one assumes the integer registers
+have already been saved.  */
+  gcc_assert (int_registers_saved);
 
   if (flag_stack_clash_protection)
{
diff --git a/gcc/testsuite/gcc.ta

[PATCH] RISC-V: Add -mpreferred-stack-boundary option.

2018-01-23 Thread Jim Wilson
This adds a -mpreferred-stack-boundary option patterned after the one supported
by the x86 port.  This allows one to reduce the size of stack frames to reduce
memory usage.

This was tested with a rv32i/ilp32 target using -mpreferred-stack-boundary=3
to get 8-byte alignment instead of the default 16-byte alignment.  There were
no regressions other than gcc.dg/stack-usage-1.c which is expected to fail as
it checks for the default stack frame size.

Committed.

Jim

2018-01-23  Andrew Waterman  
gcc/
* config/riscv/riscv.c (riscv_stack_boundary): New.
(riscv_option_override): Set riscv_stack_boundary.  Handle
riscv_preferred_stack_boundary_arg.
* config/riscv/riscv.h (MIN_STACK_BOUNDARY, ABI_STACK_BOUNDARY): New.
(BIGGEST_ALIGNMENT): Set to STACK_BOUNDARY.
(STACK_BOUNDARY): Set to riscv_stack_boundary.
(RISCV_STACK_ALIGN): Use STACK_BOUNDARY.
* config/riscv/riscv.opt (mpreferred-stack-boundary): New.
* doc/invoke.tex (RISC-V Options): Add -mpreferred-stack-boundary.
---
 gcc/config/riscv/riscv.c   | 17 +
 gcc/config/riscv/riscv.h   | 17 -
 gcc/config/riscv/riscv.opt |  4 
 gcc/doc/invoke.texi| 11 +++
 4 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 20660a4061a..4ef7a1774c4 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -222,6 +222,9 @@ struct riscv_cpu_info {
 /* Whether unaligned accesses execute very slowly.  */
 bool riscv_slow_unaligned_access_p;
 
+/* Stack alignment to assume/maintain.  */
+unsigned riscv_stack_boundary;
+
 /* Which tuning parameters to use.  */
 static const struct riscv_tune_info *tune_info;
 
@@ -4111,6 +4114,20 @@ riscv_option_override (void)
   /* We do not yet support ILP32 on RV64.  */
   if (BITS_PER_WORD != POINTER_SIZE)
 error ("ABI requires -march=rv%d", POINTER_SIZE);
+
+  /* Validate -mpreferred-stack-boundary= value.  */
+  riscv_stack_boundary = ABI_STACK_BOUNDARY;
+  if (riscv_preferred_stack_boundary_arg)
+{
+  int min = ctz_hwi (MIN_STACK_BOUNDARY / 8);
+  int max = 8;
+
+  if (!IN_RANGE (riscv_preferred_stack_boundary_arg, min, max))
+   error ("-mpreferred-stack-boundary=%d must be between %d and %d",
+  riscv_preferred_stack_boundary_arg, min, max);
+
+  riscv_stack_boundary = 8 << riscv_preferred_stack_boundary_arg;
+}
 }
 
 /* Implement TARGET_CONDITIONAL_REGISTER_USAGE.  */
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index cd37761b629..a002bff4480 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -123,8 +123,14 @@ along with GCC; see the file COPYING3.  If not see
 /* Allocation boundary (in *bits*) for the code of a function.  */
 #define FUNCTION_BOUNDARY (TARGET_RVC ? 16 : 32)
 
+/* The smallest supported stack boundary the calling convention supports.  */
+#define MIN_STACK_BOUNDARY (2 * BITS_PER_WORD)
+
+/* The ABI stack alignment.  */
+#define ABI_STACK_BOUNDARY 128
+
 /* There is no point aligning anything to a rounder boundary than this.  */
-#define BIGGEST_ALIGNMENT 128
+#define BIGGEST_ALIGNMENT STACK_BOUNDARY
 
 /* The user-level ISA permits unaligned accesses, but they are not required
of the privileged architecture.  */
@@ -472,8 +478,8 @@ enum reg_class
`crtl->outgoing_args_size'.  */
 #define OUTGOING_REG_PARM_STACK_SPACE(FNTYPE) 1
 
-#define STACK_BOUNDARY 128
-
+#define STACK_BOUNDARY riscv_stack_boundary
+
 /* Symbolic macros for the registers used to return integer and floating
point values.  */
 
@@ -528,8 +534,9 @@ typedef struct {
 
 #define EPILOGUE_USES(REGNO)   ((REGNO) == RETURN_ADDR_REGNUM)
 
-/* ABI requires 16-byte alignment, even on RV32. */
-#define RISCV_STACK_ALIGN(LOC) (((LOC) + 15) & -16)
+/* Align based on stack boundary, which might have been set by the user.  */
+#define RISCV_STACK_ALIGN(LOC) \
+  (((LOC) + ((STACK_BOUNDARY/8)-1)) & -(STACK_BOUNDARY/8))
 
 /* EXIT_IGNORE_STACK should be nonzero if, when returning from a function,
the stack pointer does not matter.  The value is tested only in
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 50df851afdf..581a26bb5c1 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -33,6 +33,10 @@ mabi=
 Target Report RejectNegative Joined Enum(abi_type) Var(riscv_abi) 
Init(ABI_ILP32)
 Specify integer and floating-point calling convention.
 
+mpreferred-stack-boundary=
+Target RejectNegative Joined UInteger Var(riscv_preferred_stack_boundary_arg)
+Attempt to keep stack aligned to this power of 2.
+
 Enum
 Name(abi_type) Type(enum riscv_abi_type)
 Supported ABIs (for use with the -mabi= option):
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 891de733160..f066349c2ff 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -992,6 +992,7 @@ See RS/6000 and PowerPC Options.
 -mdiv  -mno-

Re: [PATCH] libgcc: xtensa: fix NaN return from add/sub/mul/div helpers

2018-01-23 Thread Max Filippov
On Tue, Jan 23, 2018 at 1:07 PM, augustine.sterl...@gmail.com
 wrote:
> On Tue, Jan 23, 2018 at 9:55 AM, Max Filippov  wrote:
>> libgcc/
>> 2018-01-22  Max Filippov  
>>
>> * config/xtensa/ieee754-df.S (__addsf3, __subsf3, __mulsf3)
>> (__divsf3): Make NaN return value quiet.
>> * config/xtensa/ieee754-sf.S (__adddf3, __subdf3, __muldf3)
>> (__divdf3): Make NaN return value quiet.
>
> This is fine. Please apply.

Thanks, applied to trunk and backported to gcc-6 and gcc-7 branches.

-- Max


Re: [PATCH 4/6] [ARC] Rework delegitimate_address hook

2018-01-23 Thread Andrew Burgess
* Claudiu Zissulescu  [2017-11-02 13:30:33 
+0100]:

> From: claziss 
> 
> Delegitimize address is used to undo the obfuscating effect of PIC
> addresses, returning the address in a way which is understood by the
> compiler.
> 
> gcc/
> 2017-04-25  Claudiu Zissulescu  
> 
>   * config/arc/arc.c (arc_delegitimize_address_0): Refactored to
>   recognize new pic like addresses.
>   (arc_delegitimize_address): Clean up.
> 
> testsuite/
> 2017-08-31  Claudiu Zissulescu  
> 
>   * testsuite/gcc.target/arc/tdelegitimize_addr.c: New test.

Assuming this has passed all of the tests, then this change is fine
with me.

The commit message you propose above describes what delegitimize does
in general, but it doesn't really explain why _this_ change is
needed.  You remove a lot of code, it would be nice if the commit
message explained why we're able to drop all of this complexity.

Thanks,
Andrew



> ---
>  gcc/config/arc/arc.c  | 91 
> ++-
>  gcc/testsuite/gcc.target/arc/tdelegitimize_addr.c | 23 ++
>  2 files changed, 62 insertions(+), 52 deletions(-)
>  create mode 100755 gcc/testsuite/gcc.target/arc/tdelegitimize_addr.c
> 
> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> index e7194a2..07dd072 100644
> --- a/gcc/config/arc/arc.c
> +++ b/gcc/config/arc/arc.c
> @@ -9506,68 +9506,55 @@ arc_legitimize_address (rtx orig_x, rtx oldx, 
> machine_mode mode)
>  }
>  
>  static rtx
> -arc_delegitimize_address_0 (rtx x)
> +arc_delegitimize_address_0 (rtx op)
>  {
> -  rtx u, gp, p;
> -
> -  if (GET_CODE (x) == CONST && GET_CODE (u = XEXP (x, 0)) == UNSPEC)
> +  switch (GET_CODE (op))
>  {
> -  if (XINT (u, 1) == ARC_UNSPEC_GOT
> -   || XINT (u, 1) == ARC_UNSPEC_GOTOFFPC)
> - return XVECEXP (u, 0, 0);
> +case CONST:
> +  return arc_delegitimize_address_0 (XEXP (op, 0));
> +
> +case UNSPEC:
> +  switch (XINT (op, 1))
> + {
> + case ARC_UNSPEC_GOT:
> + case ARC_UNSPEC_GOTOFFPC:
> +   return XVECEXP (op, 0, 0);
> + default:
> +   break;
> + }
> +  break;
> +
> +case PLUS:
> +  {
> + rtx t1 = arc_delegitimize_address_0 (XEXP (op, 0));
> + rtx t2 = XEXP (op, 1);
> +
> + if (t1 && t2)
> +   return gen_rtx_PLUS (GET_MODE (op), t1, t2);
> + break;
> +  }
> +
> +default:
> +  break;
>  }
> -  else if (GET_CODE (x) == CONST && GET_CODE (p = XEXP (x, 0)) == PLUS
> -&& GET_CODE (u = XEXP (p, 0)) == UNSPEC
> -&& (XINT (u, 1) == ARC_UNSPEC_GOT
> -|| XINT (u, 1) == ARC_UNSPEC_GOTOFFPC))
> -return gen_rtx_CONST
> - (GET_MODE (x),
> -  gen_rtx_PLUS (GET_MODE (p), XVECEXP (u, 0, 0), XEXP (p, 1)));
> -  else if (GET_CODE (x) == PLUS
> -&& ((REG_P (gp = XEXP (x, 0))
> - && REGNO (gp) == PIC_OFFSET_TABLE_REGNUM)
> -|| (GET_CODE (gp) == CONST
> -&& GET_CODE (u = XEXP (gp, 0)) == UNSPEC
> -&& XINT (u, 1) == ARC_UNSPEC_GOT
> -&& GET_CODE (XVECEXP (u, 0, 0)) == SYMBOL_REF
> -&& !strcmp (XSTR (XVECEXP (u, 0, 0), 0), "_DYNAMIC")))
> -&& GET_CODE (XEXP (x, 1)) == CONST
> -&& GET_CODE (u = XEXP (XEXP (x, 1), 0)) == UNSPEC
> -&& XINT (u, 1) == ARC_UNSPEC_GOTOFF)
> -return XVECEXP (u, 0, 0);
> -  else if (GET_CODE (x) == PLUS && GET_CODE (XEXP (x, 0)) == PLUS
> -&& ((REG_P (gp = XEXP (XEXP (x, 0), 1))
> - && REGNO (gp) == PIC_OFFSET_TABLE_REGNUM)
> -|| (GET_CODE (gp) == CONST
> -&& GET_CODE (u = XEXP (gp, 0)) == UNSPEC
> -&& XINT (u, 1) == ARC_UNSPEC_GOT
> -&& GET_CODE (XVECEXP (u, 0, 0)) == SYMBOL_REF
> -&& !strcmp (XSTR (XVECEXP (u, 0, 0), 0), "_DYNAMIC")))
> -&& GET_CODE (XEXP (x, 1)) == CONST
> -&& GET_CODE (u = XEXP (XEXP (x, 1), 0)) == UNSPEC
> -&& XINT (u, 1) == ARC_UNSPEC_GOTOFF)
> -return gen_rtx_PLUS (GET_MODE (x), XEXP (XEXP (x, 0), 0),
> -  XVECEXP (u, 0, 0));
> -  else if (GET_CODE (x) == PLUS
> -&& (u = arc_delegitimize_address_0 (XEXP (x, 1
> -return gen_rtx_PLUS (GET_MODE (x), XEXP (x, 0), u);
>return NULL_RTX;
>  }
>  
>  static rtx
> -arc_delegitimize_address (rtx x)
> +arc_delegitimize_address (rtx orig_x)
>  {
> -  rtx orig_x = x = delegitimize_mem_from_attrs (x);
> -  if (GET_CODE (x) == MEM)
> +  rtx x = orig_x;
> +
> +  if (MEM_P (x))
>  x = XEXP (x, 0);
> +
>x = arc_delegitimize_address_0 (x);
> -  if (x)
> -{
> -  if (MEM_P (orig_x))
> - x = replace_equiv_address_nv (orig_x, x);
> -  return x;
> -}
> -  return orig_x;
> +  if (!x)
> +return orig_x;
> +
> +  if (MEM_P (orig_x))
> +x = replace_equiv_address_nv (orig_x, x);
> +  return x;
>  }
>  
>  /* Return a REG rtx for acc1.  N.B. the gcc-internal representation may
> diff --git a/gcc/testsuite/gcc.target/arc/tdelegitimize_addr.c 

C++ PATCH for c++/83947, ICE with auto deduction

2018-01-23 Thread Jason Merrill
In this testcase, we were deducing a type for g from the function f
which has not yet been deduced itself.  Fixed by recognizing the case
of an undeduced initializer.

The change to undeduced_auto_decl is necessary because auto parameters
are a different story; they act more like normal template parameters.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 28ec0bede14adc98ac9cec6693f8a4c0ccf39d14
Author: Jason Merrill 
Date:   Tue Jan 23 13:47:13 2018 -0500

PR c++/83947 - ICE with auto declarations.

* pt.c (do_auto_deduction): Don't deduce from an auto decl.
* decl.c (undeduced_auto_decl): Limit to vars and fns.

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index f6fab422d17..e6d9289abd2 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -16189,15 +16189,17 @@ fndecl_declared_return_type (tree fn)
   return TREE_TYPE (TREE_TYPE (fn));
 }
 
-/* Returns true iff DECL was declared with an auto type and it has
-   not yet been deduced to a real type.  */
+/* Returns true iff DECL is a variable or function declared with an auto type
+   that has not yet been deduced to a real type.  */
 
 bool
 undeduced_auto_decl (tree decl)
 {
   if (cxx_dialect < cxx11)
 return false;
-  return type_uses_auto (TREE_TYPE (decl));
+  return ((VAR_OR_FUNCTION_DECL_P (decl)
+  || TREE_CODE (decl) == TEMPLATE_DECL)
+ && type_uses_auto (TREE_TYPE (decl)));
 }
 
 /* Complain if DECL has an undeduced return type.  */
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 0296845a31b..d5d263c1c74 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -25891,6 +25891,10 @@ do_auto_deduction (tree type, tree init, tree 
auto_node,
from ahead of time isn't worth the trouble.  */
 return type;
 
+  /* Similarly, we can't deduce from another undeduced decl.  */
+  if (init && undeduced_auto_decl (init))
+return type;
+
   if (tree tmpl = CLASS_PLACEHOLDER_TEMPLATE (auto_node))
 /* C++17 class template argument deduction.  */
 return do_class_deduction (type, tmpl, init, flags, complain);
diff --git a/gcc/testsuite/g++.dg/cpp1y/auto-fn46.C 
b/gcc/testsuite/g++.dg/cpp1y/auto-fn46.C
new file mode 100644
index 000..120a4dd9e7c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/auto-fn46.C
@@ -0,0 +1,6 @@
+// PR c++/83947
+// { dg-do compile { target c++14 } }
+
+auto f ();
+template < int > auto g (f);   // { dg-error "before deduction" }
+auto h = g < 0 > ();


Re: [PATCH] libgcc: xtensa: fix NaN return from add/sub/mul/div helpers

2018-01-23 Thread augustine.sterl...@gmail.com
On Tue, Jan 23, 2018 at 9:55 AM, Max Filippov  wrote:
> libgcc/
> 2018-01-22  Max Filippov  
>
> * config/xtensa/ieee754-df.S (__addsf3, __subsf3, __mulsf3)
> (__divsf3): Make NaN return value quiet.
> * config/xtensa/ieee754-sf.S (__adddf3, __subdf3, __muldf3)
> (__divdf3): Make NaN return value quiet.

This is fine. Please apply.


Re: [PATCH v2] C++: Fix ICE in fold_for_warn on CAST_EXPR (PR c++/83974)

2018-01-23 Thread Jason Merrill
On Tue, Jan 23, 2018 at 1:06 PM, David Malcolm  wrote:
> On Tue, 2018-01-23 at 09:54 -0500, Jason Merrill wrote:
>> On Tue, Jan 23, 2018 at 9:27 AM, David Malcolm 
>> wrote:
>> > PR c++/83974 reports an ICE within fold_for_warn when calling
>> > cxx_eval_constant_expression on a CAST_EXPR.
>> >
>> > This comes from a pointer-to-member-function.  The result of
>> > build_ptrmemfunc (within cp_convert_to_pointer for a null ptr) is
>> > a CONSTRUCTOR containing, amongst other things a CAST_EXPR of a
>> > TREE_LIST containing the zero INTEGER_CST.
>> >
>> > After r256804, fold_for_warn within a template calls
>> > fold_non_dependent_expr.
>> >
>> > For this tree, is_nondependent_constant_expression returns true.
>> >
>> > potential_constant_expression_1 has these cases:
>> >
>> > case TREE_LIST:
>> >   {
>> > gcc_assert (TREE_PURPOSE (t) == NULL_TREE
>> > || DECL_P (TREE_PURPOSE (t)));
>> > if (!RECUR (TREE_VALUE (t), want_rval))
>> >   return false;
>> > if (TREE_CHAIN (t) == NULL_TREE)
>> >   return true;
>> > return RECUR (TREE_CHAIN (t), want_rval);
>> >   }
>> >
>> > and:
>> >
>> > case CAST_EXPR:
>> > case CONST_CAST_EXPR:
>> > case STATIC_CAST_EXPR:
>> > case REINTERPRET_CAST_EXPR:
>> > case IMPLICIT_CONV_EXPR:
>> >   if (cxx_dialect < cxx11
>> >   && !dependent_type_p (TREE_TYPE (t))
>> >   && !INTEGRAL_OR_ENUMERATION_TYPE_P (TREE_TYPE (t)))
>> > /* In C++98, a conversion to non-integral type can't be
>> > part of a
>> >constant expression.  */
>> >   {
>> >   // reject it
>> >   }
>> > // accept it
>> >
>> > and thus returns true for the CAST_EXPR and TREE_LIST, and hence
>> > for the
>> > CONSTRUCTOR as a whole.
>> >
>> > However, cxx_eval_constant_expression does not support these tree
>> > codes,
>>
>> Because they are template-only codes, that
>> cxx_eval_constant_expression should never see.  They shouldn't
>> survive
>> the call to instantiate_non_dependent_expr_internal from
>> fold_non_dependent_expr.
>>
>> Jason
>
> Aha - thanks.
>
> instantiate_non_dependent_expr_internal calls tsubst_copy_and_build, and
> tsubst_copy_and_build is bailing out here:
>
> 18103   /* digest_init will do the wrong thing if we let it.  */
> 18104   if (type && TYPE_PTRMEMFUNC_P (type))
> 18105 RETURN (t);
>
> leaving the CONSTRUCTOR uncopied, and thus containing the CAST_EXPR and
> TREE_LIST, leading to the ICE within cxx_eval_constant_expression.

Wow, how has that not broken things before now?

The patch is OK.

Jason


Re: [C++ Patch] PR 83921 ("[7/8 Regression] GCC rejects constexpr initialization of empty aggregate")

2018-01-23 Thread Jason Merrill
OK, thanks.

On Tue, Jan 23, 2018 at 2:37 PM, Paolo Carlini  wrote:
> Hi,
>
> On 23/01/2018 18:38, Jason Merrill wrote:
>>
>> On 01/22/2018 05:13 PM, Paolo Carlini wrote:
>>>
>>> Hi again,
>>>
>>> On 22/01/2018 22:50, Paolo Carlini wrote:

 Ok. The below passes the C++ testsuite and I'm finishing testing it.
 Therefore, as you already hinted to, we can now say that what was *really*
 missing from potential_constant_expression_1 was the use of
 default_init_uninitialized_part, which does all the non-trivial work 
 besides
 the later !DECL_NONTRIVIALLY_INITIALIZED_P check.
 check_for_uninitialized_const_var also provides the informs, which were
 completely missing.
>>>
>>> Grrr. Testing the library revealed immediately the failure of
>>> 18_support/byte/ops.cc, because in constexpr_context_p == true, thus from
>>> potential_constant_expression_1, the case CP_TYPE_CONST_P triggers. I guess
>>> we really want to keep the existing constexpr_context_p == false cases
>>> separate. I'm therefore restarting testing with the below.
>>
>>
>>> +  && ((!constexpr_context_p
>>> +   && (CP_TYPE_CONST_P (type) || var_in_constexpr_fn (decl)))
>>> +  || (constexpr_context_p && !DECL_NONTRIVIALLY_INITIALIZED_P
>>> (decl)))
>>>&& !DECL_INITIAL (decl))
>>
>>
>> I think I'd replace the DECL_INITIAL check with
>> DECL_NONTRIVIALLY_INITIALIZED_P, which seems more precise.  So we ought to
>> be ok with the simpler
>>
>> && (constexpr_context_p
>> || CP_TYPE_CONST_P (type) || var_in_constexpr_fn (decl))
>> && !DECL_NONTRIVIALLY_INITIALIZED_P (decl))
>
> Oh nice, thanks Jason.
>
> I'm therefore finishing regtesting the below, it's currently half way
> through the library testsuite. Ok if it passes?
>
> Thanks again,
> Paolo.
>
> 


Fix PR rtl-optimization/81443

2018-01-23 Thread Eric Botcazou
This is a combinatorial explosion in the combiner for the num_sign_bit_copies 
mechanism present on the 7 branch and mainline during the bootstrap on MIPS 
n32, i.e. a 64-bit WORD_REGISTER_OPERATIONS SIGN_EXTEND target.  While I'm 
still exploring various approaches to fixing it on the mainline, Richard B. 
agreed to have it fixed on the 7 branch by reverting the problematic bits from 
the PR rtl-optimization/59461 patch that introduced it.

Manually tested on mips64-unknown-linux-gnu, applied on the 7 branch.


2018-01-23  Eric Botcazou  

PR rtl-optimization/81443
* rtlanal.c (num_sign_bit_copies1) : Do not propagate results
from inner REGs to paradoxical SUBREGs.

-- 
Eric BotcazouIndex: rtlanal.c
===
--- rtlanal.c	(revision 256841)
+++ rtlanal.c	(working copy)
@@ -4976,7 +4976,7 @@ num_sign_bit_copies1 (const_rtx x, machi
   if (WORD_REGISTER_OPERATIONS
 	  && load_extend_op (inner_mode) == SIGN_EXTEND
 	  && paradoxical_subreg_p (x)
-	  && (MEM_P (SUBREG_REG (x)) || REG_P (SUBREG_REG (x
+	  && MEM_P (SUBREG_REG (x)))
 	return cached_num_sign_bit_copies (SUBREG_REG (x), mode,
 	   known_x, known_mode, known_ret);
   break;


[committed] Fix LTO ICE on OpenMP privatized member DECL_VALUE_EXPRs (PR sanitizer/83987)

2018-01-23 Thread Jakub Jelinek
Hi!

DECL_OMP_PRIVATIZED_MEMBER vars are artificial, DECL_IGNORED_P and
useful only during OpenMP lowering, their DECL_VALUE_EXPR isn't really
useful afterwards (just shouldn't be cleared because then we could try
to expand those vars if we don't optimize them as unused).

The sanitizer can add calls in them to UBSAN_VPTR etc. though, which
LTO streaming doesn't like.

The following patch just makes sure it doesn't see them.
Bootstrapped/regtested on x86_64-linux and i686-linux, committed
to trunk.

2018-01-23  Jakub Jelinek  

PR sanitizer/83987
* tree.c (cp_free_lang_data): Change DECL_VALUE_EXPR of
DECL_OMP_PRIVATIZED_MEMBER vars to error_mark_node.

* g++.dg/ubsan/pr83987.C: New test.

--- gcc/cp/tree.c.jj2018-01-17 22:00:06.878228591 +0100
+++ gcc/cp/tree.c   2018-01-23 10:17:03.810795156 +0100
@@ -5274,6 +5274,16 @@ cp_free_lang_data (tree t)
 /* We do not need the leftover chaining of namespaces from the
binding level.  */
 DECL_CHAIN (t) = NULL_TREE;
+  /* Set DECL_VALUE_EXPRs of OpenMP privatized member artificial
+ decls to error_mark_node.  These are DECL_IGNORED_P and after
+ OpenMP lowering they aren't useful anymore.  Clearing DECL_VALUE_EXPR
+ doesn't work, as expansion could then consider them as something
+ to be expanded.  */
+  if (VAR_P (t)
+  && DECL_LANG_SPECIFIC (t)
+  && DECL_OMP_PRIVATIZED_MEMBER (t)
+  && DECL_IGNORED_P (t))
+SET_DECL_VALUE_EXPR (t, error_mark_node);
 }
 
 /* Stub for c-common.  Please keep in sync with c-decl.c.
--- gcc/testsuite/g++.dg/ubsan/pr83987.C.jj 2018-01-23 10:35:37.158192734 
+0100
+++ gcc/testsuite/g++.dg/ubsan/pr83987.C2018-01-23 10:37:04.819118976 
+0100
@@ -0,0 +1,15 @@
+// PR sanitizer/83987
+// { dg-do compile { target fopenmp } }
+// { dg-options "-fopenmp -fsanitize=vptr -O0" }
+
+struct A { int i; };
+struct B : virtual A { void foo (); };
+
+void
+B::foo ()
+{
+#pragma omp sections lastprivate (i)
+  {
+i = 0;
+  }
+}

Jakub


Re: [Patch, fortran] PR37577 - [meta-bug] change internal array descriptor format for better syntax, C interop TR, rank 15

2018-01-23 Thread Jakub Jelinek
On Tue, Jan 23, 2018 at 07:30:49PM +, Paul Richard Thomas wrote:
> Janne, Thanks.
> 
> Jakub, is this OK with you?

It is indeed quite late for such large ABI changes, some distributions are
about to start using the compiler by now.  How much was it tested (on which
targets)?  Has the debug info side of things been adjusted too (the
get_array_descr_info langhook)?

> >> I took the design choice choice to replace the dtype with a structure:
> >> typedef struct dtype_type
> >> {
> >>   size_t elem_len;
> >>   int version;
> >>   int rank;
> >>   int type;
> >>   int attribute;
> >> }
> >> dtype_type;

Isn't this too wasteful?  rank will be just 0-15, right?
What values can version have?  What type?  Do you need negative values for
any of those?
I think using unsigned char or unsigned short for some of those fields
should be sufficient, yeah, they don't necessarily need to be bitfields.

Jakub


Re: [C++ Patch] PR 83921 ("[7/8 Regression] GCC rejects constexpr initialization of empty aggregate")

2018-01-23 Thread Paolo Carlini

Hi,

On 23/01/2018 18:38, Jason Merrill wrote:

On 01/22/2018 05:13 PM, Paolo Carlini wrote:

Hi again,

On 22/01/2018 22:50, Paolo Carlini wrote:
Ok. The below passes the C++ testsuite and I'm finishing testing it. 
Therefore, as you already hinted to, we can now say that what was 
*really* missing from potential_constant_expression_1 was the use of 
default_init_uninitialized_part, which does all the non-trivial work 
besides the later !DECL_NONTRIVIALLY_INITIALIZED_P check. 
check_for_uninitialized_const_var also provides the informs, which 
were completely missing.
Grrr. Testing the library revealed immediately the failure of 
18_support/byte/ops.cc, because in constexpr_context_p == true, thus 
from potential_constant_expression_1, the case CP_TYPE_CONST_P 
triggers. I guess we really want to keep the existing 
constexpr_context_p == false cases separate. I'm therefore restarting 
testing with the below.



+  && ((!constexpr_context_p
+   && (CP_TYPE_CONST_P (type) || var_in_constexpr_fn (decl)))
+  || (constexpr_context_p && !DECL_NONTRIVIALLY_INITIALIZED_P 
(decl)))

   && !DECL_INITIAL (decl))


I think I'd replace the DECL_INITIAL check with 
DECL_NONTRIVIALLY_INITIALIZED_P, which seems more precise.  So we 
ought to be ok with the simpler


&& (constexpr_context_p
    || CP_TYPE_CONST_P (type) || var_in_constexpr_fn (decl))
&& !DECL_NONTRIVIALLY_INITIALIZED_P (decl))

Oh nice, thanks Jason.

I'm therefore finishing regtesting the below, it's currently half way 
through the library testsuite. Ok if it passes?


Thanks again,
Paolo.


Index: cp/constexpr.c
===
--- cp/constexpr.c  (revision 256991)
+++ cp/constexpr.c  (working copy)
@@ -5707,13 +5707,9 @@ potential_constant_expression_1 (tree t, bool want
  "% in % context", tmp);
  return false;
}
- else if (!DECL_NONTRIVIALLY_INITIALIZED_P (tmp))
-   {
- if (flags & tf_error)
-   error_at (DECL_SOURCE_LOCATION (tmp), "uninitialized "
- "variable %qD in % context", tmp);
- return false;
-   }
+ else if (!check_for_uninitialized_const_var
+  (tmp, /*constexpr_context_p=*/true, flags))
+   return false;
}
   return RECUR (tmp, want_rval);
 
Index: cp/cp-tree.h
===
--- cp/cp-tree.h(revision 256991)
+++ cp/cp-tree.h(working copy)
@@ -6221,6 +6221,7 @@ extern tree finish_case_label 
(location_t, tree,
 extern tree cxx_maybe_build_cleanup(tree, tsubst_flags_t);
 extern bool check_array_designated_initializer  (constructor_elt *,
 unsigned HOST_WIDE_INT);
+extern bool check_for_uninitialized_const_var   (tree, bool, tsubst_flags_t);
 
 /* in decl2.c */
 extern void record_mangling(tree, bool);
Index: cp/decl.c
===
--- cp/decl.c   (revision 256991)
+++ cp/decl.c   (working copy)
@@ -72,7 +72,6 @@ static int check_static_variable_definition (tree,
 static void record_unknown_type (tree, const char *);
 static tree builtin_function_1 (tree, tree, bool);
 static int member_function_or_else (tree, tree, enum overload_flags);
-static void check_for_uninitialized_const_var (tree);
 static tree local_variable_p_walkfn (tree *, int *, void *);
 static const char *tag_name (enum tag_types);
 static tree lookup_and_check_tag (enum tag_types, tree, tag_scope, bool);
@@ -5545,10 +5544,14 @@ maybe_commonize_var (tree decl)
 }
 }
 
-/* Issue an error message if DECL is an uninitialized const variable.  */
+/* Issue an error message if DECL is an uninitialized const variable.
+   CONSTEXPR_CONTEXT_P is true when the function is called in a constexpr
+   context from potential_constant_expression.  Returns true if all is well,
+   false otherwise.  */
 
-static void
-check_for_uninitialized_const_var (tree decl)
+bool
+check_for_uninitialized_const_var (tree decl, bool constexpr_context_p,
+  tsubst_flags_t complain)
 {
   tree type = strip_array_types (TREE_TYPE (decl));
 
@@ -5557,26 +5560,38 @@ maybe_commonize_var (tree decl)
  7.1.6 */
   if (VAR_P (decl)
   && TREE_CODE (type) != REFERENCE_TYPE
-  && (CP_TYPE_CONST_P (type) || var_in_constexpr_fn (decl))
-  && !DECL_INITIAL (decl))
+  && (constexpr_context_p
+ || CP_TYPE_CONST_P (type) || var_in_constexpr_fn (decl))
+  && !DECL_NONTRIVIALLY_INITIALIZED_P (decl))
 {
   tree field = default_init_uninitialized_part (type);
   if (!field)
-   return;
+   return true;
 
-  if (CP_TYPE_CONST_P (type))
-   permerror (DECL_SOURCE_LOCATION (decl),
-  "uninitialized const %qD", decl);
-  e

Re: [Patch, fortran] PR37577 - [meta-bug] change internal array descriptor format for better syntax, C interop TR, rank 15

2018-01-23 Thread Paul Richard Thomas
Janne, Thanks.

Jakub, is this OK with you?

Paul

On 23 January 2018 at 19:09, Janne Blomqvist  wrote:
> On Mon, Jan 22, 2018 at 9:12 PM, Paul Richard Thomas
>  wrote:
>> This patch has been triggered by Thomas's recent message to the list.
>> Not only did I start work late relative to stage 3 but debugging took
>> somewhat longer than anticipated. Therefore, to get this committed
>> asap, we will have to beg the indulgence of the release managers and
>> prompt review and/or testing by fortran maintainers. (Dominique has
>> already undertaken to test -m32.)
>>
>> The patch delivers rank up to 15 for F2008, the descriptor information
>> needed to enact the F2018 C descriptor macros and an attribute field
>> to store such information as pointer/allocatable, contiguous etc..
>> Only the first has been enabled so far but it was necessary to submit
>> the array descriptor changes now to avoid any further ABI breakage in
>> 9.0.0.
>>
>> I took the design choice choice to replace the dtype with a structure:
>> typedef struct dtype_type
>> {
>>   size_t elem_len;
>>   int version;
>>   int rank;
>>   int type;
>>   int attribute;
>> }
>> dtype_type;
>>
>> This choice was intended to reduce the changes to a minimum, since in
>> most references to the dtype, one dtype is assigned to another.
>>
>> The F2018 interop defines the 'type and 'attribute fields to be signed
>> char types. I used this intially but found that using int was the best
>> way to silence the warnings about padding since it also allows for
>> more attribute information to be carried.
>>
>> Some parts of the patch (eg. in get_scalar_to_descriptor_type) look as
>> if latent bugs were uncovered by the change to the descriptor. If so,
>> the time spent debugging was well worthwhile.
>>
>> It should be noted that some of the intrinsics, which use switch/case
>> for the type/kind selection, limit the effective element size that
>> they handle to the maximum value of size_t, less 7 bits. A bit of
>> straightforward work there would fix this limitation and would allow
>> the GFC_DTYPE shifts and masks to be eliminated.
>>
>> Bootstraps and regtests on FC23/x86_64 - OK for trunk?
>
> In trans-types.c:
>
> structure dtype_type dtype;
>
> Should be "struct dtype_type dtype". (This is in a comment, so doesn't
> affect the code, but still).
>
> I have to say, the patch is much smaller and less invasive than I had
> feared for such a fundamental change. I guess you were right about
> making dtype it's own type.
>
> As for the DTYPE shifting and masking thing, now that I have read the
> patch, if I understood it correctly, that's an internal issue in
> libgfortran and has no effect on the descriptor.  That being said, the
> F2018 C descriptor has both the type and kind encoded into the type
> field, see table 18.4 in F2018 N2146. (In a way that's redundant,
> since the type and the elem_len fields suffice to figure out the
> type&kind combination). Anyway, is there a case for following suit,
> and if so, is it a too invasive change at this point?
>
> In any case, since we seem to agree that we have no big lingering
> issues that would require us to break the ABI again for GCC 9, IMHO
> this is Ok for trunk. You might want to get an explicit Ok from the
> release manager, though.
>
> --
> Janne Blomqvist



-- 
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein


Re: [PATCH] PR 83975 Set character length to 0 if unknown

2018-01-23 Thread Janne Blomqvist
On Tue, Jan 23, 2018 at 6:33 PM, Thomas Koenig  wrote:
> Hi Janne,
>
>> When associating a variable of type character, if the length of the
>> target isn't known, set it to zero rather than leaving it unset.  This
>> is not a complete fix for making associate of characters work
>> properly, but papers over an ICE in the middle-end. See PR 83344 for
>> more details.
>>
>> Regtested on x86_64-pc-linux-gnu, Ok for trunk?
>
>
> Does the code which is accepted then execute correctly?

Hmm, prompted by your question I did some investigating, and I present
to you evidence of GFortran being a quantum mechanical compiler
(spooky action at a distance!): Consider the slightly fleshed out
testcase from PR 83975:

program foo
  call s("foo")
contains
subroutine s(x)
  character(*) :: x
  print *, "X:", x, ":ENDX"
  print *, "len(x): ", len(x)
!  associate (y => x)
!print *, "Y:", y, ":ENDY"
!print *, "len(y): ", len(y)
!  end associate
end subroutine s
end program foo

With the associate stuff commented out, it's fairly bog-standard
stuff, and the output is the expected:

 X:foo:ENDX
 len(x):3

Now, incorporating the commented out associate block, and the result is:

 X::ENDX
 len(x):0
 Y::ENDY
 len(y):0

So the associate construct manages to somehow break the variable X in
the outer scope!



-- 
Janne Blomqvist


[Patch, fortran] PR83866 - [8 Regression] ICE in gfc_release_symbol, at fortran/symbol.c:3087

2018-01-23 Thread Paul Richard Thomas
Committed as 'obvious' in revision 256995.

Closing PR.

Paul

2018-23-01  Paul Thomas  

PR fortran/83866
* decl.c (gfc_match_derived_decl): If eos not matched, recover
and emit error about garbage after declaration.

2018-23-01  Paul Thomas  

PR fortran/83866
* gfortran.dg/pdt_29.f03 : New test.
Index: gcc/fortran/decl.c
===
*** gcc/fortran/decl.c  (revision 256606)
--- gcc/fortran/decl.c  (working copy)
*** gfc_match_derived_decl (void)
*** 9856,9862 
gfc_error_recovery ();
m = gfc_match_eos ();
if (m != MATCH_YES)
!   return m;
sym->attr.pdt_template = 1;
  }
  
--- 9856,9865 
gfc_error_recovery ();
m = gfc_match_eos ();
if (m != MATCH_YES)
!   {
! gfc_error_recovery ();
! gfc_error_now ("Garbage after PARAMETERIZED TYPE declaration at %C");
!   }
sym->attr.pdt_template = 1;
  }
  
Index: gcc/testsuite/gfortran.dg/pdt_29.f03
===
*** gcc/testsuite/gfortran.dg/pdt_29.f03(nonexistent)
--- gcc/testsuite/gfortran.dg/pdt_29.f03(working copy)
***
*** 0 
--- 1,15 
+ ! { dg-do compile }
+ !
+ ! Test the fix for PR83866.f90
+ !
+ ! Contributed by G Steinmetz  
+ !
+ program p
+   type private
+   end type
+type t
+   class(t), pointer :: a
+end type
+type extends(t) :: t2 ! { dg-error "Garbage after | does not have a 
component" }
+end type
+ end


[Patch, fortran] PR83898 - [6/7/8 Regression] ICE in gfc_conv_expr_descriptor, at fortran/trans-array.c:7181

2018-01-23 Thread Paul Richard Thomas
Commit as 'obvious' in revision 256994.

I will attend to 6- and 7-branches in a little while.

Paul

2018-23-01  Paul Thomas  

PR fortran/83898
* trans-stmt.c (trans_associate_var): Do not set cst_array_ctor
for characters.

2018-23-01  Paul Thomas  

PR fortran/83898
* gfortran.dg/associate_33.f03 : New test.
Index: gcc/fortran/trans-stmt.c
===
*** gcc/fortran/trans-stmt.c(revision 256606)
--- gcc/fortran/trans-stmt.c(working copy)
*** trans_associate_var (gfc_symbol *sym, gf
*** 1579,1585 
  
desc = sym->backend_decl;
cst_array_ctor = e->expr_type == EXPR_ARRAY
! && gfc_constant_array_constructor_p (e->value.constructor);
  
/* If association is to an expression, evaluate it and create temporary.
 Otherwise, get descriptor of target for pointer assignment.  */
--- 1579,1586 
  
desc = sym->backend_decl;
cst_array_ctor = e->expr_type == EXPR_ARRAY
! && gfc_constant_array_constructor_p (e->value.constructor)
! && e->ts.type != BT_CHARACTER;
  
/* If association is to an expression, evaluate it and create temporary.
 Otherwise, get descriptor of target for pointer assignment.  */
Index: gcc/testsuite/gfortran.dg/associate_33.f03
===
*** gcc/testsuite/gfortran.dg/associate_33.f03  (nonexistent)
--- gcc/testsuite/gfortran.dg/associate_33.f03  (working copy)
***
*** 0 
--- 1,11 
+ ! { dg-do run }
+ !
+ ! Test the fix for PR83898.f90
+ !
+ ! Contributed by G Steinmetz  
+ !
+ program p
+associate (x => ['1','2'])
+   if (any (x .ne. ['1','2'])) call abort
+end associate
+ end


Re: [Patch, fortran] PR37577 - [meta-bug] change internal array descriptor format for better syntax, C interop TR, rank 15

2018-01-23 Thread Janne Blomqvist
On Mon, Jan 22, 2018 at 9:12 PM, Paul Richard Thomas
 wrote:
> This patch has been triggered by Thomas's recent message to the list.
> Not only did I start work late relative to stage 3 but debugging took
> somewhat longer than anticipated. Therefore, to get this committed
> asap, we will have to beg the indulgence of the release managers and
> prompt review and/or testing by fortran maintainers. (Dominique has
> already undertaken to test -m32.)
>
> The patch delivers rank up to 15 for F2008, the descriptor information
> needed to enact the F2018 C descriptor macros and an attribute field
> to store such information as pointer/allocatable, contiguous etc..
> Only the first has been enabled so far but it was necessary to submit
> the array descriptor changes now to avoid any further ABI breakage in
> 9.0.0.
>
> I took the design choice choice to replace the dtype with a structure:
> typedef struct dtype_type
> {
>   size_t elem_len;
>   int version;
>   int rank;
>   int type;
>   int attribute;
> }
> dtype_type;
>
> This choice was intended to reduce the changes to a minimum, since in
> most references to the dtype, one dtype is assigned to another.
>
> The F2018 interop defines the 'type and 'attribute fields to be signed
> char types. I used this intially but found that using int was the best
> way to silence the warnings about padding since it also allows for
> more attribute information to be carried.
>
> Some parts of the patch (eg. in get_scalar_to_descriptor_type) look as
> if latent bugs were uncovered by the change to the descriptor. If so,
> the time spent debugging was well worthwhile.
>
> It should be noted that some of the intrinsics, which use switch/case
> for the type/kind selection, limit the effective element size that
> they handle to the maximum value of size_t, less 7 bits. A bit of
> straightforward work there would fix this limitation and would allow
> the GFC_DTYPE shifts and masks to be eliminated.
>
> Bootstraps and regtests on FC23/x86_64 - OK for trunk?

In trans-types.c:

structure dtype_type dtype;

Should be "struct dtype_type dtype". (This is in a comment, so doesn't
affect the code, but still).

I have to say, the patch is much smaller and less invasive than I had
feared for such a fundamental change. I guess you were right about
making dtype it's own type.

As for the DTYPE shifting and masking thing, now that I have read the
patch, if I understood it correctly, that's an internal issue in
libgfortran and has no effect on the descriptor.  That being said, the
F2018 C descriptor has both the type and kind encoded into the type
field, see table 18.4 in F2018 N2146. (In a way that's redundant,
since the type and the elem_len fields suffice to figure out the
type&kind combination). Anyway, is there a case for following suit,
and if so, is it a too invasive change at this point?

In any case, since we seem to agree that we have no big lingering
issues that would require us to break the ABI again for GCC 9, IMHO
this is Ok for trunk. You might want to get an explicit Ok from the
release manager, though.

-- 
Janne Blomqvist


Re: [Patch, fortran] PR37577 - [meta-bug] change internal array descriptor format for better syntax, C interop TR, rank 15

2018-01-23 Thread Thomas Koenig

Hi Paul,


Could somebody please review the patch?


I'd say the patch is OK for trunk. I have also tested this on
a big-endian system, and things worked.

I will look at the resulting fallout of the GFC_DTYPE_TYPE_SIZE
stuff.

Thanks a lot for your work!

Regards

Thomas


Re: [C++ PATCH] Deprecate ARM-era for scopes

2018-01-23 Thread Nathan Sidwell

On 01/23/2018 01:27 PM, Mike Stump wrote:

On Jan 23, 2018, at 4:18 AM, Nathan Sidwell  wrote:


As discussed (https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01778.html) this 
patch deprecates the ARM-era for scope.


The code gives:

   if you use %<-fpermissive%> G++ will accept your code

I think we should no longer recommend a deprecated feature without at least 
stating it is deprecated in the hint?  Not too important, as if they actually 
use it, the compiler will spit out:

   this flexibility is deprecated and will be removed

if someone tried to use it, but thought I might mention it.


yeah, I hemmed and hawed about that, but given how bitrotted those 
diagnostics had become punted.


nathan

--
Nathan Sidwell


Re: [Patch, fortran] PR37577 - [meta-bug] change internal array descriptor format for better syntax, C interop TR, rank 15

2018-01-23 Thread Thomas Koenig

Hi Jerry,


Do you mean adding something like this:

diff --git a/libgfortran/libgfortran.h b/libgfortran/libgfortran.h
index 4c643b7e17b..c86e0b45e1d 100644
--- a/libgfortran/libgfortran.h
+++ b/libgfortran/libgfortran.h
@@ -600,6 +600,7 @@ typedef struct st_parameter_common
GFC_INTEGER_4 line;
CHARACTER2 (iomsg);
GFC_INTEGER_4 *iostat;
+  void *reserved;
  }
  st_parameter_common;


Yes, this is what I had in mind.

Regards

Thomas


Re: [C++ PATCH] Deprecate ARM-era for scopes

2018-01-23 Thread Mike Stump
On Jan 23, 2018, at 4:18 AM, Nathan Sidwell  wrote:
> 
> As discussed (https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01778.html) this 
> patch deprecates the ARM-era for scope.

The code gives:

  if you use %<-fpermissive%> G++ will accept your code

I think we should no longer recommend a deprecated feature without at least 
stating it is deprecated in the hint?  Not too important, as if they actually 
use it, the compiler will spit out:

  this flexibility is deprecated and will be removed

if someone tried to use it, but thought I might mention it.

Re: [PATCH][GCC][Testsuite] Have dg-cmp-results reject log files.

2018-01-23 Thread Mike Stump
On Jan 23, 2018, at 2:31 AM, Tamar Christina  wrote:
> 
> This patch makes dg-cmp-results.sh reject the use of log files in the 
> comparison.

No please.  We like to run that script on log files from time to time for 
various reasons.  I'd rather fix anything that prevents that from working as 
expected.

> Often when given a log file dg-cmp-results will give incomplete/wrong output 
> and
> using log instead of sum is one autocomplete tab away.

Add a make target if you can't type the right command line?

[PATCH v2] C++: Fix ICE in fold_for_warn on CAST_EXPR (PR c++/83974)

2018-01-23 Thread David Malcolm
On Tue, 2018-01-23 at 09:54 -0500, Jason Merrill wrote:
> On Tue, Jan 23, 2018 at 9:27 AM, David Malcolm 
> wrote:
> > PR c++/83974 reports an ICE within fold_for_warn when calling
> > cxx_eval_constant_expression on a CAST_EXPR.
> > 
> > This comes from a pointer-to-member-function.  The result of
> > build_ptrmemfunc (within cp_convert_to_pointer for a null ptr) is
> > a CONSTRUCTOR containing, amongst other things a CAST_EXPR of a
> > TREE_LIST containing the zero INTEGER_CST.
> > 
> > After r256804, fold_for_warn within a template calls
> > fold_non_dependent_expr.
> > 
> > For this tree, is_nondependent_constant_expression returns true.
> > 
> > potential_constant_expression_1 has these cases:
> > 
> > case TREE_LIST:
> >   {
> > gcc_assert (TREE_PURPOSE (t) == NULL_TREE
> > || DECL_P (TREE_PURPOSE (t)));
> > if (!RECUR (TREE_VALUE (t), want_rval))
> >   return false;
> > if (TREE_CHAIN (t) == NULL_TREE)
> >   return true;
> > return RECUR (TREE_CHAIN (t), want_rval);
> >   }
> > 
> > and:
> > 
> > case CAST_EXPR:
> > case CONST_CAST_EXPR:
> > case STATIC_CAST_EXPR:
> > case REINTERPRET_CAST_EXPR:
> > case IMPLICIT_CONV_EXPR:
> >   if (cxx_dialect < cxx11
> >   && !dependent_type_p (TREE_TYPE (t))
> >   && !INTEGRAL_OR_ENUMERATION_TYPE_P (TREE_TYPE (t)))
> > /* In C++98, a conversion to non-integral type can't be
> > part of a
> >constant expression.  */
> >   {
> >   // reject it
> >   }
> > // accept it
> > 
> > and thus returns true for the CAST_EXPR and TREE_LIST, and hence
> > for the
> > CONSTRUCTOR as a whole.
> > 
> > However, cxx_eval_constant_expression does not support these tree
> > codes,
> 
> Because they are template-only codes, that
> cxx_eval_constant_expression should never see.  They shouldn't
> survive
> the call to instantiate_non_dependent_expr_internal from
> fold_non_dependent_expr.
> 
> Jason

Aha - thanks.

instantiate_non_dependent_expr_internal calls tsubst_copy_and_build, and
tsubst_copy_and_build is bailing out here:

18103   /* digest_init will do the wrong thing if we let it.  */
18104   if (type && TYPE_PTRMEMFUNC_P (type))
18105 RETURN (t);

leaving the CONSTRUCTOR uncopied, and thus containing the CAST_EXPR and
TREE_LIST, leading to the ICE within cxx_eval_constant_expression.

The above code dates back to 33643032d70f56c4e00028da8185bcac4023e646 (r61409)
from 2003, which added tsubst_copy_and_build.

The call to digest_init in tsubst_copy_and_build's CONSTRUCTOR case was
removed in 1d8baa0efe4be51729c604adf7be9c36e786edff (r117832, 2006-10-17,
fixing PR c++/27270), which amongst other things made this change,
removing the use of digest_init:

  ce->value = RECUR (ce->value);
}

  -   r = build_constructor (NULL_TREE, n);
  -   TREE_HAS_CONSTRUCTOR (r) = TREE_HAS_CONSTRUCTOR (t);
  +   if (TREE_HAS_CONSTRUCTOR (t))
  + return finish_compound_literal (type, n);

  -   if (type)
  - {
  -   r = reshape_init (type, r);
  -   return digest_init (type, r);
  - }
  -   return r;
  +   return build_constructor (NULL_TREE, n);
 }

Given that, is the early bailout for TYPE_PTRMEMFUNC_P still needed?

On the assumption that it's outdated, this patch removes it.  With that
change, tsubst_copy_and_build successfully copies the CONSTRUCTOR, with
the CAST_EXPR and TREE_LIST eliminated, fixing the ICE in fold_for_warn.

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.
OK for trunk?

gcc/cp/ChangeLog:
PR c++/83974
* pt.c (tsubst_copy_and_build) : Remove early bailout
for pointer to member function types.

gcc/testsuite/ChangeLog:
PR c++/83974
* g++.dg/warn/pr83974.C: New test case.
---
 gcc/cp/pt.c |  4 
 gcc/testsuite/g++.dg/warn/pr83974.C | 11 +++
 2 files changed, 11 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/warn/pr83974.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 0296845..0e48a13 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -18100,10 +18100,6 @@ tsubst_copy_and_build (tree t,
if (type == error_mark_node)
  RETURN (error_mark_node);
 
-   /* digest_init will do the wrong thing if we let it.  */
-   if (type && TYPE_PTRMEMFUNC_P (type))
- RETURN (t);
-
/* We do not want to process the index of aggregate
   initializers as they are identifier nodes which will be
   looked up by digest_init.  */
diff --git a/gcc/testsuite/g++.dg/warn/pr83974.C 
b/gcc/testsuite/g++.dg/warn/pr83974.C
new file mode 100644
index 000..af12c2d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/pr83974.C
@@ -0,0 +1,11 @@
+// { dg-options "-Wtautological-compare" }
+
+struct A {
+  typedef void (A::*B) ();
+  operator B ();
+};
+template 
+struct

[PATCH] libgcc: xtensa: fix NaN return from add/sub/mul/div helpers

2018-01-23 Thread Max Filippov
libgcc/
2018-01-22  Max Filippov  

* config/xtensa/ieee754-df.S (__addsf3, __subsf3, __mulsf3)
(__divsf3): Make NaN return value quiet.
* config/xtensa/ieee754-sf.S (__adddf3, __subdf3, __muldf3)
(__divdf3): Make NaN return value quiet.
---
 libgcc/config/xtensa/ieee754-df.S | 54 ---
 libgcc/config/xtensa/ieee754-sf.S | 51 
 2 files changed, 74 insertions(+), 31 deletions(-)

diff --git a/libgcc/config/xtensa/ieee754-df.S 
b/libgcc/config/xtensa/ieee754-df.S
index 9aa55d1f74a4..2662a6600751 100644
--- a/libgcc/config/xtensa/ieee754-df.S
+++ b/libgcc/config/xtensa/ieee754-df.S
@@ -64,17 +64,26 @@ __adddf3_aux:
 
 .Ladd_xnan_or_inf:
/* If y is neither Infinity nor NaN, return x.  */
-   bnall   yh, a6, 1f
+   bnall   yh, a6, .Ladd_return_nan_or_inf
/* If x is a NaN, return it.  Otherwise, return y.  */
sllia7, xh, 12
or  a7, a7, xl
-   beqza7, .Ladd_ynan_or_inf
-1: leaf_return
+   bneza7, .Ladd_return_nan
 
 .Ladd_ynan_or_inf:
/* Return y.  */
mov xh, yh
mov xl, yl
+
+.Ladd_return_nan_or_inf:
+   sllia7, xh, 12
+   or  a7, a7, xl
+   bneza7, .Ladd_return_nan
+   leaf_return
+
+.Ladd_return_nan:
+   movia4, 0x8 /* make it a quiet NaN */
+   or  xh, xh, a4
leaf_return
 
 .Ladd_opposite_signs:
@@ -319,17 +328,24 @@ __subdf3_aux:
 
 .Lsub_xnan_or_inf:
/* If y is neither Infinity nor NaN, return x.  */
-   bnall   yh, a6, 1f
+   bnall   yh, a6, .Lsub_return_nan_or_inf
+
+.Lsub_return_nan:
/* Both x and y are either NaN or Inf, so the result is NaN.  */
movia4, 0x8 /* make it a quiet NaN */
or  xh, xh, a4
-1: leaf_return
+   leaf_return
 
 .Lsub_ynan_or_inf:
/* Negate y and return it.  */
sllia7, a6, 11
xor xh, yh, a7
mov xl, yl
+
+.Lsub_return_nan_or_inf:
+   sllia7, xh, 12
+   or  a7, a7, xl
+   bneza7, .Lsub_return_nan
leaf_return
 
 .Lsub_opposite_signs:
@@ -692,10 +708,7 @@ __muldf3_aux:
/* If y is zero, return NaN.  */
bnezyl, 1f
sllia8, yh, 1
-   bneza8, 1f
-   movia4, 0x8 /* make it a quiet NaN */
-   or  xh, xh, a4
-   j   .Lmul_done
+   beqza8, .Lmul_return_nan
 1:
/* If y is NaN, return y.  */
bnall   yh, a6, .Lmul_returnx
@@ -708,6 +721,9 @@ __muldf3_aux:
mov xl, yl
 
 .Lmul_returnx:
+   sllia8, xh, 12
+   or  a8, a8, xl
+   bneza8, .Lmul_return_nan
/* Set the sign bit and return.  */
extui   a7, a7, 31, 1
sllixh, xh, 1
@@ -720,8 +736,11 @@ __muldf3_aux:
bnezxl, .Lmul_returny
sllia8, xh, 1
bneza8, .Lmul_returny
-   movia7, 0x8 /* make it a quiet NaN */
-   or  xh, yh, a7
+   mov xh, yh
+
+.Lmul_return_nan:
+   movia4, 0x8 /* make it a quiet NaN */
+   or  xh, xh, a4
j   .Lmul_done
 
.align  4
@@ -1370,10 +1389,11 @@ __divdf3_aux:
sllia7, a7, 31
xor xh, xh, a7
/* If y is NaN or Inf, return NaN.  */
-   bnall   yh, a6, 1f
-   movia4, 0x8 /* make it a quiet NaN */
-   or  xh, xh, a4
-1: leaf_return
+   ballyh, a6, .Ldiv_return_nan
+   sllia8, xh, 12
+   or  a8, a8, xl
+   bneza8, .Ldiv_return_nan
+   leaf_return
 
 .Ldiv_ynan_or_inf:
/* If y is Infinity, return zero.  */
@@ -1383,6 +1403,10 @@ __divdf3_aux:
/* y is NaN; return it.  */
mov xh, yh
mov xl, yl
+
+.Ldiv_return_nan:
+   movia4, 0x8 /* make it a quiet NaN */
+   or  xh, xh, a4
leaf_return
 
 .Ldiv_highequal1:
diff --git a/libgcc/config/xtensa/ieee754-sf.S 
b/libgcc/config/xtensa/ieee754-sf.S
index 659f183f6ba8..d48b230a7588 100644
--- a/libgcc/config/xtensa/ieee754-sf.S
+++ b/libgcc/config/xtensa/ieee754-sf.S
@@ -64,15 +64,23 @@ __addsf3_aux:
 
 .Ladd_xnan_or_inf:
/* If y is neither Infinity nor NaN, return x.  */
-   bnall   a3, a6, 1f
+   bnall   a3, a6, .Ladd_return_nan_or_inf
/* If x is a NaN, return it.  Otherwise, return y.  */
sllia7, a2, 9
-   beqza7, .Ladd_ynan_or_inf
-1: leaf_return
+   bneza7, .Ladd_return_nan
 
 .Ladd_ynan_or_inf:
/* Return y.  */
mov a2, a3
+
+.Ladd_return_nan_or_inf:
+   sllia7, a2, 9
+   bneza7, .Ladd_return_nan
+   leaf_return
+
+.Ladd_return_nan:
+   movia6, 0x40/* make it a quiet NaN */
+   or  a2, a2, a6
leaf_return
 
 .Ladd_opposite_signs:
@@ -265,16 +273,22 @@ __subsf3_aux:
 
 .Lsub_xnan_or_inf:
/* If y is neither Infinity nor NaN, return x. 

Re: [C++ Patch] PR 83921 ("[7/8 Regression] GCC rejects constexpr initialization of empty aggregate")

2018-01-23 Thread Jason Merrill

On 01/22/2018 05:13 PM, Paolo Carlini wrote:

Hi again,

On 22/01/2018 22:50, Paolo Carlini wrote:
Ok. The below passes the C++ testsuite and I'm finishing testing it. 
Therefore, as you already hinted to, we can now say that what was 
*really* missing from potential_constant_expression_1 was the use of 
default_init_uninitialized_part, which does all the non-trivial work 
besides the later !DECL_NONTRIVIALLY_INITIALIZED_P check. 
check_for_uninitialized_const_var also provides the informs, which 
were completely missing.
Grrr. Testing the library revealed immediately the failure of 
18_support/byte/ops.cc, because in constexpr_context_p == true, thus 
from potential_constant_expression_1, the case CP_TYPE_CONST_P triggers. 
I guess we really want to keep the existing constexpr_context_p == false 
cases separate. I'm therefore restarting testing with the below.



+  && ((!constexpr_context_p
+  && (CP_TYPE_CONST_P (type) || var_in_constexpr_fn (decl)))
+ || (constexpr_context_p && !DECL_NONTRIVIALLY_INITIALIZED_P (decl)))
   && !DECL_INITIAL (decl))


I think I'd replace the DECL_INITIAL check with 
DECL_NONTRIVIALLY_INITIALIZED_P, which seems more precise.  So we ought 
to be ok with the simpler


&& (constexpr_context_p
|| CP_TYPE_CONST_P (type) || var_in_constexpr_fn (decl))
&& !DECL_NONTRIVIALLY_INITIALIZED_P (decl))

Jason


Re: [Patch, fortran] PR37577 - [meta-bug] change internal array descriptor format for better syntax, C interop TR, rank 15

2018-01-23 Thread Paul Richard Thomas
Could somebody please review the patch?

Thanks

Paul


On 23 January 2018 at 06:13, Dominique d'Humières
 wrote:
> Dear Paul,
>
> The test suite passed without new regression with both -m32 and -m64.
>
> Thanks for the work,
>
> Dominique



-- 
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein


Re: [PATCH] PR 83975 Set character length to 0 if unknown

2018-01-23 Thread Thomas Koenig

Hi Janne,


When associating a variable of type character, if the length of the
target isn't known, set it to zero rather than leaving it unset.  This
is not a complete fix for making associate of characters work
properly, but papers over an ICE in the middle-end. See PR 83344 for
more details.

Regtested on x86_64-pc-linux-gnu, Ok for trunk?


Does the code which is accepted then execute correctly?

Personally, I would prefer an ICE over a silent wrong-code.

Regards

Thomas


Re: [PATCH] fix type typo in avx512fintrin.h

2018-01-23 Thread Jakub Jelinek
On Tue, Jan 23, 2018 at 04:46:02PM +0100, Matthias Kretz wrote:
> I just hit a compile error on AVX512 code. The fix is trivial enough that I 
> didn't bother writing a PR and just fixed it. Acceptable?
> 
> I hope this doesn't require the paperwork, though my employer is willing to 
> do 
> it anyway. :-)

CCing maintainers.

> 2018-01-23  Matthias Kretz  
> 
>   * config/i386/avx512fintrin.h: Fix signatures of _mm512_abs_ps and
>   _mm512_mask_abs_pd to use __m512d instead of __m512.

This should have been:
* config/i386/avx512fintrin.h (_mm512_abs_ps, _mm512_mask_abs_pd):
Change type of arguments with __m512 type to __m512d.
or so.

> diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
> index 71e36a5..de68675 100644
> --- a/gcc/config/i386/avx512fintrin.h
> +++ b/gcc/config/i386/avx512fintrin.h
> @@ -7612,7 +7612,7 @@ _mm512_mask_abs_ps (__m512 __W, __mmask16 __U, __m512 
> __A)
>  
>  extern __inline __m512d
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> -_mm512_abs_pd (__m512 __A)
> +_mm512_abs_pd (__m512d __A)
>  {
>return (__m512d) _mm512_and_epi64 ((__m512i) __A,
>  _mm512_set1_epi64 (0x7fffLL));
> @@ -7620,7 +7620,7 @@ _mm512_abs_pd (__m512 __A)
>  
>  extern __inline __m512d
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> -_mm512_mask_abs_pd (__m512d __W, __mmask8 __U, __m512 __A)
> +_mm512_mask_abs_pd (__m512d __W, __mmask8 __U, __m512d __A)
>  {
>return (__m512d)
> _mm512_mask_and_epi64 ((__m512i) __W, __U, (__m512i) __A,
> 
> -- 
> ──
>  Dr. Matthias Kretzhttps://kretzfamily.de
>  GSI Helmholtzzentrum für Schwerionenforschung https://gsi.de
>  SIMD easy and portable https://github.com/VcDevel/Vc
> ──

Jakub


[PATCH] fix type typo in avx512fintrin.h

2018-01-23 Thread Matthias Kretz
I just hit a compile error on AVX512 code. The fix is trivial enough that I 
didn't bother writing a PR and just fixed it. Acceptable?

I hope this doesn't require the paperwork, though my employer is willing to do 
it anyway. :-)

Cheers,
  Matthias

2018-01-23  Matthias Kretz  

  * config/i386/avx512fintrin.h: Fix signatures of _mm512_abs_ps and
  _mm512_mask_abs_pd to use __m512d instead of __m512.

diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
index 71e36a5..de68675 100644
--- a/gcc/config/i386/avx512fintrin.h
+++ b/gcc/config/i386/avx512fintrin.h
@@ -7612,7 +7612,7 @@ _mm512_mask_abs_ps (__m512 __W, __mmask16 __U, __m512 
__A)
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_abs_pd (__m512 __A)
+_mm512_abs_pd (__m512d __A)
 {
   return (__m512d) _mm512_and_epi64 ((__m512i) __A,
 _mm512_set1_epi64 (0x7fffLL));
@@ -7620,7 +7620,7 @@ _mm512_abs_pd (__m512 __A)
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_abs_pd (__m512d __W, __mmask8 __U, __m512 __A)
+_mm512_mask_abs_pd (__m512d __W, __mmask8 __U, __m512d __A)
 {
   return (__m512d)
_mm512_mask_and_epi64 ((__m512i) __W, __U, (__m512i) __A,

-- 
──
 Dr. Matthias Kretzhttps://kretzfamily.de
 GSI Helmholtzzentrum für Schwerionenforschung https://gsi.de
 SIMD easy and portable https://github.com/VcDevel/Vc
──

[aarch64][PATCH v2] Disable reg offset in quad-word store for Falkor

2018-01-23 Thread Siddhesh Poyarekar
Hi,

Here's v2 of the patch to disable register offset addressing mode for
stores of 128-bit values on Falkor because they're very costly.
Differences from the last version:

 - Incorporated changes Jim made to his patch earlier that I missed,
   i.e. adding an extra tuning parameter called
   SLOW_REGOFFSET_QUADWORD_STORE instead of making it conditional on
   TUNE_FALKOR.

 - Added a new query type ADDR_QUERY_STR to indicate the queried
   address is used for a store.  This way I can use it for other
   scenarios where stores are significantly more expensive than loads,
   such as pre/post modify addressing modes.

 - Incorporated the constraint functionality into
   aarch64_legitimate_address_p and aarch64_classify_address.

I evaluated the suggestion of using costs to do this but it's not
possible with the current costs as they do not differentiate between
loads and stores.  If modifying all passes that use these costs to
identify loads vs stores is considered OK (ivopts seems to be the
biggest user) then I can volunteer to do that work for gcc9 and
evetually replace this.



On Falkor, because of an idiosyncracy of how the pipelines are designed, a
quad-word store using a reg+reg addressing mode is almost twice as slow as an
add followed by a quad-word store with a single reg addressing mode.  So we
get better performance if we disallow addressing modes using register offsets
with quad-word stores.

This patch improves fpspeed by 0.3% and intspeed by 0.22% in CPU2017,
with omnetpp_s (4.3%) and pop2_s (2.6%) being the biggest winners.

2018-xx-xx  Jim Wilson  
Kugan Vivenakandarajah  
Siddhesh Poyarekar  

gcc/
* gcc/config/aarch64/aarch64-protos.h (aarch64_addr_query_type):
New member ADDR_QUERY_STR.
* gcc/config/aarch64/aarch64-tuning-flags.def
(SLOW_REGOFFSET_QUADWORD_STORE): New.
* gcc/config/aarch64/aarch64.c (qdf24xx_tunings): Add
SLOW_REGOFFSET_QUADWORD_STORE to tuning flags.
(aarch64_classify_address): Avoid register indexing for quad
mode stores when SLOW_REGOFFSET_QUADWORD_STORE is set.
* gcc/config/aarch64/constraints.md (Uts): New constraint.
* gcc/config/aarch64/aarch64.md (movti_aarch64, movtf_aarch64):
Use it.
* gcc/config/aarch64/aarch64-simd.md (aarch64_simd_mov):
Likewise.

gcc/testsuite/
* gcc/testsuite/gcc.target/aarch64/pr82533.c: New test case.
---
 gcc/config/aarch64/aarch64-protos.h |  4 
 gcc/config/aarch64/aarch64-simd.md  |  4 ++--
 gcc/config/aarch64/aarch64-tuning-flags.def |  4 
 gcc/config/aarch64/aarch64.c| 12 +++-
 gcc/config/aarch64/aarch64.md   |  6 +++---
 gcc/config/aarch64/constraints.md   |  7 +++
 gcc/testsuite/gcc.target/aarch64/pr82533.c  | 11 +++
 7 files changed, 42 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr82533.c

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index ef1b0bc8e28..5fedc85f283 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -120,6 +120,9 @@ enum aarch64_symbol_type
ADDR_QUERY_LDP_STP
   Query what is valid for a load/store pair.
 
+   ADDR_QUERY_STR
+  Query what is valid for a store.
+
ADDR_QUERY_ANY
   Query what is valid for at least one memory constraint, which may
   allow things that "m" doesn't.  For example, the SVE LDR and STR
@@ -128,6 +131,7 @@ enum aarch64_symbol_type
 enum aarch64_addr_query_type {
   ADDR_QUERY_M,
   ADDR_QUERY_LDP_STP,
+  ADDR_QUERY_STR,
   ADDR_QUERY_ANY
 };
 
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 3d1f6a01cb7..48d92702723 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -131,9 +131,9 @@
 
 (define_insn "*aarch64_simd_mov"
   [(set (match_operand:VQ 0 "nonimmediate_operand"
-   "=w, Umq,  m,  w, ?r, ?w, ?r, w")
+   "=w, Umq, Uts,  w, ?r, ?w, ?r, w")
(match_operand:VQ 1 "general_operand"
-   "m,  Dz, w,  w,  w,  r,  r, Dn"))]
+   "m,  Dz,w,  w,  w,  r,  r, Dn"))]
   "TARGET_SIMD
&& (register_operand (operands[0], mode)
|| aarch64_simd_reg_or_zero (operands[1], mode))"
diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def 
b/gcc/config/aarch64/aarch64-tuning-flags.def
index ea9ead234cb..04baf5b6de6 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -41,4 +41,8 @@ AARCH64_EXTRA_TUNING_OPTION ("slow_unaligned_ldpw", 
SLOW_UNALIGNED_LDPW)
are not considered cheap.  */
 AARCH64_EXTRA_TUNING_OPTION ("cheap_shift_extend", CHEAP_SHIFT_EXTEND)
 
+/* Don't use a register offset in a memory address for a quad-word store.  */
+AARCH64_EXTRA_TUNING_OPTION ("slow_regoffset_quadword_store",
+   

Re: [PATCH][arm] Make gcc.target/arm/copysign_softfloat_1.c more robust

2018-01-23 Thread Christophe Lyon
On 23 January 2018 at 15:57, Kyrill  Tkachov
 wrote:
> Hi Christophe,
>
> On 23/01/18 13:09, Christophe Lyon wrote:
>>
>> Hi Kyrill,
>>
>> On 22 January 2018 at 11:48, Kyrill  Tkachov
>>  wrote:
>>>
>>> Hi all,
>>>
>>> This test has needlessly restrictive requirements. It tries to force a
>>> soft-float target and tries to run.
>>> This makes it unsupportable for any non-soft-float variant.
>>> In fact, the test can be a run-time test for any target, and only the
>>> scan-assembler tests are specific to
>>> -mfloat-abi=soft. So this patch makes the test always runnable and makes
>>> the
>>> scan-assembler checks predicable
>>> on the the new arm_sotftfloat effective target check.
>>>
>> Unfortunately, the test now fails on armeb-linux-gnueabihf (it used to
>> be unsupported).
>> my logs only show:
>> qemu: uncaught target signal 11 (Segmentation fault) - core dumped
>
>
> :( it runs fine for me on a armeb-none-eabi --with-float=hard target with a
> Foundation Model.
> Unfortunately I don't have access to an armeb-non-linux-gnueabihf setup.
> Could you provide more detailed configuration options so that I can try to
> reproduce the
> exact -march -m[arm,thumb], -mfpu setup?
>

I'm using qemu in user-mode, and I see this error on the 3
configurations I test:
--with-cpu=cortex-a9
--with-mode=arm or thumb
--with-fpu=neon-fp16 or vfpv3-d16-fp16

HTH

> Thanks,
> Kyrill
>
>
>> on arm-linux-gnueabihf, it's OK (the test was unsupported, but now passes)
>>
>> Christophe
>>
>>> Committing to trunk.
>>> Thanks,
>>> Kyrill
>>>
>>> 2018-01-22  Kyrylo Tkachov  
>>>
>>>  * doc/sourcebuild.texi (arm_softfloat): Document.
>>>
>>> 2018-01-22  Kyrylo Tkachov  
>>>
>>>  * lib/target-supports.exp (check_effective_target_arm_softfloat):
>>>  New procedure.
>>>  * gcc.target/arm/copysign_softfloat_1.c: Allow running everywhere.
>>>  Adjust scan-assembler checks for soft-float.
>
>


Re: [PATCH][arm] Make gcc.target/arm/copysign_softfloat_1.c more robust

2018-01-23 Thread Kyrill Tkachov

Hi Christophe,

On 23/01/18 13:09, Christophe Lyon wrote:

Hi Kyrill,

On 22 January 2018 at 11:48, Kyrill  Tkachov
 wrote:

Hi all,

This test has needlessly restrictive requirements. It tries to force a
soft-float target and tries to run.
This makes it unsupportable for any non-soft-float variant.
In fact, the test can be a run-time test for any target, and only the
scan-assembler tests are specific to
-mfloat-abi=soft. So this patch makes the test always runnable and makes the
scan-assembler checks predicable
on the the new arm_sotftfloat effective target check.


Unfortunately, the test now fails on armeb-linux-gnueabihf (it used to
be unsupported).
my logs only show:
qemu: uncaught target signal 11 (Segmentation fault) - core dumped


:( it runs fine for me on a armeb-none-eabi --with-float=hard target with a 
Foundation Model.
Unfortunately I don't have access to an armeb-non-linux-gnueabihf setup.
Could you provide more detailed configuration options so that I can try to 
reproduce the
exact -march -m[arm,thumb], -mfpu setup?

Thanks,
Kyrill


on arm-linux-gnueabihf, it's OK (the test was unsupported, but now passes)

Christophe


Committing to trunk.
Thanks,
Kyrill

2018-01-22  Kyrylo Tkachov  

 * doc/sourcebuild.texi (arm_softfloat): Document.

2018-01-22  Kyrylo Tkachov  

 * lib/target-supports.exp (check_effective_target_arm_softfloat):
 New procedure.
 * gcc.target/arm/copysign_softfloat_1.c: Allow running everywhere.
 Adjust scan-assembler checks for soft-float.




Re: [PATCH] C++: Fix ICE in fold_for_warn on CAST_EXPR (PR c++/83974)

2018-01-23 Thread Jason Merrill
On Tue, Jan 23, 2018 at 9:27 AM, David Malcolm  wrote:
> PR c++/83974 reports an ICE within fold_for_warn when calling
> cxx_eval_constant_expression on a CAST_EXPR.
>
> This comes from a pointer-to-member-function.  The result of
> build_ptrmemfunc (within cp_convert_to_pointer for a null ptr) is
> a CONSTRUCTOR containing, amongst other things a CAST_EXPR of a
> TREE_LIST containing the zero INTEGER_CST.
>
> After r256804, fold_for_warn within a template calls
> fold_non_dependent_expr.
>
> For this tree, is_nondependent_constant_expression returns true.
>
> potential_constant_expression_1 has these cases:
>
> case TREE_LIST:
>   {
> gcc_assert (TREE_PURPOSE (t) == NULL_TREE
> || DECL_P (TREE_PURPOSE (t)));
> if (!RECUR (TREE_VALUE (t), want_rval))
>   return false;
> if (TREE_CHAIN (t) == NULL_TREE)
>   return true;
> return RECUR (TREE_CHAIN (t), want_rval);
>   }
>
> and:
>
> case CAST_EXPR:
> case CONST_CAST_EXPR:
> case STATIC_CAST_EXPR:
> case REINTERPRET_CAST_EXPR:
> case IMPLICIT_CONV_EXPR:
>   if (cxx_dialect < cxx11
>   && !dependent_type_p (TREE_TYPE (t))
>   && !INTEGRAL_OR_ENUMERATION_TYPE_P (TREE_TYPE (t)))
> /* In C++98, a conversion to non-integral type can't be part of a
>constant expression.  */
>   {
>   // reject it
>   }
> // accept it
>
> and thus returns true for the CAST_EXPR and TREE_LIST, and hence for the
> CONSTRUCTOR as a whole.
>
> However, cxx_eval_constant_expression does not support these tree codes,

Because they are template-only codes, that
cxx_eval_constant_expression should never see.  They shouldn't survive
the call to instantiate_non_dependent_expr_internal from
fold_non_dependent_expr.

Jason


[PATCH][AArch64] Fix gcc.target/aarch64/subs_compare_[12].c

2018-01-23 Thread Kyrill Tkachov

Hi all,

This patch fixes the testsuite failures gcc.target/aarch64/subs_compare_1.c and 
subs_compare_2.c
The tests check that we combine a sequence like:
sub w2, w0, w1
cmp w0, w1

into
subsw2, w0, w1

This is done by a couple of peepholes in aarch64.md.

Unfortunately due to scheduling and other optimisations the SUB and CMP
can come in a different order:
cmp w0, w1
sub w0, w0, w1

And the existing peepholes cannot catch that and we fail to combine the two.
This patch adds a peephole that matches the CMP as the first insn and the SUB 
as the second
and outputs a SUBS.  This is almost equivalent to the existing peephole that 
matches SUB first and CMP second
except that it doesn't have the restriction that the output register of the SUB 
has to not be one of the input registers.
Remember "sub w0, w0, w1 ; cmp w0, w1" is *not* equivalent to: "subs  w0, w0, 
w1"
but "cmp w0, w1 ; sub w0, w0, w1" is.

So this is what this patch does. It adds a peephole for the case above and one 
for the SUB-immediate variant
(because the SUB-immediate is represented as PLUS-of-negated-immediate and thus 
has different RTL structure).

Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk?
Thanks,
Kyrill

2018-01-23  Kyrylo Tkachov  

* config/aarch64/aarch64.md: Add peepholes for CMP + SUB -> SUBS
and CMP + SUB-immediate -> SUBS.
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index a6ecb391309494087416913c11f339cf10357977..49095f8f3d995907903c11b68cae25a919204a76 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2430,6 +2430,26 @@ (define_peephole2
   }
 )
 
+;; Same as the above peephole but with the compare and minus in
+;; swapped order.  The restriction on overlap between operand 0
+;; and operands 1 and 2 doesn't apply here.
+(define_peephole2
+  [(set (reg:CC CC_REGNUM)
+	(compare:CC
+	  (match_operand:GPI 1 "aarch64_reg_or_zero")
+	  (match_operand:GPI 2 "aarch64_reg_or_zero")))
+   (set (match_operand:GPI 0 "register_operand")
+	(minus:GPI (match_dup 1)
+		   (match_dup 2)))]
+  ""
+  [(const_int 0)]
+  {
+emit_insn (gen_sub3_compare1 (operands[0], operands[1],
+	 operands[2]));
+DONE;
+  }
+)
+
 (define_peephole2
   [(set (match_operand:GPI 0 "register_operand")
 	(plus:GPI (match_operand:GPI 1 "register_operand")
@@ -2448,6 +2468,26 @@ (define_peephole2
   }
 )
 
+;; Same as the above peephole but with the compare and minus in
+;; swapped order.  The restriction on overlap between operand 0
+;; and operands 1 doesn't apply here.
+(define_peephole2
+  [(set (reg:CC CC_REGNUM)
+	(compare:CC
+	  (match_operand:GPI 1 "register_operand")
+	  (match_operand:GPI 3 "const_int_operand")))
+   (set (match_operand:GPI 0 "register_operand")
+	(plus:GPI (match_dup 1)
+		  (match_operand:GPI 2 "aarch64_sub_immediate")))]
+  "INTVAL (operands[3]) == -INTVAL (operands[2])"
+  [(const_int 0)]
+  {
+emit_insn (gen_sub3_compare1_imm (operands[0], operands[1],
+	 operands[2], operands[3]));
+DONE;
+  }
+)
+
 (define_insn "*sub__"
   [(set (match_operand:GPI 0 "register_operand" "=r")
 	(minus:GPI (match_operand:GPI 3 "register_operand" "r")


Re: [PATCH, wwwdocs] Update GCC 8 release notes for NDS32 port

2018-01-23 Thread Gerald Pfeifer
On Tue, 23 Jan 2018, Chung-Ju Wu wrote:
> In previous patch we add new options for NDS32 port:
>   https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01397.html
> 
> So I think I need to update GCC 8 release notes as well.

Yes, please. :-)

> +New command-line options -mext-perf -mext-perf2 -mext-string

Can you write this as

  "...-mext-perf, -mext-perf2, and
  -mext-string..."

please?

Approved with that change.

Gerald


[PATCH] C++: Fix ICE in fold_for_warn on CAST_EXPR (PR c++/83974)

2018-01-23 Thread David Malcolm
PR c++/83974 reports an ICE within fold_for_warn when calling
cxx_eval_constant_expression on a CAST_EXPR.

This comes from a pointer-to-member-function.  The result of
build_ptrmemfunc (within cp_convert_to_pointer for a null ptr) is
a CONSTRUCTOR containing, amongst other things a CAST_EXPR of a
TREE_LIST containing the zero INTEGER_CST.

After r256804, fold_for_warn within a template calls
fold_non_dependent_expr.

For this tree, is_nondependent_constant_expression returns true.

potential_constant_expression_1 has these cases:

case TREE_LIST:
  {
gcc_assert (TREE_PURPOSE (t) == NULL_TREE
|| DECL_P (TREE_PURPOSE (t)));
if (!RECUR (TREE_VALUE (t), want_rval))
  return false;
if (TREE_CHAIN (t) == NULL_TREE)
  return true;
return RECUR (TREE_CHAIN (t), want_rval);
  }

and:

case CAST_EXPR:
case CONST_CAST_EXPR:
case STATIC_CAST_EXPR:
case REINTERPRET_CAST_EXPR:
case IMPLICIT_CONV_EXPR:
  if (cxx_dialect < cxx11
  && !dependent_type_p (TREE_TYPE (t))
  && !INTEGRAL_OR_ENUMERATION_TYPE_P (TREE_TYPE (t)))
/* In C++98, a conversion to non-integral type can't be part of a
   constant expression.  */
  {
  // reject it
  }
// accept it

and thus returns true for the CAST_EXPR and TREE_LIST, and hence for the
CONSTRUCTOR as a whole.

However, cxx_eval_constant_expression does not support these tree codes,
ICEing in the default case with:

  internal_error ("unexpected expression %qE of kind %s", t,

It seems that, given potential_constant_expression_1 can return true for
these cases, then cxx_eval_constant_expression ought to handle them, which
this patch implements, fixing the ICE.

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.
OK for trunk?

gcc/cp/ChangeLog:
PR c++/83974
* constexpr.c (cxx_eval_constant_expression): Handle case
TREE_LIST.  Handle CAST_EXPR, CONST_CAST_EXPR, STATIC_CAST_EXPR,
REINTERPRET_CAST_EXPR and IMPLICIT_CONV_EXPR via
cxx_eval_unary_expression.

gcc/testsuite/ChangeLog:
PR c++/83974
* g++.dg/warn/pr83974.C: New test case.
---
 gcc/cp/constexpr.c  | 27 +++
 gcc/testsuite/g++.dg/warn/pr83974.C | 11 +++
 2 files changed, 38 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/warn/pr83974.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index ca7f369..a593132 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -4348,6 +4348,24 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, 
tree t,
   }
   break;
 
+case TREE_LIST:
+  {
+   tree value, chain;
+   gcc_assert (TREE_PURPOSE (t) == NULL_TREE
+   || DECL_P (TREE_PURPOSE (t)));
+   value = cxx_eval_constant_expression (ctx, TREE_VALUE (t), lval,
+ non_constant_p, overflow_p,
+ jump_target);
+   if (TREE_CHAIN (t))
+ chain = cxx_eval_constant_expression (ctx, TREE_CHAIN (t), lval,
+   non_constant_p, overflow_p,
+   jump_target);
+   else
+ chain = NULL_TREE;
+   r = tree_cons (TREE_PURPOSE (t), value, chain);
+  }
+  break;
+
 case POINTER_PLUS_EXPR:
 case POINTER_DIFF_EXPR:
 case PLUS_EXPR:
@@ -4594,6 +4612,15 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, 
tree t,
   return cxx_eval_statement_list (&new_ctx, t,
  non_constant_p, overflow_p, jump_target);
 
+case CAST_EXPR:
+case CONST_CAST_EXPR:
+case STATIC_CAST_EXPR:
+case REINTERPRET_CAST_EXPR:
+case IMPLICIT_CONV_EXPR:
+  r = cxx_eval_unary_expression (ctx, t, lval,
+non_constant_p, overflow_p);
+  break;
+
 case BIND_EXPR:
   return cxx_eval_constant_expression (ctx, BIND_EXPR_BODY (t),
   lval,
diff --git a/gcc/testsuite/g++.dg/warn/pr83974.C 
b/gcc/testsuite/g++.dg/warn/pr83974.C
new file mode 100644
index 000..af12c2d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/pr83974.C
@@ -0,0 +1,11 @@
+// { dg-options "-Wtautological-compare" }
+
+struct A {
+  typedef void (A::*B) ();
+  operator B ();
+};
+template 
+struct C {
+  void foo () { d == 0; }
+  A d;
+};
-- 
1.8.5.3



[committed] Add testcases for PR c++/82882 and PR c++/83978

2018-01-23 Thread Jakub Jelinek
Hi!

These 2 PRs were fixed with r256964 and stay fixed even with r256965.
So I've just committed the testcase to the trunk and am going to close them
as fixed.

2018-01-23  Jakub Jelinek  

PR c++/82882
PR c++/83978
* g++.dg/cpp0x/pr82882.C: New test.
* g++.dg/cpp0x/pr83978.C: New test.

--- gcc/testsuite/g++.dg/cpp0x/pr82882.C.jj 2018-01-23 15:01:19.781297360 
+0100
+++ gcc/testsuite/g++.dg/cpp0x/pr82882.C2018-01-23 15:00:41.739297731 
+0100
@@ -0,0 +1,15 @@
+// PR c++/82882
+// { dg-do compile { target c++11 } }
+
+template 
+void
+foo ()
+{
+  auto v = [] { enum { E, F }; };
+}
+
+void
+bar ()
+{
+  foo ();
+}
--- gcc/testsuite/g++.dg/cpp0x/pr83978.C.jj 2018-01-23 15:01:30.086297256 
+0100
+++ gcc/testsuite/g++.dg/cpp0x/pr83978.C2018-01-23 15:00:54.721297609 
+0100
@@ -0,0 +1,6 @@
+// PR c++/83978
+// { dg-do compile { target c++11 } }
+
+template 
+struct A { A () { auto a = [] { enum E { F }; }; } };
+A<0> *p = new A<0> ();

Jakub


[PR c++/839888] Baselink tsubst ICE

2018-01-23 Thread Nathan Sidwell
I added an assert when recently fixing baselink substitution, but the 
assert is incorrect as this testcase shows.


Fixing thusly.

nathan
--
Nathan Sidwell
2018-01-23  Nathan Sidwell  

	PR c++/83988
	* pt.c (tsubst_baselink): Remove optype assert.
	* ptree.c (cxx_print_xnode):  Print BASELINK_OPTYPE.

	PR c++/83988
	* g++.dg/template/pr83988.C: New.

Index: cp/pt.c
===
--- cp/pt.c	(revision 256981)
+++ cp/pt.c	(working copy)
@@ -14447,11 +14447,8 @@ tsubst_baselink (tree baselink, tree obj
 	fns = BASELINK_FUNCTIONS (baselink);
 }
   else
-{
-  gcc_assert (optype == BASELINK_OPTYPE (baselink));
-  /* We're going to overwrite pieces below, make a duplicate.  */
-  baselink = copy_node (baselink);
-}
+/* We're going to overwrite pieces below, make a duplicate.  */
+baselink = copy_node (baselink);
 
   /* If lookup found a single function, mark it as used at this point.
  (If lookup found multiple functions the one selected later by
Index: cp/ptree.c
===
--- cp/ptree.c	(revision 256981)
+++ cp/ptree.c	(working copy)
@@ -215,6 +215,7 @@ cxx_print_xnode (FILE *file, tree node,
   print_node (file, "binfo", BASELINK_BINFO (node), indent + 4);
   print_node (file, "access_binfo", BASELINK_ACCESS_BINFO (node),
 		  indent + 4);
+  print_node (file, "optype", BASELINK_OPTYPE (node), indent + 4);
   break;
 case OVERLOAD:
   print_node (file, "function", OVL_FUNCTION (node), indent+4);
Index: testsuite/g++.dg/template/pr83988.C
===
--- testsuite/g++.dg/template/pr83988.C	(revision 0)
+++ testsuite/g++.dg/template/pr83988.C	(working copy)
@@ -0,0 +1,16 @@
+// PR 83988 ICE
+
+template struct optional {};
+struct get_from_json {
+  template
+  operator optional() const {return optional ();}
+  template
+  optional maybe() const
+  {
+return this->operator optional();
+  }
+};
+void test()
+{
+  get_from_json().maybe();
+}


Re: [PATCH] Handle trailing arrays in ODR warning (PR lto/81440).

2018-01-23 Thread Jan Hubicka
> Hi.
> 
> This removes false positive warning when having trailing array at the end of 
> a struct.
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
> Ready to be installed?
> Martin
> 
> gcc/lto/ChangeLog:
> 
> 2018-01-23  Martin Liska  
> 
>   PR lto/81440
>   * lto-symtab.c (lto_symtab_merge): Handle and do not warn about
>   trailing arrays at the end of a struct.

OK, thanks!
Honza


Re: [PATCH 4/5] Remove predictors that are unrealiable.

2018-01-23 Thread Jan Hubicka
> On 01/23/2018 11:02 AM, Jan Hubicka wrote:
> >> On 01/19/2018 12:57 PM, Martin Liška wrote:
> >>> Yes, there's a huge difference in between CPU 2006 and 2017. Former has 
> >>> 63% w/ dominant edges,
> >>> and later one only 11%. It's caused by these 2 benchmarks with a high 
> >>> coverage:
> >>>
> >>
> >> Hi.
> >>
> >> I'm sending details about the 2 edges that influence the statistics 
> >> significantly:
> >>
> >>> 500.perlbench_r: regexec.c.065i.profile:
> >>>   negative return heuristics of edge 1368->1370: 2.0%  exec 2477714850 
> >>> hit 2429863555 (98.1%)
> >>
> >> 2477714850: 3484:S_regtry(pTHX_ regmatch_info *reginfo, char **startposp)
> >> -: 3485:{
> >> 2477714850: 3486:CHECKPOINT lastcp;
> >> 2477714850: 3487:REGEXP *const rx = reginfo->prog;
> >> 2477714850: 3488:regexp *const prog = ReANY(rx);
> >> 2477714850: 3489:SSize_t result;
> >> [snip]
> >> 2477714850: 8046:assert(!result ||  locinput - reginfo->strbeg >= 0);
> >> 2477714850: 8047:return result ?  locinput - reginfo->strbeg : -1;
> >> -:  8048:}
> >>
> >> As seen it return -1 if a regex is not found, which is in case of 
> >> perlbench very likely branch.
> >>
> >>>
> >>> and 523.xalancbmk_r:
> >>> build/build_peak_gcc7-m64./NameDatatypeValidator.cpp.065i.profile:  
> >>> negative return heuristics of edge 3->4: 2.0%  exec 1221735072 hit 
> >>> 1221522453 (100.0%)
> >>
> >> 1221735072:   74:int NameDatatypeValidator::compare(const XMLCh* const 
> >> lValue
> >> -:   75:   , const XMLCh* const 
> >> rValue
> >> -:   76:   ,   MemoryManager*  
> >>const)
> >> -:   77:{
> >> 1221735072:   78:return ( XMLString::equals(lValue, rValue)? 0 : -1);
> >> -:   79:}
> >> -:   80:
> >>
> >> IP profile dump file:
> >>
> >> xercesc_2_7::NameDatatypeValidator::compare (struct NameDatatypeValidator 
> >> * const this, const XMLCh * const lValue, const XMLCh * const rValue, 
> >> struct MemoryManager * const D.17157)
> >> {
> >>   bool _1;
> >>   int iftmp.0_2;
> >>
> >>[count: 1221735072]:
> >>   # DEBUG BEGIN_STMT
> >>   _1 = xercesc_2_7::XMLString::equals (lValue_4(D), rValue_5(D));
> >>   if (_1 != 0)
> >> goto ; [0.02%]
> >>   else
> >> goto ; [99.98%]
> >>
> >>[count: 1221522453]:
> >>
> >>[count: 1221735072]:
> >>   # iftmp.0_2 = PHI <0(2), -1(3)>
> >>   return iftmp.0_2;
> >>
> >> }
> >>
> >> Likewise, XML values are more commonly non-equal.
> >> Ok, so may I mark also negative return with PROB_EVEN to track it?
> > 
> > Well, if we start disabling predictors just because we can find benchmark 
> > where they do not
> > perform as expected, we will need to disable them all. It is nature of 
> > heuristics to make
> > mistakes, just they should do something useful for common code.
> > It is not goal to make heuristics that makes no mistake.
> > Heuristics is useful either if it have high precision even if it cover few
> > branches (say noreturn), or if it have large coverage and still some 
> > hitrate (say opcode,
> > call or return based heuristics).
> > 
> > To make things bit more esoteric, in practice this is also bit weighted by 
> > fact
> > how much damage bad prediction does.  For example, call heuristics which 
> > predict
> > non-call path to be taken is likely going to do little damage. Even if call
> > path is common it is heavy and hard to optimize by presence of call and 
> > thus we
> > don't loose that much in common scenarios.
> > 
> > In the second case:
> > 00073 // 
> > ---
> > 00074 // Compare methods
> > 00075 // 
> > ---
> > 00076 int NameDatatypeValidator::compare(const XMLCh* const lValue
> > 00077, const XMLCh* const rValue
> > 00078,   MemoryManager* const)
> > 00079 {
> > 00080 return ( XMLString::equals(lValue, rValue)? 0 : -1);
> > 00081 }
> > 
> > I would say it is odd coding style.  I do not see why it is not declared
> > bool and not returning true/false. 
> > 
> > First case seems bit non-standard too as it looks like usual cache lookup 
> > just
> > the cache frequently fails.
> > 
> > I would be in favor of keeping the prediction with non-50% hitrate thus 
> > unless
> > we run into some performance problems.
> > If there are only two benchmarks out of say 20 we tried (in spec2000 and 
> > 2k17)
> > it seems fine to me.
> > 
> > There is issue with the way we collect our data.  Using arithmetic mean 
> > across
> > spec is optimizing for overall runtime of all spec benchmarks combined
> > together.  We more want to look for safer values that do well for average
> > benchmarks independently how many branches they do.  We could consider 
> > making
> > the statistics script also collect geometric averages across di

Re: [C++ PATCH] Fix a structured binding ICE (PR c++/83958)

2018-01-23 Thread Nathan Sidwell

On 01/22/2018 06:06 PM, Jakub Jelinek wrote:

Hi!

I've recently added the complete_type call in case the structured binding
is a reference to a type that needs to be instantiated.
This testcase shows a problem where it can't be instantiated and we ICE
because we don't expect an incomplete type.  If decl isn't a reference,
cp_finish_decl for it should handle everything.
I've considered using complete_type_or_else, but we aren't checking that
decl's type is complete, but rather that the type it refers to is complete,
and furthermore not sure if it is very nice to print 
in the diagnostics.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-01-22  Jakub Jelinek  

PR c++/83958
* decl.c (cp_finish_decomp): Diagnose if reference structure binding
refers to incomplete type.

* g++.dg/cpp1z/decomp35.C: New test.



ok, thanks


--
Nathan Sidwell


[PATCH] PR 78534 Reinstate better string copy algorithm

2018-01-23 Thread Janne Blomqvist
As part of the change to larger character lengths, the string copy
algorithm was temporarily pessimized to get around some spurious
-Wstringop-overflow warnings.  Having tried a number of variations of
this algorithm I have managed to get it down to one spurious warning,
only with -O1 optimization, in the testsuite.  This patch reinstates
the optimized variant and modifies this one testcase to ignore the
warning.

Regtested on x86_64-pc-linux-gnu, Ok for trunk?

gcc/fortran/ChangeLog:

2018-01-23  Janne Blomqvist  

PR 78534
* trans-expr.c (fill_with_spaces): Use memset instead of
generating loop.
(gfc_trans_string_copy): Improve opportunity to use builtins with
constant lengths.

gcc/testsuite/ChangeLog:

2018-01-23  Janne Blomqvist  

PR 78534
* gfortran.dg/allocate_deferred_char_scalar_1.f03: Prune
-Wstringop-overflow warnings due to spurious warning with -O1.
* gfortran.dg/char_cast_1.f90: Update dump scan pattern.
* gfortran.dg/transfer_intrinsic_1.f90: Likewise.
---
 gcc/fortran/trans-expr.c   | 52 --
 .../allocate_deferred_char_scalar_1.f03|  2 +
 gcc/testsuite/gfortran.dg/char_cast_1.f90  |  6 +--
 gcc/testsuite/gfortran.dg/transfer_intrinsic_1.f90 |  2 +-
 4 files changed, 35 insertions(+), 27 deletions(-)

diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index e90036f..4da6cee 100644
--- a/gcc/fortran/trans-expr.c
+++ b/gcc/fortran/trans-expr.c
@@ -6407,8 +6407,6 @@ fill_with_spaces (tree start, tree type, tree size)
   tree i, el, exit_label, cond, tmp;
 
   /* For a simple char type, we can call memset().  */
-  /* TODO: This code does work and is potentially more efficient, but
- causes spurious -Wstringop-overflow warnings.
   if (compare_tree_int (TYPE_SIZE_UNIT (type), 1) == 0)
 return build_call_expr_loc (input_location,
builtin_decl_explicit (BUILT_IN_MEMSET),
@@ -6416,7 +6414,6 @@ fill_with_spaces (tree start, tree type, tree size)
build_int_cst (gfc_get_int_type (gfc_c_int_kind),
   lang_hooks.to_target_charset (' ')),
fold_convert (size_type_node, size));
-  */
 
   /* Otherwise, we use a loop:
for (el = start, i = size; i > 0; el--, i+= TYPE_SIZE_UNIT (type))
@@ -6522,11 +6519,20 @@ gfc_trans_string_copy (stmtblock_t * block, tree 
dlength, tree dest,
 
   /* The string copy algorithm below generates code like
 
- if (dlen > 0) {
- memmove (dest, src, min(dlen, slen));
- if (slen < dlen)
- memset(&dest[slen], ' ', dlen - slen);
- }
+ if (destlen > 0)
+   {
+ if (srclen < destlen)
+   {
+ memmove (dest, src, srclen);
+ // Pad with spaces.
+ memset (&dest[srclen], ' ', destlen - srclen);
+   }
+ else
+   {
+ // Truncate if too long.
+ memmove (dest, src, destlen);
+   }
+   }
   */
 
   /* Do nothing if the destination length is zero.  */
@@ -6555,21 +6561,16 @@ gfc_trans_string_copy (stmtblock_t * block, tree 
dlength, tree dest,
   else
 src = gfc_build_addr_expr (pvoid_type_node, src);
 
-  /* First do the memmove. */
-  tmp2 = fold_build2_loc (input_location, MIN_EXPR, TREE_TYPE (dlen), dlen,
- slen);
-  tmp2 = build_call_expr_loc (input_location,
- builtin_decl_explicit (BUILT_IN_MEMMOVE),
- 3, dest, src,
- fold_convert (size_type_node, tmp2));
-  stmtblock_t tmpblock2;
-  gfc_init_block (&tmpblock2);
-  gfc_add_expr_to_block (&tmpblock2, tmp2);
-
-  /* If the destination is longer, fill the end with spaces.  */
+  /* Truncate string if source is too long.  */
   cond2 = fold_build2_loc (input_location, LT_EXPR, logical_type_node, slen,
   dlen);
 
+  /* Copy and pad with spaces.  */
+  tmp3 = build_call_expr_loc (input_location,
+ builtin_decl_explicit (BUILT_IN_MEMMOVE),
+ 3, dest, src,
+ fold_convert (size_type_node, slen));
+
   /* Wstringop-overflow appears at -O3 even though this warning is not
  explicitly available in fortran nor can it be switched off. If the
  source length is a constant, its negative appears as a very large
@@ -6584,14 +6585,19 @@ gfc_trans_string_copy (stmtblock_t * block, tree 
dlength, tree dest,
   tmp4 = fill_with_spaces (tmp4, chartype, tmp);
 
   gfc_init_block (&tempblock);
+  gfc_add_expr_to_block (&tempblock, tmp3);
   gfc_add_expr_to_block (&tempblock, tmp4);
   tmp3 = gfc_finish_block (&tempblock);
 
+  /* The truncated memmove if the slen >= dlen.  */
+  tmp2 = build_call_expr_loc (input_location,
+ builtin_decl_explicit (BUILT_IN_MEMMOVE)

[PATCH] Fix PR82819

2018-01-23 Thread Richard Biener

This fixes a GRAPHITE code-generation issue by eliding ISL AST plus
for large power-of-two values that don't affect the result.
I intentionally didn't extend this to other values with the same
property as I'd like to see testcases.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2018-01-23  Richard Biener  

PR tree-optimization/82819
* graphite-isl-ast-to-gimple.c (binary_op_to_tree): Avoid
code generating pluses that are no-ops in the target precision.

* gcc.dg/graphite/pr82819.c: New testcase.

Index: gcc/graphite-isl-ast-to-gimple.c
===
--- gcc/graphite-isl-ast-to-gimple.c(revision 256977)
+++ gcc/graphite-isl-ast-to-gimple.c(working copy)
@@ -326,7 +326,8 @@ binary_op_to_tree (tree type, __isl_take
   /* From our constraint generation we may get modulo operations that
  we cannot represent explicitely but that are no-ops for TYPE.
  Elide those.  */
-  if (expr_type == isl_ast_op_pdiv_r
+  if ((expr_type == isl_ast_op_pdiv_r
+   || expr_type == isl_ast_op_add)
   && isl_ast_expr_get_type (arg_expr) == isl_ast_expr_int
   && (wi::exact_log2 (widest_int_from_isl_expr_int (arg_expr))
  >= TYPE_PRECISION (type)))
Index: gcc/testsuite/gcc.dg/graphite/pr82819.c
===
--- gcc/testsuite/gcc.dg/graphite/pr82819.c (nonexistent)
+++ gcc/testsuite/gcc.dg/graphite/pr82819.c (working copy)
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -floop-nest-optimize" } */
+
+short int *ts;
+
+void
+c2 (unsigned long long int s4, int ns)
+{
+  short int *b2 = (short int *)&ns;
+
+  while (ns != 0)
+{
+  int xn;
+
+  for (xn = 0; xn < 3; ++xn)
+   for (*b2 = 0; *b2 < 2; ++*b2)
+ s4 += xn;
+  if (s4 != 0)
+   b2 = ts;
+  ++ns;
+}
+}


Re: [PATCH 2/2] Introduce prefetch-dynamic-strides option.

2018-01-23 Thread Luis Machado

Hi,

On 01/23/2018 07:40 AM, Kyrill Tkachov wrote:

Hi Luis,

On 22/01/18 13:46, Luis Machado wrote:
The following patch adds an option to control software prefetching of 
memory

references with non-constant/unknown strides.

Currently we prefetch these references if the pass thinks there is 
benefit to
doing so. But, since this is all based on heuristics, it's not always 
the case

that we end up with better performance.

For Falkor there is also the problem of conflicts with the hardware 
prefetcher,
so we need to be more conservative in terms of what we issue software 
prefetch

hints for.

This also aligns GCC with what LLVM does for Falkor.

Similarly to the previous patch, the defaults guarantee no change in 
behavior

for other targets and architectures.

I've regression-tested and bootstrapped it on aarch64-linux. No 
problems found.


Ok?



This also looks like a sensible approach to me with a caveat inline.
The same general comments as for patch [1/2] apply.
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h

index 8736bd9..22bd9ae 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -230,6 +230,9 @@ struct cpu_prefetch_tune
   const int l1_cache_size;
   const int l1_cache_line_size;
   const int l2_cache_size;
+  /* Whether software prefetch hints should be issued for non-constant
+ strides.  */
+  const unsigned int prefetch_dynamic_strides;


I understand that the midend PARAMs are defined as integers, but I think
the backend tuning option here is better represented as a boolean as it 
really

is just a yes/no decision.

I started off with a boolean to be honest. Then i noticed the midend 
only used integers, which i restricted to the range of 0..1.


I'll change this locally to use booleans again.

Thanks!
Luis


Re: [PATCH 1/2] Introduce prefetch-minimum stride option

2018-01-23 Thread Luis Machado

Hi Kyrill,

On 01/23/2018 07:32 AM, Kyrill Tkachov wrote:

Hi Luis,

On 22/01/18 13:46, Luis Machado wrote:

This patch adds a new option to control the minimum stride, for a memory
reference, after which the loop prefetch pass may issue software prefetch
hints for. There are two motivations:

* Make the pass less aggressive, only issuing prefetch hints for 
bigger strides
that are more likely to benefit from prefetching. I've noticed a case 
in cpu2017

where we were issuing thousands of hints, for example.



I've noticed a large amount of prefetch hints being issued as well, but 
had not

analysed it further.



I've gathered some numbers for this. Some of the most extreme cases 
before both patches:


CPU2017

xalancbmk_s: 3755 hints
wrf_s: 10950 hints
parest_r: 8521 hints

CPU2006

gamess: 11377 hints
wrf: 3238 hints

After both patches:

CPU2017

xalancbmk_s: 1 hint
wrf_s: 20 hints
parest_r: 0 hints

CPU2006

gamess: 44 hints
wrf: 16 hints


* For processors that have a hardware prefetcher, like Falkor, it 
allows the
loop prefetch pass to defer prefetching of smaller (less than the 
threshold)
strides to the hardware prefetcher instead. This prevents conflicts 
between

the software prefetcher and the hardware prefetcher.

I've noticed considerable reduction in the number of prefetch hints and
slightly positive performance numbers. This aligns GCC and LLVM in 
terms of

prefetch behavior for Falkor.


Do you, by any chance, have a link to the LLVM review that implemented 
that behavior?

It's okay if you don't, but I think it would be useful context.



I've dug it up. The base change was implemented here:

review: https://reviews.llvm.org/D17945
RFC: http://lists.llvm.org/pipermail/llvm-dev/2015-December/093514.html

And then target-specific changes were introduced later for specific 
processors.


One small difference in LLVM is the fact that the second parameter, 
prefetching of non-constant strides, is implicitly switched off if one 
sets the minimum stride length. My approach here makes that second 
parameter adjustable.


I've seen big gains due to prefetching of non-constant strides, but it 
tends to be tricky to control and usually comes together with 
significant regressions as well.


The fact that we potentially unroll loops along with issuing prefetch 
hints also makes things a bit erratic.




The default settings should guarantee no changes for existing targets. 
Those

are free to tweak the settings as necessary.

No regressions in the testsuite and bootstrapped ok on aarch64-linux.

Ok?



Are there any benchmark numbers you can share?
I think this approach is sensible.



Comparing the previous, more aggressive, pass behavior with the new one 
i've seen a slight improvement for CPU2006, 0.15% for both INT and FP.


For CPU2017 the previous behavior was actually a bit harmful, regressing 
performance by about 1.2% in intspeed. The new behavior kept intspeed 
stable and slightly improved fpspeed by 0.15%.


The motivation for the future is to have better control of software 
prefetching so we can fine-tune the pass, either through generic loop 
prefetch code or by using the target-specific parameters.



Since your patch touches generic code as well as AArch64
code you'll need an approval from a midend maintainer as well as an 
AArch64 maintainer.
Also, GCC development is now in the regression fixing stage, so unless 
this fixes a regression

it may have to wait until GCC 9 development is opened.


That is my understanding. I thought i'd put this up for review anyway so 
people can chime in and provide their thoughts.


Thanks for the review.

Luis


Re: [PATCH, gotools]: Fix TestCrashDumpsAllThreads testcase failure

2018-01-23 Thread Uros Bizjak
On Tue, Jan 23, 2018 at 1:54 PM, Uros Bizjak  wrote:
> On Tue, Jan 23, 2018 at 11:38 AM, Uros Bizjak  wrote:
>> On Tue, Jan 23, 2018 at 8:32 AM, Uros Bizjak  wrote:
>>> On Tue, Jan 23, 2018 at 5:49 AM, Ian Lance Taylor  wrote:
 On Sun, Jan 21, 2018 at 3:13 PM, Uros Bizjak  wrote:
>
> The default "go build" compile options over-optimize the auxiliary
> executable, built in TestCrashDumpsAllThreads testcase
> (libgo/go/runtime/crash_unix_test.go). This over-optimization results
> in removal of the trivial summing loop and in the inlining of the
> main.loop function into main.$thunk0.
>
> The testcase expects backtrace that shows several instances of
> main.loop in the backtrace dump. When main.loop gets inlined into
> thunk, its name is not dumped anymore.
>
> The solution is to pass "-O0" to gccgo, which inhibits unwanted inlining.
>
> Patch was bootstrapped and regression tested on x86_64-linux-gnu and
> alphev68-linux-gnu, where for the later target the patch fixed the
> mentioned failure.

 That sounds like a bug somewhere.  Even when one function gets inlined
 into another, its name should still be dumped.  This is implemented by
 report_inlined_functions in libbacktrace/dwarf.c.  While something
 like your patch may be needed, I'd like to more clearly understand why
 libbacktrace isn't working.
>>>
>>> If I manually compile the testcase from crash_unix_test.go wth gccgo
>>> -O2 (the asm is attached), the main.loop function is not even present
>>> in the dump.
>>
>> Digging deeper into the source, here is the problem:
>
> [...]
>
>> So, those $LBB labels are at the wrong place. During runtime, we get
>> (oh, why can't we display PC in hex?!):
>
> Following testcase:
>
> --cut here--
> package main
>
> func loop(i int, c chan bool) {
> close(c)
> for {
> for j := 0; j < 0x7fff; j++ {
> }
> }
> }
> --cut here--
>
> can be compiled with a crosscompiler to alpha-linux-gnu:
>
> ~/gcc-build-alpha/gcc/go1 -O2 -fkeep-static-functions l.go
>
> to generate assembly, where $LBB2 is placed after "lda" insn.

[...]

> I will open a PR:

PR 83992 [1].

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83992

Uros.


Re: [PATCH][arm] Make gcc.target/arm/copysign_softfloat_1.c more robust

2018-01-23 Thread Christophe Lyon
Hi Kyrill,

On 22 January 2018 at 11:48, Kyrill  Tkachov
 wrote:
> Hi all,
>
> This test has needlessly restrictive requirements. It tries to force a
> soft-float target and tries to run.
> This makes it unsupportable for any non-soft-float variant.
> In fact, the test can be a run-time test for any target, and only the
> scan-assembler tests are specific to
> -mfloat-abi=soft. So this patch makes the test always runnable and makes the
> scan-assembler checks predicable
> on the the new arm_sotftfloat effective target check.
>

Unfortunately, the test now fails on armeb-linux-gnueabihf (it used to
be unsupported).
my logs only show:
qemu: uncaught target signal 11 (Segmentation fault) - core dumped

on arm-linux-gnueabihf, it's OK (the test was unsupported, but now passes)

Christophe

> Committing to trunk.
> Thanks,
> Kyrill
>
> 2018-01-22  Kyrylo Tkachov  
>
> * doc/sourcebuild.texi (arm_softfloat): Document.
>
> 2018-01-22  Kyrylo Tkachov  
>
> * lib/target-supports.exp (check_effective_target_arm_softfloat):
> New procedure.
> * gcc.target/arm/copysign_softfloat_1.c: Allow running everywhere.
> Adjust scan-assembler checks for soft-float.


Re: [PATCH, gotools]: Fix TestCrashDumpsAllThreads testcase failure

2018-01-23 Thread Uros Bizjak
On Tue, Jan 23, 2018 at 11:38 AM, Uros Bizjak  wrote:
> On Tue, Jan 23, 2018 at 8:32 AM, Uros Bizjak  wrote:
>> On Tue, Jan 23, 2018 at 5:49 AM, Ian Lance Taylor  wrote:
>>> On Sun, Jan 21, 2018 at 3:13 PM, Uros Bizjak  wrote:

 The default "go build" compile options over-optimize the auxiliary
 executable, built in TestCrashDumpsAllThreads testcase
 (libgo/go/runtime/crash_unix_test.go). This over-optimization results
 in removal of the trivial summing loop and in the inlining of the
 main.loop function into main.$thunk0.

 The testcase expects backtrace that shows several instances of
 main.loop in the backtrace dump. When main.loop gets inlined into
 thunk, its name is not dumped anymore.

 The solution is to pass "-O0" to gccgo, which inhibits unwanted inlining.

 Patch was bootstrapped and regression tested on x86_64-linux-gnu and
 alphev68-linux-gnu, where for the later target the patch fixed the
 mentioned failure.
>>>
>>> That sounds like a bug somewhere.  Even when one function gets inlined
>>> into another, its name should still be dumped.  This is implemented by
>>> report_inlined_functions in libbacktrace/dwarf.c.  While something
>>> like your patch may be needed, I'd like to more clearly understand why
>>> libbacktrace isn't working.
>>
>> If I manually compile the testcase from crash_unix_test.go wth gccgo
>> -O2 (the asm is attached), the main.loop function is not even present
>> in the dump.
>
> Digging deeper into the source, here is the problem:

[...]

> So, those $LBB labels are at the wrong place. During runtime, we get
> (oh, why can't we display PC in hex?!):

Following testcase:

--cut here--
package main

func loop(i int, c chan bool) {
close(c)
for {
for j := 0; j < 0x7fff; j++ {
}
}
}
--cut here--

can be compiled with a crosscompiler to alpha-linux-gnu:

~/gcc-build-alpha/gcc/go1 -O2 -fkeep-static-functions l.go

to generate assembly, where $LBB2 is placed after "lda" insn.

main.loop:
...
$L2:
lda $1,-1($1)
$LBB2:
$LM6:
bne $1,$L2
br $31,$L3
$LBE2:
...

and in ._final, we can see that (note 52) is in wrong place. It should
be placed in front of (insn 13).

(code_label 14 6 12 2 (nil) [1 uses])
(note 12 14 13 [bb 4] NOTE_INSN_BASIC_BLOCK)
(insn:TI 13 12 52 (set (reg:DI 1 $1 [orig:70 ivtmp_1 ] [70])
(plus:DI (reg:DI 1 $1 [orig:70 ivtmp_1 ] [70])
(const_int -1 [0x]))) 7 {*adddi_internal}
 (nil))
(note 52 13 15 0x7f6b5f2791e0 NOTE_INSN_BLOCK_BEG)
(jump_insn:TI 15 52 45 (set (pc)
(if_then_else (ne (reg:DI 1 $1 [orig:70 ivtmp_1 ] [70])
(const_int 0 [0]))
(label_ref:DI 14)
(pc))) "l.go":6 169 {*bcc_normal}
 (int_list:REG_BR_PROB 1062895956 (nil))
 -> 14)
(note 45 15 46 [bb 5] NOTE_INSN_BASIC_BLOCK)
(jump_insn:TI 46 45 47 (set (pc)
(label_ref 16)) 200 {jump}
 (nil)
 -> 16)
(barrier 47 46 32)
(note 32 47 54 NOTE_INSN_DELETED)
(note 54 32 0 0x7f6b5f2791e0 NOTE_INSN_BLOCK_END)

I will open a PR:

Uros.


[PATCH] Handle trailing arrays in ODR warning (PR lto/81440).

2018-01-23 Thread Martin Liška
Hi.

This removes false positive warning when having trailing array at the end of a 
struct.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin

gcc/lto/ChangeLog:

2018-01-23  Martin Liska  

PR lto/81440
* lto-symtab.c (lto_symtab_merge): Handle and do not warn about
trailing arrays at the end of a struct.

gcc/testsuite/ChangeLog:

2018-01-23  Martin Liska  

PR lto/81440
* gcc.dg/lto/pr81440.h: New test.
* gcc.dg/lto/pr81440_0.c: New test.
* gcc.dg/lto/pr81440_1.c: New test.
---
 gcc/lto/lto-symtab.c | 25 +++--
 gcc/testsuite/gcc.dg/lto/pr81440.h   |  4 
 gcc/testsuite/gcc.dg/lto/pr81440_0.c |  9 +
 gcc/testsuite/gcc.dg/lto/pr81440_1.c |  6 ++
 4 files changed, 38 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr81440.h
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr81440_0.c
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr81440_1.c


diff --git a/gcc/lto/lto-symtab.c b/gcc/lto/lto-symtab.c
index ee02a534714..0f0b958e98d 100644
--- a/gcc/lto/lto-symtab.c
+++ b/gcc/lto/lto-symtab.c
@@ -352,18 +352,31 @@ lto_symtab_merge (symtab_node *prevailing, symtab_node *entry)
 return false;
 
   if (DECL_SIZE (decl) && DECL_SIZE (prevailing_decl)
-  && !tree_int_cst_equal (DECL_SIZE (decl), DECL_SIZE (prevailing_decl))
+  && !tree_int_cst_equal (DECL_SIZE (decl), DECL_SIZE (prevailing_decl)))
+  {
+	if (!DECL_COMMON (decl) && !DECL_EXTERNAL (decl))
+	  return false;
+
+	tree type = TREE_TYPE (decl);
+
+	/* For record type, check for array at the end of the structure.  */
+	if (TREE_CODE (type) == RECORD_TYPE)
+	  {
+	tree field = TYPE_FIELDS (type);
+	while (DECL_CHAIN (field) != NULL_TREE)
+	  field = DECL_CHAIN (field);
+
+	return TREE_CODE (TREE_TYPE (field)) == ARRAY_TYPE;
+	  }
   /* As a special case do not warn about merging
 	 int a[];
 	 and
 	 int a[]={1,2,3};
 	 here the first declaration is COMMON
 	 and sizeof(a) == sizeof (int).  */
-  && ((!DECL_COMMON (decl) && !DECL_EXTERNAL (decl))
-	  || TREE_CODE (TREE_TYPE (decl)) != ARRAY_TYPE
-	  || TYPE_SIZE (TREE_TYPE (decl))
-	 != TYPE_SIZE (TREE_TYPE (TREE_TYPE (decl)
-return false;
+	else if (TREE_CODE (type) == ARRAY_TYPE)
+	  return (TYPE_SIZE (decl) == TYPE_SIZE (TREE_TYPE (type)));
+  }
 
   return true;
 }
diff --git a/gcc/testsuite/gcc.dg/lto/pr81440.h b/gcc/testsuite/gcc.dg/lto/pr81440.h
new file mode 100644
index 000..d9e6c3da645
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/lto/pr81440.h
@@ -0,0 +1,4 @@
+typedef struct {
+  int i;
+  int ints[];
+} struct_t;
diff --git a/gcc/testsuite/gcc.dg/lto/pr81440_0.c b/gcc/testsuite/gcc.dg/lto/pr81440_0.c
new file mode 100644
index 000..07f2a87da21
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/lto/pr81440_0.c
@@ -0,0 +1,9 @@
+/* { dg-lto-do link } */
+
+#include "pr81440.h"
+
+extern struct_t my_struct;
+
+int main() {
+ return my_struct.ints[0];
+}
diff --git a/gcc/testsuite/gcc.dg/lto/pr81440_1.c b/gcc/testsuite/gcc.dg/lto/pr81440_1.c
new file mode 100644
index 000..d03533029c1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/lto/pr81440_1.c
@@ -0,0 +1,6 @@
+#include "pr81440.h"
+
+struct_t my_struct = {
+ 20,
+ { 1, 2 }
+};



Re: [PATCH] Clean-up IPA profile dump output.

2018-01-23 Thread Martin Liška
On 01/23/2018 10:43 AM, Jan Hubicka wrote:
>> Hi.
>>
>> I'm aware in which development stage we are. However the patch is small and 
>> makes
>> dump files readable. Hope such patch can be accepted even now?
>>
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>
>> Martin
>>
>> gcc/ChangeLog:
>>
>> 2018-01-22  Martin Liska  
>>
>>  * tree-profile.c (tree_profiling): Print function header to
>>  aware reader which function we are working on.
>>  * value-prof.c (gimple_find_values_to_profile): Do not print
>>  not interesting value histograms.
> 
> OK.  How those non-interesting value histograms arrise?

Can happen if you have missing profile, then no value histograms are loaded.

Thanks for review, let me install it.
Martin

> 
> Honza
>> ---
>>  gcc/tree-profile.c | 4 
>>  gcc/value-prof.c   | 2 +-
>>  2 files changed, 5 insertions(+), 1 deletion(-)
>>
>>
> 
>> diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c
>> index 9d919062db1..f96bd4b9704 100644
>> --- a/gcc/tree-profile.c
>> +++ b/gcc/tree-profile.c
>> @@ -52,6 +52,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "params.h"
>>  #include "stringpool.h"
>>  #include "attribs.h"
>> +#include "tree-pretty-print.h"
>>  
>>  static GTY(()) tree gcov_type_node;
>>  static GTY(()) tree tree_interval_profiler_fn;
>> @@ -671,6 +672,9 @@ tree_profiling (void)
>>  
>>push_cfun (DECL_STRUCT_FUNCTION (node->decl));
>>  
>> +  if (dump_file)
>> +dump_function_header (dump_file, cfun->decl, dump_flags);
>> +
>>/* Local pure-const may imply need to fixup the cfg.  */
>>if (execute_fixup_cfg () & TODO_cleanup_cfg)
>>  cleanup_tree_cfg ();
>> diff --git a/gcc/value-prof.c b/gcc/value-prof.c
>> index b503320f188..16cdbd64f46 100644
>> --- a/gcc/value-prof.c
>> +++ b/gcc/value-prof.c
>> @@ -2053,7 +2053,7 @@ gimple_find_values_to_profile (histogram_values 
>> *values)
>>  default:
>>gcc_unreachable ();
>>  }
>> -  if (dump_file)
>> +  if (dump_file && hist->hvalue.stmt != NULL)
>>  {
>>fprintf (dump_file, "Stmt ");
>>print_gimple_stmt (dump_file, hist->hvalue.stmt, 0, TDF_SLIM);
>>
> 



[C++ PATCH] Deprecate ARM-era for scopes

2018-01-23 Thread Nathan Sidwell
As discussed (https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01778.html) 
this patch deprecates the ARM-era for scope.


a) in c++98 mode with -fpermissive, there's now a deprecation note when 
we fix up something like

   for (int i = ...) {}
   ... i ... // out of scope use of i

b) -fno-for-scope gives a deprecated warning

I noticed that the flag showed signs of being tri-valued at some point, 
but it is no longer (I suspect at some point we only attempted #a if 
neither sense of -ffor-scope was given).  Cleaned that up.


Applying to trunk.

nathan
--
Nathan Sidwell
2018-01-23  Nathan Sidwell  

	gcc/cp/
	Deprecate ARM-era for scope handling
	* decl.c (poplevel): Flag_new_for_scope is a boolean-like.
	(cxx_init_decl_processing): Deprecate flag_new_for_scope being
	cleared.
	* name-lookup.c (check_for_out_of_scope_variable): Deprecate and
	cleanup handling.
	* semantics.c (begin_for_scope): Flag_new_for_scope is
	boolean-like.
	(finish_for_stmt, begin_range_for_stmt): Likewise.

	gcc/
	* doc/invoke.texi (ffor-scope): Deprecate.

	gcc/cp/
	* g++.dg/cpp0x/range-for10.C: Adjust.
	* g++.dg/ext/forscope1.C: Adjust.
	* g++.dg/ext/forscope2.C: Adjust.
	* g++.dg/template/for1.C: Adjust.

Index: cp/decl.c
===
--- cp/decl.c	(revision 256951)
+++ cp/decl.c	(working copy)
@@ -644,7 +644,7 @@ poplevel (int keep, int reverse, int fun
  in a init statement were in scope after the for-statement ended.
  We only use the new rules if flag_new_for_scope is nonzero.  */
   leaving_for_scope
-= current_binding_level->kind == sk_for && flag_new_for_scope == 1;
+= current_binding_level->kind == sk_for && flag_new_for_scope;
 
   /* Before we remove the declarations first check for unused variables.  */
   if ((warn_unused_variable || warn_unused_but_set_variable)
@@ -4094,6 +4094,8 @@ cxx_init_decl_processing (void)
   pop_namespace ();
 
   flag_noexcept_type = (cxx_dialect >= cxx17);
+  if (!flag_new_for_scope)
+warning (OPT_Wdeprecated, "%<-fno-for-scope%> is deprecated");
 
   c_common_nodes_and_builtins ();
 
Index: cp/name-lookup.c
===
--- cp/name-lookup.c	(revision 256951)
+++ cp/name-lookup.c	(working copy)
@@ -3231,7 +3231,9 @@ push_local_binding (tree id, tree decl,
standard.  If so, issue an error message.  If name lookup would
work in both cases, but return a different result, this function
returns the result of ANSI/ISO lookup.  Otherwise, it returns
-   DECL.  */
+   DECL.
+
+   FIXME: Scheduled for removal after GCC-8 is done.  */
 
 tree
 check_for_out_of_scope_variable (tree decl)
@@ -3252,16 +3254,16 @@ check_for_out_of_scope_variable (tree de
 shadowed = find_namespace_value (current_namespace, DECL_NAME (decl));
   if (shadowed)
 {
-  if (!DECL_ERROR_REPORTED (decl))
+  if (!DECL_ERROR_REPORTED (decl)
+	  && flag_permissive
+	  && warning (0, "name lookup of %qD changed", DECL_NAME (decl)))
 	{
-	  warning (0, "name lookup of %qD changed", DECL_NAME (decl));
-	  warning_at (DECL_SOURCE_LOCATION (shadowed), 0,
-		  "  matches this %qD under ISO standard rules",
-		  shadowed);
-	  warning_at (DECL_SOURCE_LOCATION (decl), 0,
-		  "  matches this %qD under old rules", decl);
-	  DECL_ERROR_REPORTED (decl) = 1;
+	  inform (DECL_SOURCE_LOCATION (shadowed),
+		  "matches this %qD under ISO standard rules", shadowed);
+	  inform (DECL_SOURCE_LOCATION (decl),
+		  "  matches this %qD under old rules", decl);
 	}
+  DECL_ERROR_REPORTED (decl) = 1;
   return shadowed;
 }
 
@@ -3279,26 +3281,25 @@ check_for_out_of_scope_variable (tree de
 {
   error ("name lookup of %qD changed for ISO % scoping",
 	 DECL_NAME (decl));
-  error ("  cannot use obsolete binding at %q+D because "
-	 "it has a destructor", decl);
+  inform (DECL_SOURCE_LOCATION (decl),
+	  "cannot use obsolete binding %qD because it has a destructor",
+	  decl);
   return error_mark_node;
 }
   else
 {
-  permerror (input_location, "name lookup of %qD changed for ISO % scoping",
+  permerror (input_location,
+		 "name lookup of %qD changed for ISO % scoping",
 	 DECL_NAME (decl));
   if (flag_permissive)
-permerror (DECL_SOURCE_LOCATION (decl),
-		   "  using obsolete binding at %qD", decl);
-  else
-	{
-	  static bool hint;
-	  if (!hint)
-	{
-	  inform (input_location, "(if you use %<-fpermissive%> G++ will accept your code)");
-	  hint = true;
-	}
-	}
+inform (DECL_SOURCE_LOCATION (decl),
+		"using obsolete binding %qD", decl);
+  static bool hint;
+  if (!hint)
+	inform (input_location, flag_permissive
+		? "this flexibility is deprecated and will be removed"
+		: "if you use %<-fpermissive%> G++ will accept your code");
+  hint = true;
 }
 
   return decl;
Index: cp/semantics.c
==

Re: [PATCH 4/5] Remove predictors that are unrealiable.

2018-01-23 Thread Martin Liška
On 01/23/2018 11:02 AM, Jan Hubicka wrote:
>> On 01/19/2018 12:57 PM, Martin Liška wrote:
>>> Yes, there's a huge difference in between CPU 2006 and 2017. Former has 63% 
>>> w/ dominant edges,
>>> and later one only 11%. It's caused by these 2 benchmarks with a high 
>>> coverage:
>>>
>>
>> Hi.
>>
>> I'm sending details about the 2 edges that influence the statistics 
>> significantly:
>>
>>> 500.perlbench_r: regexec.c.065i.profile:
>>>   negative return heuristics of edge 1368->1370: 2.0%  exec 2477714850 hit 
>>> 2429863555 (98.1%)
>>
>> 2477714850: 3484:S_regtry(pTHX_ regmatch_info *reginfo, char **startposp)
>> -: 3485:{
>> 2477714850: 3486:CHECKPOINT lastcp;
>> 2477714850: 3487:REGEXP *const rx = reginfo->prog;
>> 2477714850: 3488:regexp *const prog = ReANY(rx);
>> 2477714850: 3489:SSize_t result;
>> [snip]
>> 2477714850: 8046:assert(!result ||  locinput - reginfo->strbeg >= 0);
>> 2477714850: 8047:return result ?  locinput - reginfo->strbeg : -1;
>> -:  8048:}
>>
>> As seen it return -1 if a regex is not found, which is in case of perlbench 
>> very likely branch.
>>
>>>
>>> and 523.xalancbmk_r:
>>> build/build_peak_gcc7-m64./NameDatatypeValidator.cpp.065i.profile:  
>>> negative return heuristics of edge 3->4: 2.0%  exec 1221735072 hit 
>>> 1221522453 (100.0%)
>>
>> 1221735072:   74:int NameDatatypeValidator::compare(const XMLCh* const lValue
>> -:   75:   , const XMLCh* const 
>> rValue
>> -:   76:   ,   MemoryManager*
>>  const)
>> -:   77:{
>> 1221735072:   78:return ( XMLString::equals(lValue, rValue)? 0 : -1);
>> -:   79:}
>> -:   80:
>>
>> IP profile dump file:
>>
>> xercesc_2_7::NameDatatypeValidator::compare (struct NameDatatypeValidator * 
>> const this, const XMLCh * const lValue, const XMLCh * const rValue, struct 
>> MemoryManager * const D.17157)
>> {
>>   bool _1;
>>   int iftmp.0_2;
>>
>>[count: 1221735072]:
>>   # DEBUG BEGIN_STMT
>>   _1 = xercesc_2_7::XMLString::equals (lValue_4(D), rValue_5(D));
>>   if (_1 != 0)
>> goto ; [0.02%]
>>   else
>> goto ; [99.98%]
>>
>>[count: 1221522453]:
>>
>>[count: 1221735072]:
>>   # iftmp.0_2 = PHI <0(2), -1(3)>
>>   return iftmp.0_2;
>>
>> }
>>
>> Likewise, XML values are more commonly non-equal.
>> Ok, so may I mark also negative return with PROB_EVEN to track it?
> 
> Well, if we start disabling predictors just because we can find benchmark 
> where they do not
> perform as expected, we will need to disable them all. It is nature of 
> heuristics to make
> mistakes, just they should do something useful for common code.
> It is not goal to make heuristics that makes no mistake.
> Heuristics is useful either if it have high precision even if it cover few
> branches (say noreturn), or if it have large coverage and still some hitrate 
> (say opcode,
> call or return based heuristics).
> 
> To make things bit more esoteric, in practice this is also bit weighted by 
> fact
> how much damage bad prediction does.  For example, call heuristics which 
> predict
> non-call path to be taken is likely going to do little damage. Even if call
> path is common it is heavy and hard to optimize by presence of call and thus 
> we
> don't loose that much in common scenarios.
> 
> In the second case:
> 00073 // 
> ---
> 00074 // Compare methods
> 00075 // 
> ---
> 00076 int NameDatatypeValidator::compare(const XMLCh* const lValue
> 00077, const XMLCh* const rValue
> 00078,   MemoryManager* const)
> 00079 {
> 00080 return ( XMLString::equals(lValue, rValue)? 0 : -1);
> 00081 }
> 
> I would say it is odd coding style.  I do not see why it is not declared
> bool and not returning true/false. 
> 
> First case seems bit non-standard too as it looks like usual cache lookup just
> the cache frequently fails.
> 
> I would be in favor of keeping the prediction with non-50% hitrate thus unless
> we run into some performance problems.
> If there are only two benchmarks out of say 20 we tried (in spec2000 and 2k17)
> it seems fine to me.
> 
> There is issue with the way we collect our data.  Using arithmetic mean across
> spec is optimizing for overall runtime of all spec benchmarks combined
> together.  We more want to look for safer values that do well for average
> benchmarks independently how many branches they do.  We could consider making
> the statistics script also collect geometric averages across different
> benchmarks. If we will see big difference between arithmetic mean and geomean
> we will know there is small subset of benchmarks which dominates and we could
> decide what to do with the probability.

Hello.

I fully agree that proper way to do statistics is to do a we

Re: [PATCH][GCC][Testsuite] Have dg-cmp-results reject log files.

2018-01-23 Thread Tamar Christina
The 01/23/2018 11:30, Bernhard Reutner-Fischer wrote:
> On 23 January 2018 11:31:27 CET, Tamar Christina  
> wrote:
> >Hi All,
> >
> >This patch makes dg-cmp-results.sh reject the use of log files in the
> >comparison.
> >Often when given a log file dg-cmp-results will give incomplete/wrong
> >output and
> >using log instead of sum is one autocomplete tab away.
> >
> >Instead have the tool guard against such mistakes.
> 
> +if test "$OEXT" = "log"; then
> +echo " must be a sum file instead of log file."
> +exit 1
> +fi
> +
> +if test "$NEXT" = "log"; then
> +echo " must be a sum file instead of log file."
> 
> typo: new-file
> 

Thanks, Typo fixed.

> +exit 1
> +fi
>  
> >
> >Ok for trunk?
> >
> >Thanks,
> >Tamar
> >
> >contrib/
> >2018-01-23  Tamar Christina  
> >
> > * dg-cmp-results.sh: Reject log files from comparison.
> 

-- 
diff --git a/contrib/dg-cmp-results.sh b/contrib/dg-cmp-results.sh
index 5f2fed5ec3ff0c66d22bc07c84571568730fbcac..d1a7332660c35ca008d8e0d06eba8842758032c5 100755
--- a/contrib/dg-cmp-results.sh
+++ b/contrib/dg-cmp-results.sh
@@ -61,8 +61,20 @@ esac
 VARIANT="$1"
 OFILE="$2"
 OBASE=`basename "$2"`
+OEXT="${OBASE##*.}"
 NFILE="$3"
 NBASE=`basename "$3"`
+NEXT="${NBASE##*.}"
+
+if test "$OEXT" = "log"; then
+echo " must be a sum file instead of log file."
+exit 1
+fi
+
+if test "$NEXT" = "log"; then
+echo " must be a sum file instead of log file."
+exit 1
+fi
 
 echo "dg-cmp-results.sh: Verbosity is ${verbose}, Variant is \"${VARIANT}\""
 echo
@@ -82,11 +94,11 @@ fi
 unset temp
 
 # Copy out the old file's section 0.
-echo "Older log file: $OFILE"
+echo "Older summary file: $OFILE"
 sed $E -e '/^[[:space:]]+===/,$d' $OFILE
 
 # Copy out the new file's section 0.
-echo "Newer log file: $NFILE"
+echo "Newer summary file: $NFILE"
 sed $E -e '/^[[:space:]]+===/,$d' $NFILE
 
 # Create a temporary file from the old file's interesting section.



Remove explicit dg-do runs from gcc.dg/vect (PR 83889)

2018-01-23 Thread Richard Sandiford
The failures in this PR were from forcing { dg-do run } even when
vect.exp chooses options that are incompatible with the runtime.
The default vect.exp behaviour is to execute when possible, so there's
no need for a dg-do at all.

The patch removes other unconditional { dg-do run }s too.  Many of them
were already failing in the same way.

Also, the dg-do run condition in vect-reduc-or* seems unnecessary:
the test should run correctly whatever happens, and the scan tests
are already guarded properly.

Tested on aarch64-linux-gnu, arm-non-eabi, x86_64-linux-gnu and
powerpc64le-linux-gnu.  OK to install?

Richard


2018-01-23  Richard Sandiford  

gcc/testsuite/
PR testsuite/83889
* gcc.dg/vect/pr79920.c: Remove explicit dg-do run.
* gcc.dg/vect/pr80631-1.c: Likewise.
* gcc.dg/vect/pr80631-2.c: Likewise.
* gcc.dg/vect/pr81410.c: Likewise.
* gcc.dg/vect/pr81633.c: Likewise.
* gcc.dg/vect/pr81815.c: Likewise.
* gcc.dg/vect/pr82108.c: Likewise.
* gcc.dg/vect/pr83857.c: Likewise.
* gcc.dg/vect/vect-alias-check-8.c: Likewise.
* gcc.dg/vect/vect-alias-check-9.c: Likewise.
* gcc.dg/vect/vect-alias-check-10.c: Likewise.
* gcc.dg/vect/vect-alias-check-11.c: Likewise.
* gcc.dg/vect/vect-alias-check-12.c: Likewise.
* gcc.dg/vect/vect-reduc-11.c: Likewise.
* gcc.dg/vect/vect-tail-nomask-1.c: Likewise.
* gcc.dg/vect/vect-reduc-in-order-1.c: Remove dg-do run and use
dg-xfail-run-if instead.
* gcc.dg/vect/vect-reduc-in-order-2.c: Likewise.
* gcc.dg/vect/vect-reduc-in-order-3.c: Likewise.
* gcc.dg/vect/vect-reduc-in-order-4.c: Likewise.
* gcc.dg/vect/vect-reduc-or_1.c: Remove conditional dg-do run.
* gcc.dg/vect/vect-reduc-or_2.c: Likewise.

Index: gcc/testsuite/gcc.dg/vect/pr79920.c
===
--- gcc/testsuite/gcc.dg/vect/pr79920.c 2018-01-15 12:38:45.039094423 +
+++ gcc/testsuite/gcc.dg/vect/pr79920.c 2018-01-23 11:29:38.977575495 +
@@ -1,4 +1,3 @@
-/* { dg-do run } */
 /* { dg-additional-options "-O3 -fno-fast-math" } */
 
 #include "tree-vect.h"
Index: gcc/testsuite/gcc.dg/vect/pr80631-1.c
===
--- gcc/testsuite/gcc.dg/vect/pr80631-1.c   2018-01-13 17:59:52.122334084 
+
+++ gcc/testsuite/gcc.dg/vect/pr80631-1.c   2018-01-23 11:29:38.977575495 
+
@@ -1,5 +1,4 @@
 /* PR tree-optimization/80631 */
-/* { dg-do run } */
 
 #include "tree-vect.h"
 
Index: gcc/testsuite/gcc.dg/vect/pr80631-2.c
===
--- gcc/testsuite/gcc.dg/vect/pr80631-2.c   2017-12-14 00:04:52.323446529 
+
+++ gcc/testsuite/gcc.dg/vect/pr80631-2.c   2018-01-23 11:29:38.977575495 
+
@@ -1,5 +1,4 @@
 /* PR tree-optimization/80631 */
-/* { dg-do run } */
 
 #include "tree-vect.h"
 
Index: gcc/testsuite/gcc.dg/vect/pr81410.c
===
--- gcc/testsuite/gcc.dg/vect/pr81410.c 2017-07-27 10:37:55.334036950 +0100
+++ gcc/testsuite/gcc.dg/vect/pr81410.c 2018-01-23 11:29:38.977575495 +
@@ -1,4 +1,3 @@
-/* { dg-do run } */
 /* { dg-require-effective-target vect_long_long } */
 
 #include "tree-vect.h"
Index: gcc/testsuite/gcc.dg/vect/pr81633.c
===
--- gcc/testsuite/gcc.dg/vect/pr81633.c 2017-08-03 10:40:54.014105333 +0100
+++ gcc/testsuite/gcc.dg/vect/pr81633.c 2018-01-23 11:29:38.977575495 +
@@ -1,5 +1,3 @@
-/* { dg-do run } */
-
 static double identity[4][4] = {{1, 0, 0, 0},
 {0, 1, 0, 0},
 {0, 0, 1, 0},
Index: gcc/testsuite/gcc.dg/vect/pr81815.c
===
--- gcc/testsuite/gcc.dg/vect/pr81815.c 2017-08-16 08:50:54.197549943 +0100
+++ gcc/testsuite/gcc.dg/vect/pr81815.c 2018-01-23 11:29:38.978575453 +
@@ -1,5 +1,3 @@
-/* { dg-do run } */
-
 int __attribute__ ((noinline, noclone))
 f (int *x, int n)
 {
Index: gcc/testsuite/gcc.dg/vect/pr82108.c
===
--- gcc/testsuite/gcc.dg/vect/pr82108.c 2017-09-06 20:47:38.380589062 +0100
+++ gcc/testsuite/gcc.dg/vect/pr82108.c 2018-01-23 11:29:38.978575453 +
@@ -1,4 +1,3 @@
-/* { dg-do run } */
 /* { dg-require-effective-target vect_float } */
 
 #include "tree-vect.h"
Index: gcc/testsuite/gcc.dg/vect/pr83857.c
===
--- gcc/testsuite/gcc.dg/vect/pr83857.c 2018-01-16 15:13:24.937622282 +
+++ gcc/testsuite/gcc.dg/vect/pr83857.c 2018-01-23 11:29:38.978575453 +
@@ -1,4 +1,3 @@
-/* { dg-do run } */
 /* { dg-additional-options "-ffast-math" } */
 
 #define N 100
Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c
=

Re: [PATCH][GCC][Testsuite] Have dg-cmp-results reject log files.

2018-01-23 Thread Bernhard Reutner-Fischer
On 23 January 2018 11:31:27 CET, Tamar Christina  
wrote:
>Hi All,
>
>This patch makes dg-cmp-results.sh reject the use of log files in the
>comparison.
>Often when given a log file dg-cmp-results will give incomplete/wrong
>output and
>using log instead of sum is one autocomplete tab away.
>
>Instead have the tool guard against such mistakes.

+if test "$OEXT" = "log"; then
+echo " must be a sum file instead of log file."
+exit 1
+fi
+
+if test "$NEXT" = "log"; then
+echo " must be a sum file instead of log file."

typo: new-file

+exit 1
+fi
 
>
>Ok for trunk?
>
>Thanks,
>Tamar
>
>contrib/
>2018-01-23  Tamar Christina  
>
>   * dg-cmp-results.sh: Reject log files from comparison.



Fix use of boolean_true/false_node (PR 83979)

2018-01-23 Thread Richard Sandiford
r255913 changed some constant_boolean_node calls to boolean_true_node
and boolean_false_node, which meant that the returned tree didn't
always have the right type.

Tested on aarch64-linux-gnu.  Probably bordering on obvious, but just
in case: OK to install?

Richard


2018-01-23  Richard Sandiford  

gcc/
PR tree-optimization/83979
* fold-const.c (fold_comparison): Use constant_boolean_node
instead of boolean_{true,false}_node.

gcc/testsuite/
PR tree-optimization/83979
* g++.dg/pr83979.c: New test.

Index: gcc/fold-const.c
===
--- gcc/fold-const.c2018-01-16 15:13:19.643832679 +
+++ gcc/fold-const.c2018-01-23 11:23:59.982555852 +
@@ -8572,39 +8572,39 @@ fold_comparison (location_t loc, enum tr
{
case EQ_EXPR:
  if (known_eq (bitpos0, bitpos1))
-   return boolean_true_node;
+   return constant_boolean_node (true, type);
  if (known_ne (bitpos0, bitpos1))
-   return boolean_false_node;
+   return constant_boolean_node (false, type);
  break;
case NE_EXPR:
  if (known_ne (bitpos0, bitpos1))
-   return boolean_true_node;
+   return constant_boolean_node (true, type);
  if (known_eq (bitpos0, bitpos1))
-   return boolean_false_node;
+   return constant_boolean_node (false, type);
  break;
case LT_EXPR:
  if (known_lt (bitpos0, bitpos1))
-   return boolean_true_node;
+   return constant_boolean_node (true, type);
  if (known_ge (bitpos0, bitpos1))
-   return boolean_false_node;
+   return constant_boolean_node (false, type);
  break;
case LE_EXPR:
  if (known_le (bitpos0, bitpos1))
-   return boolean_true_node;
+   return constant_boolean_node (true, type);
  if (known_gt (bitpos0, bitpos1))
-   return boolean_false_node;
+   return constant_boolean_node (false, type);
  break;
case GE_EXPR:
  if (known_ge (bitpos0, bitpos1))
-   return boolean_true_node;
+   return constant_boolean_node (true, type);
  if (known_lt (bitpos0, bitpos1))
-   return boolean_false_node;
+   return constant_boolean_node (false, type);
  break;
case GT_EXPR:
  if (known_gt (bitpos0, bitpos1))
-   return boolean_true_node;
+   return constant_boolean_node (true, type);
  if (known_le (bitpos0, bitpos1))
-   return boolean_false_node;
+   return constant_boolean_node (false, type);
  break;
default:;
}
Index: gcc/testsuite/g++.dg/pr83979.c
===
--- /dev/null   2018-01-22 18:46:35.983712806 +
+++ gcc/testsuite/g++.dg/pr83979.c  2018-01-23 11:23:59.982555852 +
@@ -0,0 +1,7 @@
+/* { dg-compile } */
+
+int
+foo (char* p)
+{
+  return p + 1000 < p;
+}


Re: [PATCH PR82604]Fix regression in ftree-parallelize-loops

2018-01-23 Thread Richard Biener
On Sat, Jan 20, 2018 at 4:10 PM, Bin.Cheng  wrote:
> On Fri, Jan 19, 2018 at 5:42 PM, Bin Cheng  wrote:
>> Hi,
>> This patch is supposed to fix regression caused by loop distribution when
>> ftree-parallelize-loops.  The reason is distributed memset call can't be
>> understood/analyzed in data reference analysis, as a result, parloop can
>> only parallelize the innermost 2-level loop nest.  Before distribution
>> change, parloop can parallelize the innermost 3-level loop nest, i.e,
>> more parallelization.
>> As commented in the PR, ideally, loop distribution should be able to
>> distribute memset call for 3-level loop nest.  Unfortunately this requires
>> sophisticated work proving equality between tree expressions which gcc
>> is not good at now.
>> Another fix is to improve data reference analysis so that memset call
>> can be supported.  We don't know how big this change is and it's definitely
>> not GCC 8 task.
>>
>> So this patch fixes the regression in a bit hacking way.  It first enables
>> 3-level loop nest distribution when flag_tree_parloops > 1.  Secondly, it
>> supports 3-level loop nest distribution for ZERO-ing stmt which can only
>> be distributed as a loop (nest) of memset, but can't be distributed as a
>> single memset.  The overall effect is ZERO-ing stmt will be distributed
>> to one loop deeper than now, so parloop can parallelize as before.
>>
>> Bootstrap and test on x86_64 and AArch64 ongoing.  Is it OK if no errors?

Ok.

Thanks,
Richard.

> Test finished without error.  Also I checked
> -ftree-parallelize-loops=6 on AArch64 and can confirm the regression
> is resolved.
>
> Thanks,
> bin
>>
>> Thanks,
>> bin
>> 2018-01-19  Bin Cheng  
>>
>> PR tree-optimization/82604
>> * tree-loop-distribution.c (enum partition_kind): New enum item
>> PKIND_PARTIAL_MEMSET.
>> (partition_builtin_p): Support above new enum item.
>> (generate_code_for_partition): Ditto.
>> (compute_access_range): Differentiate cases that equality can be
>> proven at all loops, the innermost loops or no loops.
>> (classify_builtin_st, classify_builtin_ldst): Adjust call to above
>> function.  Set PKIND_PARTIAL_MEMSET for partition appropriately.
>> (finalize_partitions, distribute_loop): Don't fuse partition of
>> PKIND_PARTIAL_MEMSET kind when distributing 3-level loop nest.
>> (prepare_perfect_loop_nest): Distribute 3-level loop nest only if
>> parloop is enabled.


Re: Fix vect_float markup for a couple of tests (PR 83888)

2018-01-23 Thread Richard Sandiford
Richard Biener  writes:
> On Mon, Jan 22, 2018 at 6:55 PM, Richard Sandiford
>  wrote:
>> vect_float is true for arm*-*-* targets, but the support is only
>> available when -funsafe-math-optimizations is on.  This caused
>> failures in two tests that disable fast-math.
>>
>> The easiest fix seemed to be to add a new target selector for
>> "vect_float without special options".
>>
>> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
>> OK to install?
>
> Ok - so arm float vectors are not IEEE compliant?

Not for subnormals apparently:

; Note that NEON operations don't support the full IEEE 754 standard: in
; particular, denormal values are flushed to zero.  This means that GCC cannot
; use those instructions for autovectorization, etc. unless
; -funsafe-math-optimizations is in effect (in which case flush-to-zero
; behavior is permissible).

Thanks,
Richard


Re: [PATCH, gotools]: Fix TestCrashDumpsAllThreads testcase failure

2018-01-23 Thread Uros Bizjak
On Tue, Jan 23, 2018 at 8:32 AM, Uros Bizjak  wrote:
> On Tue, Jan 23, 2018 at 5:49 AM, Ian Lance Taylor  wrote:
>> On Sun, Jan 21, 2018 at 3:13 PM, Uros Bizjak  wrote:
>>>
>>> The default "go build" compile options over-optimize the auxiliary
>>> executable, built in TestCrashDumpsAllThreads testcase
>>> (libgo/go/runtime/crash_unix_test.go). This over-optimization results
>>> in removal of the trivial summing loop and in the inlining of the
>>> main.loop function into main.$thunk0.
>>>
>>> The testcase expects backtrace that shows several instances of
>>> main.loop in the backtrace dump. When main.loop gets inlined into
>>> thunk, its name is not dumped anymore.
>>>
>>> The solution is to pass "-O0" to gccgo, which inhibits unwanted inlining.
>>>
>>> Patch was bootstrapped and regression tested on x86_64-linux-gnu and
>>> alphev68-linux-gnu, where for the later target the patch fixed the
>>> mentioned failure.
>>
>> That sounds like a bug somewhere.  Even when one function gets inlined
>> into another, its name should still be dumped.  This is implemented by
>> report_inlined_functions in libbacktrace/dwarf.c.  While something
>> like your patch may be needed, I'd like to more clearly understand why
>> libbacktrace isn't working.
>
> If I manually compile the testcase from crash_unix_test.go wth gccgo
> -O2 (the asm is attached), the main.loop function is not even present
> in the dump.

Digging deeper into the source, here is the problem:

When inlined, the thunk reads:

0260 :
 260:   00 00 bb 27 ldahgp,0(t12)
260: GPDISP .text+0x4
 264:   00 00 bd 23 lda gp,0(gp)
 268:   08 00 10 a6 ldq a0,8(a0)
 26c:   00 00 7d a7 ldq t12,0(gp)
26c: ELF_LITERALruntime.closechan
 270:   f0 ff de 23 lda sp,-16(sp)
 274:   00 00 5e b7 stq ra,0(sp)
 278:   00 40 5b 6b jsr ra,(t12),27c 
278: LITUSE .text+0x3
278: HINT   runtime.closechan
 27c:   00 80 5f 24 ldaht1,-32768
 280:   01 05 e2 47 not t1,t0
 284:   00 00 fe 2f unop
 288:   1f 04 ff 47 nop
 28c:   00 00 fe 2f unop
 290:   ff ff 21 20 lda t0,-1(t0)
 294:   fe ff 3f f4 bne t0,290 
 298:   01 05 e2 47 not t1,t0
 29c:   fc ff ff c3 br  290 

and the (very tight) loop consists only of insns at 0x290 and 0x294.
However, the assembly dump of the loop reads:

$L23:
lda $1,-1($1)
$LBB53:
$LBB51:
$LBB49:
$LBB47:
.loc 1 13 31
bne $1,$L23
$LBE47:
$LBE49:
$LBE51:
$LBE53:

with

$Ldebug_ranges0:
.8byte  $LBB45-$Ltext0
.8byte  $LBE45-$Ltext0
.8byte  $LBB50-$Ltext0
.8byte  $LBE50-$Ltext0
.8byte  $LBB51-$Ltext0
.8byte  $LBE51-$Ltext0
.8byte  0
.8byte  0

So, those $LBB labels are at the wrong place. During runtime, we get
(oh, why can't we display PC in hex?!):

SIGQUIT: quit
PC=4831845072 m=0 sigcode=0

goroutine 7 [running]:
runtime.sighandler

/space/homedirs/uros/gcc-svn/trunk/libgo/go/runtime/signal_sighandler.go:104
runtime.sigtrampgo
/space/homedirs/uros/gcc-svn/trunk/libgo/go/runtime/signal_unix.go:312
runtime.sigtramp
/space/homedirs/uros/gcc-svn/trunk/libgo/runtime/go-signal.c:54
runtime.kickoff
/space/homedirs/uros/gcc-svn/trunk/libgo/go/runtime/proc.go:1161

0x120001ad0

000120001aa0 :
   120001aa0:   02 00 bb 27 ldahgp,2(t12)
   120001aa4:   f0 a5 bd 23 lda gp,-23056(gp)
   120001aa8:   08 00 10 a6 ldq a0,8(a0)
   120001aac:   c8 80 7d a7 ldq t12,-32568(gp)
   120001ab0:   f0 ff de 23 lda sp,-16(sp)
   120001ab4:   00 00 5e b7 stq ra,0(sp)
   120001ab8:   00 40 5b 6b jsr ra,(t12),120001abc 
   120001abc:   00 80 5f 24 ldaht1,-32768
   120001ac0:   01 05 e2 47 not t1,t0
   120001ac4:   00 00 fe 2f unop
   120001ac8:   1f 04 ff 47 nop
   120001acc:   00 00 fe 2f unop
 > 120001ad0:   ff ff 21 20 lda t0,-1(t0)
   120001ad4:   fe ff 3f f4 bne t0,120001ad0 
   120001ad8:   01 05 e2 47 not t1,t0
   120001adc:   fc ff ff c3 br  120001ad0 

which is obviously out of corresponding debug range.

Contents of the .debug_ranges section:

Offset   BeginEnd
 000120001a7c 000120001a8c
 000120001a90 000120001a9c
 
 000120001a7c 000120001a8c
 000120001a90 000120001a9c
 
0030 000120001aa8 000120001ab0
0030 000120001ab8 000120001abc
0030 000120001ad4 000120001ad8   <<< this one!
0030 

Uros.


[PATCH][GCC][Testsuite] Have dg-cmp-results reject log files.

2018-01-23 Thread Tamar Christina
Hi All,

This patch makes dg-cmp-results.sh reject the use of log files in the 
comparison.
Often when given a log file dg-cmp-results will give incomplete/wrong output and
using log instead of sum is one autocomplete tab away.

Instead have the tool guard against such mistakes.

Ok for trunk?

Thanks,
Tamar

contrib/
2018-01-23  Tamar Christina  

* dg-cmp-results.sh: Reject log files from comparison.

-- 
diff --git a/contrib/dg-cmp-results.sh b/contrib/dg-cmp-results.sh
index 5f2fed5ec3ff0c66d22bc07c84571568730fbcac..93461efbeff7c7d3b6051ab978cf97122dd5079d 100755
--- a/contrib/dg-cmp-results.sh
+++ b/contrib/dg-cmp-results.sh
@@ -61,8 +61,20 @@ esac
 VARIANT="$1"
 OFILE="$2"
 OBASE=`basename "$2"`
+OEXT="${OBASE##*.}"
 NFILE="$3"
 NBASE=`basename "$3"`
+NEXT="${NBASE##*.}"
+
+if test "$OEXT" = "log"; then
+echo " must be a sum file instead of log file."
+exit 1
+fi
+
+if test "$NEXT" = "log"; then
+echo " must be a sum file instead of log file."
+exit 1
+fi
 
 echo "dg-cmp-results.sh: Verbosity is ${verbose}, Variant is \"${VARIANT}\""
 echo
@@ -82,11 +94,11 @@ fi
 unset temp
 
 # Copy out the old file's section 0.
-echo "Older log file: $OFILE"
+echo "Older summary file: $OFILE"
 sed $E -e '/^[[:space:]]+===/,$d' $OFILE
 
 # Copy out the new file's section 0.
-echo "Newer log file: $NFILE"
+echo "Newer summary file: $NFILE"
 sed $E -e '/^[[:space:]]+===/,$d' $NFILE
 
 # Create a temporary file from the old file's interesting section.



Re: [PATCH] v2 -Warray-bounds: Fix false positive in some "switch" stmts (PR tree-optimization/83510)

2018-01-23 Thread Richard Biener
On Mon, Jan 22, 2018 at 9:05 PM, David Malcolm  wrote:
> On Fri, 2018-01-19 at 09:45 +0100, Richard Biener wrote:
>> On Fri, Jan 19, 2018 at 12:36 AM, David Malcolm 
>> wrote:
>> > PR tree-optimization/83510 reports that r255649 (for
>> > PR tree-optimization/83312) introduced a false positive for
>> > -Warray-bounds for array accesses within certain switch statements:
>> > those for which value-ranges allow more than one case to be
>> > reachable,
>> > but for which one or more of the VR-unreachable cases contain
>> > out-of-range array accesses.
>> >
>> > In the reproducer, after the switch in f is inlined into g, we have
>> > 3 cases
>> > for the switch (case 9, case 10-19, and default), within a loop
>> > that
>> > ranges from 0..9.
>> >
>> > With both the old and new code,
>> > vr_values::simplify_switch_using_ranges clears
>> > the EDGE_EXECUTABLE flag on the edge to the "case 10-19"
>> > block.  This
>> > happens during the dom walk within the substitute_and_fold_engine.
>> >
>> > With the old code, the clearing of that EDGE_EXECUTABLE flag led to
>> > the
>> >   /* Skip blocks that were found to be unreachable.  */
>> > code in the old implementation of vrp_prop::check_all_array_refs
>> > skipping
>> > the "case 10-19" block.
>> >
>> > With the new code, we have a second dom walk, and that dom_walker's
>> > ctor
>> > sets all edges to be EDGE_EXECUTABLE, losing that information.
>> >
>> > Then, dom_walker::before_dom_children (here, the subclass'
>> > check_array_bounds_dom_walker::before_dom_children) can return one
>> > edge, if
>> > there's a unique successor edge, and dom_walker::walk filters the
>> > dom walk
>> > to just that edge.
>> >
>> > Here we have two VR-valid edges (case 9 and default), and an VR-
>> > invalid
>> > successor edge (case 10-19).  There's no *unique* valid successor
>> > edge,
>> > and hence taken_edge is NULL, and the filtering in dom_walker::walk
>> > doesn't fire.
>> >
>> > Hence we've lost the filtering of the "case 10-19" BB, hence the
>> > false
>> > positive.
>> >
>> > The issue is that we have two dom walks: first within vr_values'
>> > substitute_and_fold_dom_walker (which has skip_unreachable_blocks
>> > == false),
>> > then another within vrp_prop::check_all_array_refs (with
>> > skip_unreachable_blocks == true).
>> >
>> > Each has different "knowledge" about ruling out edges due to value-
>> > ranges,
>> > but we aren't combining that information.  The former "knows" about
>> > out-edges at a particular control construct (e.g. at a switch), the
>> > latter
>> > "knows" about dominance, but only about unique successors (hence
>> > the
>> > problem when two out of three switch cases are valid).
>> >
>> > This patch combines the information by preserving the
>> > EDGE_EXECUTABLE
>> > flags from the first dom walk, and using it in the second dom walk,
>> > potentially rejecting additional edges.
>> >
>> > Doing so fixes the false positive.
>> >
>> > I attempted an alternative fix, merging the two dom walks into one,
>> > but
>> > that led to crashes in identify_jump_threads, so I went with this,
>> > as
>> > a less invasive fix.
>> >
>> > Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.
>> > OK for trunk?
>>
>> Ok, but I think you need to update the domwalk construction in
>> graphite-scop-detection.c as well - did you test w/o graphite?
>
> Thanks.  I did test with graphite enabled; the use of default args meant I
> didn't have to touch that code.
>
> That said, I've become unhappy the two bool params (both defaulting
> to false) for dom_walker's ctor, especially given that they're
> really a tristate.
>
> So here's an alternative version of the patch, which, rather than adding
> a new bool, instead introduces a 3-valued enum to explicitly capture the valid
> possibilities.  Also, having it as an enum rather than booleans makes the
> meaning clearer, and makes the places that override the default more obvious.

Ah, much nicer indeed.

> (This version of the patch *did* require touching that graphite file).
>
>> grep might be your friend...
>
> (and indeed, with an enum, it's even more grep-friendly)
>
> Successfully bootstrapped®rtested on x86_64-pc-linux-gnu (with graphite).
> OK for trunk?
> (or did you prefer the earlier patch?)

Ok.

Thanks,
Richard.

>> Thanks,
>> Richard.
>
> [...snip...]
>
>
> gcc/ChangeLog:
> PR tree-optimization/83510
> * domwalk.c (set_all_edges_as_executable): New function.
> (dom_walker::dom_walker): Convert bool param
> "skip_unreachable_blocks" to enum reachability.  Move setup of
> edge flags to set_all_edges_as_executable and only do it when
> reachability is REACHABLE_BLOCKS.
> * domwalk.h (enum dom_walker::reachability): New enum.
> (dom_walker::dom_walker): Convert bool param
> "skip_unreachable_blocks" to enum reachability.
> (set_all_edges_as_executable): New decl.
> * graphite-scop-detection.c  (gather_bbs::gather_bbs): Conv

Re: [PATCH] Fix memory leaks in sbitmap.c selftests

2018-01-23 Thread Richard Biener
On Mon, Jan 22, 2018 at 8:51 PM, David Malcolm  wrote:
> "make selftest-valgrind" shows a few leaks in sbitmap.c's selftests;
> this patch fixes them.
>
> Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.
> OK for trunk?

Ok.

> gcc/ChangeLog:
> * sbitmap.c (selftest::test_set_range): Fix memory leaks.
> (selftest::test_bit_in_range): Likewise.
> ---
>  gcc/sbitmap.c | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/gcc/sbitmap.c b/gcc/sbitmap.c
> index cf46cb2..967868a 100644
> --- a/gcc/sbitmap.c
> +++ b/gcc/sbitmap.c
> @@ -897,6 +897,7 @@ test_set_range ()
>bitmap_set_range (s, 15, 1);
>ASSERT_FALSE (bitmap_bit_in_range_p_checking (s, 1, 14));
>ASSERT_TRUE (bitmap_bit_in_range_p_checking (s, 15, 15));
> +  sbitmap_free (s);
>
>s = sbitmap_alloc (1024);
>bitmap_clear (s);
> @@ -914,6 +915,7 @@ test_set_range ()
>ASSERT_FALSE (bitmap_bit_in_range_p_checking (s, 512 + 64, 1023));
>ASSERT_TRUE (bitmap_bit_in_range_p_checking (s, 512, 512));
>ASSERT_TRUE (bitmap_bit_in_range_p_checking (s, 512 + 63, 512 + 63));
> +  sbitmap_free (s);
>  }
>
>  /* Verify bitmap_bit_in_range_p functions for sbitmap.  */
> @@ -935,6 +937,8 @@ test_bit_in_range ()
>ASSERT_TRUE (bitmap_bit_in_range_p (s, 100, 100));
>ASSERT_TRUE (bitmap_bit_p (s, 100));
>
> +  sbitmap_free (s);
> +
>s = sbitmap_alloc (64);
>bitmap_clear (s);
>bitmap_set_bit (s, 63);
> @@ -942,6 +946,7 @@ test_bit_in_range ()
>ASSERT_TRUE (bitmap_bit_in_range_p (s, 1, 63));
>ASSERT_TRUE (bitmap_bit_in_range_p (s, 63, 63));
>ASSERT_TRUE (bitmap_bit_p (s, 63));
> +  sbitmap_free (s);
>
>s = sbitmap_alloc (1024);
>bitmap_clear (s);
> @@ -985,6 +990,7 @@ test_bit_in_range ()
>ASSERT_FALSE (bitmap_bit_in_range_p (s, 17, 31));
>ASSERT_FALSE (bitmap_bit_in_range_p (s, 49, 63));
>ASSERT_FALSE (bitmap_bit_in_range_p (s, 65, 1023));
> +  sbitmap_free (s);
>  }
>
>  /* Run all of the selftests within this file.  */
> --
> 1.8.5.3
>


Re: [PATCH] Add more test coverage to selftest::test_location_wrappers

2018-01-23 Thread Richard Biener
On Mon, Jan 22, 2018 at 8:52 PM, David Malcolm  wrote:
> This patch adds a few extra assertions to selftest::test_location_wrappers.
>
> Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.
> OK for trunk?

Ok.

> gcc/ChangeLog:
> * tree.c (selftest::test_location_wrappers): Add more test
> coverage.
> ---
>  gcc/tree.c | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/gcc/tree.c b/gcc/tree.c
> index b3e93b8..c5baf08 100644
> --- a/gcc/tree.c
> +++ b/gcc/tree.c
> @@ -14490,6 +14490,8 @@ test_location_wrappers ()
>  {
>location_t loc = BUILTINS_LOCATION;
>
> +  ASSERT_EQ (NULL_TREE, maybe_wrap_with_location (NULL_TREE, loc));
> +
>/* Wrapping a constant.  */
>tree int_cst = build_int_cst (integer_type_node, 42);
>ASSERT_FALSE (CAN_HAVE_LOCATION_P (int_cst));
> @@ -14500,6 +14502,14 @@ test_location_wrappers ()
>ASSERT_EQ (loc, EXPR_LOCATION (wrapped_int_cst));
>ASSERT_EQ (int_cst, tree_strip_any_location_wrapper (wrapped_int_cst));
>
> +  /* We shouldn't add wrapper nodes for UNKNOWN_LOCATION.  */
> +  ASSERT_EQ (int_cst, maybe_wrap_with_location (int_cst, UNKNOWN_LOCATION));
> +
> +  /* We shouldn't add wrapper nodes for nodes that CAN_HAVE_LOCATION_P.  */
> +  tree cast = build1 (NOP_EXPR, char_type_node, int_cst);
> +  ASSERT_TRUE (CAN_HAVE_LOCATION_P (cast));
> +  ASSERT_EQ (cast, maybe_wrap_with_location (cast, loc));
> +
>/* Wrapping a STRING_CST.  */
>tree string_cst = build_string (4, "foo");
>ASSERT_FALSE (CAN_HAVE_LOCATION_P (string_cst));
> --
> 1.8.5.3
>


Re: Fix vect_float markup for a couple of tests (PR 83888)

2018-01-23 Thread Richard Biener
On Mon, Jan 22, 2018 at 6:55 PM, Richard Sandiford
 wrote:
> vect_float is true for arm*-*-* targets, but the support is only
> available when -funsafe-math-optimizations is on.  This caused
> failures in two tests that disable fast-math.
>
> The easiest fix seemed to be to add a new target selector for
> "vect_float without special options".
>
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> OK to install?

Ok - so arm float vectors are not IEEE compliant?

Richard.

> Richard
>
>
> 2018-01-22  Richard Sandiford  
>
> gcc/
> PR testsuite/83888
> * doc/sourcebuild.texi (vect_float): Say that the selector
> only describes the situation when -funsafe-math-optimizations is on.
> (vect_float_strict): Document.
>
> gcc/testsuite/
> PR testsuite/83888
> * lib/target-supports.exp (check_effective_target_vect_float): Say
> that the result only holds when -funsafe-math-optimizations is on.
> (check_effective_target_vect_float_strict): New procedure.
> * gcc.dg/vect/no-fast-math-vect16.c: Use vect_float_strict instead
> of vect_float.
> * gcc.dg/vect/vect-reduc-6.c: Likewise.
>
> Index: gcc/doc/sourcebuild.texi
> ===
> --- gcc/doc/sourcebuild.texi2018-01-22 17:51:03.860579049 +
> +++ gcc/doc/sourcebuild.texi2018-01-22 17:54:02.172848564 +
> @@ -1403,7 +1403,13 @@ The target's preferred vector alignment
>  alignment.
>
>  @item vect_float
> -Target supports hardware vectors of @code{float}.
> +Target supports hardware vectors of @code{float} when
> +@option{-funsafe-math-optimizations} is in effect.
> +
> +@item vect_float_strict
> +Target supports hardware vectors of @code{float} when
> +@option{-funsafe-math-optimizations} is not in effect.
> +This implies @code{vect_float}.
>
>  @item vect_int
>  Target supports hardware vectors of @code{int}.
> Index: gcc/testsuite/lib/target-supports.exp
> ===
> --- gcc/testsuite/lib/target-supports.exp   2018-01-22 17:51:03.817580787 
> +
> +++ gcc/testsuite/lib/target-supports.exp   2018-01-22 17:54:02.173848531 
> +
> @@ -5492,7 +5492,8 @@ proc check_effective_target_vect_long {
>  return $answer
>  }
>
> -# Return 1 if the target supports hardware vectors of float, 0 otherwise.
> +# Return 1 if the target supports hardware vectors of float when
> +# -funsafe-math-optimizations is enabled, 0 otherwise.
>  #
>  # This won't change for different subtargets so cache the result.
>
> @@ -5525,6 +5526,14 @@ proc check_effective_target_vect_float {
>  return $et_vect_float_saved($et_index)
>  }
>
> +# Return 1 if the target supports hardware vectors of float without
> +# -funsafe-math-optimizations being enabled, 0 otherwise.
> +
> +proc check_effective_target_vect_float_strict { } {
> +return [expr { [check_effective_target_vect_float]
> +  && ![istarget arm*-*-*] }]
> +}
> +
>  # Return 1 if the target supports hardware vectors of double, 0 otherwise.
>  #
>  # This won't change for different subtargets so cache the result.
> Index: gcc/testsuite/gcc.dg/vect/no-fast-math-vect16.c
> ===
> --- gcc/testsuite/gcc.dg/vect/no-fast-math-vect16.c 2018-01-13 
> 18:01:15.293116922 +
> +++ gcc/testsuite/gcc.dg/vect/no-fast-math-vect16.c 2018-01-22 
> 17:54:02.172848564 +
> @@ -1,4 +1,4 @@
> -/* { dg-require-effective-target vect_float } */
> +/* { dg-require-effective-target vect_float_strict } */
>
>  #include 
>  #include "tree-vect.h"
> Index: gcc/testsuite/gcc.dg/vect/vect-reduc-6.c
> ===
> --- gcc/testsuite/gcc.dg/vect/vect-reduc-6.c2018-01-13 18:01:15.294116882 
> +
> +++ gcc/testsuite/gcc.dg/vect/vect-reduc-6.c2018-01-22 17:54:02.172848564 
> +
> @@ -1,4 +1,4 @@
> -/* { dg-require-effective-target vect_float } */
> +/* { dg-require-effective-target vect_float_strict } */
>  /* { dg-additional-options "-fno-fast-math" } */
>
>  #include 


Re: Disable some patterns for fold-left reductions (PR 83965)

2018-01-23 Thread Richard Biener
On Mon, Jan 22, 2018 at 6:53 PM, Richard Sandiford
 wrote:
> In this PR we recognised a PLUS_EXPR as a fold-left reduction,
> then applied pattern matching to convert it to a WIDEN_SUM_EXPR.
> We need to keep the original code in this case since we implement
> the reduction using scalar rather than vector operations.
>
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> OK to install?

Ok.

Richard.

> Richard
>
>
> 2018-01-22  Richard Sandiford  
>
> gcc/
> PR tree-optimization/83965
> * tree-vect-patterns.c (vect_reassociating_reduction_p): New function.
> (vect_recog_dot_prod_pattern, vect_recog_sad_pattern): Use it
> instead of checking only for a reduction.
> (vect_recog_widen_sum_pattern): Likewise.
>
> gcc/testsuite/
> PR tree-optimization/83965
> * gcc.dg/vect/pr83965.c: New test.
>
> Index: gcc/tree-vect-patterns.c
> ===
> --- gcc/tree-vect-patterns.c2018-01-13 18:01:51.240735098 +
> +++ gcc/tree-vect-patterns.c2018-01-22 17:51:06.751462689 +
> @@ -217,6 +217,16 @@ vect_recog_temp_ssa_var (tree type, gimp
>return make_temp_ssa_name (type, stmt, "patt");
>  }
>
> +/* Return true if STMT_VINFO describes a reduction for which reassociation
> +   is allowed.  */
> +
> +static bool
> +vect_reassociating_reduction_p (stmt_vec_info stmt_vinfo)
> +{
> +  return (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def
> + && STMT_VINFO_REDUC_TYPE (stmt_vinfo) != FOLD_LEFT_REDUCTION);
> +}
> +
>  /* Function vect_recog_dot_prod_pattern
>
> Try to find the following pattern:
> @@ -336,7 +346,7 @@ vect_recog_dot_prod_pattern (vec  {
>gimple *def_stmt;
>
> -  if (STMT_VINFO_DEF_TYPE (stmt_vinfo) != vect_reduction_def
> +  if (!vect_reassociating_reduction_p (stmt_vinfo)
>   && ! STMT_VINFO_GROUP_FIRST_ELEMENT (stmt_vinfo))
> return NULL;
>oprnd0 = gimple_assign_rhs1 (last_stmt);
> @@ -557,7 +567,7 @@ vect_recog_sad_pattern (vec *s
>  {
>gimple *def_stmt;
>
> -  if (STMT_VINFO_DEF_TYPE (stmt_vinfo) != vect_reduction_def
> +  if (!vect_reassociating_reduction_p (stmt_vinfo)
>   && ! STMT_VINFO_GROUP_FIRST_ELEMENT (stmt_vinfo))
> return NULL;
>plus_oprnd0 = gimple_assign_rhs1 (last_stmt);
> @@ -1181,7 +1191,7 @@ vect_recog_widen_sum_pattern (vecif (gimple_assign_rhs_code (last_stmt) != PLUS_EXPR)
>  return NULL;
>
> -  if (STMT_VINFO_DEF_TYPE (stmt_vinfo) != vect_reduction_def
> +  if (!vect_reassociating_reduction_p (stmt_vinfo)
>&& ! STMT_VINFO_GROUP_FIRST_ELEMENT (stmt_vinfo))
>  return NULL;
>
> Index: gcc/testsuite/gcc.dg/vect/pr83965.c
> ===
> --- /dev/null   2018-01-22 13:31:44.608695204 +
> +++ gcc/testsuite/gcc.dg/vect/pr83965.c 2018-01-22 17:51:06.751462689 +
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-ftrapv" } */
> +
> +int
> +mac (const short *a, const short *b, int sqr, int *sum)
> +{
> +  int i;
> +  int dotp = *sum;
> +
> +  for (i = 0; i < 150; i++)
> +{
> +  dotp += b[i] * a[i];
> +  sqr += b[i] * b[i];
> +}
> +
> +  *sum = dotp;
> +  return sqr;
> +}


Re: [PACH][RFC] Pass -m -jN to gcc_build from gcc_release

2018-01-23 Thread Richard Biener
On Mon, 22 Jan 2018, Jeff Law wrote:

> On 01/22/2018 01:57 AM, Jakub Jelinek wrote:
> > On Mon, Jan 22, 2018 at 09:37:05AM +0100, Richard Biener wrote:
> >>
> >> Currently gcc_release builds GCC (for generating in-tree generated
> >> files) serially - that's prohibitly expensive and takges ages (>4h for 
> >> me).  I'm using (when I remember to edit gcc_release ...) -j6 without
> >> a problem for some years, thus the following proposal.
> >>
> >> Any recommendation for the default N?  4 might work just fine as well
> >> and no release manager should do without at least 4 cores...
> > 
> > Perhaps
> > num_cpus=1
> > if type -p getconf 2>/dev/null; then
> >   num_cpus=`getconf _NPROCESSORS_ONLN 2>/dev/null`
> >   case "$num_cpus" in
> > '' | 0* | *[!0-9]*) num_cpus=1;;
> >   esac
> > fi
> > 
> > and "-j$num_cpus" ?
> Seems reasonable to me -- I believe we do something similar on our
> various internal scripts ({redhat,fedora}-rpm-config).
> 
> jeff

Ok, I'll commit the following then if it all goes well when doing
the final GCC 7.3 release.

Richard.

Index: maintainer-scripts/gcc_release
===
--- maintainer-scripts/gcc_release  (revision 256973)
+++ maintainer-scripts/gcc_release  (working copy)
@@ -209,8 +209,16 @@ EOF
 # on at least one platform.
 inform "Building compiler"
 OBJECT_DIRECTORY=../objdir
+num_cpus=1
+if type -p getconf 2>/dev/null; then
+  num_cpus=`getconf _NPROCESSORS_ONLN 2>/dev/null`
+  case "$num_cpus" in
+   '' | 0* | *[!0-9]*) num_cpus=1;;
+  esac
+fi
 contrib/gcc_build -d ${SOURCE_DIRECTORY} -o ${OBJECT_DIRECTORY} \
-  -c "--enable-generated-files-in-srcdir --disable-multilib" build || \
+  -c "--enable-generated-files-in-srcdir --disable-multilib" \
+  -m "-j$num_cpus" build || \
   error "Could not rebuild GCC"
   fi
 


Make unlikely bb discovery more robust

2018-01-23 Thread Jan Hubicka
Hi,
this is second part of fix for wrong partitining decisions.  It makes
probabily_never_executed to ignore scaled profiles because they may be
low for invalid reasons.  This does not seem to affect size of cold section
significantly (as it is predocminantly occupied by blocks with 0 count that
remains precise).

Bootstrapped/regtested x86_64-linux, comitted.

Honza

* predict.c (probably_never_executed): Only use precise profile info.
(compute_function_frequency): Skip after inlining hack since we now
have quality checking.

Index: predict.c
===
--- predict.c   (revision 256973)
+++ predict.c   (working copy)
@@ -212,7 +212,12 @@ probably_never_executed (struct function
   gcc_checking_assert (fun);
   if (count == profile_count::zero ())
 return true;
-  if (count.initialized_p () && profile_status_for_fn (fun) == PROFILE_READ)
+  /* Do not trust adjusted counts.  This will make us to drop int cold section
+ code with low execution count as a result of inlining. These low counts
+ are not safe even with read profile and may lead us to dropping
+ code which actually gets executed into cold section of binary that is not
+ desirable.  */
+  if (count.precise_p () && profile_status_for_fn (fun) == PROFILE_READ)
 {
   int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION);
   if (count.apply_scale (unlikely_count_fraction, 1) >= profile_info->runs)
@@ -3759,15 +3764,10 @@ compute_function_frequency (void)
   return;
 }
 
-  /* Only first time try to drop function into unlikely executed.
- After inlining the roundoff errors may confuse us.
- Ipa-profile pass will drop functions only called from unlikely
- functions to unlikely and that is most of what we care about.  */
-  if (!cfun->after_inlining)
-{
-  node->frequency = NODE_FREQUENCY_UNLIKELY_EXECUTED;
-  warn_function_cold (current_function_decl);
-}
+  node->frequency = NODE_FREQUENCY_UNLIKELY_EXECUTED;
+  warn_function_cold (current_function_decl);
+  if (ENTRY_BLOCK_PTR_FOR_FN (cfun)->count.ipa() == profile_count::zero ())
+return;
   FOR_EACH_BB_FN (bb, cfun)
 {
   if (maybe_hot_bb_p (cfun, bb))


Fix quality tracking for named probabilities

2018-01-23 Thread Jan Hubicka
Hi,
this is first patch of two which aim to fix function partitioning issues where
we put basic blocks to cold sections while we should not.

This patch fixes issues where we think we have precise profile information but
we don't because we scaled it by some of named probability values.  It is nature
of these that we use them in contexts where we don't know precise outcome.
So I changed them all to guessed.  This prevents us from wrong function
splitting decisions.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

* profile-count.h (profile_probability::very_unlikely,
profile_probability::unlikely, profile_probability::even): Set
precision to guessed.
Index: profile-count.h
===
--- profile-count.h (revision 256973)
+++ profile-count.h (working copy)
@@ -164,7 +164,7 @@ public:
 {
   /* Be consistent with PROB_VERY_UNLIKELY in predict.h.  */
   profile_probability r
-= profile_probability::always ().apply_scale (1, 2000);
+= profile_probability::guessed_always ().apply_scale (1, 2000);
   r.m_val--;
   return r;
 }
@@ -172,13 +172,13 @@ public:
 {
   /* Be consistent with PROB_VERY_LIKELY in predict.h.  */
   profile_probability r
-= profile_probability::always ().apply_scale (1, 5);
+= profile_probability::guessed_always ().apply_scale (1, 5);
   r.m_val--;
   return r;
 }
   static profile_probability even ()
 {
-  return profile_probability::always ().apply_scale (1, 2);
+  return profile_probability::guessed_always ().apply_scale (1, 2);
 }
   static profile_probability very_likely ()
 {


Re: [PATCH 4/5] Remove predictors that are unrealiable.

2018-01-23 Thread Jan Hubicka
> On 01/19/2018 12:57 PM, Martin Liška wrote:
> > Yes, there's a huge difference in between CPU 2006 and 2017. Former has 63% 
> > w/ dominant edges,
> > and later one only 11%. It's caused by these 2 benchmarks with a high 
> > coverage:
> > 
> 
> Hi.
> 
> I'm sending details about the 2 edges that influence the statistics 
> significantly:
> 
> > 500.perlbench_r: regexec.c.065i.profile:
> >   negative return heuristics of edge 1368->1370: 2.0%  exec 2477714850 hit 
> > 2429863555 (98.1%)
> 
> 2477714850: 3484:S_regtry(pTHX_ regmatch_info *reginfo, char **startposp)
> -: 3485:{
> 2477714850: 3486:CHECKPOINT lastcp;
> 2477714850: 3487:REGEXP *const rx = reginfo->prog;
> 2477714850: 3488:regexp *const prog = ReANY(rx);
> 2477714850: 3489:SSize_t result;
> [snip]
> 2477714850: 8046:assert(!result ||  locinput - reginfo->strbeg >= 0);
> 2477714850: 8047:return result ?  locinput - reginfo->strbeg : -1;
> -:  8048:}
> 
> As seen it return -1 if a regex is not found, which is in case of perlbench 
> very likely branch.
> 
> > 
> > and 523.xalancbmk_r:
> > build/build_peak_gcc7-m64./NameDatatypeValidator.cpp.065i.profile:  
> > negative return heuristics of edge 3->4: 2.0%  exec 1221735072 hit 
> > 1221522453 (100.0%)
> 
> 1221735072:   74:int NameDatatypeValidator::compare(const XMLCh* const lValue
> -:   75:   , const XMLCh* const rValue
> -:   76:   ,   MemoryManager* 
> const)
> -:   77:{
> 1221735072:   78:return ( XMLString::equals(lValue, rValue)? 0 : -1);
> -:   79:}
> -:   80:
> 
> IP profile dump file:
> 
> xercesc_2_7::NameDatatypeValidator::compare (struct NameDatatypeValidator * 
> const this, const XMLCh * const lValue, const XMLCh * const rValue, struct 
> MemoryManager * const D.17157)
> {
>   bool _1;
>   int iftmp.0_2;
> 
>[count: 1221735072]:
>   # DEBUG BEGIN_STMT
>   _1 = xercesc_2_7::XMLString::equals (lValue_4(D), rValue_5(D));
>   if (_1 != 0)
> goto ; [0.02%]
>   else
> goto ; [99.98%]
> 
>[count: 1221522453]:
> 
>[count: 1221735072]:
>   # iftmp.0_2 = PHI <0(2), -1(3)>
>   return iftmp.0_2;
> 
> }
> 
> Likewise, XML values are more commonly non-equal.
> Ok, so may I mark also negative return with PROB_EVEN to track it?

Well, if we start disabling predictors just because we can find benchmark where 
they do not
perform as expected, we will need to disable them all. It is nature of 
heuristics to make
mistakes, just they should do something useful for common code.
It is not goal to make heuristics that makes no mistake.
Heuristics is useful either if it have high precision even if it cover few
branches (say noreturn), or if it have large coverage and still some hitrate 
(say opcode,
call or return based heuristics).

To make things bit more esoteric, in practice this is also bit weighted by fact
how much damage bad prediction does.  For example, call heuristics which predict
non-call path to be taken is likely going to do little damage. Even if call
path is common it is heavy and hard to optimize by presence of call and thus we
don't loose that much in common scenarios.

In the second case:
00073 // ---
00074 // Compare methods
00075 // ---
00076 int NameDatatypeValidator::compare(const XMLCh* const lValue
00077, const XMLCh* const rValue
00078,   MemoryManager* const)
00079 {
00080 return ( XMLString::equals(lValue, rValue)? 0 : -1);
00081 }

I would say it is odd coding style.  I do not see why it is not declared
bool and not returning true/false. 

First case seems bit non-standard too as it looks like usual cache lookup just
the cache frequently fails.

I would be in favor of keeping the prediction with non-50% hitrate thus unless
we run into some performance problems.
If there are only two benchmarks out of say 20 we tried (in spec2000 and 2k17)
it seems fine to me.

There is issue with the way we collect our data.  Using arithmetic mean across
spec is optimizing for overall runtime of all spec benchmarks combined
together.  We more want to look for safer values that do well for average
benchmarks independently how many branches they do.  We could consider making
the statistics script also collect geometric averages across different
benchmarks. If we will see big difference between arithmetic mean and geomean
we will know there is small subset of benchmarks which dominates and we could
decide what to do with the probability.

Honza

> 
> Thanks,
> Martin
> 
> > 
> > Ideas what to do with the predictor for GCC 8 release?
> > Martin


Re: [C++ Patch] PR 83921 ("[7/8 Regression] GCC rejects constexpr initialization of empty aggregate")

2018-01-23 Thread Paolo Carlini

.. testing completed OK.

Just in case it wasn't obvious in my previous messages, when we are 
calling check_for_uninitialized_const_var from 
potential_constant_expression_1 we can't use CP_TYPE_CONST_P as gate for 
emitting diagnostic  - as just discovered - neither we can use 
var_in_constexpr_fn, which would cause many regressions for variables in 
constexpr functions, I remember cpp1y/pr63996.C for example. 
Conservatively, !DECL_NONTRIVIALLY_INITIALIZED_P supplemented by 
default_init_uninitialized_part appears to work fine.


Thanks,
Paolo.


Re: [PATCH] Clean-up IPA profile dump output.

2018-01-23 Thread Jan Hubicka
> Hi.
> 
> I'm aware in which development stage we are. However the patch is small and 
> makes
> dump files readable. Hope such patch can be accepted even now?
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
> Martin
> 
> gcc/ChangeLog:
> 
> 2018-01-22  Martin Liska  
> 
>   * tree-profile.c (tree_profiling): Print function header to
>   aware reader which function we are working on.
>   * value-prof.c (gimple_find_values_to_profile): Do not print
>   not interesting value histograms.

OK.  How those non-interesting value histograms arrise?

Honza
> ---
>  gcc/tree-profile.c | 4 
>  gcc/value-prof.c   | 2 +-
>  2 files changed, 5 insertions(+), 1 deletion(-)
> 
> 

> diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c
> index 9d919062db1..f96bd4b9704 100644
> --- a/gcc/tree-profile.c
> +++ b/gcc/tree-profile.c
> @@ -52,6 +52,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "params.h"
>  #include "stringpool.h"
>  #include "attribs.h"
> +#include "tree-pretty-print.h"
>  
>  static GTY(()) tree gcov_type_node;
>  static GTY(()) tree tree_interval_profiler_fn;
> @@ -671,6 +672,9 @@ tree_profiling (void)
>  
>push_cfun (DECL_STRUCT_FUNCTION (node->decl));
>  
> +  if (dump_file)
> + dump_function_header (dump_file, cfun->decl, dump_flags);
> +
>/* Local pure-const may imply need to fixup the cfg.  */
>if (execute_fixup_cfg () & TODO_cleanup_cfg)
>   cleanup_tree_cfg ();
> diff --git a/gcc/value-prof.c b/gcc/value-prof.c
> index b503320f188..16cdbd64f46 100644
> --- a/gcc/value-prof.c
> +++ b/gcc/value-prof.c
> @@ -2053,7 +2053,7 @@ gimple_find_values_to_profile (histogram_values *values)
>   default:
> gcc_unreachable ();
>   }
> -  if (dump_file)
> +  if (dump_file && hist->hvalue.stmt != NULL)
>  {
> fprintf (dump_file, "Stmt ");
>print_gimple_stmt (dump_file, hist->hvalue.stmt, 0, TDF_SLIM);
> 



Re: [PATCH 2/2] Introduce prefetch-dynamic-strides option.

2018-01-23 Thread Kyrill Tkachov

Hi Luis,

On 22/01/18 13:46, Luis Machado wrote:

The following patch adds an option to control software prefetching of memory
references with non-constant/unknown strides.

Currently we prefetch these references if the pass thinks there is benefit to
doing so. But, since this is all based on heuristics, it's not always the case
that we end up with better performance.

For Falkor there is also the problem of conflicts with the hardware prefetcher,
so we need to be more conservative in terms of what we issue software prefetch
hints for.

This also aligns GCC with what LLVM does for Falkor.

Similarly to the previous patch, the defaults guarantee no change in behavior
for other targets and architectures.

I've regression-tested and bootstrapped it on aarch64-linux. No problems found.

Ok?



This also looks like a sensible approach to me with a caveat inline.
The same general comments as for patch [1/2] apply.

Thanks,
Kyrill


2018-01-22  Luis Machado  

Introduce option to control whether the software prefetch pass should
issue prefetch hints for non-constant strides.

gcc/
* config/aarch64/aarch64-protos.h (cpu_prefetch_tune)
: New const unsigned int field.
* config/aarch64/aarch64.c (generic_prefetch_tune): Update to include
prefetch_dynamic_strides.
(exynosm1_prefetch_tune): Likewise.
(thunderxt88_prefetch_tune): Likewise.
(thunderx_prefetch_tune): Likewise.
(thunderx2t99_prefetch_tune): Likewise.
(qdf24xx_prefetch_tune): Likewise. Set prefetch_dynamic_strides to 0.
(aarch64_override_options_internal): Update to set
PARAM_PREFETCH_DYNAMIC_STRIDES.
* doc/invoke.texi (prefetch-dynamic-strides): Document new option.
* params.def (PARAM_PREFETCH_DYNAMIC_STRIDES): New.
* params.h (PARAM_PREFETCH_DYNAMIC_STRIDES): Define.
* tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Account for
prefetch-dynamic-strides setting.
---
 gcc/config/aarch64/aarch64-protos.h |  3 +++
 gcc/config/aarch64/aarch64.c| 11 +++
 gcc/doc/invoke.texi | 10 ++
 gcc/params.def  |  9 +
 gcc/params.h|  2 ++
 gcc/tree-ssa-loop-prefetch.c| 10 ++
 6 files changed, 45 insertions(+)

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 8736bd9..22bd9ae 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -230,6 +230,9 @@ struct cpu_prefetch_tune
   const int l1_cache_size;
   const int l1_cache_line_size;
   const int l2_cache_size;
+  /* Whether software prefetch hints should be issued for non-constant
+ strides.  */
+  const unsigned int prefetch_dynamic_strides;


I understand that the midend PARAMs are defined as integers, but I think
the backend tuning option here is better represented as a boolean as it really
is just a yes/no decision.


   /* The minimum constant stride beyond which we should use prefetch
  hints for.  */
   const int minimum_stride;
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 0ed9f14..713b230 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -547,6 +547,7 @@ static const cpu_prefetch_tune generic_prefetch_tune =
   -1,  /* l1_cache_size  */
   -1,  /* l1_cache_line_size  */
   -1,  /* l2_cache_size  */
+  1,   /* prefetch_dynamic_strides */
   -1,  /* minimum_stride */
   -1   /* default_opt_level  */
 };
@@ -557,6 +558,7 @@ static const cpu_prefetch_tune exynosm1_prefetch_tune =
   -1,  /* l1_cache_size  */
   64,  /* l1_cache_line_size  */
   -1,  /* l2_cache_size  */
+  1,   /* prefetch_dynamic_strides */
   -1,  /* minimum_stride */
   -1   /* default_opt_level  */
 };
@@ -567,6 +569,7 @@ static const cpu_prefetch_tune qdf24xx_prefetch_tune =
   32,  /* l1_cache_size  */
   64,  /* l1_cache_line_size  */
   1024,/* l2_cache_size  */
+  0,   /* prefetch_dynamic_strides */
   2048,/* minimum_stride */
   3/* default_opt_level  */
 };
@@ -577,6 +580,7 @@ static const cpu_prefetch_tune thunderxt88_prefetch_tune =
   32,  /* l1_cache_size  */
   128, /* l1_cache_line_size  */
   16*1024, /* l2_cache_size  */
+  1,   /* prefetch_dynamic_strides */
   -1,  /* minimum_stride */
   3/* default_opt_level  */
 };
@@ -587,6 +591,7 @@ static const cpu_prefetch_tune thunderx_prefetch_tune =
   32,  /* l1_cache_size  */
   128, /* l1_cache_line_size  */
   -1,  /* l2_cache_siz

Re: [PATCH 1/2] Introduce prefetch-minimum stride option

2018-01-23 Thread Kyrill Tkachov

Hi Luis,

On 22/01/18 13:46, Luis Machado wrote:

This patch adds a new option to control the minimum stride, for a memory
reference, after which the loop prefetch pass may issue software prefetch
hints for. There are two motivations:

* Make the pass less aggressive, only issuing prefetch hints for bigger strides
that are more likely to benefit from prefetching. I've noticed a case in cpu2017
where we were issuing thousands of hints, for example.



I've noticed a large amount of prefetch hints being issued as well, but had not
analysed it further.


* For processors that have a hardware prefetcher, like Falkor, it allows the
loop prefetch pass to defer prefetching of smaller (less than the threshold)
strides to the hardware prefetcher instead. This prevents conflicts between
the software prefetcher and the hardware prefetcher.

I've noticed considerable reduction in the number of prefetch hints and
slightly positive performance numbers. This aligns GCC and LLVM in terms of
prefetch behavior for Falkor.


Do you, by any chance, have a link to the LLVM review that implemented that 
behavior?
It's okay if you don't, but I think it would be useful context.



The default settings should guarantee no changes for existing targets. Those
are free to tweak the settings as necessary.

No regressions in the testsuite and bootstrapped ok on aarch64-linux.

Ok?



Are there any benchmark numbers you can share?
I think this approach is sensible.

Since your patch touches generic code as well as AArch64
code you'll need an approval from a midend maintainer as well as an AArch64 
maintainer.
Also, GCC development is now in the regression fixing stage, so unless this 
fixes a regression
it may have to wait until GCC 9 development is opened.

Thanks,
Kyrill


2018-01-22  Luis Machado  

Introduce option to limit software prefetching to known constant
strides above a specific threshold with the goal of preventing
conflicts with a hardware prefetcher.

gcc/
* config/aarch64/aarch64-protos.h (cpu_prefetch_tune)
: New const int field.
* config/aarch64/aarch64.c (generic_prefetch_tune): Update to include
minimum_stride field.
(exynosm1_prefetch_tune): Likewise.
(thunderxt88_prefetch_tune): Likewise.
(thunderx_prefetch_tune): Likewise.
(thunderx2t99_prefetch_tune): Likewise.
(qdf24xx_prefetch_tune): Likewise. Set minimum_stride to 2048.
(aarch64_override_options_internal): Update to set
PARAM_PREFETCH_MINIMUM_STRIDE.
* doc/invoke.texi (prefetch-minimum-stride): Document new option.
* params.def (PARAM_PREFETCH_MINIMUM_STRIDE): New.
* params.h (PARAM_PREFETCH_MINIMUM_STRIDE): Define.
* tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Return false if
stride is constant and is below the minimum stride threshold.
---
 gcc/config/aarch64/aarch64-protos.h |  3 +++
 gcc/config/aarch64/aarch64.c| 13 -
 gcc/doc/invoke.texi | 15 +++
 gcc/params.def  |  9 +
 gcc/params.h|  2 ++
 gcc/tree-ssa-loop-prefetch.c| 16 
 6 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index ef1b0bc..8736bd9 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -230,6 +230,9 @@ struct cpu_prefetch_tune
   const int l1_cache_size;
   const int l1_cache_line_size;
   const int l2_cache_size;
+  /* The minimum constant stride beyond which we should use prefetch
+ hints for.  */
+  const int minimum_stride;
   const int default_opt_level;
 };

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 174310c..0ed9f14 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -547,6 +547,7 @@ static const cpu_prefetch_tune generic_prefetch_tune =
   -1,  /* l1_cache_size  */
   -1,  /* l1_cache_line_size  */
   -1,  /* l2_cache_size  */
+  -1,  /* minimum_stride */
   -1   /* default_opt_level  */
 };

@@ -556,6 +557,7 @@ static const cpu_prefetch_tune exynosm1_prefetch_tune =
   -1,  /* l1_cache_size  */
   64,  /* l1_cache_line_size  */
   -1,  /* l2_cache_size  */
+  -1,  /* minimum_stride */
   -1   /* default_opt_level  */
 };

@@ -565,7 +567,8 @@ static const cpu_prefetch_tune qdf24xx_prefetch_tune =
   32,  /* l1_cache_size  */
   64,  /* l1_cache_line_size  */
   1024,/* l2_cache_size  */
-  -1   /* default_opt_level  */
+  2048,/* minimum_stride */
+  3/* default_opt_level  */
 };

 static const cpu_prefetch_tune thunderxt88_prefetch

[PATCH] PR 83975 Set character length to 0 if unknown

2018-01-23 Thread Janne Blomqvist
When associating a variable of type character, if the length of the
target isn't known, set it to zero rather than leaving it unset.  This
is not a complete fix for making associate of characters work
properly, but papers over an ICE in the middle-end. See PR 83344 for
more details.

Regtested on x86_64-pc-linux-gnu, Ok for trunk?

gcc/fortran/ChangeLog:

2018-01-23  Janne Blomqvist  

PR 83975
PR 83344
* resolve.c (resolve_assoc_var): Set character length to zero if
target length unknown.

gcc/testsuite/ChangeLog:

2018-01-23  Janne Blomqvist  

PR 83975
PR 83344
* gfortran.dg/associate_31.f90: New test.
---
 gcc/fortran/resolve.c  | 13 -
 gcc/testsuite/gfortran.dg/associate_31.f90 |  8 
 2 files changed, 16 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/associate_31.f90

diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index a2b892a..f3fc3ca 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -8634,11 +8634,14 @@ resolve_assoc_var (gfc_symbol* sym, bool resolve_target)
   if (!sym->ts.u.cl)
sym->ts.u.cl = target->ts.u.cl;
 
-  if (!sym->ts.u.cl->length && !sym->ts.deferred
- && target->expr_type == EXPR_CONSTANT)
-   sym->ts.u.cl->length
- = gfc_get_int_expr (gfc_charlen_int_kind,
- NULL, target->value.character.length);
+  if (!sym->ts.u.cl->length && !sym->ts.deferred)
+   /* TODO: Setting the length to zero if the expression isn't
+  constant is probably not correct, but at this point we
+  don't have any better idea what it should be either.  */
+   sym->ts.u.cl->length =
+ gfc_get_int_expr (gfc_charlen_int_kind, NULL,
+   target->expr_type == EXPR_CONSTANT ?
+   target->value.character.length : 0);
 }
 
   /* If the target is a good class object, so is the associate variable.  */
diff --git a/gcc/testsuite/gfortran.dg/associate_31.f90 
b/gcc/testsuite/gfortran.dg/associate_31.f90
new file mode 100644
index 000..75774c8
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/associate_31.f90
@@ -0,0 +1,8 @@
+! { dg-do compile }
+! PR 83975, see also PR 83344
+! Testcase contributed by Gerhard Steinmetz
+subroutine s(x)
+  character(*) :: x
+  associate (y => x)
+  end associate
+end subroutine s
-- 
2.7.4



[PATCH] Clean-up IPA profile dump output.

2018-01-23 Thread Martin Liška
Hi.

I'm aware in which development stage we are. However the patch is small and 
makes
dump files readable. Hope such patch can be accepted even now?

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Martin

gcc/ChangeLog:

2018-01-22  Martin Liska  

* tree-profile.c (tree_profiling): Print function header to
aware reader which function we are working on.
* value-prof.c (gimple_find_values_to_profile): Do not print
not interesting value histograms.
---
 gcc/tree-profile.c | 4 
 gcc/value-prof.c   | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)


diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c
index 9d919062db1..f96bd4b9704 100644
--- a/gcc/tree-profile.c
+++ b/gcc/tree-profile.c
@@ -52,6 +52,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "stringpool.h"
 #include "attribs.h"
+#include "tree-pretty-print.h"
 
 static GTY(()) tree gcov_type_node;
 static GTY(()) tree tree_interval_profiler_fn;
@@ -671,6 +672,9 @@ tree_profiling (void)
 
   push_cfun (DECL_STRUCT_FUNCTION (node->decl));
 
+  if (dump_file)
+	dump_function_header (dump_file, cfun->decl, dump_flags);
+
   /* Local pure-const may imply need to fixup the cfg.  */
   if (execute_fixup_cfg () & TODO_cleanup_cfg)
 	cleanup_tree_cfg ();
diff --git a/gcc/value-prof.c b/gcc/value-prof.c
index b503320f188..16cdbd64f46 100644
--- a/gcc/value-prof.c
+++ b/gcc/value-prof.c
@@ -2053,7 +2053,7 @@ gimple_find_values_to_profile (histogram_values *values)
 	default:
 	  gcc_unreachable ();
 	}
-  if (dump_file)
+  if (dump_file && hist->hvalue.stmt != NULL)
 {
 	  fprintf (dump_file, "Stmt ");
   print_gimple_stmt (dump_file, hist->hvalue.stmt, 0, TDF_SLIM);



[PATCH, wwwdocs] Update GCC 8 release notes for NDS32 port

2018-01-23 Thread Chung-Ju Wu

Hi, Gerald,

In previous patch we add new options for NDS32 port:
  https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01397.html

So I think I need to update GCC 8 release notes as well.


Index: htdocs/gcc-8/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-8/changes.html,v
retrieving revision 1.30
diff -u -r1.30 changes.html
--- htdocs/gcc-8/changes.html   21 Jan 2018 20:01:31 -  1.30
+++ htdocs/gcc-8/changes.html   22 Jan 2018 06:28:56 -
@@ -348,7 +348,13 @@
 
 
 
-

+NDS32
+
+  
+New command-line options -mext-perf -mext-perf2 -mext-string
+have been added for performance extension instructions.
+  
+
 
 Nios II

 


Is this patch OK?


Best regards,
jasonwucj