Re: [patch,avr] Simplify genmultilib.awk

2017-05-23 Thread Georg-Johann Lay

On 22.05.2017 13:47, Denis Chertykov wrote:

2017-05-22 15:37 GMT+04:00 Georg-Johann Lay :

This patch simplifies genmultilib.awk so that it generates
MULTILIB_REQUIRED instead of the complement, MULTILIB_EXCEPTIONS.

The current awk script dates back to the days when we mapped
all the > 200 devices to the mmcu core and also parts of
--help=target to list all devices were generated from avr-mmcus.
That code rot is also cleaned up.  Moreover, the structure now allows
for simpler addition of new multilib options which are likely to come
in the near future.

The result of -print-multi-lib is:

.;
avr25;@mmcu=avr25
avr3;@mmcu=avr3
avr31;@mmcu=avr31
avr35;@mmcu=avr35
avr4;@mmcu=avr4
avr5;@mmcu=avr5
avr51;@mmcu=avr51
avr6;@mmcu=avr6
avrxmega2;@mmcu=avrxmega2
avrxmega4;@mmcu=avrxmega4
avrxmega5;@mmcu=avrxmega5
avrxmega6;@mmcu=avrxmega6
avrxmega7;@mmcu=avrxmega7
avrtiny;@mmcu=avrtiny
tiny-stack;@msp8
avr25/tiny-stack;@mmcu=avr25@msp8

hence unchanged.

Ok for trunk?

Johann


* config/avr/t-avr (AWK) [t-multilib]: Remove "-v FORMAT=Makefile"
command option from $(AWK) call.
* config/avr/genmultilib.awk: Simplify and rewrite so that it
generates MULTILIB_REQUIRED instead of MULTILIB_EXCEPTIONS.
[FORMAT]: Remove handling of variable.
* config/avr/t-multilib: Regenerate.


Approved.
Please commit.


Also committed the following change to make it work with less common
awk implementations:

https://gcc.gnu.org/r248357

Johann


Index: config/avr/genmultilib.awk
===
--- config/avr/genmultilib.awk  (revision 248332)
+++ config/avr/genmultilib.awk  (working copy)
@@ -123,7 +123,7 @@ BEGIN {
have[opts] = 1
# Some special handling for the default mmcu: Remove a
# leading "mmcu=avr2/" in order not to confuse genmultilib.
-   opts = gensub (/mmcu=avr2\//, "", 1, opts)
+   gsub (/^mmcu=avr2\//, "", opts)
if (opts != "mmcu=avr2")
m_required = m_required " \\\n\t" opts
 }





Re: Runtime checking of OpenACC parallelism dimensions clauses

2017-05-23 Thread Jakub Jelinek
On Thu, May 11, 2017 at 02:24:05PM +0200, Thomas Schwinge wrote:
> Hi!
> 
> OK for trunk?
> 
> commit 0ba48b4faf85420fbe12971afdd6e0afe70778bb
> Author: Thomas Schwinge 
> Date:   Fri May 5 16:41:59 2017 +0200
> 
> Runtime checking of OpenACC parallelism dimensions clauses
> 
> libgomp/
> * testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Rewrite.
> * testsuite/lib/libgomp.exp
> (check_effective_target_openacc_nvidia_accel_configured): New
> proc.
> * testsuite/libgomp.oacc-c++/c++.exp (check_effective_target_c)
> (check_effective_target_c++): New procs.
> * testsuite/libgomp.oacc-c/c.exp (check_effective_target_c)
> (check_effective_target_c++): Likewise.

Ok.

Jakub


Re: Web page for binaries

2017-05-23 Thread FX
Hi,

I’m suggesting we apply the following patch to provide links to macOS package 
managers, where users can download binaries for GCC on macOS. I have included 
the two major ones, Homebrew and MacPorts.

FX



mac.diff
Description: Binary data


mac.ChangeLog
Description: Binary data


Re: [testsuite, committed, PR65941] Add and use effective target rdrand

2017-05-23 Thread Rainer Orth
Hi Tom,

> On 05/11/2017 04:44 PM, Rainer Orth wrote:
>> Hi Tom,
>>
>>> 2017-05-01  Tom de Vries  
>>>
>>> PR testsuite/65941
>>> * lib/target-supports.exp (check_effective_target_rdrand): New proc.
>>
>> the new effective-target keyword needs documenting in sourcebuild.texi.
>
> Attached patch adds the missing documentation.

> 2017-05-22  Tom de Vries  
>
>   * doc/sourcebuild.texi: Document rdrand effective target.

Please include the section, something like

* doc/sourcebuild.texi (Effective-Target Keywords, Other
hardware attributes): Document rdrand.

> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> index 84d9a22..dec9227 100644
> --- a/gcc/doc/sourcebuild.texi
> +++ b/gcc/doc/sourcebuild.texi
> @@ -1883,6 +1883,9 @@ Target supporting hardware divmod insn or divmod 
> libcall.
>  @item divmod_simode
>  Target supporting hardware divmod insn or divmod libcall for SImode.
>  
> +@item rdrand
> +Target supports i386/x86-64 insn rdrand.

We use x86 for 32 and 64-bit x86, plus markup...

   Target supports x86 @code{rdrand} instruction.

Besides, please keep the list sorted alphabetically (the last three are
already random; please fix that while you're at it).

Ok with those nits fixed.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


RE: [PATCH] Match x86 family machine constraints section with constarints.md

2017-05-23 Thread Peryt, Sebastian
Gentle ping.

Thanks,
Sebastian

-Original Message-
From: Peryt, Sebastian 
Sent: Friday, April 28, 2017 11:31 AM
To: Sandra Loosemore ; gcc-patches@gcc.gnu.org
Cc: ubiz...@gmail.com; Koval, Julia ; 
kirill.yuk...@gmail.com
Subject: RE: [PATCH] Match x86 family machine constraints section with 
constarints.md

Hi,

Thank you for your comments. I edited my patch accordingly. As for some of your 
doubts:
- REX is  the opcode prefix to access 64-bit register extensions introduced in 
IA-32e mode.
- EVEX is the encoding prefix which applies to SIMD operating instructions 
operating on XMM, YMM and ZMM registers. It was introduced with AVX-512 
instructions.
- "number factor of four" that means that sources start in a multiple of 4 
boundary. This is used for some of instructions.

Also I'd like to add that this whole patch is strictly based on docstring parts 
of constraints that are present in config/i386/constraints.md but not in 
documentation (md.texi file). There is no new (new as in nonexistent in code) 
content.

I'm also adding Kirill Yukhin to CC, because I believe he is the correct person 
that can catch any technical errors if any has slipped-in.

Thanks,
Sebastian

-Original Message-
From: Sandra Loosemore [mailto:san...@codesourcery.com]
Sent: Thursday, April 27, 2017 10:17 PM
To: Peryt, Sebastian ; gcc-patches@gcc.gnu.org
Cc: ubiz...@gmail.com; Koval, Julia 
Subject: Re: [PATCH] Match x86 family machine constraints section with 
constarints.md

On 04/26/2017 08:29 AM, Peryt, Sebastian wrote:
> Hi,
>
> This patch updates x86 family machine constraints section in '16.8.5 
> Constraints for Particular Machines' section to match the ones in 
> 'config/i386/constraints.md'.
>
> gcc/
>   * doc/md.texi (Machine Constraints): Update x86 family machine 
> constraints
>  section to match 'config/i386/constraints.md'.
>
> Is it ok for trunk?

I have a few comments on grammar and markup, but I can't comment intelligently 
on whether the technical content is correct.

> @@ -4013,24 +4015,94 @@ Top of 80387 floating-point stack (@code{%st(0)}).
>  @item u
>  Second from top of 80387 floating-point stack (@code{%st(1)}).
>
> +@ifset INTERNALS
> +@item Yk
> +Any mask register that can be used as predicate, i.e. k1-k7.

s/predicate/a predicate/

Other places in this section use @code markup on literal register names.

> +
> +@item k
> +Any mask register.
> +@end ifset
> +
>  @item y
>  Any MMX register.
>
>  @item x
>  Any SSE register.
>
> +@item v
> +Any EVEX encodable SSE register (@code{%xmm0-%xmm31}).
> +
> +@ifset INTERNALS
> +@item w
> +Any bound register.
> +@end ifset
> +
>  @item Yz
>  First SSE register (@code{%xmm0}).
>
>  @ifset INTERNALS
> -@item Y2
> -Any SSE register, when SSE2 is enabled.
> -
>  @item Yi
>  Any SSE register, when SSE2 and inter-unit moves are enabled.
>
> +@item Yj
> +Any SSE register, when SSE2 and inter-unit moves from vector registers are 
> enabled.
> +
>  @item Ym
>  Any MMX register, when inter-unit moves are enabled.
> +
> +@item Yn
> +Any MMX register, when inter-unit moves from vector registers are enabled.
> +
> +@item Yp
> +Any integer register when TARGET_PARTIAL_REG_STALL is disabled.

@code markup on that.

> +
> +@item Ya
> +Any integer register when zero extensions with AND are disabled.

I'm not sure what "AND" is, but it probably needs @code markup too.
> +
> +@item Yb
> +Any register that can be used as the GOT base when calling ___tls_get_addr:

@code{___tls_get_addr}

> +that is, any general register except @code{a} and @code{sp} 
> +registers, for -fno-plt if linker supports it. Otherwise, @code{b} register.

@option{-fno-plt}

> +
> +@item Yf
> +Any x87 register when 80387 FP arithmetic is enabled.

Is "FP" a literal feature name used in the processor documentation, or do you 
just mean "floating-point arithmetic" here?

> +
> +@item Yr
> +Lower SSE register when avoiding REX prefix and all SSE registers otherwise.

I don't know what "avoiding REX prefix" means, and don't see the string "REX" 
in any other GCC documentation.

> +
> +@item Yv
> +For AVX512VL, any EVEX encodable SSE register (@code{%xmm0-%xmm31}), 
> +otherwise any SSE register.

This should probably be "EVEX-encodable", whatever that means.

> +
> +@item Yh
> +Any EVEX encodable SSE register, which has number factor of four.

Same here, but what is "number factor of four"?  Also, if this is supposed to 
designate a subset of the EVEX-encodable SSE registers rather than describe all 
of them, you need "that" instead of "which".

> +
> +@item Bf
> +Flags register operand.
> +
> +@item Bg
> +GOT memory operand.
> +
> +@item Bm
> +Vector memory operand.
> +
> +@item Bc
> +Constant memory operand.
> +
> +@item Bn
> +Memory operand without REX prefix.
> +
> +@item Bs
> +Sibcall memory operand.
> +
> +@item Bw
> +Call memory operand.
> +
> +@item Bz
> +Constant call address operand.
> +
> +@item BC
> +SSE constant -1 operand.
>  @end ifset
>
>  @item I
> @@ -4068,11 +414

Re: Runtime checking of OpenACC parallelism dimensions clauses

2017-05-23 Thread Thomas Schwinge
Hi!

On Tue, 23 May 2017 10:25:12 +0200, Jakub Jelinek  wrote:
> On Thu, May 11, 2017 at 02:24:05PM +0200, Thomas Schwinge wrote:
> > OK for trunk?

> > Runtime checking of OpenACC parallelism dimensions clauses

> Ok.

Thanks.  As posted, committed to trunk in r248358:

commit 681ad5cef0c3153f1233ef178c01ad53e7b9c405
Author: tschwinge 
Date:   Tue May 23 09:16:05 2017 +

Runtime checking of OpenACC parallelism dimensions clauses

libgomp/
* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Rewrite.
* testsuite/lib/libgomp.exp
(check_effective_target_openacc_nvidia_accel_configured): New
proc.
* testsuite/libgomp.oacc-c++/c++.exp (check_effective_target_c)
(check_effective_target_c++): New procs.
* testsuite/libgomp.oacc-c/c.exp (check_effective_target_c)
(check_effective_target_c++): Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@248358 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog  |  11 +
 libgomp/testsuite/lib/libgomp.exp  |  12 +
 libgomp/testsuite/libgomp.oacc-c++/c++.exp |   7 +
 .../libgomp.oacc-c-c++-common/parallel-dims.c  | 523 -
 libgomp/testsuite/libgomp.oacc-c/c.exp |   7 +
 5 files changed, 548 insertions(+), 12 deletions(-)

diff --git libgomp/ChangeLog libgomp/ChangeLog
index 8209f9f..8fd5f07 100644
--- libgomp/ChangeLog
+++ libgomp/ChangeLog
@@ -1,3 +1,14 @@
+2017-05-23  Thomas Schwinge  
+
+   * testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Rewrite.
+   * testsuite/lib/libgomp.exp
+   (check_effective_target_openacc_nvidia_accel_configured): New
+   proc.
+   * testsuite/libgomp.oacc-c++/c++.exp (check_effective_target_c)
+   (check_effective_target_c++): New procs.
+   * testsuite/libgomp.oacc-c/c.exp (check_effective_target_c)
+   (check_effective_target_c++): Likewise.
+
 2017-05-22  Jakub Jelinek  
 
PR middle-end/80809
diff --git libgomp/testsuite/lib/libgomp.exp libgomp/testsuite/lib/libgomp.exp
index 501a860..ea3da2c 100644
--- libgomp/testsuite/lib/libgomp.exp
+++ libgomp/testsuite/lib/libgomp.exp
@@ -359,6 +359,18 @@ proc check_effective_target_offload_device_shared_as { } {
 } ]
 }
 
+# Return 1 if configured for nvptx offloading.
+
+proc check_effective_target_openacc_nvidia_accel_configured { } {
+global offload_targets
+if { ![string match "*,nvptx,*" ",$offload_targets,"] } {
+return 0
+}
+# PR libgomp/65099: Currently, we only support offloading in 64-bit
+# configurations.
+return [is-effective-target lp64]
+}
+
 # Return 1 if at least one nvidia board is present.
 
 proc check_effective_target_openacc_nvidia_accel_present { } {
diff --git libgomp/testsuite/libgomp.oacc-c++/c++.exp 
libgomp/testsuite/libgomp.oacc-c++/c++.exp
index 608b298..9beadd6 100644
--- libgomp/testsuite/libgomp.oacc-c++/c++.exp
+++ libgomp/testsuite/libgomp.oacc-c++/c++.exp
@@ -4,6 +4,13 @@ load_lib libgomp-dg.exp
 load_gcc_lib gcc-dg.exp
 load_gcc_lib torture-options.exp
 
+proc check_effective_target_c { } {
+return 0
+}
+proc check_effective_target_c++ { } {
+return 1
+}
+
 global shlib_ext
 
 set shlib_ext [get_shlib_extension]
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
index f5766a4..d8af546 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
@@ -1,25 +1,524 @@
-/* { dg-do run { target openacc_nvidia_accel_selected } } */
+/* OpenACC parallelism dimensions clauses: num_gangs, num_workers,
+   vector_length.  */
+
+#include 
+#include 
+
+/* TODO: "(int) acc_device_*" casts because of the C++ acc_on_device wrapper
+   not behaving as expected for -O0.  */
+#pragma acc routine seq
+static unsigned int __attribute__ ((optimize ("O2"))) acc_gang ()
+{
+  if (acc_on_device ((int) acc_device_host))
+return 0;
+  else if (acc_on_device ((int) acc_device_nvidia))
+{
+  unsigned int r;
+  asm volatile ("mov.u32 %0,%%ctaid.x;" : "=r" (r));
+  return r;
+}
+  else
+__builtin_abort ();
+}
+
+#pragma acc routine seq
+static unsigned int __attribute__ ((optimize ("O2"))) acc_worker ()
+{
+  if (acc_on_device ((int) acc_device_host))
+return 0;
+  else if (acc_on_device ((int) acc_device_nvidia))
+{
+  unsigned int r;
+  asm volatile ("mov.u32 %0,%%tid.y;" : "=r" (r));
+  return r;
+}
+  else
+__builtin_abort ();
+}
+
+#pragma acc routine seq
+static unsigned int __attribute__ ((optimize ("O2"))) acc_vector ()
+{
+  if (acc_on_device ((int) acc_device_host))
+return 0;
+  else if (acc_on_device ((int) acc_device_nvidia))
+{
+  unsigned int r;
+  asm volatile ("mov.u32 %0,%%tid.x;" : "=r" (r));
+  return r;
+}
+  else
+_

Re: OpenACC 2.5 kernels construct: num_gangs, num_workers, vector_length clauses

2017-05-23 Thread Jakub Jelinek
On Thu, May 11, 2017 at 02:26:51PM +0200, Thomas Schwinge wrote:
> OpenACC 2.5 kernels construct: num_gangs, num_workers, vector_length 
> clauses
> 
> gcc/c/
> * c-parser.c (OACC_KERNELS_CLAUSE_MASK): Add
> "PRAGMA_OACC_CLAUSE_NUM_GANGS", "PRAGMA_OACC_CLAUSE_NUM_WORKERS",
> "VECTOR_LENGTH".
> gcc/cp/
> * parser.c (OACC_KERNELS_CLAUSE_MASK): Add
> "PRAGMA_OACC_CLAUSE_NUM_GANGS", "PRAGMA_OACC_CLAUSE_NUM_WORKERS",
> "VECTOR_LENGTH".
> gcc/fortran/
> * openmp.c (OACC_KERNELS_CLAUSES): Add "OMP_CLAUSE_NUM_GANGS",
> "OMP_CLAUSE_NUM_WORKERS", "OMP_CLAUSE_VECTOR_LENGTH".
> gcc/
> * omp-offload.c (execute_oacc_device_lower): Remove the
> parallelism dimensions function attributes for unparallelized
> OpenACC kernels constructs.
> gcc/testsuite/
> * c-c++-common/goacc/parallel-dims-1.c: Update.
> * c-c++-common/goacc/parallel-dims-2.c: Likewise.
> * c-c++-common/goacc/routine-1.c: Likewise.
> * c-c++-common/goacc/uninit-dim-clause.c: Likewise.
> * g++.dg/goacc/template.C: Likewise.
> * gfortran.dg/goacc/kernels-tree.f95: Likewise.
> * gfortran.dg/goacc/routine-3.f90: Likewise.
> * gfortran.dg/goacc/sie.f95: Likewise.
> * gfortran.dg/goacc/uninit-dim-clause.f95: Likewise.
> libgomp/
> * testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c: Update.
> * testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
> * testsuite/libgomp.oacc-fortran/kernels-loop-2.f95: Likewise.

Ok, thanks.

Jakub


[Patch, testsuite, committed] Fix bogus builtin-snprintf-warn-3.c failure for avr

2017-05-23 Thread Senthil Kumar Selvaraj
Hi,

  The below patch fixes failures in builtin-snprintf-warn-3.c for the
  avr target.

  The test declares a struct with an array member that has INT_MAX/32767
  elements. This causes a "type xxx is too large" error for targets like
  the avr, which have pointers smaller or equal to (16 bit) int size.

  Fixed by marking the test as unsupported for targets with ptr size <
  32. Committed as obvious.

Regards
Senthil

gcc/testsuite/

2017-05-23  Senthil Kumar Selvaraj  

* gcc.dg/tree-ssa/builtin-snprintf-warn-3.c: Require ptr32plus.

Index: gcc/testsuite/gcc.dg/tree-ssa/builtin-snprintf-warn-3.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/builtin-snprintf-warn-3.c (revision 
248360)
+++ gcc/testsuite/gcc.dg/tree-ssa/builtin-snprintf-warn-3.c (working copy)
@@ -1,6 +1,7 @@
 /* PR middle-end/79448 - unhelpful -Wformat-truncation=2 warning
{ dg-do compile }
-   { dg-options "-O2 -Wformat -Wformat-truncation=2 -ftrack-macro-expansion=0" 
} */
+   { dg-options "-O2 -Wformat -Wformat-truncation=2 -ftrack-macro-expansion=0" 
} 
+   { dg-require-effective-target ptr32plus } */
 
 typedef __SIZE_TYPE__  size_t;
 typedef __WCHAR_TYPE__ wchar_t;


Re: Make tree-ssa-strlen.c handle partial unterminated strings

2017-05-23 Thread Jakub Jelinek
On Wed, May 17, 2017 at 07:52:38AM +0100, Richard Sandiford wrote:
> 2017-05-17  Richard Sandiford  
> 
> gcc/
>   * tree-ssa-strlen.c (strinfo): Rename the length field to
>   nonzero_chars.  Add a full_string_p field.
>   (compare_nonzero_chars, zero_length_string_p): New functions.
>   (get_addr_stridx): Add an offset_out parameter.
>   Use compare_nonzero_chars.
>   (get_stridx): Update accordingly.  Use compare_nonzero_chars.
>   (new_strinfo): Update after above changes to strinfo.
>   (set_endptr_and_length): Set full_string_p.
>   (get_string_length): Update after above changes to strinfo.
>   (unshare_strinfo): Update call to new_strinfo.
>   (maybe_invalidate): Likewise.
>   (get_stridx_plus_constant): Change off to unsigned HOST_WIDE_INT.
>   Use compare_nonzero_chars and zero_string_p.  Treat nonzero_chars
>   as a uhwi instead of an shwi.  Update after above changes to
>   strinfo and new_strinfo.
>   (zero_length_string): Assert that chainsi contains full strings.
>   Use zero_length_string_p.  Update call to new_strinfo.
>   (adjust_related_strinfos): Update after above changes to strinfo.
>   Copy full_string_p from origsi.
>   (adjust_last_stmt): Use zero_length_string_p.
>   (handle_builtin_strlen): Update after above changes to strinfo and
>   new_strinfo.  Install the lhs as the string length if the previous
>   entry didn't describe a full string.
>   (handle_builtin_strchr): Update after above changes to strinfo
>   and new_strinfo.
>   (handle_builtin_strcpy): Likewise.
>   (handle_builtin_strcat): Likewise.
>   (handle_builtin_malloc): Likewise.
>   (handle_pointer_plus): Likewise.
>   (handle_builtin_memcpy): Likewise.  Track nonzero characters
>   that aren't necessarily followed by a nul terminator.
>   (handle_char_store): Likewise.
> 
> gcc/testsuite/
>   * gcc.dg/strlenopt-32.c: New testcase.
>   * gcc.dg/strlenopt-33.c: Likewise.
>   * gcc.dg/strlenopt-33g.c: Likewise.
>   * gcc.dg/strlenopt-34.c: Likewise.
>   * gcc.dg/strlenopt-35.c: Likewise.

Ok, with a small nit.

> @@ -501,8 +550,8 @@ set_endptr_and_length (location_t loc, s
>  static tree
>  get_string_length (strinfo *si)
>  {
> -  if (si->length)
> -return si->length;
> +  if (si->nonzero_chars)
> +return si->full_string_p ? si->nonzero_chars : NULL;

This should be NULL_TREE.
>  
>if (si->stmt)
>  {
> @@ -595,19 +644,19 @@ get_string_length (strinfo *si)
> for (strinfo *chainsi = verify_related_strinfos (si);
>  chainsi != NULL;
>  chainsi = get_next_strinfo (chainsi))
> - if (chainsi->length == NULL)
> + if (chainsi->nonzero_chars == NULL)

and this actually too (though it is preexisting).

For future work, it would be nice if we could handle not just
memcpy and single character stores, but also cases where a memcpy
is folded into a store of couple of adjacent bytes, say
MEM_REF[ptr, 0] = 0x12345678;
is storing 4 non-zero bytes, while = 0x345678; would be zero nonzero_chars +
full_string_p for big endian and 3 non-zero bytes plus zero byte on little
endian.  One could use native_encode_expr on the rhs and then determine the
nonzero count at the start and optional presence of a zero char afterwards.

Jakub


Re: [PATCH] Avoid signed overflow in num_get::_M_extract_int (PR libstdc++/67214)

2017-05-23 Thread Jonathan Wakely

On 22/05/17 11:08 +0100, Jonathan Wakely wrote:

On 20/05/17 15:10 +0800, Xi Ruoyao wrote:

On 2017-05-19 15:38 +0100, Jonathan Wakely wrote:

On 18/05/17 19:10 +0800, Xi Ruoyao wrote:

This UB has been hiding so long...


Indeed! Thanks for the patch.


2017-03-11  Xi Ruoyao  

PR libstdc++/67214
* include/bits/locale_facets.tcc (_M_extract_int):
  Add explicit conversion to avoid signed overflow.
---
 libstdc++-v3/include/bits/locale_facets.tcc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/locale_facets.tcc 
b/libstdc++-v3/include/bits/locale_facets.tcc
index 351190c..5f85d15 100644
--- a/libstdc++-v3/include/bits/locale_facets.tcc
+++ b/libstdc++-v3/include/bits/locale_facets.tcc
@@ -470,7 +470,8 @@ _GLIBCXX_BEGIN_NAMESPACE_LDBL
    bool __testoverflow = false;
    const __unsigned_type __max =
      (__negative && __gnu_cxx::__numeric_traits<_ValueT>::__is_signed)
-     ? -__gnu_cxx::__numeric_traits<_ValueT>::__min
+     ? -static_cast<__unsigned_type>(__gnu_cxx::
+    __numeric_traits<_ValueT>::__min)


Do we need to keep the negation, or can we just cast to
__unsigned_type?


For 2's complement we can just cast to __unsigned_type.  But for
clarity and other strange architectures I think we should keep
the negation.


https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html
"GCC only supports two's complement integer types"

I doubt that's ever going to change, but keeping the negation also
doesn't do any harm. I'll test and commit this, thanks.




Here's what I committed, which also adds a typedef for the
__numeric_traits type, to make everything more readable.

Tested powerpc64le-linux. Committed to trunk.


commit effe4603accc56b08eda6453eb1abf752cfaafba
Author: Jonathan Wakely 
Date:   Tue May 23 11:11:41 2017 +0100

PR libstdc++/67214 Avoid signed overflow in num_get::_M_extract_int

2017-05-23  Xi Ruoyao  

	PR libstdc++/67214
	* include/bits/locale_facets.tcc (num_get::_M_extract_int): Add
	explicit conversion to avoid signed overflow.

diff --git a/libstdc++-v3/include/bits/locale_facets.tcc b/libstdc++-v3/include/bits/locale_facets.tcc
index 351190c..d7710e6 100644
--- a/libstdc++-v3/include/bits/locale_facets.tcc
+++ b/libstdc++-v3/include/bits/locale_facets.tcc
@@ -375,10 +375,10 @@ _GLIBCXX_BEGIN_NAMESPACE_LDBL
   _M_extract_int(_InIter __beg, _InIter __end, ios_base& __io,
 		 ios_base::iostate& __err, _ValueT& __v) const
   {
-typedef char_traits<_CharT>			 __traits_type;
+typedef char_traits<_CharT>			__traits_type;
 	using __gnu_cxx::__add_unsigned;
-	typedef typename __add_unsigned<_ValueT>::__type __unsigned_type;
-	typedef __numpunct_cache<_CharT> __cache_type;
+	typedef typename __add_unsigned<_ValueT>::__type__unsigned_type;
+	typedef __numpunct_cache<_CharT>__cache_type;
 	__use_cache<__cache_type> __uc;
 	const locale& __loc = __io._M_getloc();
 	const __cache_type* __lc = __uc(__loc);
@@ -463,15 +463,16 @@ _GLIBCXX_BEGIN_NAMESPACE_LDBL
 			  - __num_base::_S_izero : __base);
 
 	// Extract.
+	typedef __gnu_cxx::__numeric_traits<_ValueT> __num_traits;
 	string __found_grouping;
 	if (__lc->_M_use_grouping)
 	  __found_grouping.reserve(32);
 	bool __testfail = false;
 	bool __testoverflow = false;
 	const __unsigned_type __max =
-	  (__negative && __gnu_cxx::__numeric_traits<_ValueT>::__is_signed)
-	  ? -__gnu_cxx::__numeric_traits<_ValueT>::__min
-	  : __gnu_cxx::__numeric_traits<_ValueT>::__max;
+	  (__negative && __num_traits::__is_signed)
+	  ? -static_cast<__unsigned_type>(__num_traits::__min)
+	  : __num_traits::__max;
 	const __unsigned_type __smax = __max / __base;
 	__unsigned_type __result = 0;
 	int __digit = 0;
@@ -572,11 +573,10 @@ _GLIBCXX_BEGIN_NAMESPACE_LDBL
 	  }
 	else if (__testoverflow)
 	  {
-	if (__negative
-		&& __gnu_cxx::__numeric_traits<_ValueT>::__is_signed)
-	  __v = __gnu_cxx::__numeric_traits<_ValueT>::__min;
+	if (__negative && __num_traits::__is_signed)
+	  __v = __num_traits::__min;
 	else
-	  __v = __gnu_cxx::__numeric_traits<_ValueT>::__max;
+	  __v = __num_traits::__max;
 	__err = ios_base::failbit;
 	  }
 	else


Re: [Patch] SFINAE on is_same first in variant's _Tp&& constructor

2017-05-23 Thread Jonathan Wakely

On 22/05/17 16:14 -0400, Tim Song wrote:

On Mon, May 22, 2017 at 9:21 AM, Jonathan Wakely  wrote:

On 19/05/17 22:40 -0700, Tim Shen via libstdc++ wrote:


diff --git a/libstdc++-v3/include/std/variant
b/libstdc++-v3/include/std/variant
index 0e04a820d69..b9824a5182c 100644
--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -936,9 +936,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  noexcept((is_nothrow_move_constructible_v<_Types> && ...)) =
default;

  template, variant>>,
   typename =
enable_if_t<__exactly_once<__accepted_type<_Tp&&>>
- && is_constructible_v<__accepted_type<_Tp&&>,
_Tp&&>
- && !is_same_v, variant>>>
+ && is_constructible_v<__accepted_type<_Tp&&>,
_Tp&&>>>



Does this definitely short-circuit? I seem to recall a similar case
where either Clang or GCC (I think it was Clang) was evaluating the
second default template argument even though the first had produce a
substition failure.

If we need to guarantee it short-circuits then we'd want:

 template, variant>>,
 __bool_constant<
   __exactly_once<__accepted_type<_Tp&&>>
   && is_constructible_v<__accepted_type<_Tp&&>, 
_Tp&&>>>

i.e. use __and_< is-this-type, everything-else> where
"everything-else" still uses && to avoid making the instantiations too
deep.

Also, this is another place where we could use an __is_samey
trait that does is_same, U>.



The thing that needs to be short-circuited out is the evaluation of
__accepted_type<_Tp&&>, which starts the tower of turtles.


Ah I see.


The original patch does that (assuming core issue 1227's resolution),
but the __and_ version doesn't; __and_ only short circuits the
immediate parameter, not things used in forming it.


Then the original patch is OK for trunk and gcc-7-branch.

Thank you Tim and Tim for the explanations.



Re: Remove unused "default_kind" member from gcc/omp-low.c's "struct omp_context"

2017-05-23 Thread Jakub Jelinek
On Fri, Mar 31, 2017 at 05:05:29PM +0200, Thomas Schwinge wrote:
> It would appear that ever since the first version of gcc/omp-low.c got
> committed to GCC trunk (more than ten years ago), it extraced OpenMP
> default clause data that it doesn't actually use for anything.  OK to

The last use of that field went away with
https://gcc.gnu.org/ml/gcc-patches/2005-10/msg01828.html

> commit the following cleanup to trunk in next stage 1?
> 
> commit cd157ff348694009f4043b84f47de8c62774931f
> Author: Thomas Schwinge 
> Date:   Fri Mar 31 15:26:55 2017 +0200
> 
> Remove unused "default_kind" member from gcc/omp-low.c's "struct 
> omp_context"
> 
> gcc/
> * omp-low.c (struct omp_context): Remove "default_kind" member.
> Adjust all users.

Ok for trunk.

Jakub


Re: OpenACC 2.5 default (present) clause

2017-05-23 Thread Jakub Jelinek
On Fri, May 19, 2017 at 03:46:53PM +0200, Thomas Schwinge wrote:
> > > -  /* OpenMP clause: default.  */
> > > +  /* OpenACC clause: default ( none | present ).
> > > +
> > > + OpenMP clause: default ( firstprivate | none | private | shared ). 
> > > */
> > >OMP_CLAUSE_DEFAULT,
> > >  
> > >/* OpenACC/OpenMP clause: collapse (constant-integer-expression).  */
> > 
> > I think this hunk isn't needed (plus it is not accurate anyway).
> 
> Now you got me curious: why isn't it accurate?

Because the clause in OpenMP is default ( shared | none ) for C/C++
and default ( private | firstprivate | shared | none ) for Fortran.

Jakub


Re: libgomp nvptx plugin: Debugging output for cuInit failure

2017-05-23 Thread Jakub Jelinek
On Mon, May 22, 2017 at 02:57:24PM +0200, Thomas Schwinge wrote:
> Hi!
> 
> OK for trunk?
> 
> commit 7b2f06b7d2fd23b20d81fda8be6ec7453e8b3fe3
> Author: Thomas Schwinge 
> Date:   Thu Dec 22 08:30:04 2016 +0100
> 
> libgomp nvptx plugin: Debugging output for cuInit failure
> 
> libgomp/
> * plugin/plugin-nvptx.c (nvptx_get_num_devices): Debugging output
> for cuInit failure.

Ok.

Jakub


[committed] Remove redundant quotes in c-parser.c

2017-05-23 Thread Marek Polacek
Applying to trunk.

2017-05-23  Marek Polacek  

* c-parser.c (c_parser_compound_statement_nostart): Remove redundant
quotes.

diff --git gcc/c/c-parser.c gcc/c/c-parser.c
index 2e01316..f3bcbee 100644
--- gcc/c/c-parser.c
+++ gcc/c/c-parser.c
@@ -4905,7 +4905,7 @@ c_parser_compound_statement_nostart (c_parser *parser)
   if (parser->in_if_block)
 {
  mark_valid_location_for_stdc_pragma (save_valid_for_pragma);
-  error_at (loc, """expected %<}%> before %");
+ error_at (loc, "expected %<}%> before %");
   return;
 }
   else

Marek


SSA range class and removal of VR_ANTI_RANGEs

2017-05-23 Thread Aldy Hernandez

Howdy!

For Andrew's upcoming on-demand range work I have created a range class 
for use in his engine.  Currently, the range class is only for SSA 
integers, but there is no reason why we couldn't adapt this for RTL or 
non-integer types at a later time.


The class can live outside of his work, as can be demonstrated by the 
attached patch.  With it, I was able to rewrite the post-VRP range 
information to use this class and get rid of VR_ANTI_RANGE throughout 
the compiler.  A VR_ANTI_RANGE of ~[5,10] becomes [-MIN,4][11,+MAX].


The internal structure of VRP is unchanged.  I have only changed the 
outward facing structures.


A good example of how much cleaner using an actual range rather than 
VR_ANTI_RANGEs is the change to get_size_range() in the attached patch. 
We remove an assortment of contortions into:


+  /* Remove negative numbers from the range.  */
+  irange positives;
+  range_positives (&positives, exptype, allow_zero ? 0 : 1);
+  if (positives.Intersect (*ir))
+{
+  /* Remove the unknown parts of a multi-range.
+This will transform [5,10][20,MAX] into [5,10].  */
+  if (positives.num_ranges () > 1
+ && positives.upper_bound () == wide_int (TYPE_MAX_VALUE 
(exptype)))

+   positives.remove_range (positives.num_ranges () - 1);
+
+  range[0] = wide_int_to_tree (exptype, positives.lower_bound ());
+  range[1] = wide_int_to_tree (exptype, positives.upper_bound ());
+}
+  else
+{
+  /* If removing the negative numbers didn't give us anything
+back, the entire range was negative. Leave things as they
+were, and let the caller sort it out.  */
+  range[0] = wide_int_to_tree (exptype, min);
+  range[1] = wide_int_to_tree (exptype, max);
 }

A few notes:

1. There are still two uses of VR_ANTI_RANGEs left after this patch: 
ip-cp.c and ipa-prop.c.  I didn't look too hard at these two passes, as 
they are using `struct value_range' which should only have been used 
*before* VRP finishes.  I can work on this as a follow-up.


2. I have not included ChangeLog entries, as I would hate to write them, 
and have the API or significant things change from under me as part of 
the review.  I can certainly cobble the ChangeLog entries as soon as/if 
we agree that this is an acceptable approach.


3. There are lots of unit tests to maintain sanity.  The cast ones in 
particular will definitely need refinement as Andrew's work stresses the 
class.


The attached patch passes bootstrap and tests.

I have also benchmarked an assortment of .ii files (from a previous 
bootstrap) with no noticeable difference in -O2 compilation time.  So, I 
would prefer to tackle any micro-optimizations either as a follow-up or 
if it actually becomes noticeable--. Yeah, yeah, I know.  Wishful 
thinking ;-).



Fire away!
Aldy

p.s. Please refer any comments as to why we need the class, or why we 
need on-demand range information to Andrew :-P.
diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 8ace3c2..5e48d6e 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1416,6 +1416,7 @@ OBJS = \
print-rtl-function.o \
print-tree.o \
profile.o \
+   range.o \
read-md.o \
read-rtl.o \
read-rtl-function.o \
@@ -2484,6 +2485,7 @@ GTFILES = $(CPP_ID_DATA_H) $(srcdir)/input.h 
$(srcdir)/coretypes.h \
   $(srcdir)/gimple.h \
   $(srcdir)/gimple-ssa.h \
   $(srcdir)/tree-chkp.c \
+  $(srcdir)/range.h $(srcdir)/range.c \
   $(srcdir)/tree-ssanames.c $(srcdir)/tree-eh.c $(srcdir)/tree-ssa-address.c \
   $(srcdir)/tree-cfg.c $(srcdir)/tree-ssa-loop-ivopts.c \
   $(srcdir)/tree-dfa.c \
diff --git a/gcc/builtins.c b/gcc/builtins.c
index 4f6c9c4..328f028 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tm_p.h"
 #include "stringpool.h"
 #include "tree-vrp.h"
+#include "range.h"
 #include "tree-ssanames.h"
 #include "expmed.h"
 #include "optabs.h"
@@ -2894,6 +2895,52 @@ builtin_memcpy_read_str (void *data, HOST_WIDE_INT 
offset,
   return c_readstr (str + offset, mode);
 }
 
+/* If a range IR may have wrapped in such a way that we can guess the
+   range is positive, return TRUE and set PROBABLE_MAX_SIZE.
+   Otherwise, return FALSE and leave PROBABLE_MAX_SIZE unchanged.  */
+
+static bool
+range_may_have_wrapped (irange ir,
+   unsigned HOST_WIDE_INT *probable_max_size)
+{
+  /* Code like:
+
+   signed int n;
+   if (n < 100)
+ {
+   # RANGE [0, 99][0x8000, 0x]
+  _1 = (unsigned) n;
+  memcpy (a, b, _1)
+ }
+
+ Produce a range allowing negative values of N.  We can still use
+ the information and make a guess that N is not negative.  */
+  if (ir.num_ranges () != 2
+  || ir.lower_bound () != 0)
+return false;
+
+  tree type = ir.get_type ();
+  unsigned precision = TYPE_PRECISION (type);
+  gcc_assert (TYPE_UNSIGNED (type));
+
+  /* Build a ra

[C++ PATCH] using decls

2017-05-23 Thread Nathan Sidwell
This patch addresses namespace and local scope using declarations. 
Unlike the using directive case, we already had separate workers for 
these, they just needed a bit of cleanup to use the new iterator and 
make them ready for the pushdecl change that's coming.  I renamed them 
consistently with the directive name change I made yesterday.


There'll be a final change to these once the other pieces have landed.

nathan
--
Nathan Sidwell
2017-05-23  Nathan Sidwell  

	gcc/cp
	* cp-tree.h (OVL_P): New.
	* name-lookup.h (push_local_binding): Delete.
	(do_toplevel_using_decl, do_local_using_decl): Rename to ...
	(finish_namespace_using_decl, finish_local_using_decl): ... here
	* name-lookup.c (add_decl_to_level): Swap args.
	(pop_bindings_and_leave_scope): Look inside TREE_LIST.
	(diagnose_name_conflict): Check contexts are same for redecl.
	(update_local_overload): New.
	(compparms_for_decl_and_using): Rename to ...
	(matching_fn_p): ... here.
	(pushdecl_maybe_friend_1): Adjust add_decl_to_level,
	push_local_bindings call.
	(push_local_binding): Make static, replace FLAGS arg with
	IS_USING.
	(validate_nonmember_using_decl): Use OVL_FIRST.
	(do_nonmember_using_decl): Use in/out parameters.  Use
	lkp_iterator and simplify.
	(do_toplevel_using_decl, do_local_using_decl): Rename to ...
	(finish_namespace_using_decl, finish_local_using_decl): ... here.
	Adjust.
	(lookup_type_current_level): Delete.
	* parser.c (cp_parser_using_declaration): Adjust.
	* pt.c (tsubst_expr): Adjust.

	libcc1/
	* libcp1plugin.cc (plugin_add_using_decl): Call
	finish_namespace_using_decl.  Use assert not unreachable.

	gcc/testsuite/
	* g++.dg/lookup/using13.C: Adjust expected error.

Index: gcc/cp/cp-tree.h
===
--- gcc/cp/cp-tree.h	(revision 248338)
+++ gcc/cp/cp-tree.h	(working copy)
@@ -671,6 +671,11 @@ typedef struct ptrmem_cst * ptrmem_cst_t
 /* The name of the overload set.  */
 #define OVL_NAME(NODE) DECL_NAME (OVL_FIRST (NODE))
 
+/* Whether this is a set of overloaded functions.  TEMPLATE_DECLS are
+   always wrapped in an OVERLOAD, so we don't need to check them
+   here.  */
+#define OVL_P(NODE) \
+  (TREE_CODE (NODE) == FUNCTION_DECL || TREE_CODE (NODE) == OVERLOAD)
 /* Whether this is a single member overload.  */
 #define OVL_SINGLE_P(NODE) \
   (TREE_CODE (NODE) != OVERLOAD || !OVL_CHAIN (NODE))
Index: gcc/cp/name-lookup.c
===
--- gcc/cp/name-lookup.c	(revision 248338)
+++ gcc/cp/name-lookup.c	(working copy)
@@ -38,6 +38,7 @@ static cp_binding_level *innermost_noncl
 static void set_identifier_type_value_with_scope (tree id, tree decl,
 		  cp_binding_level *b);
 static void set_namespace_binding (tree scope, tree name, tree val);
+static void push_local_binding (tree, tree, bool);
 
 /* The bindings for a particular name in a particular scope.  */
 
@@ -58,14 +59,13 @@ static void consider_binding_level (tree
 cp_binding_level *lvl,
 bool look_within_fields,
 enum lookup_name_fuzzy_kind kind);
-static tree lookup_type_current_level (tree);
 static tree push_using_directive (tree);
 static void diagnose_name_conflict (tree, tree);
 
 /* Add DECL to the list of things declared in B.  */
 
 static void
-add_decl_to_level (tree decl, cp_binding_level *b)
+add_decl_to_level (cp_binding_level *b, tree decl)
 {
   /* We used to record virtual tables as if they were ordinary
  variables, but no longer do so.  */
@@ -995,7 +995,13 @@ void
 pop_bindings_and_leave_scope (void)
 {
   for (tree t = get_local_decls (); t; t = DECL_CHAIN (t))
-pop_local_binding (DECL_NAME (t), t);
+{
+  tree decl = TREE_CODE (t) == TREE_LIST ? TREE_VALUE (t) : t;
+  tree name = OVL_NAME (decl);
+
+  pop_local_binding (name, decl);
+}
+
   leave_scope ();
 }
 
@@ -1179,7 +1185,8 @@ diagnose_name_conflict (tree decl, tree
   && (TREE_CODE (decl) != TYPE_DECL
 	  || (DECL_ARTIFICIAL (decl) && DECL_ARTIFICIAL (bval))
 	  || (!DECL_ARTIFICIAL (decl) && !DECL_ARTIFICIAL (bval)))
-  && !is_overloaded_fn (decl))
+  && !DECL_DECLARES_FUNCTION_P (decl)
+  && CP_DECL_CONTEXT (decl) == CP_DECL_CONTEXT (bval))
 error ("redeclaration of %q#D", decl);
   else
 error ("%q#D conflicts with a previous declaration", decl);
@@ -1199,6 +1206,56 @@ supplement_binding (cxx_binding *binding
   return ret;
 }
 
+/* Replace BINDING's current value on its scope's name list with
+   NEWVAL.  */
+
+static void
+update_local_overload (cxx_binding *binding, tree newval)
+{
+  tree *d;
+
+  for (d = &binding->scope->names; ; d = &TREE_CHAIN (*d))
+if (*d == binding->value)
+  {
+	/* Stitch new list node in.  */
+	*d = tree_cons (NULL_TREE, NULL_TREE, TREE_CHAIN (*d));
+	break;
+  }
+else if (TREE_CODE (*d) == TREE_LIST && TREE_VALUE (*d) == binding->value)
+  break;
+
+  TREE_VALUE (*d) = newval;
+}
+
+/* Compares the parameter-type-lists of ONE and TWO and
+

[PATCH] rs6000: Fix for separate shrink-wrapping for fp (PR80860, PR80843)

2017-05-23 Thread Segher Boessenkool
After my r248256, rs6000_components_for_bb allocates an sbitmap of size
only 32 while it can use up to 64.  This patch fixes it.  It moves the
n_components variable into the machine_function struct so that other
hooks can use it.

Tested on powerpc64-linux {-m32,-m64}.  Will also test on AIX before
committing.

I cannot reproduce PR80843 (I have trouble building a 32-bit hosted
compiler), but I suspect it is the same problem.


Segher


2017-05-23  Segher Boessenkool  

PR bootstrap/80860
PR bootstrap/80843
* config/rs6000/rs6000.c (struct machine_function): Add new field
n_components.
(rs6000_get_separate_components): Init that field, use it.
(rs6000_components_for_bb): Use the field.

---
 gcc/config/rs6000/rs6000.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index a017afc..f45bb7a 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -154,6 +154,8 @@ typedef struct GTY(()) machine_function
   bool split_stack_argp_used;
   /* Flag if r2 setup is needed with ELFv2 ABI.  */
   bool r2_setup_needed;
+  /* The number of components we use for separate shrink-wrapping.  */
+  int n_components;
   /* The components already handled by separate shrink-wrapping, which should
  not be considered by the prologue and epilogue.  */
   bool gpr_is_wrapped_separately[32];
@@ -29368,9 +29370,9 @@ rs6000_get_separate_components (void)
  Components 13..31 are the save/restore of GPR13..GPR31.
  Components 46..63 are the save/restore of FPR14..FPR31.  */
 
-  int n_components = 64;
+  cfun->machine->n_components = 64;
 
-  sbitmap components = sbitmap_alloc (n_components);
+  sbitmap components = sbitmap_alloc (cfun->machine->n_components);
   bitmap_clear (components);
 
   int reg_size = TARGET_32BIT ? 4 : 8;
@@ -29462,7 +29464,7 @@ rs6000_components_for_bb (basic_block bb)
   bitmap gen = &DF_LIVE_BB_INFO (bb)->gen;
   bitmap kill = &DF_LIVE_BB_INFO (bb)->kill;
 
-  sbitmap components = sbitmap_alloc (32);
+  sbitmap components = sbitmap_alloc (cfun->machine->n_components);
   bitmap_clear (components);
 
   /* A register is used in a bb if it is in the IN, GEN, or KILL sets.  */
-- 
1.9.3



Re: C/C++ OpenACC: acc_pcopyin, acc_pcreate

2017-05-23 Thread Jakub Jelinek
On Mon, May 22, 2017 at 04:26:48PM +0200, Thomas Schwinge wrote:
> In , we currently describe acc_pcopyin, acc_pcreate as "old
> names", but they're not "old" but really "alternative names", with the
> intention to provide them at symbol level, not via "#define"s.  OK for
> trunk?

No.

> * libgomp.map (OACC_2.0): Add "acc_pcopyin", and "acc_pcreate".

> --- libgomp/libgomp.map
> +++ libgomp/libgomp.map
> @@ -335,6 +335,7 @@ OACC_2.0 {
>   acc_copyin_64_h_;
>   acc_copyin_array_h_;
>   acc_present_or_copyin;
> + acc_pcopyin;
>   acc_present_or_copyin_32_h_;
>   acc_present_or_copyin_64_h_;
>   acc_present_or_copyin_array_h_;
> @@ -343,6 +344,7 @@ OACC_2.0 {
>   acc_create_64_h_;
>   acc_create_array_h_;
>   acc_present_or_create;
> + acc_pcreate;
>   acc_present_or_create_32_h_;
>   acc_present_or_create_64_h_;
>   acc_present_or_create_array_h_;

This is just wrong, new symbols should never be added to an existing symbol
version after a GCC version with that symbol version has been released.
You can add it into OACC_2.0.1, or OACC_1.0, or whatever else, but certainly
not into OACC_2.0.

Another option is just to use something like glibc's sys/cdefs.h
__REDIRECT_NTH macro (including the __USER_LABEL_PREFIX__ stuff)
and just declare those functions as having the asm name of the corresponding
alias.  The openacc.h header is for use with GCC only anyway, right?
Unless OpenACC allows one to declare those functions himself and expect
to be able to call them...

Jakub


Re: Translate libgomp.oacc-c-c++-common/lib-32.c into Fortran (was: C/C++ OpenACC: acc_pcopyin, acc_pcreate)

2017-05-23 Thread Jakub Jelinek
On Mon, May 22, 2017 at 04:29:51PM +0200, Thomas Schwinge wrote:
> Hi!
> 
> On Mon, 22 May 2017 16:26:48 +0200, I wrote:
> > C/C++ OpenACC: acc_pcopyin, acc_pcreate
> 
> > libgomp/
> > [...]
> > * testsuite/libgomp.oacc-c-c++-common/lib-38.c: Remove, merging
> > its content into...
> > * testsuite/libgomp.oacc-c-c++-common/lib-32.c: ... this file.
> > Extend testing.
> 
> ... which I then translated into two Fortran variants; OK for trunk?
> 
> commit 04ee44504e6256ee95ae3c6ba91a5960265b3261
> Author: Thomas Schwinge 
> Date:   Wed Apr 19 15:38:51 2017 +0200
> 
> Translate libgomp.oacc-c-c++-common/lib-32.c into Fortran
> 
> libgomp/
> * testsuite/libgomp.oacc-fortran/lib-32-1.f: New file.
> * testsuite/libgomp.oacc-fortran/lib-32-2.f: Likewise.

Ok.

Jakub


Re: Implementing OpenACC's Fortran module

2017-05-23 Thread Jakub Jelinek
On Mon, May 22, 2017 at 05:23:25PM +0200, Thomas Schwinge wrote:
> --- libgomp/openacc_lib.h
> +++ libgomp/openacc_lib.h
> @@ -191,23 +191,9 @@
>end interface
>  
>interface acc_pcopyin
> -subroutine acc_pcopyin_32_h (a, len)
> -  use iso_c_binding, only: c_int32_t
> -  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> -  type (*), dimension (*) :: a
> -  integer (c_int32_t) len
> -end subroutine
> -
> -subroutine acc_pcopyin_64_h (a, len)
> -  use iso_c_binding, only: c_int64_t
> -  !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> -  type (*), dimension (*) :: a
> -  integer (c_int64_t) len
> -end subroutine
> -
> -subroutine acc_pcopyin_array_h (a)
> -  type (*), dimension (..), contiguous :: a
> -  end subroutine
> +procedure :: acc_present_or_copyin_32_h
> +procedure :: acc_present_or_copyin_64_h
> +procedure :: acc_present_or_copyin_array_h
>end interface

My Fortran knowledge is limited, does this actually provide
the interfaces of those procedures (what arguments they have
and other their properties), or shall it instead use
subroutine acc_present_or_copyin_32_h (a, len)
  ...
end subroutine
etc. there?

Jakub


Re: OpenACC 1.0 compatibility: acc_async_wait, acc_async_wait_all

2017-05-23 Thread Jakub Jelinek
On Mon, May 22, 2017 at 07:42:38PM +0200, Thomas Schwinge wrote:
> Hi!
> 
> For OpenACC 1.0 compatibility, we need to provide the aliases
> acc_async_wait, and acc_async_wait_all for acc_wait, and acc_wait_all,
> respectively.  OK for trunk?
> 
> commit 0f8302913db6d2c23804a3463c51a47e623e76b2
> Author: Thomas Schwinge 
> Date:   Mon May 22 19:22:24 2017 +0200
> 
> OpenACC 1.0 compatibility: acc_async_wait, acc_async_wait_all
> 
> libgomp/
> * openacc.h (acc_async_wait, acc_async_wait_all): New prototypes.
> * libgomp.map (OACC_2.0): Add these.
> * oacc-async.c (acc_async_wait, acc_async_wait_all): New aliases
> for "acc_wait", and "acc_wait_all", respectively.
> * openacc.f90 (acc_async_wait, acc_async_wait_all): New interfaces
> for "acc_wait", and "acc_wait_all", respectively.
> * openacc_lib.h (acc_async_wait, acc_async_wait_all): Likewise.
> * libgomp.texi (acc_wait, acc_wait_all): Update.
> * testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c: Update.
> * testsuite/libgomp.oacc-fortran/par-reduction-2-1.f: New file.
> * testsuite/libgomp.oacc-fortran/par-reduction-2-2.f: Likewise.

All I've said about acc_pcopyin applies here too.  Except that OACC_1.0
as symbol version might look like a good idea here.

Jakub


Re: SSA range class and removal of VR_ANTI_RANGEs

2017-05-23 Thread Nathan Sidwell

On 05/23/2017 06:48 AM, Aldy Hernandez wrote:

The class can live outside of his work, as can be demonstrated by the 
attached patch.  With it, I was able to rewrite the post-VRP range 
information to use this class and get rid of VR_ANTI_RANGE throughout 
the compiler.  A VR_ANTI_RANGE of ~[5,10] becomes [-MIN,4][11,+MAX].


Seems useful.


+  /* Remove negative numbers from the range.  */
+  irange positives;
+  range_positives (&positives, exptype, allow_zero ? 0 : 1);


'allow_zero ? 0 : 1' looks mighty strange. I know that's a nit, but you 
placed it front and centre!



+  if (positives.Intersect (*ir))


I notice you have a number of Uppercase member fns ...

nathan

--
Nathan Sidwell


Re: [PATCH] Implement non-trivial std::random_device::entropy (PR libstdc++/67578)

2017-05-23 Thread Jonathan Wakely

On 22/05/17 22:10 +0800, Xi Ruoyao wrote:

In very old Linux kernel (1.3.x) there is random.h but not RNDGETENTCNT.
The random.h without RNDGETENTCNT was only used in the kernel internally.
But at that time Linux Makefile didn't have "headers_install" target
so user may copy all headers (including the internal ones) to /usr/include.
It seems we should check RNDGETENTCNT explicitly if we want to support
1.3.x.


OK, I think that's ancient enough that we don't need to care about it,
but on the other hand it's not impossible to believe that somebody
could be using a custom Linux kernel that removed it. So let's keep
the test for it.




Re: SSA range class and removal of VR_ANTI_RANGEs

2017-05-23 Thread Jakub Jelinek
On Tue, May 23, 2017 at 06:48:15AM -0400, Aldy Hernandez wrote:
> --- a/gcc/tree-ssanames.h
> +++ b/gcc/tree-ssanames.h
> @@ -45,14 +45,12 @@ struct GTY(()) ptr_info_def
>unsigned int misalign;
>  };
>  
> -/* Value range information for SSA_NAMEs representing non-pointer variables. 
>  */
> -
> -struct GTY ((variable_size)) range_info_def {
> -  /* Minimum, maximum and nonzero bits.  */
> -  TRAILING_WIDE_INT_ACCESSOR (min, ints, 0)
> -  TRAILING_WIDE_INT_ACCESSOR (max, ints, 1)
> -  TRAILING_WIDE_INT_ACCESSOR (nonzero_bits, ints, 2)
> -  trailing_wide_ints <3> ints;
> +/* Used bits information for SSA_NAMEs representing non-pointer variables.  
> */
> +
> +struct GTY ((variable_size)) nonzero_bits_def {
> +  /* Mask representing which bits are known to be used in an integer.  */
> +  TRAILING_WIDE_INT_ACCESSOR (nonzero_bits, ints, 0)
> +  trailing_wide_ints <1> ints;
>  };
>  
>  
> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -1416,11 +1413,15 @@ struct GTY(()) tree_ssa_name {
>union ssa_name_info_type {
>  /* Pointer attributes used for alias analysis.  */
>  struct GTY ((tag ("0"))) ptr_info_def *ptr_info;
> -/* Value range attributes used for zero/sign extension elimination.  */
> -struct GTY ((tag ("1"))) range_info_def *range_info;
> +/* Value range attributes.  */
> +class GTY ((tag ("1"))) irange *range_info;
>} GTY ((desc ("%1.typed.type ?" \
>   "!POINTER_TYPE_P (TREE_TYPE ((tree)&%1)) : 2"))) info;
>  
> +  /* For non-pointer types, this specifies a mask for the bits that
> + are known to be set.  */
> +  struct nonzero_bits_def *nonzero_bits;
> +
>/* Immediate uses list for this SSA_NAME.  */
>struct ssa_use_operand_t imm_uses;
>  };

I'm worried a lot here about compile time memory usage increase, especially
with EVRP and IPA-VRP and even more so with LTO.
The changes mean that for every SSA_NAME of any kind we now need 8 more
bytes, and for every SSA_NAME that has range info (typically both range info
and nonzero_bits; in the old implementation the two were tied together for a
good reason, many ranges also imply some non-zero bits and from non-zero
bits one can in many cases derive a range) much more than that (through
the nonzero_bits_def you get all the overhead of trailing_wide_ints -
the 3 fields it has, just save on the actual 2 lengths and 2 HWI sets,
but then you add 3x 8 bytes and 6x size of the whole wide_int which is
from what I understood not really meant as the memory storage of wide ints
in data structures, but as something not memory efficient you can work
quickly on (so ideal for automatic variables in functions that work with
those).  So it is a 8 byte growth for SSA_NAMEs without ranges and
8+3*8+6*32-2*4-2*8*x growth for SSA_NAMEs with ranges if x is the number
of HWIs needed to represent the integers of that type (so most commonly
x=1).  For x=1 8+3*8+6*32-2*4-2*8*x is 200 bytes growth.
With say 1000 SSA_NAMEs, 500 of them with ranges, that will be
already a 1GB difference, dunno how many SSA_NAMEs are there e.g. in firefox
LTO build.
Can the nonzero_bits stuff be folded back into irange (and have code to
maintain nonzero_bits in addition to ranges as before (from setting a range
compute or update nonzero_bits and vice versa)?  Can you use
trailing_wide_int for the irange storage of the values?  Can you allocate
only as many HWIs as you really need, rather than always 6?
Again, it can be different between a class you use for accessing the
information and manipulating it and for actual underlying storage on
SSA_NAMEs.

> --- /dev/null
> +++ b/gcc/range.h
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +.  */
> +
> +#ifndef GCC_RANGE_H
> +#define GCC_RANGE_H
> +#define MAX_RANGES   6
> +
> +typedef class irange *irange_p;
> +enum irange_type { RANGE_PLAIN, RANGE_INVERT };
> +
> +class GTY(()) irange
> +{
> + private:
> +  bool overflow;
> +  size_t n;
> +  void prepend (wide_int x, wide_int y);
> +  void append (wide_int x, wide_int y);
> +  void remove (unsigned i, unsigned j);
> +  void canonicalize ();
> +  /* This is stupid.  These two should be private, but the GTY
> + machinery can't look inside an irange.  */
> + public:
> +  tree type;
> +  wide_int bounds[MAX_RANGES];
> +
> +public:
...
> +  void Union (wide_int x, wide_int y);
> +  bool Union (const irange &r);
> +  bool Union (const irange &r1, const irange &r2);

Do we really want methods starting with capital letters?
I understand why you can't use union, but I don't think we use CamelCase
anywhere.

Jakub


[PING][RFC, PATCH][ASAN] Implement dynamic allocas/VLAs sanitization.

2017-05-23 Thread Maxim Ostapenko

Hi,

could someone take a look on this patch: 
https://gcc.gnu.org/ml/gcc-patches/2017-05/msg01374.html ?


Thanks,
-Maxim


Avoid ICE in fnsummary on complex predicates

2017-05-23 Thread Jan Hubicka
Hi,
as noticed when bootstrapping ppc, it may happen that !exec does not imply
nonconst becuase predicates are computed conservatively and we can throw away
some information.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

* ipa-fnsummary.c (estimate_node_size_and_time): Do not sanity check
that nonconst implies exec.
Index: ipa-fnsummary.c
===
--- ipa-fnsummary.c (revision 248365)
+++ ipa-fnsummary.c (working copy)
@@ -2738,11 +2738,14 @@ estimate_node_size_and_time (struct cgra
 
   for (i = 0; vec_safe_iterate (info->size_time_table, i, &e); i++)
 {
-  bool nonconst = e->nonconst_predicate.evaluate (possible_truths);
   bool exec = e->exec_predicate.evaluate (nonspec_possible_truths);
-  gcc_assert (!nonconst || exec);
+
+  /* Because predicates are conservative, it can happen that nonconst is 1
+but exec is 0.  */
   if (exec)
 {
+  bool nonconst = e->nonconst_predicate.evaluate (possible_truths);
+
  gcc_checking_assert (e->time >= 0);
  gcc_checking_assert (time >= 0);
 


[PATCH][www] Deprecate MPX for GCC 7

2017-05-23 Thread Richard Biener

Appearantly we forgot to officially deprecate MPX in GCC 7 changes.html.

The following adds the missing note.

Ok?

Thanks,
Richard.

2017-05-23  Richard Biener  

* gcc-7/changes.html: Deprecate MPX.

Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v
retrieving revision 1.85
diff -u -r1.85 changes.html
--- changes.html10 May 2017 11:39:41 -  1.85
+++ changes.html23 May 2017 12:48:52 -
@@ -1002,6 +1002,8 @@
   (4FMAPS), AVX-512 Vector Neural Network Instructions Word variable precision
   (4VNNIW), AVX-512 Vector Population Count (VPOPCNTDQ) and Software
   Guard Extensions (SGX) ISA extensions has been added.
+  Support for the Intel memory protection extension (MPX) has been
+  deprecated.
 
 
 


Re: [RFC, testsuite] Add dg-save-linenr

2017-05-23 Thread Tom de Vries

On 05/22/2017 06:55 PM, Tom de Vries wrote:

Besides, it may be worthwhile contributing/suggesting this upstream.


Will do.


Sent 'relative line numbers and dg-line directive' to dejagnu ml ( 
http://lists.gnu.org/archive/html/dejagnu/2017-05/msg0.html ).


Thanks,
- Tom


Re: [RFC] propagate malloc attribute in ipa-pure-const pass

2017-05-23 Thread Prathamesh Kulkarni
On 19 May 2017 at 19:02, Jan Hubicka  wrote:
>>
>> * LTO and memory management
>> This is a general question about LTO and memory management.
>> IIUC the following sequence takes place during normal LTO:
>> LGEN: generate_summary, write_summary
>> WPA: read_summary, execute ipa passes, write_opt_summary
>>
>> So I assumed it was OK in LGEN to allocate return_callees_map in
>> generate_summary and free it in write_summary and during WPA, allocate
>> return_callees_map in read_summary and free it after execute (since
>> write_opt_summary does not require return_callees_map).
>>
>> However with fat LTO, it seems the sequence changes for LGEN with
>> execute phase takes place after write_summary. However since
>> return_callees_map is freed in pure_const_write_summary and
>> propagate_malloc() accesses it in execute stage, it results in
>> segmentation fault.
>>
>> To work around this, I am using the following hack in 
>> pure_const_write_summary:
>> // FIXME: Do not free if -ffat-lto-objects is enabled.
>> if (!global_options.x_flag_fat_lto_objects)
>>   free_return_callees_map ();
>> Is there a better approach for handling this ?
>
> I think most passes just do not free summaries with -flto.  We probably want
> to fix it to make it possible to compile multiple units i.e. from plugin by
> adding release_summaries method...
> So I would say it is OK to do the same as others do and leak it with -flto.
>> diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
>> index e457166ea39..724c26e03f6 100644
>> --- a/gcc/ipa-pure-const.c
>> +++ b/gcc/ipa-pure-const.c
>> @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "tree-scalar-evolution.h"
>>  #include "intl.h"
>>  #include "opts.h"
>> +#include "ssa.h"
>>
>>  /* Lattice values for const and pure functions.  Everything starts out
>> being const, then may drop to pure and then neither depending on
>> @@ -69,6 +70,15 @@ enum pure_const_state_e
>>
>>  const char *pure_const_names[3] = {"const", "pure", "neither"};
>>
>> +enum malloc_state_e
>> +{
>> +  PURE_CONST_MALLOC_TOP,
>> +  PURE_CONST_MALLOC,
>> +  PURE_CONST_MALLOC_BOTTOM
>> +};
>
> It took me a while to work out what PURE_CONST means here :)
> I would just call it something like STATE_MALLOC_TOP... or so.
> ipa_pure_const is outdated name from the time pass was doing only
> those two.
>> @@ -109,6 +121,10 @@ typedef struct funct_state_d * funct_state;
>>
>>  static vec funct_state_vec;
>>
>> +/* A map from node to subset of callees. The subset contains those callees
>> + * whose return-value is returned by the node. */
>> +static hash_map< cgraph_node *, vec* > *return_callees_map;
>> +
>
> Hehe, a special case of return jump function.  We ought to support those more 
> generally.
> How do you keep it up to date over callgraph changes?
>> @@ -921,6 +1055,23 @@ end:
>>if (TREE_NOTHROW (decl))
>>  l->can_throw = false;
>>
>> +  if (ipa)
>> +{
>> +  vec v = vNULL;
>> +  l->malloc_state = PURE_CONST_MALLOC_BOTTOM;
>> +  if (DECL_IS_MALLOC (decl))
>> + l->malloc_state = PURE_CONST_MALLOC;
>> +  else if (malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), v))
>> + {
>> +   l->malloc_state = PURE_CONST_MALLOC_TOP;
>> +   vec *callees_p = new vec (vNULL);
>> +   for (unsigned i = 0; i < v.length (); ++i)
>> + callees_p->safe_push (v[i]);
>> +   return_callees_map->put (fn, callees_p);
>> + }
>> +  v.release ();
>> +}
>> +
>
> I would do non-ipa variant, too.  I think most attributes can be detected 
> that way
> as well.
>
> The patch generally makes sense to me.  It would be nice to make it easier to 
> write such
> a basic propagators across callgraph (perhaps adding a template doing the 
> basic
> propagation logic). Also I think you need to solve the problem with keeping 
> your
> summaries up to date across callgraph node removal and duplications.
Thanks for the suggestions, I will try to address them in a follow-up patch.
IIUC, I would need to modify ipa-pure-const cgraph hooks -
add_new_function, remove_node_data, duplicate_node_data
to keep return_callees_map up-to-date across callgraph node insertions
and removal ?

Also, if instead of having a separate data-structure like return_callees_map,
should we rather have a flag within cgraph_edge, which marks that the
caller may return the value of the callee ?

Thanks,
Prathamesh
>
> Honza


Ping ** 0.8571 [patch, fortran] Handle MATMUL(TRANSPOSE(A),B) in inline matmul

2017-05-23 Thread Thomas Koenig

Hi,


after receiving no negative feedback on my RFC patch, I have deciced
to submit the patch.

The attached patch handles MATMUL(TRANSPOSE(A),B) in inlining matmul.
Speed is a bit faster than the library version.

Regression-tested.  OK for trunk?


Ping.


Re: PR80806

2017-05-23 Thread Prathamesh Kulkarni
On 22 May 2017 at 10:03, Jeff Law  wrote:
> On 05/18/2017 12:55 PM, Prathamesh Kulkarni wrote:
>> Hi,
>> The attached patch tries to fix PR80806 by warning when a variable is
>> set using memset (and friends) but not used. I chose to warn in dse
>> pass since dse would detect if the variable passed as 1st argument is
>> a dead store. Does this approach look OK ?
>>
>> There were following fallouts observed during bootstrap build:
>>
>> * double-int.c (div_and_round_double):
>> Warning emitted 'den' set but not used for following call to memset:
>> memset (den, 0, sizeof den);
>>
>> I assume the warning is correct since there's immediately call to:
>> encode (den, lden, hden);
>>
>> and encode overwrites all the contents of den.
>> Should the above call to memset be removed from the source ?
> Unsure.  Yes, it's dead, but from a readability standpoint should it
> stay?  I don't know.  This one isn't very clear.
>
>
>>
>> * tree-streamer.c (streamer_check_handled_ts_structures)
>> The function defines a local array bool handled_p[LAST_TS_ENUM];
>> and the warning is emitted for:
>> memset (&handled_p, 0, sizeof (handled_p));
>>
>> That's because the function then initializes each element of the array
>> handled_p to true
>> making the memset call redundant.
>> I am not sure if warning for the above case is a good idea ? The call
>> to memset() seems deliberate, to initialize all elements to 0, and
>> later assert checks if all the elements were explicitly set to true.
> This one looks like it should stay to me.  It's there for defensive
> purposes to catch cases if programming errors.
>
>
>>
>> * value-prof.c (free_hist):
>> Warns for the call to memset:
>>
>> static int
>> free_hist (void **slot, void *data ATTRIBUTE_UNUSED)
>> {
>>   histogram_value hist = *(histogram_value *) slot;
>>   free (hist->hvalue.counters);
>>   if (flag_checking)
>> memset (hist, 0xab, sizeof (*hist));
>>   free (hist);
>>   return 1;
>> }
>>
>> Assuming flag_checking was true, the call to memset would be dead
>> anyway since it would be immediately freed ? Um, I don't understand
>> the purpose of memset in the above function.
> It's been like that from the day the code was introduced (initially
> surrounded with an ENABLE_CHECKING.  Given the call to free, the memset
> is going to get removed.This one probably falls into the "should
> just be removed" bucket.
>
>
>>
>> * testsuite fallout
>> I verified regressing test-cases were not false positives and added
>> -Wno-unused-but-set-variable.
> I think the real question is do we want to warn here.   I can certainly
> see both sides of the issue.
Hi Jeff,
Thanks for the suggestions. I agree that warning for the above cases
in tree-streamer.c and double-int.c may not be ideal.
Should we perhaps only warn if there's a single use of the pointer in
the call to memset (and friends) like in the test-case
reported in PR ? But then I fear the warning may end up being a bit artificial.

Thanks,
Prathamesh
>
> jeff
>


remove call to memset in value-prof.c:free_hist()

2017-05-23 Thread Prathamesh Kulkarni
Hi Jeff,
As discussed in the other thread, this patch removes dead call to
memset in free_hist().
Bootstrap+tested on x86_64-unknown-linux-gnu.
Cross-tested on arm*-*-*, aarch64*-*-*.
OK for trunk ?

Thanks,
Prathamesh
2017-05-23  Prathamesh Kulkarni  

* value-prof.c (free_hist): Remove call to memset and the enclosing if
condition.

diff --git a/gcc/value-prof.c b/gcc/value-prof.c
index 1ce0fda0ed2..7366070d97a 100644
--- a/gcc/value-prof.c
+++ b/gcc/value-prof.c
@@ -564,8 +564,6 @@ free_hist (void **slot, void *data ATTRIBUTE_UNUSED)
 {
   histogram_value hist = *(histogram_value *) slot;
   free (hist->hvalue.counters);
-  if (flag_checking)
-memset (hist, 0xab, sizeof (*hist));
   free (hist);
   return 1;
 }


[ping * 2] PR78736: New C warning -Wenum-conversion

2017-05-23 Thread Prathamesh Kulkarni
Hi,
I would like to ping this patch for review:
https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00775.html

Thanks,
Prathamesh


Re: [PATCH] Introduce symtab_node::dump_{asm_,}name functions.

2017-05-23 Thread Martin Liška

On 05/19/2017 04:41 PM, Jan Hubicka wrote:

We used to have fixed size cyclic buffer for those strings, so it was safe
to invoke name/asm_name few times and still expect the result to be there.
I wonder what happened to those?  Using ggc is bit ugly here, but I suppose
it is OK for debug output...



Ok, I see, I'm going to install the patch and we'll see how that can
impact memory usage at the end of GCC 8 stage 4.

Martin


Re: [PATCH], PR target/80718, Improve PowerPC splat double word

2017-05-23 Thread Richard Sandiford
Michael Meissner  writes:
> When I was comparing spec 2006 numbers between GCC 6.3 and 7.1, there was one
> benchmark that was noticeably slower (milc).  In looking at the code 
> generated,
> the #1 hot function (mult_adj_su3_mat_vec) had some cases where automatic
> vectorization generated splat of double from memory.
>
> The register allocator did not use the load with splat instruction (LXVDSX)
> because all of the loads were register+offset.  For the scalar values that it
> could load into the FPR registers, it used the normal register+offset load
> (d-form).  For the other scalar values that would wind up in the traditional
> Altivec registers, the register allocator decided to load up the value into a
> GPR register and do a direct move.
>
> Now, it turns out that while the above code is inefficient, it is not a cause
> for slow down of the milc benchmark.  However there might be other places 
> where
> using a load, direct move, and double word permute are causing a performance
> problem, so I made this patch.
>
> The patch splits the splat into a register splat and a memory splat.  This
> forces the register allocator to convert the load to the indexed form which 
> the
> LXVDSX instruction uses.  I did a spec 2006 run with these changes, and there
> were no significant performance differences with this patch.
>
> In the mult_adj_su3_mat_vec function, there were previously 5 GPR loads, 
> direct
> move, and permute sequences along with one LXVDSK.  With this patch, those GPR
> loads have been replaced with LXVDSKs.

It sounds like that might create the opposite problem, in that if the RA
ends up having to spill the input operand of a register splat, it'll be
forced to keep the instruction as a register splat and load the spill
slot into a temporary register.

I thought the advice was normally to do what the port did before the
patch and present the register and memory versions as alternatives of a
single pattern.  If there's no current way of getting efficient code in
that case then maybe we need to tweak some of the costing infrastructure...

Thanks,
Richard


Re: [PATCH][www] Deprecate MPX for GCC 7

2017-05-23 Thread Jeff Law
On 05/23/2017 06:50 AM, Richard Biener wrote:
> 
> Appearantly we forgot to officially deprecate MPX in GCC 7 changes.html.
> 
> The following adds the missing note.
> 
> Ok?
> 
> Thanks,
> Richard.
> 
> 2017-05-23  Richard Biener  
> 
>   * gcc-7/changes.html: Deprecate MPX.
I don't recall deprecating MPX.  I do recall deprecating Cilk+.

There's certainly folks at Intel fixing MPX related issues for GCC.

jeff


Re: [PATCH] [Aarch64] Variable shift count truncation issues

2017-05-23 Thread Richard Sandiford
Wilco Dijkstra  writes:
> Richard Sandiford wrote:
>
>> Insn patterns shouldn't check can_create_pseudo_p, because there's no
>> guarantee that the associated split happens before RA.  In this case it
>> should be safe to reuse operand 0 after RA if you change it to:
>
> The goal is to only create and split this pattern before register allocation.
> It's a transient pattern, combine creates it, and it is split immediately 
> after.
>
> Maybe !reload_completed is clearer as that is what several other
> define_insn_and_split patterns do?

But the concept of a transient pattern doesn't really exist.  We shouldn't
rely for correctness on a split being applied before RA.

If all we want to do is match and split something that combine generates,
it would be better to do it as a pure define_split.

Thanks,
Richard


Re: SSA range class and removal of VR_ANTI_RANGEs

2017-05-23 Thread David Malcolm
On Tue, 2017-05-23 at 14:11 +0200, Jakub Jelinek wrote:
> On Tue, May 23, 2017 at 06:48:15AM -0400, Aldy Hernandez wrote:

[...]

> > --- /dev/null
> > +++ b/gcc/range.h
> > +
> > +You should have received a copy of the GNU General Public License
> > +along with GCC; see the file COPYING3.  If not see
> > +.  */
> > +
> > +#ifndef GCC_RANGE_H
> > +#define GCC_RANGE_H
> > +#define MAX_RANGES 6
> > +
> > +typedef class irange *irange_p;
> > +enum irange_type { RANGE_PLAIN, RANGE_INVERT };

The irange class needs a leading comment, which would go a long way to
documenting the patch as a whole, I think (i.e. I'm not just nit
-picking here :) )

> > +class GTY(()) irange
> > +{
> > + private:
> > +  bool overflow;
> > +  size_t n;
> > +  void prepend (wide_int x, wide_int y);
> > +  void append (wide_int x, wide_int y);
> > +  void remove (unsigned i, unsigned j);
> > +  void canonicalize ();
> > +  /* This is stupid.  These two should be private, but the GTY
> > + machinery can't look inside an irange.  */
> > + public:
> > +  tree type;
> > +  wide_int bounds[MAX_RANGES];
> > +
> > +public:
> ...
> > +  void Union (wide_int x, wide_int y);
> > +  bool Union (const irange &r);
> > +  bool Union (const irange &r1, const irange &r2);
> 
> Do we really want methods starting with capital letters?
> I understand why you can't use union, but I don't think we use
> CamelCase
> anywhere.

FWIW in the JIT, I have a class switch_ (i.e. with a trailing
underscore).  Maybe we should use trailing underscores for names that
clash with reserved words?

Dave


[PATCH] [i386] Recompute the frame layout less often

2017-05-23 Thread Bernd Edlinger
Hi,

this is the latest version of my patch.

As already said, it attempts to compute
the frame layout only when relevant data have
changed.

Apologies for doing more clean-up on Daniel's
patch than absolutely necessary, but ...

Bootstrap and reg-tested successfully on
x86_64-pc-linux-gnu with unix\{,-m32\}.
Is it OK for trunk?


Thanks
Bernd.
2017-05-23  Bernd Edlinger  

* config/i386/i386.c (x86_64_ms_sysv_extra_clobbered_registers): Make
static.
(xlogue_layout::get_stack_space_used, xlogue_layout::s_instances,
xlogue_layout::get_instance, logue_layout::xlogue_layout,
sp_valid_at, fp_valid_at, choose_basereg): Formatting.
(xlogue_layout::get_stub_rtx): Make static.
(xlogue_layout::get_stub_name): Avoid const-cast, make static.
(xlogue_layout::compute_stub_managed_regs): Rename to...
(xlogue_layout::count_stub_managed_regs): ...this.
(xlogue_layout::is_stub_managed_reg): New function.
(xlogue_layout::m_stub_names): Rename to...
(xlogue_layout::s_stub_names): ...this, make static.
(xlogue_layout::STUB_INDEX_OFFSET, xlogue_layout::MIN_REGS,
xlogue_layout::MAX_REGS, xlogue_layout::MAX_EXTRA_REGS,
xlogue_layout::VARIANT_COUNT, xlogue_layout::STUB_NAME_MAX_LEN,
xlogue_layout::s_stub_names): Instantiate statics.
(stub_managed_regs): Remove.
(ix86_save_reg): Use xlogue_layout::compute_stub_managed_regs.
(disable_call_ms2sysv_xlogues): Rename to...
(warn_once_call_ms2sysv_xlogues): ...this, and warn only once.
(ix86_initial_elimination_offset, ix86_expand_call): Fix call_ms2sysv
warning logic.
(ix86_static_chain): Make sure that ix86_static_chain_on_stack can't
change after reload_completed.
(ix86_can_use_return_insn_p): Use the ix86_frame data structure
directly.
(ix86_expand_prologue): Likewise.
(ix86_expand_epilogue): Likewise.
(ix86_expand_split_stack_prologue): Likewise.
(ix86_compute_frame_layout): Remove frame parameter ...
(TARGET_COMPUTE_FRAME_LAYOUT): ... and export it as a target hook.
(ix86_finalize_stack_realign_flags): Call ix86_compute_frame_layout
only if necessary.
(ix86_init_machine_status): Don't set use_fast_prologue_epilogue_nregs.
(ix86_frame): Move from here ...
* config/i386/i386.h (ix86_frame): ... to here.
(machine_function): Remove use_fast_prologue_epilogue_nregs, cache the
complete ix86_frame data structure instead.  Remove some_ld_name.
Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c	(revision 248341)
+++ gcc/config/i386/i386.c	(working copy)
@@ -2425,7 +2425,9 @@ static int const x86_64_int_return_registers[4] =
 
 /* Additional registers that are clobbered by SYSV calls.  */
 
-unsigned const x86_64_ms_sysv_extra_clobbered_registers[12] =
+#define NUM_X86_64_MS_CLOBBERED_REGS 12
+static int const x86_64_ms_sysv_extra_clobbered_registers
+		 [NUM_X86_64_MS_CLOBBERED_REGS] =
 {
   SI_REG, DI_REG,
   XMM6_REG, XMM7_REG,
@@ -2472,25 +2474,26 @@ class xlogue_layout {
 return m_regs[reg];
   }
 
-  const char *get_stub_name (enum xlogue_stub stub,
-			 unsigned n_extra_args) const;
+  static const char *get_stub_name (enum xlogue_stub stub,
+unsigned n_extra_args);
+
   /* Returns an rtx for the stub's symbol based upon
1.) the specified stub (save, restore or restore_ret) and
2.) the value of cfun->machine->call_ms2sysv_extra_regs and
3.) rather or not stack alignment is being performed.  */
-  rtx get_stub_rtx (enum xlogue_stub stub) const;
+  static rtx get_stub_rtx (enum xlogue_stub stub);
 
   /* Returns the amount of stack space (including padding) that the stub
  needs to store registers based upon data in the machine_function.  */
   HOST_WIDE_INT get_stack_space_used () const
   {
-const struct machine_function &m = *cfun->machine;
-unsigned last_reg = m.call_ms2sysv_extra_regs + MIN_REGS - 1;
+const struct machine_function *m = cfun->machine;
+unsigned last_reg = m->call_ms2sysv_extra_regs + MIN_REGS - 1;
 
-gcc_assert (m.call_ms2sysv_extra_regs <= MAX_EXTRA_REGS);
+gcc_assert (m->call_ms2sysv_extra_regs <= MAX_EXTRA_REGS);
 return m_regs[last_reg].offset
-	+ (m.call_ms2sysv_pad_out ? 8 : 0)
-	+ STUB_INDEX_OFFSET;
+	   + (m->call_ms2sysv_pad_out ? 8 : 0)
+	   + STUB_INDEX_OFFSET;
   }
 
   /* Returns the offset for the base pointer used by the stub.  */
@@ -2500,7 +2503,8 @@ class xlogue_layout {
   }
 
   static const struct xlogue_layout &get_instance ();
-  static unsigned compute_stub_managed_regs (HARD_REG_SET &stub_managed_regs);
+  static unsigned count_stub_managed_regs ();
+  static bool is_stub_managed_reg (unsigned regno, unsigned count);
 
   static const HOST_WIDE_INT STUB_INDEX_OFFSET = 0x70;
   static const u

Re: SSA range class and removal of VR_ANTI_RANGEs

2017-05-23 Thread Jakub Jelinek
On Tue, May 23, 2017 at 10:29:58AM -0400, David Malcolm wrote:
> > Do we really want methods starting with capital letters?
> > I understand why you can't use union, but I don't think we use
> > CamelCase
> > anywhere.
> 
> FWIW in the JIT, I have a class switch_ (i.e. with a trailing
> underscore).  Maybe we should use trailing underscores for names that
> clash with reserved words?

union_ and not_ is just fine with me.

Jakub


Re: SSA range class and removal of VR_ANTI_RANGEs

2017-05-23 Thread Andrew MacLeod

On 05/23/2017 08:11 AM, Jakub Jelinek wrote:

On Tue, May 23, 2017 at 06:48:15AM -0400, Aldy Hernandez wrote:

--- a/gcc/tree-ssanames.h
+++ b/gcc/tree-ssanames.h
@@ -45,14 +45,12 @@ struct GTY(()) ptr_info_def
unsigned int misalign;
  };
  
-/* Value range information for SSA_NAMEs representing non-pointer variables.  */

-
-struct GTY ((variable_size)) range_info_def {
-  /* Minimum, maximum and nonzero bits.  */
-  TRAILING_WIDE_INT_ACCESSOR (min, ints, 0)
-  TRAILING_WIDE_INT_ACCESSOR (max, ints, 1)
-  TRAILING_WIDE_INT_ACCESSOR (nonzero_bits, ints, 2)
-  trailing_wide_ints <3> ints;
+/* Used bits information for SSA_NAMEs representing non-pointer variables.  */
+
+struct GTY ((variable_size)) nonzero_bits_def {
+  /* Mask representing which bits are known to be used in an integer.  */
+  TRAILING_WIDE_INT_ACCESSOR (nonzero_bits, ints, 0)
+  trailing_wide_ints <1> ints;
  };
  
  
--- a/gcc/tree-core.h

+++ b/gcc/tree-core.h
@@ -1416,11 +1413,15 @@ struct GTY(()) tree_ssa_name {
union ssa_name_info_type {
  /* Pointer attributes used for alias analysis.  */
  struct GTY ((tag ("0"))) ptr_info_def *ptr_info;
-/* Value range attributes used for zero/sign extension elimination.  */
-struct GTY ((tag ("1"))) range_info_def *range_info;
+/* Value range attributes.  */
+class GTY ((tag ("1"))) irange *range_info;
} GTY ((desc ("%1.typed.type ?" \
"!POINTER_TYPE_P (TREE_TYPE ((tree)&%1)) : 2"))) info;
  
+  /* For non-pointer types, this specifies a mask for the bits that

+ are known to be set.  */
+  struct nonzero_bits_def *nonzero_bits;
+
/* Immediate uses list for this SSA_NAME.  */
struct ssa_use_operand_t imm_uses;
  };

I'm worried a lot here about compile time memory usage increase, especially
with EVRP and IPA-VRP and even more so with LTO.
The changes mean that for every SSA_NAME of any kind we now need 8 more
bytes, and for every SSA_NAME that has range info (typically both range info
and nonzero_bits; in the old implementation the two were tied together for a
good reason, many ranges also imply some non-zero bits and from non-zero
bits one can in many cases derive a range) much more than that (through


As follow on work we have discussed an interface which would be able to 
calculate a bitmask (for either zeros or ones) from a range and vice 
versa.. Meaning we wouldn't to need to store both unless the ssa_name is 
used in such a way that it generates both.  ie.  when we create a range 
from a bit operation, we simply store the bit version and not a range, 
otherwise we create an irange but not a bitrange. When you query and ask 
the ssa_name for an irange,  if there is no irange and there is a bit 
pattern, we can generate the irange from that on demand.Likewise, if 
there is an irange and no bit pattern, we can generate any known on or 
off bits from the irange. The only time there would be both would be 
when we can somehow find some real benefit from having both.. I doubt 
that would be very common.


I think aldy simply copied the bitrange stuff wholesale in a separate 
array just to get it working..  converting between the two was follow on 
work to make things more efficient once it was fundamentally working.




  
the nonzero_bits_def you get all the overhead of trailing_wide_ints -

the 3 fields it has, just save on the actual 2 lengths and 2 HWI sets,
but then you add 3x 8 bytes and 6x size of the whole wide_int which is
from what I understood not really meant as the memory storage of wide ints
in data structures, but as something not memory efficient you can work
quickly on (so ideal for automatic variables in functions that work with
those).  So it is a 8 byte growth for SSA_NAMEs without ranges and
8+3*8+6*32-2*4-2*8*x growth for SSA_NAMEs with ranges if x is the number
of HWIs needed to represent the integers of that type (so most commonly
x=1).  For x=1 8+3*8+6*32-2*4-2*8*x is 200 bytes growth.
With say 1000 SSA_NAMEs, 500 of them with ranges, that will be
already a 1GB difference, dunno how many SSA_NAMEs are there e.g. in firefox
LTO build.
why don't we actually try measuring it and see if it is noticeable? and 
if it is, where the issues really are. You really think we have relevant 
range info  for 50% of the ssa_names in a program?  Id be surprised.  
Again, thats something we should probably measure and see whats 
typical.  the range table should only be used when we can refine the 
global range to something other than range_for_type().


We can also change the representation of the range to be 2 ranges like 
it is today..  we're just defaulting to 3 because it seems like it may 
carry more useful information some times.   The long range plan is that 
this can be tweaked at runtime to only use as much as we need or ant to 
represent a range..  allowing an increase in the range precision for 
specific optimizations that might care, and decreasing it the rest of 
the time.  None of the external API e

Re: SSA range class and removal of VR_ANTI_RANGEs

2017-05-23 Thread Andrew MacLeod

On 05/23/2017 10:34 AM, Jakub Jelinek wrote:

On Tue, May 23, 2017 at 10:29:58AM -0400, David Malcolm wrote:

Do we really want methods starting with capital letters?
I understand why you can't use union, but I don't think we use
CamelCase
anywhere.

FWIW in the JIT, I have a class switch_ (i.e. with a trailing
underscore).  Maybe we should use trailing underscores for names that
clash with reserved words?

union_ and not_ is just fine with me.

Jakub


I was also going to ask if we had any rules regarding using a trailing _ 
for the union, intersect and not routines :-)  I'm fine with that as 
well   I'd do intersect_ as wel for some consistency



Andrew



Re: [PATCH GCC]Check tieable TRUNCATE only if inner rtx is register

2017-05-23 Thread Richard Sandiford
Bin Cheng  writes:
> Hi,
> Revision 247881 possibly exposes bug in RTL or ARM backend, while the
> change itself
> may be incomplete too.  Given a TRUNCATE rtx with complicated sub-rtx,
> backend may
> want to know the complicated sub-rtx happens in context of TRUNCATE in
> order to give
> different costs.  This patch adds restriction only checking tieable
> TRUNCATE if its operand
> is register.  This is enough for middle-end, it builds up (truncate:SI
> (reg:DI)) in order to setup
> conversion cost for GIMPLE optimizations.  On the other hand, targets
> like i386/arm don't
> handle register truncation at the moment.
>
> Note, this patch minimize the impact of cost change and workaround
> PR80754 for now.
> So far the effect of change in r247881 is to allow smulsi3_highpart_v6
> pattern to be used
> rather than smullsidi, which means it actually gives RA more free in
> choosing registers.
> If it's because of the cost change that smulsi3_highpart_v6 is now
> preferred, it could be
> suggesting that r247881 is good.  Anyway, it's just my guess before
> looking into rtl passes.
>
> Bootstrap and test ongoing.  Is it OK?
>
> 2017-05-13  Bin Cheng  
>
>   * rtlanal.c (rtx_cost): Check tieable TRUNCATE only if inner rtx
>   is register.
>
> diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
> index d9f57c3..b211efb 100644
> --- a/gcc/rtlanal.c
> +++ b/gcc/rtlanal.c
> @@ -4165,7 +4165,10 @@ rtx_cost (rtx x, machine_mode mode, enum rtx_code 
> outer_code,
>break;
>  
>  case TRUNCATE:
> -  if (MODES_TIEABLE_P (mode, GET_MODE (XEXP (x, 0
> +  /* If inner rtx isn't register, fall through and call target hook.
> +  Backend may want to know that sub-rtx is in truncate.  */
> +  if (REG_P (XEXP (x, 0))
> +   && MODES_TIEABLE_P (mode, GET_MODE (XEXP (x, 0
>   {
> total = 0;
> break;

Even for !REG_P (i.e. even when you still call the hook), 0 seems like
a better default to pass than COSTS_N_INSNS (1).  Would it be worth
guarding just the break with REG_P?

Thanks,
Richard


Re: SSA range class and removal of VR_ANTI_RANGEs

2017-05-23 Thread Jakub Jelinek
On Tue, May 23, 2017 at 10:38:44AM -0400, Andrew MacLeod wrote:
> As follow on work we have discussed an interface which would be able to
> calculate a bitmask (for either zeros or ones) from a range and vice versa..

Sometimes the range vs. nonzero_bits info is redundant, you can compute
one from the other or vice versa without any information loss, but in many
cases it is not redundant, you can have e.g. known zero least significant or
some middle bits, or the range boundaries are not powers of two or powers of
two - 1.  The point is that with the coexistence of both it can be gradually
improved, first you e.g. find some range, then CCP can use the corresponding
nonzero_bits knowledge from that range in bitwise mask propagation, then you
notice some other bits are known to be zero, perhaps from that adjust the
value range again if possible, ...

> why don't we actually try measuring it and see if it is noticeable? and if
> it is, where the issues really are. You really think we have relevant range
> info  for 50% of the ssa_names in a program?  Id be surprised.  Again, thats
> something we should probably measure and see whats typical.  the range table
> should only be used when we can refine the global range to something other
> than range_for_type().

I'm not used to build firefox and various other large C++ projects with LTO
regularly, so it would be harder for me to get that stuff working and
measure it, but I think e.g. Honza or Martin L. do that often and could
provide hints on what is needed to test that, then we can have exact
numbers.

> We can also change the representation of the range to be 2 ranges like it is

Well, the current representation is just 1 range (2 numbers) + 1 extra
number for the nonzero_bits bitmask.

> More importantly, when the on-demand range work comes online it will end the
> proliferation of SSA_NAMES which carry ASSERT_EXPR information.  We won't
> see the trail of new  ssa_names the ASSET_EXPR chains create. I suspect that
> will eliminate storing global ranges and bits for most SSA names, which is
> likely to make this class a win regardless.

Are you sure it is desirable to recompute it in all cases, rather than just
use what you have recorded and recompute what you don't have computed yet?
E.g. various ranges come from if (cond) __builtin_unreachable (); style
asserts, we do want to drop them at some point and the only spot to keep
that info afterwards is SSA_NAME range info.

> Im not familiar with the details of how wide_int and host vs target integers
> really works. I don't think Aldy is either.  If there is a more efficient
> way to store an integer fr this use case then by all means we can do that.
> to get things working we just used what was easily available.   if there is
> a more efficient way to represent it, perhaps it makes sense to create a
> class for a"memory_efficient_wide_int" and allow it to convert back and
> forth from a wide_int as needed?

You want to talk to Richard Sandiford mainly here I guess.  There are many
flavors of wide ints, for different purposes.

Jakub


Re: SSA range class and removal of VR_ANTI_RANGEs

2017-05-23 Thread Richard Sandiford
Andrew MacLeod  writes:
> On 05/23/2017 08:11 AM, Jakub Jelinek wrote:
>> On Tue, May 23, 2017 at 06:48:15AM -0400, Aldy Hernandez wrote:
>>> --- a/gcc/tree-ssanames.h
>>> +++ b/gcc/tree-ssanames.h
>>> @@ -45,14 +45,12 @@ struct GTY(()) ptr_info_def
>>> unsigned int misalign;
>>>   };
>>>   
>>> -/* Value range information for SSA_NAMEs representing non-pointer 
>>> variables.  */
>>> -
>>> -struct GTY ((variable_size)) range_info_def {
>>> -  /* Minimum, maximum and nonzero bits.  */
>>> -  TRAILING_WIDE_INT_ACCESSOR (min, ints, 0)
>>> -  TRAILING_WIDE_INT_ACCESSOR (max, ints, 1)
>>> -  TRAILING_WIDE_INT_ACCESSOR (nonzero_bits, ints, 2)
>>> -  trailing_wide_ints <3> ints;
>>> +/* Used bits information for SSA_NAMEs representing non-pointer variables. 
>>>  */
>>> +
>>> +struct GTY ((variable_size)) nonzero_bits_def {
>>> +  /* Mask representing which bits are known to be used in an integer.  */
>>> +  TRAILING_WIDE_INT_ACCESSOR (nonzero_bits, ints, 0)
>>> +  trailing_wide_ints <1> ints;
>>>   };
>>>   
>>>   
>>> --- a/gcc/tree-core.h
>>> +++ b/gcc/tree-core.h
>>> @@ -1416,11 +1413,15 @@ struct GTY(()) tree_ssa_name {
>>> union ssa_name_info_type {
>>>   /* Pointer attributes used for alias analysis.  */
>>>   struct GTY ((tag ("0"))) ptr_info_def *ptr_info;
>>> -/* Value range attributes used for zero/sign extension elimination.  */
>>> -struct GTY ((tag ("1"))) range_info_def *range_info;
>>> +/* Value range attributes.  */
>>> +class GTY ((tag ("1"))) irange *range_info;
>>> } GTY ((desc ("%1.typed.type ?" \
>>> "!POINTER_TYPE_P (TREE_TYPE ((tree)&%1)) : 2"))) info;
>>>   
>>> +  /* For non-pointer types, this specifies a mask for the bits that
>>> + are known to be set.  */
>>> +  struct nonzero_bits_def *nonzero_bits;
>>> +
>>> /* Immediate uses list for this SSA_NAME.  */
>>> struct ssa_use_operand_t imm_uses;
>>>   };
>> I'm worried a lot here about compile time memory usage increase, especially
>> with EVRP and IPA-VRP and even more so with LTO.
>> The changes mean that for every SSA_NAME of any kind we now need 8 more
>> bytes, and for every SSA_NAME that has range info (typically both range info
>> and nonzero_bits; in the old implementation the two were tied together for a
>> good reason, many ranges also imply some non-zero bits and from non-zero
>> bits one can in many cases derive a range) much more than that (through
>
> As follow on work we have discussed an interface which would be able to 
> calculate a bitmask (for either zeros or ones) from a range and vice 
> versa.. Meaning we wouldn't to need to store both unless the ssa_name is 
> used in such a way that it generates both.  ie.  when we create a range 
> from a bit operation, we simply store the bit version and not a range, 
> otherwise we create an irange but not a bitrange. When you query and ask 
> the ssa_name for an irange,  if there is no irange and there is a bit 
> pattern, we can generate the irange from that on demand.Likewise, if 
> there is an irange and no bit pattern, we can generate any known on or 
> off bits from the irange. The only time there would be both would be 
> when we can somehow find some real benefit from having both.. I doubt 
> that would be very common.
>
> I think aldy simply copied the bitrange stuff wholesale in a separate 
> array just to get it working..  converting between the two was follow on 
> work to make things more efficient once it was fundamentally working.
>
>
>
>>   
>> the nonzero_bits_def you get all the overhead of trailing_wide_ints -
>> the 3 fields it has, just save on the actual 2 lengths and 2 HWI sets,
>> but then you add 3x 8 bytes and 6x size of the whole wide_int which is
>> from what I understood not really meant as the memory storage of wide ints
>> in data structures, but as something not memory efficient you can work
>> quickly on (so ideal for automatic variables in functions that work with
>> those).  So it is a 8 byte growth for SSA_NAMEs without ranges and
>> 8+3*8+6*32-2*4-2*8*x growth for SSA_NAMEs with ranges if x is the number
>> of HWIs needed to represent the integers of that type (so most commonly
>> x=1).  For x=1 8+3*8+6*32-2*4-2*8*x is 200 bytes growth.
>> With say 1000 SSA_NAMEs, 500 of them with ranges, that will be
>> already a 1GB difference, dunno how many SSA_NAMEs are there e.g. in firefox
>> LTO build.
> why don't we actually try measuring it and see if it is noticeable? and 
> if it is, where the issues really are. You really think we have relevant 
> range info  for 50% of the ssa_names in a program?  Id be surprised.  
> Again, thats something we should probably measure and see whats 
> typical.  the range table should only be used when we can refine the 
> global range to something other than range_for_type().
>
> We can also change the representation of the range to be 2 ranges like 
> it is today..  we're just defaulting to 3 because it seems like it may 
> carry more u

Re: [RFC, testsuite] Add dg-save-linenr

2017-05-23 Thread Mike Stump
On May 22, 2017, at 9:55 AM, Tom de Vries  wrote:
> 
> Attached patch adds the missing documentation.
> 
> It looks like this in gccint.info:
> ...
> '{ dg-line LINENUMVAR }'
> This DejaGnu directive sets the variable LINENUMVAR to the line
> number of the source line.  The variable LINENUMVAR can then be
> used in subsequent 'dg-error', 'dg-warning', 'dg-message' and
> 'dg-bogus' directives.  For example:
> 
>  int a;   /* { dg-line first_def_a } */
>  float a; /* { dg-error "conflicting types of" } */
>  /* { dg-message "previous declaration of" "" { target *-*-* } 
> first_def_a } */
> ...
> 
> Note: AFAIK, dg-line does not work in the gnat testsuite. This is similar to 
> PR80219 for relative line numbers. I'm not sure if we should mention this 
> here, or how.
> 
> OK for trunk like this?

Ok.

Re: SSA range class and removal of VR_ANTI_RANGEs

2017-05-23 Thread Andrew MacLeod

On 05/23/2017 10:54 AM, Jakub Jelinek wrote:

On Tue, May 23, 2017 at 10:38:44AM -0400, Andrew MacLeod wrote:

As follow on work we have discussed an interface which would be able to
calculate a bitmask (for either zeros or ones) from a range and vice versa..

Sometimes the range vs. nonzero_bits info is redundant, you can compute
one from the other or vice versa without any information loss, but in many
cases it is not redundant, you can have e.g. known zero least significant or
some middle bits, or the range boundaries are not powers of two or powers of
two - 1.  The point is that with the coexistence of both it can be gradually
improved, first you e.g. find some range, then CCP can use the corresponding
nonzero_bits knowledge from that range in bitwise mask propagation, then you
notice some other bits are known to be zero, perhaps from that adjust the
value range again if possible, ...


Right, but we do only need to store both for those cases which we are 
actually refining the info.  and via a central manager of the 
information it can know when that is a thing to do and do it.  I also 
hope to ad some of the bit inormation ot the on demand system.. tat'll 
come later tho.I simply maintain that the vast majority of ssa_names 
are unlikely to need both, not that we never need both.   This seems 
like statistics that shouldn't be too hard to find with some simple 
looking at the data at the end of compilation or various passes.





why don't we actually try measuring it and see if it is noticeable? and if
it is, where the issues really are. You really think we have relevant range
info  for 50% of the ssa_names in a program?  Id be surprised.  Again, thats
something we should probably measure and see whats typical.  the range table
should only be used when we can refine the global range to something other
than range_for_type().

I'm not used to build firefox and various other large C++ projects with LTO
regularly, so it would be harder for me to get that stuff working and
measure it, but I think e.g. Honza or Martin L. do that often and could
provide hints on what is needed to test that, then we can have exact
numbers.
that seems reasonable.  I think we ought to look at the low hanging 
fruit for making it more efficient to start with before hassling them or 
that.. unless it is trivial.   It would be useful t have input from 
"heavy" application users if there is any/much impact.

We can also change the representation of the range to be 2 ranges like it is

Well, the current representation is just 1 range (2 numbers) + 1 extra
number for the nonzero_bits bitmask.


oh yeah,right but anti range is evil. Its worth losing a little bit 
of memory to get rid of that, imo  :-)





More importantly, when the on-demand range work comes online it will end the
proliferation of SSA_NAMES which carry ASSERT_EXPR information.  We won't
see the trail of new  ssa_names the ASSET_EXPR chains create. I suspect that
will eliminate storing global ranges and bits for most SSA names, which is
likely to make this class a win regardless.

Are you sure it is desirable to recompute it in all cases, rather than just
use what you have recorded and recompute what you don't have computed yet?
E.g. various ranges come from if (cond) __builtin_unreachable (); style
asserts, we do want to drop them at some point and the only spot to keep
that info afterwards is SSA_NAME range info.


I think it will be desirable in most cases.  The on-demand system does 
*not* use assert_exprs.   It will make them mostly obsolete, and the 
goal is to eventually eliminate them from the IL completely.


Most assert_exprs exist to provide ranges on sub-branches of the CFG.  
These we are likely to be able to simply replace with the on-demand 
mechanism.  I suspect we'll find some cases where we find its more 
useful to have a new global ssa_name which carries information,  but I 
expect the situation which requires that to be fairly rare.


There will be a cycle where I have to identify cases we missing and 
catch those... or introduce a new ssa_name to handle the situation until 
such time that it can be addressed. I hope to get thru most of that in 
this stage 1

Im not familiar with the details of how wide_int and host vs target integers
really works. I don't think Aldy is either.  If there is a more efficient
way to store an integer fr this use case then by all means we can do that.
to get things working we just used what was easily available.   if there is
a more efficient way to represent it, perhaps it makes sense to create a
class for a"memory_efficient_wide_int" and allow it to convert back and
forth from a wide_int as needed?

You want to talk to Richard Sandiford mainly here I guess.  There are many
flavors of wide ints, for different purposes.

I did not know that :-)   Aldy, maybe you should have a chat :-)

Andrew


Make the OpenACC C++ acc_on_device wrapper "always inline" (was: [openacc] on_device fix)

2017-05-23 Thread Thomas Schwinge
Hi!

On Thu, 29 Oct 2015 17:22:46 -0700, Nathan Sidwell  wrote:
> acc_on_device and it's builtin had a conflict.  The function formally takes 
> an 
> enum argument, but the builtin takes an int -- primarily to avoid the 
> compiler 
> having to generate the enum  type internally.
> 
> This works fine for C,  where the external declaration of the function (in 
> openacc.h) matches up with the builtin, and we optimize the builtin as 
> expected.
> 
> It fails for C++ where the builtin doesn't match the declaration in the 
> header. 
>   We end up with emitting a call to acc_on_device,  which is resolved by 
> libgomp.  Unfortunately that means we fail to optimize.  [...]

> [Nathan's trunk r229562] leaves things unchanged for C --  declare a function 
> with an enum arg. 
>   But for C++ we the extern "C" declaration takes an int -- and therefore 
> matches the builtin.  We insert an inline wrapper that takes an enum 
> argument. 
> Because of C++'s overload resolution both the wrapper and the int-taking 
> declaration can have the same source name.

> --- libgomp/openacc.h (revision 229535)
> +++ libgomp/openacc.h (working copy)

> -int acc_on_device (acc_device_t) __GOACC_NOTHROW;
> +#ifdef __cplusplus
> +int acc_on_device (int __arg) __GOACC_NOTHROW;
> +#else
> +int acc_on_device (acc_device_t __arg) __GOACC_NOTHROW;
> +#endif

>  #ifdef __cplusplus
>  }
> +
> +/* Forwarding function with correctly typed arg.  */
> +
> +inline int acc_on_device (acc_device_t __arg) __GOACC_NOTHROW
> +{
> +  return acc_on_device ((int) __arg);
> +}
>  #endif

> --- libgomp/testsuite/libgomp.oacc-c-c++-common/acc-on-device.c   
> (revision 0)
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/acc-on-device.c   
> (working copy)
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O2" } */
> +
> +#include 
> +
> +int Foo (acc_device_t x)
> +{
> +  return acc_on_device (x);
> +}
> +
> +/* { dg-final { scan-assembler-not "acc_on_device" } } */

As a user, I'd expect that when compiling such code with "-O0" instead of
"-O2", but adding "__attribute__ ((optimize ("O2")))" to "Foo", that I'd
then get "acc_on_device" expanded as a builtin, and no calls to the
"acc_on_device library function.  In C++ that is currently not working,
because the "Forwarding function with correctly typed arg" (cited above)
doesn't "inherit" that "optimize" attribute.  Making that one "always
inline" resolves the problem.  Also I cleaned up and extended testing
some more.  OK for trunk?

commit 9cc3a384c17e9f692f7864c604d2e2f9fbf0bac9
Author: Thomas Schwinge 
Date:   Tue May 23 13:21:14 2017 +0200

Make the OpenACC C++ acc_on_device wrapper "always inline"

libgomp/
* openacc.h [__cplusplus] (acc_on_device): Mark as "always
inline".
* testsuite/libgomp.oacc-c-c++-common/acc-on-device-2.c: Remove
file; test cases already present...
* testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c: ... in
this file.  Update.
* testsuite/libgomp.oacc-c-c++-common/acc-on-device.c: Remove
file; test cases now present...
* testsuite/libgomp.oacc-c-c++-common/acc_on_device-2.c: ... in
this new file.
* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Update.
---
 libgomp/openacc.h  |  3 +-
 .../libgomp.oacc-c-c++-common/acc-on-device-2.c| 22 -
 .../libgomp.oacc-c-c++-common/acc-on-device.c  | 12 ---
 .../libgomp.oacc-c-c++-common/acc_on_device-1.c| 38 +-
 .../libgomp.oacc-c-c++-common/acc_on_device-2.c| 21 
 .../libgomp.oacc-c-c++-common/parallel-dims.c  | 14 
 6 files changed, 52 insertions(+), 58 deletions(-)

diff --git libgomp/openacc.h libgomp/openacc.h
index 137e2c1..266f559 100644
--- libgomp/openacc.h
+++ libgomp/openacc.h
@@ -121,7 +121,8 @@ int acc_set_cuda_stream (int, void *) __GOACC_NOTHROW;
 /* Forwarding function with correctly typed arg.  */
 
 #pragma acc routine seq
-inline int acc_on_device (acc_device_t __arg) __GOACC_NOTHROW
+inline __attribute__ ((__always_inline__)) int
+acc_on_device (acc_device_t __arg) __GOACC_NOTHROW
 {
   return acc_on_device ((int) __arg);
 }
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/acc-on-device-2.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/acc-on-device-2.c
deleted file mode 100644
index bfcb67d..000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/acc-on-device-2.c
+++ /dev/null
@@ -1,22 +0,0 @@
-/* Test the acc_on_device library function. */
-/* { dg-additional-options "-fno-builtin-acc_on_device" } */
-
-#include 
-
-int main ()
-{
-  int dev;
-  
-#pragma acc parallel copyout (dev)
-  {
-dev = acc_on_device (acc_device_not_host);
-  }
-
-  int expect = 1;
-  
-#if  ACC_DEVICE_TYPE_host
-  expect = 0;
-#endif
-  
-  return dev != expect;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/acc-on-devi

Re: [PATCH] Match x86 family machine constraints section with constarints.md

2017-05-23 Thread Sandra Loosemore

On 04/28/2017 03:30 AM, Peryt, Sebastian wrote:

Hi,

Thank you for your comments. I edited my patch accordingly. As for some of your 
doubts:
- REX is  the opcode prefix to access 64-bit register extensions introduced in 
IA-32e mode.
- EVEX is the encoding prefix which applies to SIMD operating instructions 
operating on XMM, YMM and ZMM registers. It was introduced with AVX-512 
instructions.
- "number factor of four" that means that sources start in a multiple of 4 
boundary. This is used for some of instructions.

Also I'd like to add that this whole patch is strictly based on docstring parts 
of constraints that are present in config/i386/constraints.md but not in 
documentation (md.texi file). There is no new (new as in nonexistent in code) 
content.

I'm also adding Kirill Yukhin to CC, because I believe he is the correct person 
that can catch any technical errors if any has slipped-in.


The grammar/markup/etc are OK now, but I can't comment on technical 
correctness of the information.


-Sandra



Re: OpenACC 2.5 kernels construct: num_gangs, num_workers, vector_length clauses

2017-05-23 Thread Thomas Schwinge
Hi!

On Tue, 23 May 2017 11:48:09 +0200, Jakub Jelinek  wrote:
> On Thu, May 11, 2017 at 02:26:51PM +0200, Thomas Schwinge wrote:
> > OpenACC 2.5 kernels construct: num_gangs, num_workers, vector_length 
> > clauses

> Ok, thanks.

Thanks.  As posted, committed to trunk in r248370:

commit 9d5c2cca06bf15e6eff22c3a4f6e1cf0072645e5
Author: tschwinge 
Date:   Tue May 23 15:47:32 2017 +

OpenACC 2.5 kernels construct: num_gangs, num_workers, vector_length clauses

gcc/c/
* c-parser.c (OACC_KERNELS_CLAUSE_MASK): Add
"PRAGMA_OACC_CLAUSE_NUM_GANGS", "PRAGMA_OACC_CLAUSE_NUM_WORKERS",
"VECTOR_LENGTH".
gcc/cp/
* parser.c (OACC_KERNELS_CLAUSE_MASK): Add
"PRAGMA_OACC_CLAUSE_NUM_GANGS", "PRAGMA_OACC_CLAUSE_NUM_WORKERS",
"VECTOR_LENGTH".
gcc/fortran/
* openmp.c (OACC_KERNELS_CLAUSES): Add "OMP_CLAUSE_NUM_GANGS",
"OMP_CLAUSE_NUM_WORKERS", "OMP_CLAUSE_VECTOR_LENGTH".
gcc/
* omp-offload.c (execute_oacc_device_lower): Remove the
parallelism dimensions function attributes for unparallelized
OpenACC kernels constructs.
gcc/testsuite/
* c-c++-common/goacc/parallel-dims-1.c: Update.
* c-c++-common/goacc/parallel-dims-2.c: Likewise.
* c-c++-common/goacc/routine-1.c: Likewise.
* c-c++-common/goacc/uninit-dim-clause.c: Likewise.
* g++.dg/goacc/template.C: Likewise.
* gfortran.dg/goacc/kernels-tree.f95: Likewise.
* gfortran.dg/goacc/routine-3.f90: Likewise.
* gfortran.dg/goacc/sie.f95: Likewise.
* gfortran.dg/goacc/uninit-dim-clause.f95: Likewise.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c: Update.
* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
* testsuite/libgomp.oacc-fortran/kernels-loop-2.f95: Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@248370 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog  |   6 +
 gcc/c/ChangeLog|   6 +
 gcc/c/c-parser.c   |   3 +
 gcc/cp/ChangeLog   |   6 +
 gcc/cp/parser.c|   3 +
 gcc/fortran/ChangeLog  |   5 +
 gcc/fortran/openmp.c   |   3 +-
 gcc/omp-offload.c  |   9 ++
 gcc/testsuite/ChangeLog|  12 ++
 gcc/testsuite/c-c++-common/goacc/parallel-dims-1.c |   3 +
 gcc/testsuite/c-c++-common/goacc/parallel-dims-2.c | 152 +++--
 gcc/testsuite/c-c++-common/goacc/routine-1.c   |   7 +
 .../c-c++-common/goacc/uninit-dim-clause.c |  20 ++-
 gcc/testsuite/g++.dg/goacc/template.C  |   4 +
 gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95   |   6 +-
 gcc/testsuite/gfortran.dg/goacc/routine-3.f90  |   6 +
 gcc/testsuite/gfortran.dg/goacc/sie.f95|  86 +++-
 .../gfortran.dg/goacc/uninit-dim-clause.f95|  18 ++-
 libgomp/ChangeLog  |   4 +
 .../libgomp.oacc-c-c++-common/kernels-loop-2.c |  21 ++-
 .../libgomp.oacc-c-c++-common/parallel-dims.c  |  35 +
 .../libgomp.oacc-fortran/kernels-loop-2.f95|  13 +-
 22 files changed, 397 insertions(+), 31 deletions(-)

diff --git gcc/ChangeLog gcc/ChangeLog
index d2e846e..b38a31d 100644
--- gcc/ChangeLog
+++ gcc/ChangeLog
@@ -1,3 +1,9 @@
+2017-05-23  Thomas Schwinge  
+
+   * omp-offload.c (execute_oacc_device_lower): Remove the
+   parallelism dimensions function attributes for unparallelized
+   OpenACC kernels constructs.
+
 2017-05-23  Martin Liska  
 
* cgraph.c (cgraph_node::get_create): Use symtab_node::dump_{asm_,}name
diff --git gcc/c/ChangeLog gcc/c/ChangeLog
index d768d93..cb04d4a 100644
--- gcc/c/ChangeLog
+++ gcc/c/ChangeLog
@@ -1,3 +1,9 @@
+2017-05-23  Thomas Schwinge  
+
+   * c-parser.c (OACC_KERNELS_CLAUSE_MASK): Add
+   "PRAGMA_OACC_CLAUSE_NUM_GANGS", "PRAGMA_OACC_CLAUSE_NUM_WORKERS",
+   "VECTOR_LENGTH".
+
 2017-05-23  Marek Polacek  
 
* c-parser.c (c_parser_compound_statement_nostart): Remove redundant
diff --git gcc/c/c-parser.c gcc/c/c-parser.c
index f3bcbee..03c711b 100644
--- gcc/c/c-parser.c
+++ gcc/c/c-parser.c
@@ -13984,11 +13984,14 @@ c_parser_oacc_loop (location_t loc, c_parser *parser, 
char *p_name,
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEFAULT) \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICEPTR)   \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_IF)  \
+   | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_NUM_GANGS)   \
+   | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_NUM_WORKERS) \
| (OMP_CLAU

[PATCH 0/5 v3] Vect peeling cost model

2017-05-23 Thread Robin Dapp
The last version of the patch series caused some regressions for ppc64.
This was largely due to incorrect handling of unsupportable alignment
and should be fixed with the new version.

p2 and p5 have not changed but I'm posting the whole series again for
reference.  p1 only changed comment wording, p3 was adapted to apply on
the trunk.

No regressions on s390x, x86-64 and ppc64.  Bootstrapped.

Regards
 Robin



[PATCH 1/5 v3] Vect peeling cost model

2017-05-23 Thread Robin Dapp
gcc/ChangeLog:

2017-05-23  Robin Dapp  

* tree-vect-data-refs.c (vect_compute_data_ref_alignment):
Create DR_HAS_NEGATIVE_STEP.
(vect_update_misalignment_for_peel): Define DR_MISALIGNMENT.
(vect_enhance_data_refs_alignment): Use.
(vect_duplicate_ssa_name_ptr_info): Use.
* tree-vectorizer.h (dr_misalignment): Use.
(known_alignment_for_access_p): Use.
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 67cc969..874fdb5 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -717,7 +717,7 @@ vect_compute_data_ref_alignment (struct data_reference *dr)
 loop = LOOP_VINFO_LOOP (loop_vinfo);
 
   /* Initialize misalignment to unknown.  */
-  SET_DR_MISALIGNMENT (dr, -1);
+  SET_DR_MISALIGNMENT (dr, DR_MISALIGNMENT_UNKNOWN);
 
   if (tree_fits_shwi_p (DR_STEP (dr)))
 misalign = DR_INIT (dr);
@@ -957,8 +957,9 @@ vect_update_misalignment_for_peel (struct data_reference *dr,
 }
 
   if (dump_enabled_p ())
-dump_printf_loc (MSG_NOTE, vect_location, "Setting misalignment to -1.\n");
-  SET_DR_MISALIGNMENT (dr, -1);
+dump_printf_loc (MSG_NOTE, vect_location, "Setting misalignment " \
+		 "to unknown (-1).\n");
+  SET_DR_MISALIGNMENT (dr, DR_MISALIGNMENT_UNKNOWN);
 }
 
 
@@ -1526,32 +1527,31 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
 {
   if (known_alignment_for_access_p (dr))
 {
-  unsigned int npeel_tmp;
+  unsigned int npeel_tmp = 0;
 	  bool negative = tree_int_cst_compare (DR_STEP (dr),
 		size_zero_node) < 0;
 
-  /* Save info about DR in the hash table.  */
   vectype = STMT_VINFO_VECTYPE (stmt_info);
   nelements = TYPE_VECTOR_SUBPARTS (vectype);
   mis = DR_MISALIGNMENT (dr) / GET_MODE_SIZE (TYPE_MODE (
 TREE_TYPE (DR_REF (dr;
-  npeel_tmp = (negative
-			   ? (mis - nelements) : (nelements - mis))
-		  & (nelements - 1);
+	  if (DR_MISALIGNMENT (dr) != 0)
+		npeel_tmp = (negative ? (mis - nelements)
+			 : (nelements - mis)) & (nelements - 1);
 
   /* For multiple types, it is possible that the bigger type access
  will have more than one peeling option.  E.g., a loop with two
  types: one of size (vector size / 4), and the other one of
  size (vector size / 8).  Vectorization factor will 8.  If both
- access are misaligned by 3, the first one needs one scalar
+ accesses are misaligned by 3, the first one needs one scalar
  iteration to be aligned, and the second one needs 5.  But the
 		 first one will be aligned also by peeling 5 scalar
  iterations, and in that case both accesses will be aligned.
  Hence, except for the immediate peeling amount, we also want
  to try to add full vector size, while we don't exceed
  vectorization factor.
- We do this automatically for cost model, since we calculate cost
- for every peeling option.  */
+ We do this automatically for cost model, since we calculate
+		 cost for every peeling option.  */
   if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
 		{
 		  if (STMT_SLP_TYPE (stmt_info))
@@ -1559,17 +1559,15 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
 		  = (vf * GROUP_SIZE (stmt_info)) / nelements;
 		  else
 		possible_npeel_number = vf / nelements;
-		}
 
-  /* Handle the aligned case. We may decide to align some other
- access, making DR unaligned.  */
-  if (DR_MISALIGNMENT (dr) == 0)
-{
-  npeel_tmp = 0;
-  if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
-possible_npeel_number++;
-}
+		  /* NPEEL_TMP is 0 when there is no misalignment, also allow
+		 peeling off NELEMENTS below.  */
+		  if (DR_MISALIGNMENT (dr) == 0)
+		possible_npeel_number++;
+		}
 
+	  /* Save info about DR in the hash table.  Also include peeling
+	 amounts according to the explanation above.  */
   for (j = 0; j < possible_npeel_number; j++)
 {
   vect_peeling_hash_insert (&peeling_htab, loop_vinfo,
@@ -4182,7 +4180,7 @@ vect_duplicate_ssa_name_ptr_info (tree name, data_reference *dr,
   duplicate_ssa_name_ptr_info (name, DR_PTR_INFO (dr));
   unsigned int align = TYPE_ALIGN_UNIT (STMT_VINFO_VECTYPE (stmt_info));
   int misalign = DR_MISALIGNMENT (dr);
-  if (misalign == -1)
+  if (misalign == DR_MISALIGNMENT_UNKNOWN)
 mark_ptr_info_alignment_unknown (SSA_NAME_PTR_INFO (name));
   else
 set_ptr_info_alignment (SSA_NAME_PTR_INFO (name), align, misalign);
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.

Re: PR80806

2017-05-23 Thread Martin Sebor

On 05/18/2017 12:55 PM, Prathamesh Kulkarni wrote:

Hi,
The attached patch tries to fix PR80806 by warning when a variable is
set using memset (and friends) but not used. I chose to warn in dse
pass since dse would detect if the variable passed as 1st argument is
a dead store. Does this approach look OK ?


Detecting -Wunused-but-set-variable in the optimizer means that
the warning will not be issued without optimization.  It also
means that the warning will trigger in cases where the variable
is used conditionally and the condition is subject to constant
propagation.  For instance:

  void sink (void*);

  void test (int i)
  {
  char buf[10];   // -Wunused-but-set-variable
  memset (buf, 0, sizeof(buf));

  if (i)
sink (buf);
  }

  void f (void)
  {
  test (0);
  }

I suspect this would be considered a false positive by most users.
In my view, it would be more in line with the design of the warning
to enhance the front end to detect this case, and it would avoid
these issues.

I have a patch that does that.  Rather than checking the finite
set of known built-in functions like memset that are known not
to read the referenced object, I took the approach of adding
a new  function attribute (I call it write-only) and avoiding
setting the DECL_READ_P flag for DECLs that are passed to
function arguments decorated with the attribute.  That makes
it possible to issue the warning even if the variable is passed
to ordinary (non-built-in) functions like getline(), and should
open up optimization opportunities beyond built-ins.  The only
wrinkle is that the front end sets DECL_READ_P even for uses that
aren't reads such as a sizeof expression, so while an otherwise
unused buf is diagnosed given a call to memset(buf, 0, 10), it
isn't diagnosed if a call is made to memset(buf, 0, sizeof buf).
I am yet to see what impact not setting DECL_READ_P would have
when the decl is used without being evaluated.  (In any event,
setting DECL_READ_P on a use that doesn't involve reading the
DECL doesn't seem right.)

I attach what I have so far in case you would like to check it
out.  I think you have more experience with DSE than me so I'd
be interested in your thoughts on making use of the attribute
for optimization.  (Another couple attributes I'm considering
to complement write-only is read-only and read-write, also
with the hope of improving both warnings and code generation.
Ideas on those would be welcome as well.)



There were following fallouts observed during bootstrap build:

* double-int.c (div_and_round_double):
Warning emitted 'den' set but not used for following call to memset:
memset (den, 0, sizeof den);

I assume the warning is correct since there's immediately call to:
encode (den, lden, hden);

and encode overwrites all the contents of den.
Should the above call to memset be removed from the source ?


IIUC, this seems to be a more involved example of the simple one
above.  AFAICS, the buffer is subsequently read in the function,
but only conditionally.

That said, since encode() overwrites the whole buffer right after
it has been cleared by memset, I would think that a warning pointing
that out could be helpful (although I'm not sure -Wunused is the
right warning to issue).



* tree-streamer.c (streamer_check_handled_ts_structures)
The function defines a local array bool handled_p[LAST_TS_ENUM];
and the warning is emitted for:
memset (&handled_p, 0, sizeof (handled_p));

That's because the function then initializes each element of the array
handled_p to true
making the memset call redundant.
I am not sure if warning for the above case is a good idea ? The call
to memset() seems deliberate, to initialize all elements to 0, and
later assert checks if all the elements were explicitly set to true.


Right.  Warning on this function doesn't seem right since
the variable is subsequently used in the test loop.



* value-prof.c (free_hist):
Warns for the call to memset:

static int
free_hist (void **slot, void *data ATTRIBUTE_UNUSED)
{
  histogram_value hist = *(histogram_value *) slot;
  free (hist->hvalue.counters);
  if (flag_checking)
memset (hist, 0xab, sizeof (*hist));
  free (hist);
  return 1;
}

Assuming flag_checking was true, the call to memset would be dead
anyway since it would be immediately freed ? Um, I don't understand
the purpose of memset in the above function.


I'm guessing it's an effort to invalidate the memory to detect
subsequent accesses to its contents.  Warning on such code would
be valuable because it's a common misconception that a memset
can be used to wipe sensitive data (e.g., passwords) from memory
before freeing it, to prevent them from leaking into compromised
stack frames (or core files).  But it should be a warning that's
distinct from -Wunused.

Martin
diff --git a/gcc/builtin-attrs.def b/gcc/builtin-attrs.def
index 38fb1bb8..24e6f6a 100644
--- a/gcc/builtin-attrs.def
+++ b/gcc/builtin-attrs.def
@@ -113,6 +113,7 @@ DEF_ATTR_IDENT (ATTR_TM_REGPARM, 

[PATCH 2/5 v3] Vect peeling cost model

2017-05-23 Thread Robin Dapp
gcc/ChangeLog:

2017-05-23  Robin Dapp  

* tree-vect-data-refs.c (vect_update_misalignment_for_peel):
Rename.
(vect_get_peeling_costs_all_drs): Create function.
(vect_peeling_hash_get_lowest_cost):
Use vect_get_peeling_costs_all_drs.
(vect_peeling_supportable): Create function.
(vect_enhance_data_refs_alignment): Use
vect_peeling_supportable.


diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 874fdb5..fe398ea 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -903,7 +903,11 @@ vect_compute_data_ref_alignment (struct data_reference *dr)
 }
 
 
-/* Function vect_update_misalignment_for_peel
+/* Function vect_update_misalignment_for_peel.
+   Sets DR's misalignment
+   - to 0 if it has the same alignment as DR_PEEL,
+   - to the misalignment computed using NPEEL if DR's salignment is known,
+   - to -1 (unknown) otherwise.
 
DR - the data reference whose misalignment is to be adjusted.
DR_PEEL - the data reference whose misalignment is being made
@@ -916,7 +920,7 @@ vect_update_misalignment_for_peel (struct data_reference *dr,
struct data_reference *dr_peel, int npeel)
 {
   unsigned int i;
-  vec same_align_drs;
+  vec same_aligned_drs;
   struct data_reference *current_dr;
   int dr_size = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (dr;
   int dr_peel_size = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (dr_peel;
@@ -932,9 +936,9 @@ vect_update_misalignment_for_peel (struct data_reference *dr,
 
   /* It can be assumed that the data refs with the same alignment as dr_peel
  are aligned in the vector loop.  */
-  same_align_drs
+  same_aligned_drs
 = STMT_VINFO_SAME_ALIGN_REFS (vinfo_for_stmt (DR_STMT (dr_peel)));
-  FOR_EACH_VEC_ELT (same_align_drs, i, current_dr)
+  FOR_EACH_VEC_ELT (same_aligned_drs, i, current_dr)
 {
   if (current_dr != dr)
 continue;
@@ -1234,27 +1238,23 @@ vect_peeling_hash_get_most_frequent (_vect_peel_info **slot,
   return 1;
 }
 
+/* Get the costs of peeling NPEEL iterations checking data access costs
+   for all data refs. */
 
-/* Traverse peeling hash table and calculate cost for each peeling option.
-   Find the one with the lowest cost.  */
-
-int
-vect_peeling_hash_get_lowest_cost (_vect_peel_info **slot,
-   _vect_peel_extended_info *min)
+static void
+vect_get_peeling_costs_all_drs (struct data_reference *dr0,
+unsigned int *inside_cost,
+unsigned int *outside_cost,
+stmt_vector_for_cost *body_cost_vec,
+unsigned int npeel, unsigned int vf)
 {
-  vect_peel_info elem = *slot;
-  int save_misalignment, dummy;
-  unsigned int inside_cost = 0, outside_cost = 0, i;
-  gimple *stmt = DR_STMT (elem->dr);
+  gimple *stmt = DR_STMT (dr0);
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   vec datarefs = LOOP_VINFO_DATAREFS (loop_vinfo);
-  struct data_reference *dr;
-  stmt_vector_for_cost prologue_cost_vec, body_cost_vec, epilogue_cost_vec;
 
-  prologue_cost_vec.create (2);
-  body_cost_vec.create (2);
-  epilogue_cost_vec.create (2);
+  unsigned i;
+  data_reference *dr;
 
   FOR_EACH_VEC_ELT (datarefs, i, dr)
 {
@@ -1272,12 +1272,40 @@ vect_peeling_hash_get_lowest_cost (_vect_peel_info **slot,
 	  && !STMT_VINFO_GROUPED_ACCESS (stmt_info))
 	continue;
 
+  int save_misalignment;
   save_misalignment = DR_MISALIGNMENT (dr);
-  vect_update_misalignment_for_peel (dr, elem->dr, elem->npeel);
-  vect_get_data_access_cost (dr, &inside_cost, &outside_cost,
- &body_cost_vec);
+  if (dr == dr0 && npeel == vf / 2)
+	SET_DR_MISALIGNMENT (dr, 0);
+  else
+	vect_update_misalignment_for_peel (dr, dr0, npeel);
+  vect_get_data_access_cost (dr, inside_cost, outside_cost,
+ body_cost_vec);
   SET_DR_MISALIGNMENT (dr, save_misalignment);
 }
+}
+
+/* Traverse peeling hash table and calculate cost for each peeling option.
+   Find the one with the lowest cost.  */
+
+int
+vect_peeling_hash_get_lowest_cost (_vect_peel_info **slot,
+   _vect_peel_extended_info *min)
+{
+  vect_peel_info elem = *slot;
+  int dummy;
+  unsigned int inside_cost = 0, outside_cost = 0;
+  gimple *stmt = DR_STMT (elem->dr);
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+  stmt_vector_for_cost prologue_cost_vec, body_cost_vec,
+		   epilogue_cost_vec;
+
+  prologue_cost_vec.create (2);
+  body_cost_vec.create (2);
+  epilogue_cost_vec.create (2);
+
+  vect_get_peeling_costs_all_drs (elem->dr, &inside_cost, &outside_cost,
+  &body_cost_vec, elem->npeel, 0);
 
   outside_cost += vect_get_known_peeling_cost
 (loop_vinfo, elem->npeel, &dummy,
@@ -1292,7 +1320,8 @@ vect_peeling_hash_get_lowest_cost (_vect_peel_info **slot,
   epilogue_cost_vec.release ();
 
   if (inside_cost < min->inside_cost
-  || (inside_cost == min-

[PATCH 3/5 v3] Vect peeling cost model

2017-05-23 Thread Robin Dapp
gcc/ChangeLog:

2017-05-23  Robin Dapp  

* tree-vect-data-refs.c (vect_peeling_hash_choose_best_peeling):
Return peeling info and set costs to zero for unlimited cost
model.
(vect_enhance_data_refs_alignment): Also inspect all datarefs
with unknown misalignment. Compute and costs for unknown
misalignment, compare them to the costs for known misalignment
and choose the cheapest for peeling.


diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index fe398ea..8cd6edd 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -1342,7 +1342,7 @@ vect_peeling_hash_get_lowest_cost (_vect_peel_info **slot,
choosing an option with the lowest cost (if cost model is enabled) or the
option that aligns as many accesses as possible.  */
 
-static struct data_reference *
+static struct _vect_peel_extended_info
 vect_peeling_hash_choose_best_peeling (hash_table *peeling_htab,
    loop_vec_info loop_vinfo,
unsigned int *npeel,
@@ -1365,11 +1365,13 @@ vect_peeling_hash_choose_best_peeling (hash_table *peeling_hta
res.peel_info.count = 0;
peeling_htab->traverse <_vect_peel_extended_info *,
 	   		   vect_peeling_hash_get_most_frequent> (&res);
+   res.inside_cost = 0;
+   res.outside_cost = 0;
  }
 
*npeel = res.peel_info.npeel;
*body_cost_vec = res.body_cost_vec;
-   return res.peel_info.dr;
+   return res;
 }
 
 /* Return true if the new peeling NPEEL is supported.  */
@@ -1518,6 +1520,7 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
   enum dr_alignment_support supportable_dr_alignment;
   struct data_reference *dr0 = NULL, *first_store = NULL;
   struct data_reference *dr;
+  struct data_reference *dr0_known_align = NULL;
   unsigned int i, j;
   bool do_peeling = false;
   bool do_versioning = false;
@@ -1525,7 +1528,8 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
   gimple *stmt;
   stmt_vec_info stmt_info;
   unsigned int npeel = 0;
-  bool all_misalignments_unknown = true;
+  bool one_misalignment_known = false;
+  bool one_misalignment_unknown = false;
   unsigned int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
   unsigned possible_npeel_number = 1;
   tree vectype;
@@ -1651,11 +1655,7 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
   npeel_tmp += nelements;
 }
 
-  all_misalignments_unknown = false;
-  /* Data-ref that was chosen for the case that all the
- misalignments are unknown is not relevant anymore, since we
- have a data-ref with known alignment.  */
-  dr0 = NULL;
+	  one_misalignment_known = true;
 }
   else
 {
@@ -1663,35 +1663,32 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
  peeling for data-ref that has the maximum number of data-refs
  with the same alignment, unless the target prefers to align
  stores over load.  */
-  if (all_misalignments_unknown)
-{
-		  unsigned same_align_drs
-		= STMT_VINFO_SAME_ALIGN_REFS (stmt_info).length ();
-  if (!dr0
-		  || same_align_drs_max < same_align_drs)
-{
-  same_align_drs_max = same_align_drs;
-  dr0 = dr;
-}
-		  /* For data-refs with the same number of related
-		 accesses prefer the one where the misalign
-		 computation will be invariant in the outermost loop.  */
-		  else if (same_align_drs_max == same_align_drs)
-		{
-		  struct loop *ivloop0, *ivloop;
-		  ivloop0 = outermost_invariant_loop_for_expr
-			  (loop, DR_BASE_ADDRESS (dr0));
-		  ivloop = outermost_invariant_loop_for_expr
-			  (loop, DR_BASE_ADDRESS (dr));
-		  if ((ivloop && !ivloop0)
-			  || (ivloop && ivloop0
-			  && flow_loop_nested_p (ivloop, ivloop0)))
-			dr0 = dr;
-		}
+	  unsigned same_align_drs
+		= STMT_VINFO_SAME_ALIGN_REFS (stmt_info).length ();
+	  if (!dr0
+		  || same_align_drs_max < same_align_drs)
+		{
+		  same_align_drs_max = same_align_drs;
+		  dr0 = dr;
+		}
+	  /* For data-refs with the same number of related
+		 accesses prefer the one where the misalign
+		 computation will be invariant in the outermost loop.  */
+	  else if (same_align_drs_max == same_align_drs)
+		{
+		  struct loop *ivloop0, *ivloop;
+		  ivloop0 = outermost_invariant_loop_for_expr
+		(loop, DR_BASE_ADDRESS (dr0));
+		  ivloop = outermost_invariant_loop_for_expr
+		(loop, DR_BASE_ADDRESS (dr));
+		  if ((ivloop && !ivloop0)
+		  || (ivloop && ivloop0
+			  && flow_loop_nested_p (ivloop, ivloop0)))
+		dr0 = dr;
+		}
 
-  if (!first_store && DR_IS_WRITE (dr))
-first_store = dr;
-}
+	  if (!first_store && DR_IS_WRITE (

[PATCH 4/5 v3] Vect peeling cost model

2017-05-23 Thread Robin Dapp
gcc/ChangeLog:

2017-05-23  Robin Dapp  

* tree-vect-data-refs.c (vect_get_data_access_cost):
Workaround for SLP handling.
(vect_enhance_data_refs_alignment):
Compute costs for doing no peeling at all, compare to the best
peeling costs so far and avoid peeling if cheaper.
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 8cd6edd..05f944a 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -1134,7 +1134,7 @@ vect_get_data_access_cost (struct data_reference *dr,
   int nunits = TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info));
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
-  int ncopies = vf / nunits;
+  int ncopies = MAX (1, vf / nunits); /* TODO: Handle SLP properly  */
 
   if (DR_IS_READ (dr))
 vect_get_load_cost (dr, ncopies, true, inside_cost, outside_cost,
@@ -1520,7 +1520,6 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
   enum dr_alignment_support supportable_dr_alignment;
   struct data_reference *dr0 = NULL, *first_store = NULL;
   struct data_reference *dr;
-  struct data_reference *dr0_known_align = NULL;
   unsigned int i, j;
   bool do_peeling = false;
   bool do_versioning = false;
@@ -1530,6 +1529,8 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
   unsigned int npeel = 0;
   bool one_misalignment_known = false;
   bool one_misalignment_unknown = false;
+  bool one_dr_unsupportable = false;
+  struct data_reference *unsupportable_dr = NULL;
   unsigned int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
   unsigned possible_npeel_number = 1;
   tree vectype;
@@ -1687,20 +1688,18 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
 		dr0 = dr;
 		}
 
-	  if (!first_store && DR_IS_WRITE (dr))
-		first_store = dr;
+	  one_misalignment_unknown = true;
 
-  /* If there are both known and unknown misaligned accesses in the
- loop, we choose peeling amount according to the known
- accesses.  */
-  if (!supportable_dr_alignment)
-{
-  dr0 = dr;
-  if (!first_store && DR_IS_WRITE (dr))
-first_store = dr;
-}
+	  /* Check for data refs with unsupportable alignment that
+	 can be peeled.  */
+	  if (!supportable_dr_alignment)
+	  {
+		one_dr_unsupportable = true;
+		unsupportable_dr = dr;
+	  }
 
-	  one_misalignment_unknown = true;
+	  if (!first_store && DR_IS_WRITE (dr))
+		first_store = dr;
 }
 }
   else
@@ -1721,81 +1720,85 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
   || loop->inner)
 do_peeling = false;
 
-  unsigned int unknown_align_inside_cost = INT_MAX;
-  unsigned int unknown_align_outside_cost = INT_MAX;
+  struct _vect_peel_extended_info peel_for_known_alignment;
+  struct _vect_peel_extended_info peel_for_unknown_alignment;
+  struct _vect_peel_extended_info best_peel;
+
+  peel_for_unknown_alignment.inside_cost = INT_MAX;
+  peel_for_unknown_alignment.outside_cost = INT_MAX;
+  peel_for_unknown_alignment.peel_info.count = 0;
 
   if (do_peeling
-  && one_misalignment_unknown
-  && vect_supportable_dr_alignment (dr0, false))
+  && one_misalignment_unknown)
 {
   /* Check if the target requires to prefer stores over loads, i.e., if
  misaligned stores are more expensive than misaligned loads (taking
  drs with same alignment into account).  */
-  if (first_store && DR_IS_READ (dr0))
-{
-	  unsigned int load_inside_cost = 0;
-	  unsigned int load_outside_cost = 0;
-	  unsigned int store_inside_cost = 0;
-	  unsigned int store_outside_cost = 0;
-	  stmt_vector_for_cost dummy;
-	  dummy.create (2);
-	  vect_get_peeling_costs_all_drs (dr0,
-	  &load_inside_cost,
-	  &load_outside_cost,
-	  &dummy, vf / 2, vf);
-	  dummy.release ();
-
+  unsigned int load_inside_cost = 0;
+  unsigned int load_outside_cost = 0;
+  unsigned int store_inside_cost = 0;
+  unsigned int store_outside_cost = 0;
+
+  stmt_vector_for_cost dummy;
+  dummy.create (2);
+  vect_get_peeling_costs_all_drs (dr0,
+  &load_inside_cost,
+  &load_outside_cost,
+  &dummy, vf / 2, vf);
+  dummy.release ();
+
+  if (first_store)
+	{
 	  dummy.create (2);
 	  vect_get_peeling_costs_all_drs (first_store,
 	  &store_inside_cost,
 	  &store_outside_cost,
 	  &dummy, vf / 2, vf);
 	  dummy.release ();
+	}
+  else
+	{
+	  store_inside_cost = INT_MAX;
+	  store_outside_cost = INT_MAX;
+	}
 
-  if (load_inside_cost > store_inside_cost
-  || (load_inside_cost == store_inside_cost
-		  && load_outside_cost > store_outside_cost))
-	{
-	  dr0 = first_store;
-	  unknown_align_inside_cost = store_inside_cost;
-	  unknown_align_outside_cost = store_outs

[PATCH 5/5 v3] Vect peeling cost model

2017-05-23 Thread Robin Dapp
gcc/testsuite/ChangeLog:

2017-05-23  Robin Dapp  

* gcc.target/s390/vector/vec-nopeel-2.c: New test.


diff --git a/gcc/testsuite/gcc.target/s390/vector/vec-nopeel-2.c b/gcc/testsuite/gcc.target/s390/vector/vec-nopeel-2.c
new file mode 100644
index 000..9b67793
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/vector/vec-nopeel-2.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target s390_vx } */
+/* { dg-options "-O2 -mzarch -march=z13 -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic" } */
+
+void foo(int *restrict a, int *restrict b, unsigned int n)
+{
+  for (unsigned int i = 0; i < n; i++)
+b[i] = a[i] * 2 + 1;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 2 "vect" } } */


Re: Remove unused "default_kind" member from gcc/omp-low.c's "struct omp_context"

2017-05-23 Thread Thomas Schwinge
Hi!

On Tue, 23 May 2017 12:36:18 +0200, Jakub Jelinek  wrote:
> On Fri, Mar 31, 2017 at 05:05:29PM +0200, Thomas Schwinge wrote:
> > Remove unused "default_kind" member from gcc/omp-low.c's "struct 
> > omp_context"

> Ok for trunk.

Thanks.  As posted, committed to trunk in r248372:

commit 093c94dd910ff55efb5e1bb38cc8020ee488c1e9
Author: tschwinge 
Date:   Tue May 23 16:00:56 2017 +

Remove unused "default_kind" member from gcc/omp-low.c's "struct 
omp_context"

gcc/
* omp-low.c (struct omp_context): Remove "default_kind" member.
Adjust all users.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@248372 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog |  3 +++
 gcc/omp-low.c | 12 +---
 2 files changed, 4 insertions(+), 11 deletions(-)

diff --git gcc/ChangeLog gcc/ChangeLog
index b38a31d..b626515 100644
--- gcc/ChangeLog
+++ gcc/ChangeLog
@@ -1,5 +1,8 @@
 2017-05-23  Thomas Schwinge  
 
+   * omp-low.c (struct omp_context): Remove "default_kind" member.
+   Adjust all users.
+
* omp-offload.c (execute_oacc_device_lower): Remove the
parallelism dimensions function attributes for unparallelized
OpenACC kernels constructs.
diff --git gcc/omp-low.c gcc/omp-low.c
index 968075c..9a16248 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -112,10 +112,6 @@ struct omp_context
  otherwise.  */
   gimple *simt_stmt;
 
-  /* What to do with variables with implicitly determined sharing
- attributes.  */
-  enum omp_clause_default_kind default_kind;
-
   /* Nesting depth of this context.  Used to beautify error messages re
  invalid gotos.  The outermost ctx is depth 1, with depth 0 being
  reserved for the main body of the function.  */
@@ -1162,10 +1158,6 @@ scan_sharing_clauses (tree clauses, omp_context *ctx,
  install_var_field (decl, by_ref, 3, ctx);
  break;
 
-   case OMP_CLAUSE_DEFAULT:
- ctx->default_kind = OMP_CLAUSE_DEFAULT_KIND (c);
- break;
-
case OMP_CLAUSE_FINAL:
case OMP_CLAUSE_IF:
case OMP_CLAUSE_NUM_THREADS:
@@ -1332,6 +1324,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx,
case OMP_CLAUSE_SEQ:
case OMP_CLAUSE_TILE:
case OMP_CLAUSE__SIMT_:
+   case OMP_CLAUSE_DEFAULT:
  break;
 
case OMP_CLAUSE_ALIGNED:
@@ -1826,7 +1819,6 @@ scan_omp_parallel (gimple_stmt_iterator *gsi, omp_context 
*outer_ctx)
   if (taskreg_nesting_level > 1)
 ctx->is_nested = true;
   ctx->field_map = splay_tree_new (splay_tree_compare_pointers, 0, 0);
-  ctx->default_kind = OMP_CLAUSE_DEFAULT_SHARED;
   ctx->record_type = lang_hooks.types.make_type (RECORD_TYPE);
   name = create_tmp_var_name (".omp_data_s");
   name = build_decl (gimple_location (stmt),
@@ -1875,7 +1867,6 @@ scan_omp_task (gimple_stmt_iterator *gsi, omp_context 
*outer_ctx)
   if (taskreg_nesting_level > 1)
 ctx->is_nested = true;
   ctx->field_map = splay_tree_new (splay_tree_compare_pointers, 0, 0);
-  ctx->default_kind = OMP_CLAUSE_DEFAULT_SHARED;
   ctx->record_type = lang_hooks.types.make_type (RECORD_TYPE);
   name = create_tmp_var_name (".omp_data_s");
   name = build_decl (gimple_location (stmt),
@@ -2390,7 +2381,6 @@ scan_omp_target (gomp_target *stmt, omp_context 
*outer_ctx)
 
   ctx = new_omp_context (stmt, outer_ctx);
   ctx->field_map = splay_tree_new (splay_tree_compare_pointers, 0, 0);
-  ctx->default_kind = OMP_CLAUSE_DEFAULT_SHARED;
   ctx->record_type = lang_hooks.types.make_type (RECORD_TYPE);
   name = create_tmp_var_name (".omp_data_t");
   name = build_decl (gimple_location (stmt),


Grüße
 Thomas


Re: OpenACC 2.5 default (present) clause

2017-05-23 Thread Thomas Schwinge
Hi!

On Tue, 23 May 2017 12:40:06 +0200, Jakub Jelinek  wrote:
> On Fri, May 19, 2017 at 03:46:53PM +0200, Thomas Schwinge wrote:
> > > > -  /* OpenMP clause: default.  */
> > > > +  /* OpenACC clause: default ( none | present ).
> > > > +
> > > > + OpenMP clause: default ( firstprivate | none | private | shared 
> > > > ). */
> > > >OMP_CLAUSE_DEFAULT,
> > > >  
> > > >/* OpenACC/OpenMP clause: collapse (constant-integer-expression).  */
> > > 
> > > I think this hunk isn't needed (plus it is not accurate anyway).
> > 
> > Now you got me curious: why isn't it accurate?
> 
> Because the clause in OpenMP is default ( shared | none ) for C/C++
> and default ( private | firstprivate | shared | none ) for Fortran.

So, the union of the C/C++ and Fortran variants is "private |
firstprivate | shared | none", and sorting these alphabetically, you'd
get "firstprivate | none | private | shared" as I posted?  Anyway, no
point on spending more time on this.  ;-)


Grüße
 Thomas


[C++ PATCH] break more out of pushdecl

2017-05-23 Thread Nathan Sidwell

this patch breaks some more workers out of pushdecl.

1) in function-scope pushdecl may need to set the decl's context. 
pushdecl had some funky code there to cope with pushing a function decl 
in its own context -- that's no longer needed with my changes to 
start_preparsed_function to get things in the correct order.  But there 
is still some tweaking that needs doing.


2) There are some special rules for function-scope extern decls, they 
can inherit linkage if they have a decl in the enclosing scope.  That 
code was tangled up with that checking for a matching decl, but is 
better separated out, which this patch does too.


3) also introduced a DECL_HIDDEN_P, a useful predicate.

nathan
--
Nathan Sidwell
2017-05-23  Nathan Sidwell  

	gcc/cp/
	* cp-tree.h (DECL_HIDDEN_P): New.
	* name-lookup.c (set_decl_context,
	set_local_extern_decl_linkage):	New, broken out of ...
	(pushdecl_maybe_friend_1): ... here.  Call them.

	gcc/testsuite/
	* g++.dg/parse/ctor9.C: Adjust expected error.

Index: cp/cp-tree.h
===
--- cp/cp-tree.h	(revision 248364)
+++ cp/cp-tree.h	(working copy)
@@ -3775,6 +3775,11 @@ more_aggr_init_expr_args_p (const aggr_i
   (DECL_LANG_SPECIFIC (TYPE_FUNCTION_OR_TEMPLATE_DECL_CHECK (NODE)) \
->u.base.anticipated_p)
 
+/* Is DECL NODE a hidden name?  */
+#define DECL_HIDDEN_P(NODE) \
+  (DECL_LANG_SPECIFIC (NODE) && TYPE_FUNCTION_OR_TEMPLATE_DECL_P (NODE) \
+   && DECL_ANTICIPATED (NODE))
+
 /* True if this is a hidden class type.*/
 #define TYPE_HIDDEN_P(NODE) \
   (DECL_LANG_SPECIFIC (TYPE_NAME (NODE)) \
Index: cp/name-lookup.c
===
--- cp/name-lookup.c	(revision 248364)
+++ cp/name-lookup.c	(working copy)
@@ -1534,6 +1534,137 @@ check_local_shadow (tree decl)
   inform (DECL_SOURCE_LOCATION (shadowed), "shadowed declaration is here");
 }
 
+
+/* DECL is being pushed inside function CTX.  Set its context, if
+   needed.  */
+
+static void
+set_decl_context_in_fn (tree ctx, tree decl)
+{
+  if (!DECL_CONTEXT (decl)
+  /* A local declaration for a function doesn't constitute
+	 nesting.  */
+  && TREE_CODE (decl) != FUNCTION_DECL
+  /* A local declaration for an `extern' variable is in the
+	 scope of the current namespace, not the current
+	 function.  */
+  && !(VAR_P (decl) && DECL_EXTERNAL (decl))
+  /* When parsing the parameter list of a function declarator,
+	 don't set DECL_CONTEXT to an enclosing function.  When we
+	 push the PARM_DECLs in order to process the function body,
+	 current_binding_level->this_entity will be set.  */
+  && !(TREE_CODE (decl) == PARM_DECL
+	   && current_binding_level->kind == sk_function_parms
+	   && current_binding_level->this_entity == NULL))
+DECL_CONTEXT (decl) = ctx;
+
+  /* If this is the declaration for a namespace-scope function,
+ but the declaration itself is in a local scope, mark the
+ declaration.  */
+  if (TREE_CODE (decl) == FUNCTION_DECL && DECL_NAMESPACE_SCOPE_P (decl))
+DECL_LOCAL_FUNCTION_P (decl) = 1;
+}
+
+/* DECL is a local-scope decl with linkage.  SHADOWED is true if the
+   name is already bound at the current level.
+
+   [basic.link] If there is a visible declaration of an entity with
+   linkage having the same name and type, ignoring entities declared
+   outside the innermost enclosing namespace scope, the block scope
+   declaration declares that same entity and receives the linkage of
+   the previous declaration.
+
+   Also, make sure that this decl matches any existing external decl
+   in the enclosing namespace.  */
+
+static void
+set_local_extern_decl_linkage (tree decl, bool shadowed)
+{
+  tree ns_value = decl; /* Unique marker.  */
+
+  if (!shadowed)
+{
+  tree loc_value = innermost_non_namespace_value (DECL_NAME (decl));
+  if (!loc_value)
+	{
+	  ns_value
+	= get_namespace_binding (current_namespace, DECL_NAME (decl));
+	  loc_value = ns_value;
+	}
+  if (loc_value == error_mark_node)
+	loc_value = NULL_TREE;
+
+  for (ovl_iterator iter (loc_value); iter; ++iter)
+	if (!DECL_HIDDEN_P (*iter)
+	&& (TREE_STATIC (*iter) || DECL_EXTERNAL (*iter))
+	&& decls_match (*iter, decl))
+	  {
+	/* The standard only says that the local extern inherits
+	   linkage from the previous decl; in particular, default
+	   args are not shared.  Add the decl into a hash table to
+	   make sure only the previous decl in this case is seen
+	   by the middle end.  */
+	struct cxx_int_tree_map *h;
+
+	/* We inherit the outer decl's linkage.  But we're a
+	   different decl.  */
+	TREE_PUBLIC (decl) = TREE_PUBLIC (*iter);
+
+	if (cp_function_chain->extern_decl_map == NULL)
+	  cp_function_chain->extern_decl_map
+		= hash_table::create_ggc (20);
+
+	h = ggc_alloc ();
+	h->uid = DECL_UID (decl);
+	h->to = *iter;
+	cxx_int_tree_map **loc = cp_function_chain->extern_decl_map
+

Re: [PATCH v2] Implement non-trivial std::random_device::entropy (PR libstdc++/67578)

2017-05-23 Thread Jonathan Wakely

On 22/05/17 22:28 +0800, Xi Ruoyao wrote:

The new patch is attached.  Just merged the patches you sent and fixed
the ChangeLog of gnu.ver and testsuite_abi.cc.

(For fun:  I had mistakenly attached the Vim .swp file of the patch
and almost sent it. :-p)


Ha, that *definitely* wouldnt' have applied, but I think I'd have
noticed the reason sooner ;-)

I've committed the patch now - thanks.



@@ -334,6 +335,7 @@ compatible.
GCC 5.1.0: GLIBCXX_3.4.21, CXXABI_1.3.9
GCC 6.1.0: GLIBCXX_3.4.22, CXXABI_1.3.10
GCC 7.1.0: GLIBCXX_3.4.23, CXXABI_1.3.11
+GCC 8.0.0: GLIBCXX_3.4.24, CXXABI_1.3.10


Oops, this should have been CXXABI_1.3.11, I fixed that before
committing it.


+if (static_cast(ent) > sizeof(result_type) * 8)
+  return static_cast(sizeof(result_type) * 8);


After committing it I realised this should use __CHAR_BIT__ instead of
hardcoding 8, which I'll fix shortly.




[PATCH GCC][1/6]Move compare_tree to tree.c and expose the interface.

2017-05-23 Thread Bin Cheng
Hi,
This patch set factors out runtime alias check code from tree-vect-data-refs.c
and tree-vect-loop-manip.c as general interfaces in tree-data-ref.c.  With this
change other optimizers like tree loop distribution could version loop wrto the
runtime alias checks.  During this work, I also found current code has issues
with negative DR_STEP.  This patch set fixes the issue as tracked in PR80815.

This is the first patch simply moves compare_tree to tree.c and exposes it.
Bootstrap and test on x86_64 and AArch64, is it OK?

Thanks,
bin

2017-05-22  Bin Cheng  

* tree-vect-data-refs.c (compare_tree): Move ...
* tree.c (compare_tree): ... to here.
* tree.h (compare_tree): New decalaration.From 2b993c1aaada767cbbda4e73010d48b0b8347ef3 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Fri, 19 May 2017 12:49:38 +0100
Subject: [PATCH 1/6] compare_tree-interface-20170515.txt

---
 gcc/tree-vect-data-refs.c | 77 ---
 gcc/tree.c| 74 +
 gcc/tree.h|  1 +
 3 files changed, 75 insertions(+), 77 deletions(-)

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 67cc969..7b835ae 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -2574,83 +2574,6 @@ vect_analyze_data_ref_access (struct data_reference *dr)
   return vect_analyze_group_access (dr);
 }
 
-
-
-/*  A helper function used in the comparator function to sort data
-references.  T1 and T2 are two data references to be compared.
-The function returns -1, 0, or 1.  */
-
-static int
-compare_tree (tree t1, tree t2)
-{
-  int i, cmp;
-  enum tree_code code;
-  char tclass;
-
-  if (t1 == t2)
-return 0;
-  if (t1 == NULL)
-return -1;
-  if (t2 == NULL)
-return 1;
-
-  STRIP_NOPS (t1);
-  STRIP_NOPS (t2);
-
-  if (TREE_CODE (t1) != TREE_CODE (t2))
-return TREE_CODE (t1) < TREE_CODE (t2) ? -1 : 1;
-
-  code = TREE_CODE (t1);
-  switch (code)
-{
-/* For const values, we can just use hash values for comparisons.  */
-case INTEGER_CST:
-case REAL_CST:
-case FIXED_CST:
-case STRING_CST:
-case COMPLEX_CST:
-case VECTOR_CST:
-  {
-   hashval_t h1 = iterative_hash_expr (t1, 0);
-   hashval_t h2 = iterative_hash_expr (t2, 0);
-   if (h1 != h2)
- return h1 < h2 ? -1 : 1;
-   break;
-  }
-
-case SSA_NAME:
-  cmp = compare_tree (SSA_NAME_VAR (t1), SSA_NAME_VAR (t2));
-  if (cmp != 0)
-   return cmp;
-
-  if (SSA_NAME_VERSION (t1) != SSA_NAME_VERSION (t2))
-   return SSA_NAME_VERSION (t1) < SSA_NAME_VERSION (t2) ? -1 : 1;
-  break;
-
-default:
-  tclass = TREE_CODE_CLASS (code);
-
-  /* For var-decl, we could compare their UIDs.  */
-  if (tclass == tcc_declaration)
-   {
- if (DECL_UID (t1) != DECL_UID (t2))
-   return DECL_UID (t1) < DECL_UID (t2) ? -1 : 1;
- break;
-   }
-
-  /* For expressions with operands, compare their operands recursively.  */
-  for (i = TREE_OPERAND_LENGTH (t1) - 1; i >= 0; --i)
-   {
- cmp = compare_tree (TREE_OPERAND (t1, i), TREE_OPERAND (t2, i));
- if (cmp != 0)
-   return cmp;
-   }
-}
-
-  return 0;
-}
-
-
 /* Compare two data-references DRA and DRB to group them into chunks
suitable for grouping.  */
 
diff --git a/gcc/tree.c b/gcc/tree.c
index b76b521..c9b0615 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -7635,6 +7635,80 @@ compare_tree_int (const_tree t, unsigned HOST_WIDE_INT u)
 return 1;
 }
 
+/*  A helper function computes order between two tree epxressions T1 and T2.
+This is used in comparator functions sorting objects based on the order
+of tree expressions.  The function returns -1, 0, or 1.  */
+
+int
+compare_tree (tree t1, tree t2)
+{
+  int i, cmp;
+  enum tree_code code;
+  char tclass;
+
+  if (t1 == t2)
+return 0;
+  if (t1 == NULL)
+return -1;
+  if (t2 == NULL)
+return 1;
+
+  STRIP_NOPS (t1);
+  STRIP_NOPS (t2);
+
+  if (TREE_CODE (t1) != TREE_CODE (t2))
+return TREE_CODE (t1) < TREE_CODE (t2) ? -1 : 1;
+
+  code = TREE_CODE (t1);
+  switch (code)
+{
+/* For const values, we can just use hash values for comparisons.  */
+case INTEGER_CST:
+case REAL_CST:
+case FIXED_CST:
+case STRING_CST:
+case COMPLEX_CST:
+case VECTOR_CST:
+  {
+   hashval_t h1 = iterative_hash_expr (t1, 0);
+   hashval_t h2 = iterative_hash_expr (t2, 0);
+   if (h1 != h2)
+ return h1 < h2 ? -1 : 1;
+   break;
+  }
+
+case SSA_NAME:
+  cmp = compare_tree (SSA_NAME_VAR (t1), SSA_NAME_VAR (t2));
+  if (cmp != 0)
+   return cmp;
+
+  if (SSA_NAME_VERSION (t1) != SSA_NAME_VERSION (t2))
+   return SSA_NAME_VERSION (t1) < SSA_NAME_VERSION (t2) ? -1 : 1;
+  break;
+
+default:
+  tclass = TREE_CODE_CLASS (code);
+
+  /* For var-decl, we could compare t

[PATCH GCC][2/6]Factor out code pruning runtime alias checks

2017-05-23 Thread Bin Cheng
Hi,
This is the second patch in the set, it factors out code pruning runtime alias
checks from file tree-vect-data-refs.c to tree-data-ref.c.  It also introduces
new interface prune_runtime_alias_test_list so that we can use it in pass like
loop distribution.
Bootstrap and test on x86_64 and AArch64, is it OK?

Thanks,
bin
2017-05-22  Bin Cheng  

* tree-vect-data-refs.c (Operator==, comp_dr_with_seg_len_pair):
Move from ...
* tree-data-ref.c (Operator==, comp_dr_with_seg_len_pair): To here.
* tree-vect-data-refs.c (vect_prune_runtime_alias_test_list): Factor
out code pruning runtime alias checks.
* tree-data-ref.c (prune_runtime_alias_test_list): New function
factored out from above.
* tree-vectorizer.h (struct dr_with_seg_len, dr_with_seg_len_pair_t):
Move from ...
* tree-data-ref.h (struct dr_with_seg_len, dr_with_seg_len_pair_t):
... to here.
(prune_runtime_alias_test_list): New decalaration.From 030967c7b1d6af95f2becd1159dad7823cdaecd9 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Tue, 23 May 2017 17:11:19 +0100
Subject: [PATCH 2/6] prune-runtime-alias-check-20170516

---
 gcc/tree-data-ref.c   | 233 ++
 gcc/tree-data-ref.h   |  28 ++
 gcc/tree-vect-data-refs.c | 228 +
 gcc/tree-vectorizer.h |  28 --
 4 files changed, 263 insertions(+), 254 deletions(-)

diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index 2480f4e..a5f8c1c 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -1107,6 +1107,239 @@ create_data_ref (loop_p nest, loop_p loop, tree memref, 
gimple *stmt,
   return dr;
 }
 
+/* Operator == between two dr_with_seg_len objects.
+
+   This equality operator is used to make sure two data refs
+   are the same one so that we will consider to combine the
+   aliasing checks of those two pairs of data dependent data
+   refs.  */
+
+static bool
+operator == (const dr_with_seg_len& d1,
+const dr_with_seg_len& d2)
+{
+  return operand_equal_p (DR_BASE_ADDRESS (d1.dr),
+ DR_BASE_ADDRESS (d2.dr), 0)
+  && compare_tree (DR_OFFSET (d1.dr), DR_OFFSET (d2.dr)) == 0
+  && compare_tree (DR_INIT (d1.dr), DR_INIT (d2.dr)) == 0
+  && compare_tree (d1.seg_len, d2.seg_len) == 0;
+}
+
+/* Comparison function for sorting objects of dr_with_seg_len_pair_t
+   so that we can combine aliasing checks in one scan.  */
+
+static int
+comp_dr_with_seg_len_pair (const void *pa_, const void *pb_)
+{
+  const dr_with_seg_len_pair_t* pa = (const dr_with_seg_len_pair_t *) pa_;
+  const dr_with_seg_len_pair_t* pb = (const dr_with_seg_len_pair_t *) pb_;
+  const dr_with_seg_len &a1 = pa->first, &a2 = pa->second;
+  const dr_with_seg_len &b1 = pb->first, &b2 = pb->second;
+
+  /* For DR pairs (a, b) and (c, d), we only consider to merge the alias checks
+ if a and c have the same basic address snd step, and b and d have the same
+ address and step.  Therefore, if any a&c or b&d don't have the same 
address
+ and step, we don't care the order of those two pairs after sorting.  */
+  int comp_res;
+
+  if ((comp_res = compare_tree (DR_BASE_ADDRESS (a1.dr),
+   DR_BASE_ADDRESS (b1.dr))) != 0)
+return comp_res;
+  if ((comp_res = compare_tree (DR_BASE_ADDRESS (a2.dr),
+   DR_BASE_ADDRESS (b2.dr))) != 0)
+return comp_res;
+  if ((comp_res = compare_tree (DR_STEP (a1.dr), DR_STEP (b1.dr))) != 0)
+return comp_res;
+  if ((comp_res = compare_tree (DR_STEP (a2.dr), DR_STEP (b2.dr))) != 0)
+return comp_res;
+  if ((comp_res = compare_tree (DR_OFFSET (a1.dr), DR_OFFSET (b1.dr))) != 0)
+return comp_res;
+  if ((comp_res = compare_tree (DR_INIT (a1.dr), DR_INIT (b1.dr))) != 0)
+return comp_res;
+  if ((comp_res = compare_tree (DR_OFFSET (a2.dr), DR_OFFSET (b2.dr))) != 0)
+return comp_res;
+  if ((comp_res = compare_tree (DR_INIT (a2.dr), DR_INIT (b2.dr))) != 0)
+return comp_res;
+
+  return 0;
+}
+
+/* Merge alias checks recorded in ALIAS_PAIRS and remove redundant ones.
+   FACTOR is number of iterations that each data reference is accessed.
+
+   Basically, for each pair of dependent data refs store_ptr_0 & load_ptr_0,
+   we create an expression:
+
+   ((store_ptr_0 + store_segment_length_0) <= load_ptr_0)
+   || (load_ptr_0 + load_segment_length_0) <= store_ptr_0))
+
+   for aliasing checks.  However, in some cases we can decrease the number
+   of checks by combining two checks into one.  For example, suppose we have
+   another pair of data refs store_ptr_0 & load_ptr_1, and if the following
+   condition is satisfied:
+
+   load_ptr_0 < load_ptr_1  &&
+   load_ptr_1 - load_ptr_0 - load_segment_length_0 < store_segment_length_0
+
+   (this condition means, in each iteration of vectorized loop, the accessed
+   memory of store_ptr_0 cannot be between the memory 

[PATCH GCC][4/6]Relax minimal segment length of DR_B for merging alias check

2017-05-23 Thread Bin Cheng
Hi,
As commented in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80815#c1,
We can relax minimal segment length of DR_B for merging.  With this change,
the new test can be improved to only one alias check.  Note the
condition is still accurate after this patch, it won't introduce false
alias.
Bootstrap and test on x86_64 and AArch64, is it OK?

2017-05-22  Bin Cheng  

* tree-data-ref.c (prune_runtime_alias_test_list): Relax minimal
segment length for dr_b.

gcc/testsuite/ChangeLog
2017-05-22  Bin Cheng  

* gcc.dg/vect/pr80815-3.c: New test.From 8a570eb93cfaff6fcecdce6b91dd665a81d38e29 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Mon, 22 May 2017 11:34:18 +0100
Subject: [PATCH 4/6] minimal-seg-length-for-dr_b-20170516.txt

---
 gcc/testsuite/gcc.dg/vect/pr80815-3.c | 45 +++
 gcc/tree-data-ref.c   |  5 +++-
 2 files changed, 49 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr80815-3.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr80815-3.c 
b/gcc/testsuite/gcc.dg/vect/pr80815-3.c
new file mode 100644
index 000..dae01fa
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr80815-3.c
@@ -0,0 +1,45 @@
+/* { dg-require-effective-target vect_int } */
+
+#include "tree-vect.h"
+int arr[2048];
+int res[100] = { 2148, 2146, 2144, 2142, 2140, 2138, 2136, 2134, 2132, 2130,
+2128, 2126, 2124, 2122, 2120, 2118, 2116, 2114, 2112, 2110,
+2108, 2106, 2104, 2102, 2100, 2098, 2096, 2094, 2092, 2090,
+2088, 2086, 2084, 2082, 2080, 2078, 2076, 2074, 2072, 2070,
+2068, 2066, 2064, 2062, 2060, 2058, 2056, 2054, 3078, 2050};
+
+__attribute__ ((noinline)) int
+foo (int *a, int *b, int len)
+{
+  int i;
+  int *a1 = a;
+  int *a0 = a1 - 4;
+  for (i = 0; i < len; i++)
+{
+  *b = *a0 + *a1;
+  b--;
+  a0++;
+  a1++;
+}
+  return 0;
+}
+
+int main (void)
+{
+  int *a = &arr[1027];
+  int *b = &arr[1024];
+
+  int i;
+  for (i = 0; i < 2048; i++)
+arr[i] = i;
+
+  foo (a, b, 50);
+
+  for (i = 975; i < 1025; i++)
+if (arr[i] != res[i - 975])
+  abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "improved number of alias checks from \[0-9\]* 
to 1" "vect" } } */
diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index f0799d9..5d9054d 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -1286,7 +1286,10 @@ prune_runtime_alias_test_list 
(vec *alias_pairs,
min_seg_len_b = 0 - min_seg_len_b;
}
  else
-   min_seg_len_b = factor;
+   {
+ min_seg_len_b = factor;
+ min_seg_len_b *= absu_hwi (tree_to_shwi (DR_STEP (dr_b1->dr)));
+   }
 
  /* Now we try to merge alias check dr_a1 & dr_b and dr_a2 & dr_b.
 
-- 
1.9.1



[PATCH GCC][3/6]Fix PR80815 by handling negative DR_STEPs in runtime alias check

2017-05-23 Thread Bin Cheng
Hi,
This patch fixes PR80815 in which negative DR_STEP is mis-handled.  It does 
below:
  1) Reorder three cases in which we merge alias checks, in order like:
   old_case_A   ->  new_case_B
   old_case_B   ->  new_case_C (and removed as described in 3))
   old_case_C   ->  new_case_A
 This is because new_case_1 is accurate check that doesn't introduce false 
alias,
 while the other two does.
  2) Explicitly comment that Case A and B combined together can merge all 
dr_a1&dr_b and
 dr_a2&dr_b pairs if SEGMENT_LEN_A is constant.  We still keep Case A/B 
separately
 for clarity, also because Case A doesn't introduces false alias, while B 
does.
  3) Remove the old_case_C:
  /* Generally the new segment length is the maximum of the
 left segment size and the right segment size plus the distance.
 ???  We can also build tree MAX_EXPR here but it's not clear this
 is profitable.  */
  else if (tree_fits_uhwi_p (dr_a1->seg_len)
   && tree_fits_uhwi_p (dr_a2->seg_len))
 This check is actually covered by A and B.
  4) Handle negative DR_STEPs explicitly.
  5) Add two tests illustrating wrong alias checking issue.

Bootstrap and test on x86_64 and AArch64, is it OK?

BTW, I tried to keep the change as minimal as possible, but ended up with quite
amount refactoring because the old cases are somehow duplicated/complicated,
and not fit to negative DR_STEPs handling.

Thanks,
bin
2017-05-22  Bin Cheng  

PR tree-optimization/80815
* tree-data-ref.c (prune_runtime_alias_test_list): Simplify condition
for merging runtime alias checks.  Handle negative DR_STEPs.

gcc/testsuite/ChangeLog
2017-05-22  Bin Cheng  

PR tree-optimization/80815
* gcc.dg/vect/pr80815-1.c: New test.
* gcc.dg/vect/pr80815-2.c: New test.From af1dda073f8bd1a371cd1207fcaf467be0dc1106 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Fri, 19 May 2017 18:46:14 +0100
Subject: [PATCH 3/6] negative-step-alias-check-20170516.txt

---
 gcc/testsuite/gcc.dg/vect/pr80815-1.c |  38 ++
 gcc/testsuite/gcc.dg/vect/pr80815-2.c |  46 
 gcc/tree-data-ref.c   | 131 +++---
 3 files changed, 175 insertions(+), 40 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr80815-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr80815-2.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr80815-1.c 
b/gcc/testsuite/gcc.dg/vect/pr80815-1.c
new file mode 100644
index 000..98c06c0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr80815-1.c
@@ -0,0 +1,38 @@
+/* { dg-require-effective-target vect_int } */
+
+#include "tree-vect.h"
+int arr[2048];
+
+__attribute__ ((noinline)) int
+foo (int *a, int *b)
+{
+  int i;
+  int *a1 = a;
+  int *a0 = a1 - 512;
+  for (i = 0; i < 500; i++)
+{
+  *b = *a0 + *a1;
+  b++;
+  a0--;
+  a1--;
+}
+  return 0;
+}
+
+int main (void)
+{
+  int *a = &arr[1027];
+  int *b = &arr[1024];
+
+  int i;
+  for (i = 0; i < 2048; i++)
+arr[i] = i;
+
+  foo (a, b);
+
+  if (arr[1026] != 2053 || arr[1027] != 2054)
+abort ();
+
+  return 0;
+}
+
diff --git a/gcc/testsuite/gcc.dg/vect/pr80815-2.c 
b/gcc/testsuite/gcc.dg/vect/pr80815-2.c
new file mode 100644
index 000..83557da
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr80815-2.c
@@ -0,0 +1,46 @@
+/* { dg-require-effective-target vect_int } */
+
+#include "tree-vect.h"
+int arr[2048];
+int res[100] = { 13198, 13224, 12735, 12760, 12270, 12294,
+11803, 11826, 11334, 11356, 10863, 10884,
+10390, 10410, 9915, 9934, 9438, 9456,
+8959, 8976, 8478, 8494, 7995, 8010,
+7510, 7524, 7023, 7036, 6534, 6546,
+6043, 6054, 5550, 5560, 5055, 5064,
+4558, 4566, 4059, 4066, 3558, 3564,
+3055, 3060, 2550, 2554, 2043, 0};
+
+__attribute__ ((noinline)) int
+foo (int *a, int *b)
+{
+  int i;
+  int *a1 = a;
+  int *a0 = a1 - 512;
+  for (i = 0; i < 50; i++)
+{
+  *b = *a0 + *a1;
+  b--;
+  a0--;
+  a1--;
+}
+  return 0;
+}
+
+int main (void)
+{
+  int *a = &arr[1024];
+  int *b = &arr[1022];
+
+  int i;
+  for (i = 0; i < 2048; i++)
+arr[i] = i;
+
+  foo (a, b);
+
+  for (i = 973; i < 1020; i++)
+if (arr[i] != res[i - 973])
+  abort ();
+
+  return 0;
+}
diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index a5f8c1c..f0799d9 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -1259,6 +1259,15 @@ prune_runtime_alias_test_list 
(vec *alias_pairs,
  != tree_int_cst_compare (DR_STEP (dr_a2->dr), size_zero_node))
continue;
 
+ bool neg_step
+   = (tree_int_cst_compare (DR_STEP (dr_a1->dr), size_zero_node) < 0);
+
+ /* DR_A1 and DR_A2 must have the same step if it's negative.  */
+ if (neg_step
+ && tree_int_cst_compare (DR_STEP (dr_a1->dr),
+ 

[PATCH GCC][5/6]Change parameters to make functions independent to vect structure

2017-05-23 Thread Bin Cheng
Hi,
This is a simple patch changing parameter to below functions:
  create_intersect_range_checks_index
  create_intersect_range_checks
as well as updating calls to these functions.  This is to make
them independent to vectorizer's data structure.
Bootstrap and test on x86_64 and AArch64, is it OK?

Thanks,
bin
2017-05-22  Bin Cheng  

* tree-vect-loop-manip.c (create_intersect_range_checks_index): Pass
in parameter loop, rather than loop_vinfo.
(create_intersect_range_checks): Ditto.
(vect_create_cond_for_alias_checks): Update call to above function.From a2c40984d2fb57e60f4d04f22c86b4dbbced425b Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Mon, 22 May 2017 12:03:45 +0100
Subject: [PATCH 5/6] parameter-20170516.txt

---
 gcc/tree-vect-loop-manip.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index f48336b..62b1fe8 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -2044,11 +2044,11 @@ vect_create_cond_for_align_checks (loop_vec_info 
loop_vinfo,
 *cond_expr = part_cond_expr;
 }
 
-/* Given two data references and segment lengths described by DR_A and DR_B,
-   create expression checking if the two addresses ranges intersect with
-   each other based on index of the two addresses.  This can only be done
-   if DR_A and DR_B referring to the same (array) object and the index is
-   the only difference.  For example:
+/* Given LOOP's two data references and segment lengths described by DR_A
+   and DR_B, create expression checking if the two addresses ranges intersect
+   with each other based on index of the two addresses.  This can only be
+   done if DR_A and DR_B referring to the same (array) object and the index
+   is the only difference.  For example:
 
DR_A   DR_B
   data-ref arr[i] arr[j]
@@ -2070,7 +2070,7 @@ vect_create_cond_for_align_checks (loop_vec_info 
loop_vinfo,
Note evolution step of index needs to be considered in comparison.  */
 
 static bool
-create_intersect_range_checks_index (loop_vec_info loop_vinfo, tree *cond_expr,
+create_intersect_range_checks_index (struct loop *loop, tree *cond_expr,
 const dr_with_seg_len& dr_a,
 const dr_with_seg_len& dr_b)
 {
@@ -2109,7 +2109,6 @@ create_intersect_range_checks_index (loop_vec_info 
loop_vinfo, tree *cond_expr,
   unsigned HOST_WIDE_INT niter_len2 = (seg_len2 + abs_step - 1) / abs_step;
 
   unsigned int i;
-  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   for (i = 0; i < DR_NUM_DIMENSIONS (dr_a.dr); i++)
 {
   tree access1 = DR_ACCESS_FN (dr_a.dr, i);
@@ -2186,12 +2185,12 @@ create_intersect_range_checks_index (loop_vec_info 
loop_vinfo, tree *cond_expr,
  || (DR_B_addr_0 + DER_B_segment_length_0) <= DR_A_addr_0))  */
 
 static void
-create_intersect_range_checks (loop_vec_info loop_vinfo, tree *cond_expr,
+create_intersect_range_checks (struct loop *loop, tree *cond_expr,
   const dr_with_seg_len& dr_a,
   const dr_with_seg_len& dr_b)
 {
   *cond_expr = NULL_TREE;
-  if (create_intersect_range_checks_index (loop_vinfo, cond_expr, dr_a, dr_b))
+  if (create_intersect_range_checks_index (loop, cond_expr, dr_a, dr_b))
 return;
 
   tree segment_length_a = dr_a.seg_len;
@@ -2263,6 +2262,7 @@ vect_create_cond_for_alias_checks (loop_vec_info 
loop_vinfo, tree * cond_expr)
   if (comp_alias_ddrs.is_empty ())
 return;
 
+  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   for (size_t i = 0, s = comp_alias_ddrs.length (); i < s; ++i)
 {
   const dr_with_seg_len& dr_a = comp_alias_ddrs[i].first;
@@ -2279,7 +2279,7 @@ vect_create_cond_for_alias_checks (loop_vec_info 
loop_vinfo, tree * cond_expr)
}
 
   /* Create condition expression for each pair data references.  */
-  create_intersect_range_checks (loop_vinfo, &part_cond_expr, dr_a, dr_b);
+  create_intersect_range_checks (loop, &part_cond_expr, dr_a, dr_b);
   if (*cond_expr)
*cond_expr = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
  *cond_expr, part_cond_expr);
-- 
1.9.1



[PATCH GCC][6/6]Factor out code generating runtime alias checks

2017-05-23 Thread Bin Cheng
Hi,
This patch factors out code generating runtime alias check from 
tree-vect-data-refs.c
to tree-data-ref.c, as well as introduces new interface 
create_runtime_alias_checks for
later use.
Bootstrap and test on x86_64 and AArch64, is it OK?

Thanks,
bin
2017-05-22  Bin Cheng  

* tree-vect-loop-manip.c (create_intersect_range_checks_index)
(create_intersect_range_checks): Move from ...
* tree-data-ref.c (create_intersect_range_checks_index)
(create_intersect_range_checks): ... to here.
(create_runtime_alias_checks): New function factored from ...
* tree-vect-loop-manip.c (vect_create_cond_for_alias_checks): ...
here.  Call above function.
* tree-data-ref.h (create_runtime_alias_checks): New function.From 658a376b8ba5c73596f59c963077815f75d0e4db Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Tue, 23 May 2017 17:13:05 +0100
Subject: [PATCH 6/6] create-runtime-alias-check-20170516.txt

---
 gcc/tree-data-ref.c| 225 +
 gcc/tree-data-ref.h|   2 +
 gcc/tree-vect-loop-manip.c | 216 +--
 3 files changed, 229 insertions(+), 214 deletions(-)

diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index 5d9054d..826d064 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -1394,6 +1394,231 @@ prune_runtime_alias_test_list 
(vec *alias_pairs,
 }
 }
 
+/* Given LOOP's two data references and segment lengths described by DR_A
+   and DR_B, create expression checking if the two addresses ranges intersect
+   with each other based on index of the two addresses.  This can only be
+   done if DR_A and DR_B referring to the same (array) object and the index
+   is the only difference.  For example:
+
+   DR_A   DR_B
+  data-ref arr[i] arr[j]
+  base_object  arrarr
+  index{i_0, +, 1}_loop   {j_0, +, 1}_loop
+
+   The addresses and their index are like:
+
+|<- ADDR_A->|  |<- ADDR_B->|
+ --->
+|   |   |   |   |  |   |   |   |   |
+ --->
+i_0 ... i_0+4  j_0 ... j_0+4
+
+   We can create expression based on index rather than address:
+
+ (i_0 + 4 < j_0 || j_0 + 4 < i_0)
+
+   Note evolution step of index needs to be considered in comparison.  */
+
+static bool
+create_intersect_range_checks_index (struct loop *loop, tree *cond_expr,
+const dr_with_seg_len& dr_a,
+const dr_with_seg_len& dr_b)
+{
+  if (integer_zerop (DR_STEP (dr_a.dr))
+  || integer_zerop (DR_STEP (dr_b.dr))
+  || DR_NUM_DIMENSIONS (dr_a.dr) != DR_NUM_DIMENSIONS (dr_b.dr))
+return false;
+
+  if (!tree_fits_uhwi_p (dr_a.seg_len) || !tree_fits_uhwi_p (dr_b.seg_len))
+return false;
+
+  if (!tree_fits_shwi_p (DR_STEP (dr_a.dr)))
+return false;
+
+  if (!operand_equal_p (DR_BASE_OBJECT (dr_a.dr), DR_BASE_OBJECT (dr_b.dr), 0))
+return false;
+
+  if (!operand_equal_p (DR_STEP (dr_a.dr), DR_STEP (dr_b.dr), 0))
+return false;
+
+  gcc_assert (TREE_CODE (DR_STEP (dr_a.dr)) == INTEGER_CST);
+
+  bool neg_step = tree_int_cst_compare (DR_STEP (dr_a.dr), size_zero_node) < 0;
+  unsigned HOST_WIDE_INT abs_step
+= absu_hwi (tree_to_shwi (DR_STEP (dr_a.dr)));
+
+  unsigned HOST_WIDE_INT seg_len1 = tree_to_uhwi (dr_a.seg_len);
+  unsigned HOST_WIDE_INT seg_len2 = tree_to_uhwi (dr_b.seg_len);
+  /* Infer the number of iterations with which the memory segment is accessed
+ by DR.  In other words, alias is checked if memory segment accessed by
+ DR_A in some iterations intersect with memory segment accessed by DR_B
+ in the same amount iterations.
+ Note segnment length is a linear function of number of iterations with
+ DR_STEP as the coefficient.  */
+  unsigned HOST_WIDE_INT niter_len1 = (seg_len1 + abs_step - 1) / abs_step;
+  unsigned HOST_WIDE_INT niter_len2 = (seg_len2 + abs_step - 1) / abs_step;
+
+  unsigned int i;
+  for (i = 0; i < DR_NUM_DIMENSIONS (dr_a.dr); i++)
+{
+  tree access1 = DR_ACCESS_FN (dr_a.dr, i);
+  tree access2 = DR_ACCESS_FN (dr_b.dr, i);
+  /* Two indices must be the same if they are not scev, or not scev wrto
+current loop being vecorized.  */
+  if (TREE_CODE (access1) != POLYNOMIAL_CHREC
+ || TREE_CODE (access2) != POLYNOMIAL_CHREC
+ || CHREC_VARIABLE (access1) != (unsigned)loop->num
+ || CHREC_VARIABLE (access2) != (unsigned)loop->num)
+   {
+ if (operand_equal_p (access1, access2, 0))
+   continue;
+
+ return false;
+   }
+  /* The two indices must have the same step.  */
+  if (!operand_equal_p (CHREC_RIGHT (access1), CHREC_RIGHT (acce

Ipa function summary pass

2017-05-23 Thread Jan Hubicka
Hi,
this patch finishes the breakup of ipa-inline and function analysis.
The analysis is now done by separate pass and I will work on cleaning
up the interfaces now.

Honza

* cgraphunit.c (symbol_table::process_new_functions): Update.
* ipa-fnsummary.c (pass_data_inline_parameters): Remove.
(inline_generate_summary): Rename to ...
(ipa_fn_summary_generate): ... this one.
(inline_read_summary): Rename to ...
(ipa_fn_summary_read): ... this one.
(inline_write_summary): Rename to ...
(ipa_fn_summary_write): ... this one.
(inline_free_summary): Rename to ...
(ipa_free_fn_summary): ... this one.
(pass_data_local_fn_summary, pass_local_fn_summary,
make_pass_local_fn_summary, pass_data_ipa_free_fn_summary,
pass_ipa_free_fn_summary, make_pass_ipa_free_fn_summary,
pass_data_ipa_fn_summary, pass_ipa_fn_summary,
make_pass_ipa_fn_summary): New.
* ipa-fnsummary.h (inline_generate_summary, inline_read_summary,
inline_write_summary, inline_free_summary): Remove.
(ipa_free_fn_summary) : New.
* ipa-inline.c (ipa_inline): Update.
(pass_ipa_inline): Do not generate summaries.
* ipa.c (pass_data_ipa_free_fn_summary, pass_ipa_free_fn_summary):
Remove.
* passes.def: Replace pass_inline_parameters by pass_local_fn_summary
and add pass_ipa_fn_summary.
* tree-pass.h (make_pass_ipa_fn_summary, make_pass_local_fn_summary):
New.
(make_pass_inline_parameters): Remove.

* lto.c (do_whole_program_analysis): Replace inline_free_summary
by ipa_free_fn_summary.

* gcc.dg/ipa/ctor-empty-1.c: Update template.
* gcc.dg/ipa/inline-5.c: Likewise.
* gfortran.dg/pr48636.f90: Likewise.
Index: cgraphunit.c
===
--- cgraphunit.c(revision 248365)
+++ cgraphunit.c(working copy)
@@ -339,7 +339,7 @@ symbol_table::process_new_functions (voi
 and splitting.  This is redundant for functions added late.
 Just throw away whatever it did.  */
  if (!summaried_computed)
-   inline_free_summary ();
+   ipa_free_fn_summary ();
}
  else if (ipa_fn_summaries != NULL)
compute_fn_summary (node, true);
Index: ipa-fnsummary.c
===
--- ipa-fnsummary.c (revision 248366)
+++ ipa-fnsummary.c (working copy)
@@ -2504,46 +2504,6 @@ compute_fn_summary_for_current (void)
   return 0;
 }
 
-namespace {
-
-const pass_data pass_data_inline_parameters =
-{
-  GIMPLE_PASS, /* type */
-  "inline_param", /* name */
-  OPTGROUP_INLINE, /* optinfo_flags */
-  TV_INLINE_PARAMETERS, /* tv_id */
-  0, /* properties_required */
-  0, /* properties_provided */
-  0, /* properties_destroyed */
-  0, /* todo_flags_start */
-  0, /* todo_flags_finish */
-};
-
-class pass_inline_parameters : public gimple_opt_pass
-{
-public:
-  pass_inline_parameters (gcc::context *ctxt)
-: gimple_opt_pass (pass_data_inline_parameters, ctxt)
-  {}
-
-  /* opt_pass methods: */
-  opt_pass * clone () { return new pass_inline_parameters (m_ctxt); }
-  virtual unsigned int execute (function *)
-{
-  return compute_fn_summary_for_current ();
-}
-
-}; // class pass_inline_parameters
-
-} // anon namespace
-
-gimple_opt_pass *
-make_pass_inline_parameters (gcc::context *ctxt)
-{
-  return new pass_inline_parameters (ctxt);
-}
-
-
 /* Estimate benefit devirtualizing indirect edge IE, provided KNOWN_VALS,
KNOWN_CONTEXTS and KNOWN_AGGS.  */
 
@@ -3207,8 +3167,8 @@ ipa_fn_summary_t::insert (struct cgraph_
 
 /* Note function body size.  */
 
-void
-inline_generate_summary (void)
+static void
+ipa_fn_summary_generate (void)
 {
   struct cgraph_node *node;
 
@@ -3226,7 +3186,7 @@ inline_generate_summary (void)
   ipa_fn_summaries->enable_insertion_hook ();
 
   ipa_register_cgraph_hooks ();
-  inline_free_summary ();
+  ipa_free_fn_summary ();
 
   FOR_EACH_DEFINED_FUNCTION (node)
 if (!node->alias)
@@ -3358,8 +3318,8 @@ inline_read_section (struct lto_file_dec
and inliner, so when ipa-cp is active, we don't need to write them
twice.  */
 
-void
-inline_read_summary (void)
+static void
+ipa_fn_summary_read (void)
 {
   struct lto_file_decl_data **file_data_vec = lto_get_file_decl_data ();
   struct lto_file_decl_data *file_data;
@@ -3419,8 +3379,8 @@ write_ipa_call_summary (struct output_bl
Jump functions are shared among ipa-cp and inliner, so when ipa-cp is
active, we don't need to write them twice.  */
 
-void
-inline_write_summary (void)
+static void
+ipa_fn_summary_write (void)
 {
   struct output_block *ob = create_output_block (LTO_section_ipa_fn_summary);
   lto_symtab_encoder_t encoder = ob->decl_state->symtab_node_encoder;
@@ -3510,7 +3470,7 @@ inline_write_summary (void)
 /* 

[C++ PATCH] fix 80866

2017-05-23 Thread Nathan Sidwell
This patch fixes 80866, a regression I introduced.  The parser stashes 
template-ids when tentatively parsing, such ids can contain lookups.  We 
need to mark the lookup as 'kept', otherwise the first parse could say 
'I'm done with this' and recycle it prematurely.


Applied to trunk.

nathan
--
Nathan Sidwell
2017-05-23  Nathan Sidwell  

	PR c++/80866
	* parser.c (cp_parser_template_id): Keep the lookup when stashing
	the template_id.

	PR c++/80866
	* g++.dg/parse/pr80866.C: New.

Index: cp/parser.c
===
--- cp/parser.c	(revision 248372)
+++ cp/parser.c	(working copy)
@@ -15570,6 +15570,11 @@ cp_parser_template_id (cp_parser *parser
 	= make_location (token->location, token->location, finish_loc);
   token->location = combined_loc;
 
+  /* We must mark the lookup as kept, so we don't throw it away on
+	 the first parse.  */
+  if (is_overloaded_fn (template_id))
+	lookup_keep (get_fns (template_id), true);
+
   /* Retrieve any deferred checks.  Do not pop this access checks yet
 	 so the memory will not be reclaimed during token replacing below.  */
   token->u.tree_check_value = ggc_cleared_alloc ();
Index: testsuite/g++.dg/parse/pr80866.C
===
--- testsuite/g++.dg/parse/pr80866.C	(revision 0)
+++ testsuite/g++.dg/parse/pr80866.C	(working copy)
@@ -0,0 +1,10 @@
+// { dg-do compile { target c++11 } }
+// PR 80866 recycled a lookup too soon.
+
+void pow();
+namespace math {
+  template  void pow(T);
+}
+using namespace math;
+
+decltype(pow<>(0)) z();


Re: SSA range class and removal of VR_ANTI_RANGEs

2017-05-23 Thread Martin Sebor

--- /dev/null
+++ b/gcc/range.h
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#ifndef GCC_RANGE_H
+#define GCC_RANGE_H
+#define MAX_RANGES 6
+
+typedef class irange *irange_p;
+enum irange_type { RANGE_PLAIN, RANGE_INVERT };
+
+class GTY(()) irange
+{
+ private:
+  bool overflow;
+  size_t n;


Since space usage is of concern, defining the biggest member first
and using a smaller type should help as the first step.  Say:

  tree type;
  wide_int bounds[MAX_RANGES];
  unsigned char n: 7;   // 127 > MAX_RANGES
  bool overflow: 1 ;

There may be other tricks to play with the array although turning
it into a flexible array member will defeat the bit-field gain.


+  void prepend (wide_int x, wide_int y);
+  void append (wide_int x, wide_int y);
+  void remove (unsigned i, unsigned j);
+  void canonicalize ();
+  /* This is stupid.  These two should be private, but the GTY
+ machinery can't look inside an irange.  */
+ public:
+  tree type;
+  wide_int bounds[MAX_RANGES];
+
+public:

...

+  void Union (wide_int x, wide_int y);
+  bool Union (const irange &r);
+  bool Union (const irange &r1, const irange &r2);


Do we really want methods starting with capital letters?
I understand why you can't use union, but I don't think we use CamelCase
anywhere.


One way to get around the problem is to make the functions non-
members and rename them in the process:

  irange& irange_union (irange &, wide_int, wide_int);
  irange& irange_union (irange &, const irange &);
  irange& irange_union (irange &, const irange &, const irange &);

Making them friends of class irange will let them access private
members.  Returning irange& makes chaining multiple calls possible.

I think making interfaces thin and providing operations as non
member functions is also considered the preferable design these
days.

Martin


[PATCH] Fix a SPARC -mcbcond compare-and-branch out of range failure.

2017-05-23 Thread Sheldon Lobo
The compare-and-branch distance was calculated incorrectly. This occurred
because a -mflat sibcall returned an instruction count of 2 instead of 3.
Fix sparc.md to match the output_sibcall() code in sparc.c.

* config/sparc/sparc.md: Set the number of instructions correctly
for -mflat sibcalls, to match output_sibcall().
---
 gcc/ChangeLog |5 +
 gcc/config/sparc/sparc.md |3 ++-
 2 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 7ed62bc..5836ac1 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2017-05-23  Sheldon Lobo  
+
+   * config/sparc/sparc.md: Set the number of instructions correctly
+   for -mflat sibcalls, to match output_sibcall().
+
 2017-05-18  Michael Meissner  
 
PR target/80510
diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md
index 29a8bcf..3f970f7 100644
--- a/gcc/config/sparc/sparc.md
+++ b/gcc/config/sparc/sparc.md
@@ -338,7 +338,8 @@
 (const_int 2)
 (const_int 1))
 (eq_attr "type" "sibcall")
-  (if_then_else (eq_attr "leaf_function" "true")
+  (if_then_else (ior (eq_attr "leaf_function" "true")
+ (eq_attr "flat" "true"))
 (if_then_else (eq_attr "empty_delay_slot" "true")
   (const_int 3)
   (const_int 2))
-- 
1.7.1



[PATCH] Fix PR middle-end/80823, ICE: verify_flow_info failed

2017-05-23 Thread Peter Bergner
The fix for PR80775 included a thinko bug that caused us to skip some
case label statements.  This leads to problems for test cases where we
have multiple case labels that point to the same unreachable block, but
are not mergeable since they don't have consecutive case values.
This leads to a problem, because we remove the unreachable block when
handling this first case label, but then we have a dangling reference
to that block with the skipped case label.  The fix is to remove the
unwanted increment, so that we handle the next case label and end up
removing it too.

This passed bootstrap and regtesting on powerpc64le-linux with no
regressions.  Is this ok for trunk?

Peter

gcc/
PR middle-end/80823
* tree-cfg.c (group_case_labels_stmt): Delete increment of "i";

gcc/testsuite/
PR middle-end/80823
* gcc.dg/pr80823.c: New test.

Index: gcc/tree-cfg.c
===
--- gcc/tree-cfg.c  (revision 248375)
+++ gcc/tree-cfg.c  (working copy)
@@ -1726,7 +1726,6 @@ group_case_labels_stmt (gswitch *stmt)
remove_edge_and_dominated_blocks (base_edge);
  gimple_switch_set_label (stmt, base_index, NULL_TREE);
  new_size--;
- i++;
}
 }
 
Index: gcc/testsuite/gcc.dg/pr80823.c
===
--- gcc/testsuite/gcc.dg/pr80823.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/pr80823.c  (working copy)
@@ -0,0 +1,23 @@
+/* PR middle-end/80823 ICE: verify_flow_info failed  */
+/* { dg-do compile }  */
+/* { dg-options "-O3" }  */
+
+int a, c;
+int b[1];
+static inline int
+fn1() {
+  switch (a)
+  case 0:
+  case 2:
+return 1;
+  return 0;
+}
+void fn2() {
+  int i;
+  for (;; ++i) {
+c = b[i];
+int d = !fn1();
+if (d)
+  __asm__("");
+  }
+}



Re: [PATCH GCC][3/6]Fix PR80815 by handling negative DR_STEPs in runtime alias check

2017-05-23 Thread Richard Sandiford
AIUI, the reason the old code mishandled negative steps was that the
associated segment lengths were stored as sizetype and so looked like
big unsigned values.  Those values therefore satisfied tree_fits_uhwi_p
even though they were semantically negative.  Is that right?

Assuming yes, and quoting the change as a context diff...

> diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
> index a5f8c1c..f0799d9 100644
> *** a/gcc/tree-data-ref.c
> --- b/gcc/tree-data-ref.c
> ***
> *** 1259,1264 
> --- 1259,1273 
> != tree_int_cst_compare (DR_STEP (dr_a2->dr), size_zero_node))
>   continue;
>  
> +   bool neg_step
> + = (tree_int_cst_compare (DR_STEP (dr_a1->dr), size_zero_node) < 0);
> + 
> +   /* DR_A1 and DR_A2 must have the same step if it's negative.  */
> +   if (neg_step
> +   && tree_int_cst_compare (DR_STEP (dr_a1->dr),
> +DR_STEP (dr_a2->dr)) != 0)
> + continue;
> +

[Why do they need to be the same step?]

> /* Make sure dr_a1 starts left of dr_a2.  */
> if (tree_int_cst_lt (DR_INIT (dr_a2->dr), DR_INIT (dr_a1->dr)))
>   std::swap (*dr_a1, *dr_a2);
> ***
> *** 1266,1325 
> bool do_remove = false;
> unsigned HOST_WIDE_INT diff
>   = (tree_to_shwi (DR_INIT (dr_a2->dr))
> !- tree_to_shwi (DR_INIT (dr_a1->dr)));
>  
> !   /* If the left segment does not extend beyond the start of the
> !  right segment the new segment length is that of the right
> !  plus the segment distance.  */
> !   if (tree_fits_uhwi_p (dr_a1->seg_len)
> !   && compare_tree_int (dr_a1->seg_len, diff) <= 0)
>   {
> !   dr_a1->seg_len = size_binop (PLUS_EXPR, dr_a2->seg_len,
> !size_int (diff));
> !   do_remove = true;
>   }
> !   /* Generally the new segment length is the maximum of the
> !  left segment size and the right segment size plus the distance.
> !  ???  We can also build tree MAX_EXPR here but it's not clear this
> !  is profitable.  */
> !   else if (tree_fits_uhwi_p (dr_a1->seg_len)
> !&& tree_fits_uhwi_p (dr_a2->seg_len))
> ! {
> !   unsigned HOST_WIDE_INT seg_len_a1 = tree_to_uhwi (dr_a1->seg_len);
> !   unsigned HOST_WIDE_INT seg_len_a2 = tree_to_uhwi (dr_a2->seg_len);
> !   dr_a1->seg_len = size_int (MAX (seg_len_a1, diff + seg_len_a2));
> !   do_remove = true;
> ! }
> !   /* Now we check if the following condition is satisfied:
>  
> !  DIFF - SEGMENT_LENGTH_A < SEGMENT_LENGTH_B
>  
> !  where DIFF = DR_A2_INIT - DR_A1_INIT.  However,
> !  SEGMENT_LENGTH_A or SEGMENT_LENGTH_B may not be constant so we
> !  have to make a best estimation.  We can get the minimum value
> !  of SEGMENT_LENGTH_B as a constant, represented by MIN_SEG_LEN_B,
> !  then either of the following two conditions can guarantee the
> !  one above:
>  
> !  1: DIFF <= MIN_SEG_LEN_B
> !  2: DIFF - SEGMENT_LENGTH_A < MIN_SEG_LEN_B  */
> !   else
>   {
> !   unsigned HOST_WIDE_INT min_seg_len_b
> ! = (tree_fits_uhwi_p (dr_b1->seg_len)
> !? tree_to_uhwi (dr_b1->seg_len)
> !: factor);
>  
> if (diff <= min_seg_len_b
> || (tree_fits_uhwi_p (dr_a1->seg_len)
> !   && diff - tree_to_uhwi (dr_a1->seg_len) < min_seg_len_b))
>   {
> !   dr_a1->seg_len = size_binop (PLUS_EXPR,
> !dr_a2->seg_len, size_int (diff));
> do_remove = true;
>   }
>   }
>  
> if (do_remove)
>   {
> if (dump_enabled_p ())
> --- 1275,1375 
> bool do_remove = false;
> unsigned HOST_WIDE_INT diff
>   = (tree_to_shwi (DR_INIT (dr_a2->dr))
> !- tree_to_shwi (DR_INIT (dr_a1->dr)));
> !   tree new_seg_len;
> !   unsigned HOST_WIDE_INT min_seg_len_b;
>  
> !   if (tree_fits_uhwi_p (dr_b1->seg_len))
>   {
> !   min_seg_len_b = tree_to_uhwi (dr_b1->seg_len);
> !   if (tree_int_cst_sign_bit (dr_b1->seg_len))
> ! min_seg_len_b = 0 - min_seg_len_b;
>   }
> !   else
> ! min_seg_len_b = factor;

...I'm not sure how safe this or the later neg_step handling is
for 16-bit and 32-bit sizetypes.  It might be better to use wide_int
instead, so that all arithmetic and comparisons happen in the precision
of sizetype.

The situation seems very close to RTL in the sense that the segment
length has no inherent sign and we have to instead get the sign from
the DR_STEP.  Unless we can change that of course...

>  
> !   /* Now we try to merge alias check dr_a1 & dr_b and dr_a2 & dr_b.
>  
> !  Case 

Re: [PATCH] PR c++/80544 strip cv-quals from cast results

2017-05-23 Thread Jonathan Wakely

On 18/05/17 13:44 -0400, Nathan Sidwell wrote:

On 05/18/2017 01:40 PM, Jonathan Wakely wrote:


+  /* A prvalue of non-class type is cv-unqualified.  */
+  if (TREE_CODE (type) != REFERENCE_TYPE && !CLASS_TYPE_P (type))
+type = cv_unqualified (type);
+


References can't be CV qualified, so the REFERENCE_TYPE check seems 
superfluous?


True. I did it because that matches the semantics of the cast
according to the standard, but it isn't needed here. Is it worth
keeping anyway, to avoid a redundant call to cv_unqualified?

It also occurs to me that checking for !CLASS_TYPE_P (type) isn't
needed in build_const_cast_1 because you can't const_cast to a class
type, only reference or pointer types.




Re: [PATCH] PR c++/80544 strip cv-quals from cast results

2017-05-23 Thread Nathan Sidwell

On 05/23/2017 01:56 PM, Jonathan Wakely wrote:

On 18/05/17 13:44 -0400, Nathan Sidwell wrote:


References can't be CV qualified, so the REFERENCE_TYPE check seems 
superfluous?


True. I did it because that matches the semantics of the cast
according to the standard, but it isn't needed here. Is it worth
keeping anyway, to avoid a redundant call to cv_unqualified?


I don't think it's worth checking.


It also occurs to me that checking for !CLASS_TYPE_P (type) isn't
needed in build_const_cast_1 because you can't const_cast to a class
type, only reference or pointer types.


true.

nathan

--
Nathan Sidwell


Re: [PATCH] PR c++/80544 strip cv-quals from cast results

2017-05-23 Thread Jonathan Wakely

On 19/05/17 15:14 -0400, Jason Merrill wrote:

On Thu, Apr 27, 2017 at 12:59 PM, Jonathan Wakely  wrote:

I also tried to add a warning like EDG's (see the PR) but it gave a
false positive for direct-list-init of scoped enums (P0138R2, r240449)
because that code goes through build_c_cast to perform the conversion,
so looks like a cast to my new warning.


The enum init code could strip the cv-quals when calling build_c_cast
to avoid the warning.


Thanks, that works. I don't think this warning fits under any existing
option, so I'll add a new one. -Wuseless-cast-qual maybe? Or is that
too close to -Wuseless-cast and -Wcast-qual and would cause confusion?




Re: [PATCH] PR c++/80544 strip cv-quals from cast results

2017-05-23 Thread Jonathan Wakely

On 23/05/17 13:58 -0400, Nathan Sidwell wrote:

On 05/23/2017 01:56 PM, Jonathan Wakely wrote:

On 18/05/17 13:44 -0400, Nathan Sidwell wrote:


References can't be CV qualified, so the REFERENCE_TYPE check 
seems superfluous?


True. I did it because that matches the semantics of the cast
according to the standard, but it isn't needed here. Is it worth
keeping anyway, to avoid a redundant call to cv_unqualified?


I don't think it's worth checking.


Ah yes, cp_build_qualified_type_real returns early if there's nothing
to do. OK, I'll remove those checks. Thanks.





Re: [PATCH 2/5 v3] Vect peeling cost model

2017-05-23 Thread Richard Sandiford
Robin Dapp  writes:
> @@ -1272,12 +1272,40 @@ vect_peeling_hash_get_lowest_cost (_vect_peel_info 
> **slot,
> && !STMT_VINFO_GROUPED_ACCESS (stmt_info))
>   continue;
>  
> +  int save_misalignment;
>save_misalignment = DR_MISALIGNMENT (dr);
> -  vect_update_misalignment_for_peel (dr, elem->dr, elem->npeel);
> -  vect_get_data_access_cost (dr, &inside_cost, &outside_cost,
> -  &body_cost_vec);
> +  if (dr == dr0 && npeel == vf / 2)
> + SET_DR_MISALIGNMENT (dr, 0);
> +  else
> + vect_update_misalignment_for_peel (dr, dr0, npeel);
> +  vect_get_data_access_cost (dr, inside_cost, outside_cost,
> +  body_cost_vec);
>SET_DR_MISALIGNMENT (dr, save_misalignment);
>  }

Not sure I've understood the series TBH, but is the npeel == vf / 2
there specifically for the "unknown number of peels" case?  How do
we distinguish that from the case in which the number of peels is
known to be vf / 2 at compile time?  Or have I missed the point
completely? (probably yes, sorry!)

Thanks,
Richard



Re: MinGW compilation warnings in libiberty's waitpid.c

2017-05-23 Thread Eli Zaretskii
> From: DJ Delorie 
> Cc: gcc-patches@gcc.gnu.org, gdb-patc...@sourceware.org
> Date: Mon, 22 May 2017 15:38:40 -0400
> 
> Since (or "if") nobody will (should) use waitpid() on mingw anyway, and
> since libiberty really wants to include waitpid.o, how difficult would
> it be to use some #ifdefs to have waitpid() just return an error on
> mingw?  That at least gets past the mingw build problem.

Instead of making waitpid an always-failing stub on MinGW, wouldn't it
be better to make it work on MinGW?  Like this:

--- libiberty/waitpid.c~0   2016-08-01 18:50:21.0 +0300
+++ libiberty/waitpid.c 2017-05-23 21:19:34.302415000 +0300
@@ -23,6 +23,11 @@ does the return value.  The third argume
 #include 
 #endif
 
+#ifdef __MINGW32__
+#include 
+#define wait(s)  _cwait(s,pid,_WAIT_CHILD)
+#endif
+
 pid_t
 waitpid (pid_t pid, int *stat_loc, int options ATTRIBUTE_UNUSED)
 {


Re: SSA range class and removal of VR_ANTI_RANGEs

2017-05-23 Thread Martin Sebor

On 05/23/2017 04:48 AM, Aldy Hernandez wrote:

Howdy!

For Andrew's upcoming on-demand range work I have created a range class
for use in his engine.  Currently, the range class is only for SSA
integers, but there is no reason why we couldn't adapt this for RTL or
non-integer types at a later time.

The class can live outside of his work, as can be demonstrated by the
attached patch.  With it, I was able to rewrite the post-VRP range
information to use this class and get rid of VR_ANTI_RANGE throughout
the compiler.  A VR_ANTI_RANGE of ~[5,10] becomes [-MIN,4][11,+MAX].

The internal structure of VRP is unchanged.  I have only changed the
outward facing structures.

A good example of how much cleaner using an actual range rather than
VR_ANTI_RANGEs is the change to get_size_range() in the attached patch.
We remove an assortment of contortions into:

+  /* Remove negative numbers from the range.  */
+  irange positives;
+  range_positives (&positives, exptype, allow_zero ? 0 : 1);
+  if (positives.Intersect (*ir))
+{
+  /* Remove the unknown parts of a multi-range.
+This will transform [5,10][20,MAX] into [5,10].  */
+  if (positives.num_ranges () > 1
+ && positives.upper_bound () == wide_int (TYPE_MAX_VALUE
(exptype)))
+   positives.remove_range (positives.num_ranges () - 1);
+
+  range[0] = wide_int_to_tree (exptype, positives.lower_bound ());
+  range[1] = wide_int_to_tree (exptype, positives.upper_bound ());
+}
+  else
+{
+  /* If removing the negative numbers didn't give us anything
+back, the entire range was negative. Leave things as they
+were, and let the caller sort it out.  */
+  range[0] = wide_int_to_tree (exptype, min);
+  range[1] = wide_int_to_tree (exptype, max);
 }

A few notes:

1. There are still two uses of VR_ANTI_RANGEs left after this patch:
ip-cp.c and ipa-prop.c.  I didn't look too hard at these two passes, as
they are using `struct value_range' which should only have been used
*before* VRP finishes.  I can work on this as a follow-up.

2. I have not included ChangeLog entries, as I would hate to write them,
and have the API or significant things change from under me as part of
the review.  I can certainly cobble the ChangeLog entries as soon as/if
we agree that this is an acceptable approach.

3. There are lots of unit tests to maintain sanity.  The cast ones in
particular will definitely need refinement as Andrew's work stresses the
class.

The attached patch passes bootstrap and tests.

I have also benchmarked an assortment of .ii files (from a previous
bootstrap) with no noticeable difference in -O2 compilation time.  So, I
would prefer to tackle any micro-optimizations either as a follow-up or
if it actually becomes noticeable--. Yeah, yeah, I know.  Wishful
thinking ;-).


Fire away!
Aldy

p.s. Please refer any comments as to why we need the class, or why we
need on-demand range information to Andrew :-P.


I'm pretty excited about this work (as you know :) so just a few
longish comments on/suggestions for the design of the class to
make it ever so slightly easier to work with.  Since YMMV, feel
free to disregard what you don't like.

+#define MAX_RANGES 6
+

This seems like an implementation detail that clients of the class
shouldn't know about or have access to.  Suggest to hide that from
the interface (e.g., by making a private member of irange).

+typedef class irange *irange_p;

FWIW, I find pointer typedefs more trouble than worth.  They
obscure the fact that they are pointers and cannot be made const
by adding the const qualifier.  E.g., this:

  void foo (const irange_p);

doesn't declare foo to take a pointer to a const range as one
might hope but rather a const pointer to a non-const range.

+enum irange_type { RANGE_PLAIN, RANGE_INVERT };

Consider making this a member and renaming it to something like
'kind' to avoid giving the impression that it's somehow a type
that's related to class irange.  Making it a member also avoids
the need to prefix the enumerators with RANGE_, and makes it
convenient to refer to PLAIN and INVERT using the dot and arrow
operators (in addition to irange::PLAIN).

+
+class GTY(()) irange
+{
+ private:
+  bool overflow;
+  size_t n;

Suggest to use a more descriptive name for the member.

+  void prepend (wide_int x, wide_int y);
+  void append (wide_int x, wide_int y);
+  void remove (unsigned i, unsigned j);
+  void canonicalize ();
+  /* This is stupid.  These two should be private, but the GTY
+ machinery can't look inside an irange.  */
+ public:
+  tree type;
+  wide_int bounds[MAX_RANGES];
+
+public:
+  irange () { type = NULL_TREE; n = 0; }
+  irange (tree t);

Should the range be implicitly convertible from any tree or would
it better if it required explicit conversion (i.e., should the ctor
be declared explicit)?  The usual rule of thumb is to err on the
side of safety to avoid accidentally creating objects of the class).

+  irange (tree t,

Re: MinGW compilation warnings in libiberty's waitpid.c

2017-05-23 Thread DJ Delorie

Eli Zaretskii  writes:
> Instead of making waitpid an always-failing stub on MinGW, wouldn't it
> be better to make it work on MinGW?  Like this:

That's up to you, if it's target-specific.  What about mingw64?

> --- libiberty/waitpid.c~0 2016-08-01 18:50:21.0 +0300
> +++ libiberty/waitpid.c   2017-05-23 21:19:34.302415000 +0300
> @@ -23,6 +23,11 @@ does the return value.  The third argume
>  #include 
>  #endif
>  
> +#ifdef __MINGW32__
> +#include 
> +#define wait(s)  _cwait(s,pid,_WAIT_CHILD)
> +#endif
> +
>  pid_t
>  waitpid (pid_t pid, int *stat_loc, int options ATTRIBUTE_UNUSED)
>  {


[C++ PATCH] namespace bindings

2017-05-23 Thread Nathan Sidwell
There's a twisty set of function calling to get at a namespace binding. 
This patch cleans that up.


nathan
--
Nathan Sidwell
2017-05-23  Nathan Sidwell  

	* name-lookup.c (find_namespace_binding): New.
	(pushdecl_maybe_friend_1): Use CP_DECL_CONTEXT.
	(set_identifier_type_value_with_scope): Use find_namespace_binding.
	(find_binding, cp_binding_level_find_binding_for_name,
	binding_for_name, namespace_binding_1): Delete.
	(push_overloaded_decl_1): Use CP_DECL_CONTEXT.
	(get_namespace_binding, set_namespace_binding,
	finish_namespace_using_decl, unqualified_namespace_lookup_1,
	qualified_lookup_using_namespace, lookup_type_scope_1,
	lookup_name_innermost_nonclass_level_1): Use find_namespace_binding.

Index: name-lookup.c
===
--- name-lookup.c	(revision 248377)
+++ name-lookup.c	(working copy)
@@ -48,7 +48,6 @@ struct scope_binding {
 };
 #define EMPTY_SCOPE_BINDING { NULL_TREE, NULL_TREE }
 
-static cxx_binding *binding_for_name (cp_binding_level *, tree);
 static tree push_overloaded_decl (tree, int, bool);
 static bool lookup_using_namespace (tree, struct scope_binding *, tree,
 tree, int);
@@ -62,6 +61,32 @@ static void consider_binding_level (tree
 static tree push_using_directive (tree);
 static void diagnose_name_conflict (tree, tree);
 
+/* Find the binding for NAME in namespace NS.  If CREATE_P is true,
+   make an empty binding if there wasn't one.  */
+
+static cxx_binding *
+find_namespace_binding (tree ns, tree name, bool create_p = false)
+{
+  cp_binding_level *level = NAMESPACE_LEVEL (ns);
+  cxx_binding *binding = IDENTIFIER_NAMESPACE_BINDINGS (name);
+
+  for (;binding; binding = binding->previous)
+if (binding->scope == level)
+  return binding;
+
+  if (create_p)
+{
+  binding = cxx_binding_make (NULL, NULL);
+  binding->previous = IDENTIFIER_NAMESPACE_BINDINGS (name);
+  binding->scope = level;
+  binding->is_local = false;
+  binding->value_is_inherited = false;
+  IDENTIFIER_NAMESPACE_BINDINGS (name) = binding;
+}
+
+  return binding;
+}
+
 /* Add DECL to the list of things declared in B.  */
 
 static void
@@ -1698,7 +1723,7 @@ pushdecl_maybe_friend_1 (tree x, bool is
   /* In case this decl was explicitly namespace-qualified, look it
 	 up in its namespace context.  */
   if (DECL_NAMESPACE_SCOPE_P (x) && namespace_bindings_p ())
-	t = get_namespace_binding (DECL_CONTEXT (x), name);
+	t = get_namespace_binding (CP_DECL_CONTEXT (x), name);
   else
 	t = lookup_name_innermost_nonclass_level (name);
 
@@ -2702,9 +2727,9 @@ set_identifier_type_value_with_scope (tr
 }
   else
 {
-  cxx_binding *binding =
-	binding_for_name (NAMESPACE_LEVEL (current_namespace), id);
-  gcc_assert (decl);
+  cxx_binding *binding
+	= find_namespace_binding (current_namespace, id, true);
+
   if (binding->value)
 	supplement_binding (binding, decl);
   else
@@ -2811,55 +2836,6 @@ make_lambda_name (void)
   return get_identifier (buf);
 }
 
-/* Return (from the stack of) the BINDING, if any, established at SCOPE.  */
-
-static inline cxx_binding *
-find_binding (cp_binding_level *scope, cxx_binding *binding)
-{
-  for (; binding != NULL; binding = binding->previous)
-if (binding->scope == scope)
-  return binding;
-
-  return (cxx_binding *)0;
-}
-
-/* Return the binding for NAME in SCOPE, if any.  Otherwise, return NULL.  */
-
-static inline cxx_binding *
-cp_binding_level_find_binding_for_name (cp_binding_level *scope, tree name)
-{
-  cxx_binding *b = IDENTIFIER_NAMESPACE_BINDINGS (name);
-  if (b)
-{
-  /* Fold-in case where NAME is used only once.  */
-  if (scope == b->scope && b->previous == NULL)
-	return b;
-  return find_binding (scope, b);
-}
-  return NULL;
-}
-
-/* Always returns a binding for name in scope.  If no binding is
-   found, make a new one.  */
-
-static cxx_binding *
-binding_for_name (cp_binding_level *scope, tree name)
-{
-  cxx_binding *result;
-
-  result = cp_binding_level_find_binding_for_name (scope, name);
-  if (result)
-return result;
-  /* Not found, make a new one.  */
-  result = cxx_binding_make (NULL, NULL);
-  result->previous = IDENTIFIER_NAMESPACE_BINDINGS (name);
-  result->scope = scope;
-  result->is_local = false;
-  result->value_is_inherited = false;
-  IDENTIFIER_NAMESPACE_BINDINGS (name) = result;
-  return result;
-}
-
 /* Insert another USING_DECL into the current binding level, returning
this declaration. If this is a redeclaration, do nothing, and
return NULL_TREE if this not in namespace scope (in namespace
@@ -2978,7 +2954,7 @@ push_overloaded_decl_1 (tree decl, int f
   int doing_global = (namespace_bindings_p () || !(flags & PUSH_LOCAL));
 
   if (doing_global)
-old = get_namespace_binding (DECL_CONTEXT (decl), name);
+old = get_namespace_binding (CP_DECL_CONTEXT (decl), name);
   else
 old = lookup_name_innermost_nonclass_level (name);
 
@@ -3965

Re: SSA range class and removal of VR_ANTI_RANGEs

2017-05-23 Thread Andrew MacLeod

On 05/23/2017 03:26 PM, Martin Sebor wrote:

On 05/23/2017 04:48 AM, Aldy Hernandez wrote:

Howdy!

+typedef class irange *irange_p;

FWIW, I find pointer typedefs more trouble than worth.  They
obscure the fact that they are pointers and cannot be made const
by adding the const qualifier.  E.g., this:

  void foo (const irange_p);

doesn't declare foo to take a pointer to a const range as one
might hope but rather a const pointer to a non-const range.



yeah, the trend is to expose pointers, like 'gimple *' does. extracting 
types from trees into 'ttype *' was doing the same thing.  we should be 
able to punt on irange_p.




+
+public:
+  irange () { type = NULL_TREE; n = 0; }
+  irange (tree t);

Should the range be implicitly convertible from any tree or would
it better if it required explicit conversion (i.e., should the ctor
be declared explicit)?  The usual rule of thumb is to err on the
side of safety to avoid accidentally creating objects of the class).


the original idea was to allow it to be created from either
  a) an ssa_name, which would return what global range we know about 
the name,

  b) a type , and set it to the range_for_type().
and it would gcc_assert fail on any other kind.

a) may or may not be a good idea or useful long term... it might be 
better to simply always go through a central ssa_name range server 
class, which doesnt quite exist yet.  That would subsume this.
b) some day (haha), this would be statically types as irange (ttype *) 
rather than tree for types.It could also be provided trough an 
external routine.. irange_for_type (tree type)


neither are probably needed


+  irange (tree t, wide_int lbound, wide_int ubound,
+  irange_type rt = RANGE_PLAIN);
+  irange (const irange &r);
+
+  void set_range (tree t);
+  void set_range (tree t, wide_int lbound, wide_int ubound,
+  irange_type rt = RANGE_PLAIN);
+
+  bool overflow_p () { return overflow && !TYPE_OVERFLOW_WRAPS (type);

Suggest to declare this function const.


I figured this would come up.  We ought to properly constify the entire 
class where appropriate. the more code that gets written using 
un-constified code, the harder it is to go back an constify it.




PS For the names, have you considered using the bitwise operators?
I.e., operator| for union, operator& for intersection, and operator~
for not (aka invert).



I'd be more tempted to follow the bitmap model and ( how you mentioned 
earlier I think) have friend routines for range_intersect(), 
range_invert(), range_union().  especially since we want 2 variations. 
one which takes a single parameter and works into 'this',  and another 
variation that builds a new range from 2 parametered ranges .


I feel overloading those operators would make the code a little less 
readable?  at least to me.


Andrew



[PATCH, i386]: Remove moves using pextr insn

2017-05-23 Thread Uros Bizjak
Hello!

Implementing SImode and DImode moves using pextr insn doesn't bring
anything, since the latency of pextr is slightly worse than movd for
targets without efficient inter-unit moves.

2017-05-23  Uros Bizjak  

* config/i386/i386.md (*movdi_internal): Remove SSE4
alternative 18 (?r, *v).  Update insn attributes.
(*movsi_internal): Remove SSE4 alternative 13 (?r, *v).
Update insn attributes.
(*zero_extendsidi2): Remove SSE4 alternative (?r, *x).
Update insn attributes.
* config/i386/sse.md (vec_extract_0): Remove SSE4
alternative 1 (r, v). Remove isa attribute.
* config/i386/i386.c (dimode_scalar_chain::make_vector_copies):
Always move value through stack for !TARGET_INTER_UNIT_MOVES_TO_VEC
and !TARGET_INTER_UNIT_MOVES_TO_VEC targets.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline, will be committed to gcc-7 branch after a couple of days.

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 248369)
+++ config/i386/i386.c  (working copy)
@@ -3881,6 +3881,15 @@ dimode_scalar_chain::make_vector_copies (unsigned
 
emit_insn (gen_zero_extendsidi2 (vreg, tmp));
  }
+   else if (!TARGET_INTER_UNIT_MOVES_TO_VEC)
+ {
+   rtx tmp = assign_386_stack_local (DImode, SLOT_STV_TEMP);
+   emit_move_insn (adjust_address (tmp, SImode, 0),
+   gen_rtx_SUBREG (SImode, reg, 0));
+   emit_move_insn (adjust_address (tmp, SImode, 4),
+   gen_rtx_SUBREG (SImode, reg, 4));
+   emit_move_insn (vreg, tmp);
+ }
else if (TARGET_SSE4_1)
  {
emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 0),
@@ -3891,7 +3900,7 @@ dimode_scalar_chain::make_vector_copies (unsigned
  gen_rtx_SUBREG (SImode, reg, 4),
  GEN_INT (2)));
  }
-   else if (TARGET_INTER_UNIT_MOVES_TO_VEC)
+   else
  {
rtx tmp = gen_reg_rtx (DImode);
emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 0),
@@ -3905,15 +3914,6 @@ dimode_scalar_chain::make_vector_copies (unsigned
gen_rtx_SUBREG (V4SImode, vreg, 0),
gen_rtx_SUBREG (V4SImode, tmp, 0)));
  }
-   else
- {
-   rtx tmp = assign_386_stack_local (DImode, SLOT_STV_TEMP);
-   emit_move_insn (adjust_address (tmp, SImode, 0),
-   gen_rtx_SUBREG (SImode, reg, 0));
-   emit_move_insn (adjust_address (tmp, SImode, 4),
-   gen_rtx_SUBREG (SImode, reg, 4));
-   emit_move_insn (vreg, tmp);
- }
rtx_insn *seq = get_insns ();
end_sequence ();
rtx_insn *insn = DF_REF_INSN (ref);
@@ -3987,8 +3987,17 @@ dimode_scalar_chain::convert_reg (unsigned regno)
   if (scalar_copy)
{
  start_sequence ();
- if (TARGET_SSE4_1)
+ if (!TARGET_INTER_UNIT_MOVES_FROM_VEC)
{
+ rtx tmp = assign_386_stack_local (DImode, SLOT_STV_TEMP);
+ emit_move_insn (tmp, reg);
+ emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 0),
+ adjust_address (tmp, SImode, 0));
+ emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 4),
+ adjust_address (tmp, SImode, 4));
+   }
+ else if (TARGET_SSE4_1)
+   {
  rtx tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, const0_rtx));
  emit_insn
(gen_rtx_SET
@@ -4003,7 +4012,7 @@ dimode_scalar_chain::convert_reg (unsigned regno)
  gen_rtx_VEC_SELECT (SImode,
  gen_rtx_SUBREG (V4SImode, reg, 0), tmp)));
}
- else if (TARGET_INTER_UNIT_MOVES_FROM_VEC)
+ else
{
  rtx vcopy = gen_reg_rtx (V2DImode);
  emit_move_insn (vcopy, gen_rtx_SUBREG (V2DImode, reg, 0));
@@ -4014,15 +4023,6 @@ dimode_scalar_chain::convert_reg (unsigned regno)
  emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 4),
  gen_rtx_SUBREG (SImode, vcopy, 0));
}
- else
-   {
- rtx tmp = assign_386_stack_local (DImode, SLOT_STV_TEMP);
- emit_move_insn (tmp, reg);
- emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 0),
- adjust_address (tmp, SImode, 0));
- emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 4),
- adjust_address (tmp, SImode, 4));
-   }
  rtx_insn *seq = get_insns ();
  end_sequence ();
  emit_conversion_insns (seq, insn);
Index: config/i386/i386.md
===
--- config/i38

Re: Default std::vector default and move constructor

2017-05-23 Thread François Dumont

Gentle reminder, ok to commit ?

François

On 19/05/2017 21:39, François Dumont wrote:

Hi

On 15/05/2017 21:31, Marc Glisse wrote:
The __fill_bvector part of the fill overload for vector could 
do with some improvements as well. Looping is unnecessary, one just 
needs to produce the right mask and and or or with it, that shouldn't 
take more than 4 instructions or so.



I have implemented this idear so I would like to amend my patch 
proposal with this additional optimization.


Tested under Linux x86_64 normal mode.

Ok to commit ?

François






C++ PATCH to add __integer_pack built-in for std::make_integer_sequence (c++/80396)

2017-05-23 Thread Jason Merrill
make_integer_sequence currently uses template metaprogramming to
generate a sequence of integers, which is fairly expensive at compile
time: PR 80396 asked for some built-in to speed it up.  This patch
introduces __integer_pack as a magic function taking a single integer
argument; a call to this function as the pattern of a pack expansion
produces a list of integers which can be used as arguments to a
template (or a function, but that's not the goal).

Rather than introduce a new keyword, I decided to use a FUNCTION_DECL,
which works fine.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit a5681fc17f5d79e443062d7fc652d45ba81cda0f
Author: Jason Merrill 
Date:   Fri May 12 07:45:02 2017 -0400

PR c++/80396 - built-in for make_integer_sequence.

* pt.c (builtin_pack_fn_p, builtin_pack_call_p)
(expand_integer_pack, expand_builtin_pack_call): New.
(find_parameter_packs_r): Check builtin_pack_call_p.
(check_for_bare_parameter_packs): Handle it.
(tsubst_pack_expansion): Call expand_builtin_pack_call.
(declare_integer_pack): New.
(init_template_processing): Call it.
* decl2.c (mark_used): Check builtin_pack_fn_p.

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 0064222..3f9bb6b 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6372,6 +6372,7 @@ extern bool always_instantiate_p  (tree);
 extern void maybe_instantiate_noexcept (tree);
 extern tree instantiate_decl   (tree, bool, bool);
 extern int comp_template_parms (const_tree, const_tree);
+extern bool builtin_pack_fn_p  (tree);
 extern bool uses_parameter_packs(tree);
 extern bool template_parameter_pack_p   (const_tree);
 extern bool function_parameter_pack_p  (const_tree);
diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index 7247b0f..85310e0 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -5109,6 +5109,13 @@ mark_used (tree decl, tsubst_flags_t complain)
   if (!require_deduced_type (decl, complain))
 return false;
 
+  if (builtin_pack_fn_p (decl))
+{
+  error ("use of built-in parameter pack %qD outside of a template",
+DECL_NAME (decl));
+  return false;
+}
+
   /* If we don't need a value, then we don't need to synthesize DECL.  */
   if (cp_unevaluated_operand || in_discarded_stmt)
 return true;
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index cdde7a0..e11c601 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -15096,7 +15096,7 @@ cp_parser_template_parameter (cp_parser* parser, bool 
*is_non_type,
   *is_parameter_pack = false;
   /* Peek at the next token.  */
   token = cp_lexer_peek_token (parser->lexer);
-  /* If it is `class' or `template', we have a type-parameter.  */
+  /* If it is `template', we have a type-parameter.  */
   if (token->keyword == RID_TEMPLATE)
 return cp_parser_type_parameter (parser, is_parameter_pack);
   /* If it is `class' or `typename' we do not know yet whether it is a
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index f9980fe..93874cd 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -215,6 +215,7 @@ static tree instantiate_alias_template (tree, tree, 
tsubst_flags_t);
 static bool complex_alias_template_p (const_tree tmpl);
 static tree tsubst_attributes (tree, tree, tsubst_flags_t, tree);
 static tree canonicalize_expr_argument (tree, tsubst_flags_t);
+static tree make_argument_pack (tree);
 
 /* Make the current scope suitable for access checking when we are
processing T.  T can be FUNCTION_DECL for instantiated function
@@ -3414,6 +3415,101 @@ get_template_argument_pack_elems (const_tree t)
   return ARGUMENT_PACK_ARGS (t);
 }
 
+/* True iff FN is a function representing a built-in variadic parameter
+   pack.  */
+
+bool
+builtin_pack_fn_p (tree fn)
+{
+  if (!fn
+  || TREE_CODE (fn) != FUNCTION_DECL
+  || !DECL_IS_BUILTIN (fn))
+return false;
+
+  if (strcmp (IDENTIFIER_POINTER (DECL_NAME (fn)), "__integer_pack") == 0)
+return true;
+
+  return false;
+}
+
+/* True iff CALL is a call to a function representing a built-in variadic
+   parameter pack.  */
+
+static bool
+builtin_pack_call_p (tree call)
+{
+  if (TREE_CODE (call) != CALL_EXPR)
+return false;
+  return builtin_pack_fn_p (CALL_EXPR_FN (call));
+}
+
+/* Return a TREE_VEC for the expansion of __integer_pack(HI).  */
+
+static tree
+expand_integer_pack (tree call, tree args, tsubst_flags_t complain,
+tree in_decl)
+{
+  tree ohi = CALL_EXPR_ARG (call, 0);
+  tree hi = tsubst_copy_and_build (ohi, args, complain, in_decl,
+  false/*fn*/, true/*int_cst*/);
+
+  if (value_dependent_expression_p (hi))
+{
+  if (hi != ohi)
+   {
+ call = copy_node (call);
+ CALL_EXPR_ARG (call, 0) = hi;
+   }
+  tree ex = make_pack_expansion (call);
+  tree vec = make_tree_vec (1);
+  TREE_VEC_ELT (vec

Re: [PATCH] PR c++/80544 strip cv-quals from cast results

2017-05-23 Thread Jason Merrill
On Tue, May 23, 2017 at 2:00 PM, Jonathan Wakely  wrote:
> On 19/05/17 15:14 -0400, Jason Merrill wrote:
>>
>> On Thu, Apr 27, 2017 at 12:59 PM, Jonathan Wakely 
>> wrote:
>>>
>>> I also tried to add a warning like EDG's (see the PR) but it gave a
>>> false positive for direct-list-init of scoped enums (P0138R2, r240449)
>>> because that code goes through build_c_cast to perform the conversion,
>>> so looks like a cast to my new warning.
>>
>>
>> The enum init code could strip the cv-quals when calling build_c_cast
>> to avoid the warning.
>
>
> Thanks, that works. I don't think this warning fits under any existing
> option, so I'll add a new one. -Wuseless-cast-qual maybe? Or is that
> too close to -Wuseless-cast and -Wcast-qual and would cause confusion?

-Wcast-rvalue-qual ?

Jason


[C++ PATCH] pushdecl

2017-05-23 Thread Nathan Sidwell
This patch reimplements the innards of pushdecl, which deals with local 
and namespace scope pushing.


I managed to collapse or remove a lot of special casing, which I think 
greatly improves the readability of this piece of code.  We're still not 
onto the performance improvements though.


nathan
--
Nathan Sidwell
2017-05-23  Nathan Sidwell  

	* cp-tree.h (PUSH_GLOBAL, PUSH_LOCAL, PUSH_USING): Delete.
	* name-lookup.c (create_local_binding): New.
	(update_binding): New.
	(pushdecl_maybe_friend_1): Rename to ...
	(do_pushdecl): ... this.  Reimplement.
	(pushdecl): Adjust.
	(push_overloaded_decl_1, push_overloaded_decl): Delete.

Index: cp-tree.h
===
--- cp-tree.h	(revision 248377)
+++ cp-tree.h	(working copy)
@@ -5312,14 +5312,6 @@ enum overload_flags { NO_SPECIAL = 0, DT
    will be identical to
    COMPARE_STRICT.  */
 
-/* Used with push_overloaded_decl.  */
-#define PUSH_GLOBAL	 0  /* Push the DECL into namespace scope,
-   regardless of the current scope.  */
-#define PUSH_LOCAL	 1  /* Push the DECL into the current
-   scope.  */
-#define PUSH_USING	 2  /* We are pushing this DECL as the
-   result of a using declaration.  */
-
 /* Used with start function.  */
 #define SF_DEFAULT	 0  /* No flags.  */
 #define SF_PRE_PARSED	 1  /* The function declaration has
Index: name-lookup.c
===
--- name-lookup.c	(revision 248382)
+++ name-lookup.c	(working copy)
@@ -48,7 +48,6 @@ struct scope_binding {
 };
 #define EMPTY_SCOPE_BINDING { NULL_TREE, NULL_TREE }
 
-static tree push_overloaded_decl (tree, int, bool);
 static bool lookup_using_namespace (tree, struct scope_binding *, tree,
 tree, int);
 static bool qualified_lookup_using_namespace (tree, tree,
@@ -61,6 +60,23 @@ static void consider_binding_level (tree
 static tree push_using_directive (tree);
 static void diagnose_name_conflict (tree, tree);
 
+/* Create a local binding level for NAME.  */
+
+static cxx_binding *
+create_local_binding (cp_binding_level *level, tree name)
+{
+  cxx_binding *binding = cxx_binding_make (NULL, NULL);
+
+  INHERITED_VALUE_BINDING_P (binding) = false;
+  LOCAL_BINDING_P (binding) = true;
+  binding->scope = level;
+  binding->previous = IDENTIFIER_BINDING (name);
+
+  IDENTIFIER_BINDING (name) = binding;
+  
+  return binding;
+}
+
 /* Find the binding for NAME in namespace NS.  If CREATE_P is true,
make an empty binding if there wasn't one.  */
 
@@ -1281,6 +1297,173 @@ matching_fn_p (tree one, tree two)
   return true;
 }
 
+/* Push DECL into nonclass LEVEL BINDING.  OLD is the current
+   binding value (possibly with anticipated builtins stripped).
+   Diagnose conflicts and return updated decl.  */
+
+static tree
+update_binding (cp_binding_level *level, cxx_binding *binding,
+		tree old, tree decl, bool is_friend)
+{
+  tree to_val = decl;
+  tree to_type = NULL_TREE;
+
+  gcc_assert (level->kind != sk_class);
+  if (old == error_mark_node)
+old = NULL_TREE;
+
+  if (old && TREE_CODE (old) == TYPE_DECL && DECL_ARTIFICIAL (old))
+{
+  /* Slide the tdef out of the way.  We'll undo this below, if
+	 we're pushing a matching tdef.  */
+  to_type = old;
+  old = NULL_TREE;
+}
+
+  if (DECL_DECLARES_FUNCTION_P (decl))
+{
+  if (!old)
+	;
+  else if (OVL_P (old))
+	{
+	  for (ovl_iterator iter (old); iter; ++iter)
+	{
+	  tree fn = *iter;
+
+	  if (iter.using_p () && matching_fn_p (fn, decl))
+		{
+		  /* If a function declaration in namespace scope or
+		 block scope has the same name and the same
+		 parameter-type- list (8.3.5) as a function
+		 introduced by a using-declaration, and the
+		 declarations do not declare the same function,
+		 the program is ill-formed.  [namespace.udecl]/14 */
+		  if (tree match = duplicate_decls (decl, fn, is_friend))
+		return match;
+		  else
+		/* FIXME: To preserve existing error behavior, we
+		   still push the decl.  This might change.  */
+		diagnose_name_conflict (decl, fn);
+		}
+	}
+	}
+  else
+	goto conflict;
+
+  to_val = ovl_insert (decl, old);
+}
+  else if (to_type && TREE_CODE (decl) == TYPE_DECL)
+{
+  /* We thought we wanted to slide an artificial typedef out of
+	 the way, to make way for another typedef.  That's not always
+	 what we want to do.  */
+  if (!DECL_ARTIFICIAL (decl))
+	; /* Slide.  */
+  else if (same_type_p (TREE_TYPE (to_type), TREE_TYPE (decl)))
+	/* Two artificial decls to same type.  Do nothing.  */
+	return to_type;
+  else
+	goto conflict;
+}
+  else if (!old)
+;
+  else if (TREE_CODE (decl) == TYPE_DECL && DECL_ARTIFICIAL (decl))
+{
+  /* Slide DECL into the type slot.  */
+  to_type = decl;
+  to_val = old;
+}
+  else if (TREE_CODE (old) != TREE_CODE (decl))
+/* Different kinds of decls conflict.  */
+goto con

Re: [PATCH] [i386] Recompute the frame layout less often

2017-05-23 Thread Daniel Santos

On 05/22/2017 01:32 PM, Bernd Edlinger wrote:

On 05/19/17 05:17, Daniel Santos wrote:


No, I'm not at all comfortable with you making so many seemingly
unnecessary changes to this code.  (Although I wish I got this much
feedback during my RFCs! :)  I can accept the changes to
is/count_stub_managed_reg (with some caveats), but I do not at all see
the rationale for changing m_stub_names to a static and adding another
dimension for the object instance -- from an OO standpoint, that's just
bad design.  Can you please share your rationale for that?


Hmm, sorry about that ...
I just thought it would be nice to avoid the const-cast here.


Well remember const-correctness isn't about an object's internal 
(bitwise) state, but it's externally visible (logical) state.  So a 
const member function need not avoid altering it's internal state if the 
externally visible state remains unchanged, such as when caching some 
result or lazy initing.  I have tended to prefer using const_cast for 
this, isolating its use to a single const accessor function (or if () 
block) to leave less room for the data members to be accidentally 
altered in another const member function.  But mutable is generally 
preferred over const_cast, which opens up the danger of accidentally 
modifying an object's logical state (especially by a subsequent edit), 
so using mutable is probably a better practice anyway.


However, ...


This moved the m_stub_names from all 4 instances to one static
array s_stub_names.  But looking at it again, I think the extra
dimension is not even necessary, because all instances share the
same data, so removing that extra dimension again will be fine.


You are correct!  And I see that you're new patch has already changed 
get_stub_name to a static member function, so great!



Incidentally, half of the space in that array is wasted and can be
trimmed since a given instance of xlogue_layout either supports hard
frame pointers or doesn't, I just never got around to splitting that
apart.  (The first three enum xlogue_stub values are for without HFP and
the last three for with.)  Also, if we wanted to further reduce the
memory footprint of xlogue_layout objects, the offset field of struct
reginfo could be changed to int, and if we really wanted to go nuts then
16 bits would work for both of its fields.

So for is/count_stub_managed_reg(s), you are obviously much more
familiar with gcc, its passes and the i386 backend, but my knowledge
level makes me very uncomfortable having the result of
xlogue_layout::is_stub_managed_reg() determined in a way that has the
(apparent) potential to differ from from the state at the time the last
call to ix86_compute_frame_layout() was made; I just don't understand

I fund it technically difficult to add a HARD_REG_SET to
struct machine_function, and tried to avoid the extra overhead of
calling ix86_save_reg so often, which you also felt uncomfortable with.

So, if you look at compute_stub_managed_regs I first saw that the
first loop can never terminate thru the "return 0", because the
registers in x86_64_ms_sysv_extra_clobbered_registers are guaranteed
to be clobbered.  Then I saw that the bits in stub_managed_regs
are always added in the same sequence, thus the result depends only
on the number call_ms2sysv_extra_regs and hfp so everything is already
available in struct machine_function.


Thanks
Bernd.


Yes, I agree with how you have refactored compute_stub_managed_regs 
given your rationale of not adding another header dependency to i386.h.  
It is only the overall scheme of calculating this outside of 
ix86_compute_frame_layout that I cannot validate due to my not having a 
good understanding of what can and cannot change in between the time 
that ix86_compute_frame_layout is last called and the last call to 
is_stub_managed_regs().


As Uros said, my patch set touches a "delicate part of the compiler, 
where lots of code-paths cross each other (and we have had quite some 
hard-to-fix bugs in this area)" 
(https://gcc.gnu.org/ml/gcc-patches/2016-12/msg01924.html).  I wrote it 
the way I did with my understanding of what was safe to do and your 
alterations move it's functionality outside of that understanding.  So 
if you say that this won't break it, then I will have to trust you (and 
the testsuite) on that.


On that note, the tests are undergoing some change and bug fixes and I'm 
planning on adding more tests to validate non-breakage with various 
other stack frame-related options and probably additional tests (and 
test options) triggered by GCC_TEST_RUN_EXPENSIVE or some such.


Daniel





RE: [PATCH] [Aarch64] Variable shift count truncation issues

2017-05-23 Thread Michael Collison
I updated the patch to resolve the big endian issues as recommended by Richard. 
I also changed all uses of 'can_create_pseudo_p' to '!reload_completed' to make 
the intent clearer.

Bootstrapped and tested on aarch64-linux-gnu. Okay for trunk?

2017-05-22  Kyrylo Tkachov  
Michael Collison 

PR target/70119
* config/aarch64/aarch64.md (*aarch64__reg_3_mask1):
New pattern.
(*aarch64_reg_3_neg_mask2): New pattern.
(*aarch64_reg_3_minus_mask): New pattern.
(*aarch64__reg_di3_mask2): New pattern.
* config/aarch64/aarch64.c (aarch64_rtx_costs): Account for cost
of shift when the shift amount is masked with constant equal to
the size of the mode.
* config/aarch64/predicates.md (subreg_lowpart_operator): New
predicate.


2016-05-22  Kyrylo Tkachov  
Michael Collison 

PR target/70119
* gcc.target/aarch64/var_shift_mask_1.c: New test.
-Original Message-
From: Richard Sandiford [mailto:richard.sandif...@linaro.org] 
Sent: Tuesday, May 23, 2017 7:25 AM
To: Wilco Dijkstra 
Cc: Michael Collison ; GCC Patches 
; nd 
Subject: Re: [PATCH] [Aarch64] Variable shift count truncation issues

Wilco Dijkstra  writes:
> Richard Sandiford wrote:
>
>> Insn patterns shouldn't check can_create_pseudo_p, because there's no 
>> guarantee that the associated split happens before RA.  In this case 
>> it should be safe to reuse operand 0 after RA if you change it to:
>
> The goal is to only create and split this pattern before register allocation.
> It's a transient pattern, combine creates it, and it is split immediately 
> after.
>
> Maybe !reload_completed is clearer as that is what several other 
> define_insn_and_split patterns do?

But the concept of a transient pattern doesn't really exist.  We shouldn't rely 
for correctness on a split being applied before RA.

If all we want to do is match and split something that combine generates, it 
would be better to do it as a pure define_split.

Thanks,
Richard


pr5546v3.patch
Description: pr5546v3.patch


Re: [patch] build xz (instead of bz2) compressed tarballs and diffs

2017-05-23 Thread Matthias Klose
On 15.05.2017 12:04, Markus Trippelsdorf wrote:
> On 2017.05.15 at 16:24 +0200, Jakub Jelinek wrote:
>> On Mon, May 15, 2017 at 04:13:44PM +0200, Markus Trippelsdorf wrote:
>>> On 2017.05.15 at 14:02 +, Joseph Myers wrote:
 The xz manpage warns against blindly using -9 (for which --best is a 
 deprecated alias) because of the implications for memory requirements for 
 decompressing.  If there's a reason it's considered appropriate here, I 
 think it needs an explanatory comment.
>>>
>>> I think it is unacceptable, because it would increase memory usage when
>>> decompressing over 20x compared to bz2 (and over 100x while compressing).
>>
>> The memory using during compressing isn't that interesting as long as it
>> isn't prohibitive for sourceware or the machines RMs use.
>> For the decompression, I guess it matters what is actually the memory needed
>> for decompression the -9 gcc tarball, and compare that to minimal memory
>> requirements to compile (not bootstrap) the compiler using typical system
>> compilers.  If compilation of gcc takes more memory than the decompression,
>> then it should be fine, why would anyone try to decompress gcc not to build
>> it afterwards?
> 
> Ok, it doesn't really matter. With gcc-7.1 tarball:
> 
> size: 533084160 (uncompressed)
> 
> -9:
>  xz -d gcc.tar.xz
> 4.71user 0.26system 0:04.97elapsed 100%CPU (0avgtext+0avgdata 
> 67804maxresident)k
> size: 60806928
> 
> -6 (default):
>  xz -d gcc.tar.xz
> 4.88user 0.28system 0:05.17elapsed 99%CPU (0avgtext+0avgdata 
> 10324maxresident)k
> size: 65059664
> 
> So -9 is actually just fine.

ok, updated the script to use xz --best by default. trunk and the gcc-7-branch.

Matthias


Re: [patch] build xz (instead of bz2) compressed tarballs and diffs

2017-05-23 Thread Matthias Klose
On 18.05.2017 03:34, Richard Biener wrote:
> On Mon, May 15, 2017 at 3:11 AM, Matthias Klose  wrote:
>> As discussed on IRC with Jakub and Richard here are is a small patch which
>> builds xz compressed tarballs and diff files.
>>
>> Tested with
>>
>>   maintainer-scripts/gcc_release \
>> -s snap:trunk -p  diffs sources tarfiles
>>   maintainer-scripts/gcc_release \
>> -s snap:trunk -p  diffs sources tarfiles
>>
>> and checked that the new tarball and diff files are compressed using xz.
>>
>> Ok for the trunk and the gcc-7-branch?
> 
> Ok.  The version on trunk can get the bz2 old-tar support removed after the 
> next
> snapshot generation I  think.  Likewise the branch version after 7.2
> was released.

Looks like the copy of the script on gcc.gnu.org affects all active branches.
the May 23 GCC 5 snapshot was created successfully.  Is this acceptable? If yes,
then the patch should probably go to the 5 and 6 branches as well.

Please copy the script again to enable the xz --best compression.

Matthias


Go patch committed: detect failure to set init priority

2017-05-23 Thread Ian Lance Taylor
This patch to the Go frontend detects a failure to set the init
priority for a package.  This would have caught the bug fixed in the
patch in https://golang.org/cl/43610 aka
https://gcc.gnu.org/ml/gcc-patches/2017-05/msg01416.html .
Bootstrapped and ran Go tests on x86_64-pc-linux-gnu.  Committed to
mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 248248)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-369e1efe19adfc5393d2235992327f39360e0554
+ec49c69df1df4d62f3751fcd7e930d6508d67bf2
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/gogo.cc
===
--- gcc/go/gofrontend/gogo.cc   (revision 248081)
+++ gcc/go/gofrontend/gogo.cc   (working copy)
@@ -689,7 +689,13 @@ Gogo::init_imports(std::vectorimported_init_fns_.begin();
p != this->imported_init_fns_.end();
++p)
-v.push_back(*p);
+{
+  if ((*p)->priority() < 0)
+   go_error_at(Linemap::unknown_location(),
+   "internal error: failed to set init priority for %s",
+   (*p)->package_name().c_str());
+  v.push_back(*p);
+}
   std::sort(v.begin(), v.end(), priority_compare);
 
   // We build calls to the init functions, which take no arguments.


Re: RFA: PATCH to add id_strcmp helper function

2017-05-23 Thread Jason Merrill
On Thu, May 18, 2017 at 11:03 PM, Martin Sebor  wrote:
> On 05/18/2017 08:30 PM, Jason Merrill wrote:
>>
>> I got tired of writing strcmp (IDENTIFIER_POINTER and decided to wrap
>> it in an inline function.  I decided to use "id_strcmp" instead of
>> just overloading strcmp, but I don't feel strongly about that choice.
>>
>> The second patch changes all existing uses of that pattern to use the
>> new function.
>>
>> OK for trunk?
>
>
> Since all the uses are of the form !id_strcmp(), would taking
> a step further and introducing a bool id_equal() be going too
> far?
>
> Besides being (arguably) easier to read, it would get around
> the question of whether it should be !id_strcmp() or
> id_strcmp == 0, or perhaps even 0 == id_strcmp().

Makes sense.

OK for trunk?
commit ca96d8533f808e8b8ebcd35ee699a90dcc2bf8cf
Author: Jason Merrill 
Date:   Thu May 18 15:23:27 2017 -0400

* tree.h (id_equal): New.

* dwarf2out.c, hsa-gen.c, ipa-devirt.c, omp-expand.c,
omp-simd-clone.c, read-rtl-function.c, tree-chkp.c, tree.c: Use it
instead of strcmp of IDENTIFIER_POINTER.
c-family/
* c-ada-spec.c, c-pragma.c: Use it.
cp/
* cp-tree.h, decl2.c, mangle.c, parser.c, pt.c, semantics.c: Use it.
* error.c: Use is_this_parameter.

diff --git a/gcc/c-family/c-ada-spec.c b/gcc/c-family/c-ada-spec.c
index 18c5ccf..6cf298a 100644
--- a/gcc/c-family/c-ada-spec.c
+++ b/gcc/c-family/c-ada-spec.c
@@ -1799,7 +1799,7 @@ is_char_array (tree t)
 
   tmp = TREE_TYPE (tmp);
   return num_dim == 1 && TREE_CODE (tmp) == INTEGER_TYPE
-&& !strcmp (IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (tmp))), "char");
+&& id_equal (DECL_NAME (TYPE_NAME (tmp)), "char");
 }
 
 /* Dump in BUFFER an array type T in Ada syntax.  Assume that the "type"
diff --git a/gcc/c-family/c-pragma.c b/gcc/c-family/c-pragma.c
index bc36626..48b02b8 100644
--- a/gcc/c-family/c-pragma.c
+++ b/gcc/c-family/c-pragma.c
@@ -514,7 +514,7 @@ handle_pragma_redefine_extname (cpp_reader * ARG_UNUSED 
(dummy))
  const char *name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME 
(decl));
  name = targetm.strip_name_encoding (name);
 
- if (strcmp (name, IDENTIFIER_POINTER (newname)))
+ if (!id_equal (newname, name))
warning (OPT_Wpragmas, "#pragma redefine_extname ignored due to 
"
 "conflict with previous rename");
}
@@ -587,7 +587,7 @@ maybe_apply_renaming_pragma (tree decl, tree asmname)
if (DECL_NAME (decl) == p->oldname)
  {
/* Only warn if there is a conflict.  */
-   if (strcmp (IDENTIFIER_POINTER (p->newname), oldname))
+   if (!id_equal (p->newname, oldname))
  warning (OPT_Wpragmas, "#pragma redefine_extname ignored due to "
   "conflict with previous rename");
 
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 98ef023..7c46cad 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -2960,7 +2960,7 @@ struct GTY(()) lang_decl {
template function.  */
 #define DECL_PRETTY_FUNCTION_P(NODE) \
   (DECL_NAME (NODE) \
-   && !strcmp (IDENTIFIER_POINTER (DECL_NAME (NODE)), "__PRETTY_FUNCTION__"))
+   && id_equal (DECL_NAME (NODE), "__PRETTY_FUNCTION__"))
 
 /* Nonzero if the variable was declared to be thread-local.
We need a special C++ version of this test because the middle-end
diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index 85310e0..29883da 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -827,7 +827,7 @@ grokfield (const cp_declarator *declarator,
}
 
   if (IDENTIFIER_POINTER (name)[0] == '_'
- && ! strcmp (IDENTIFIER_POINTER (name), "_vptr"))
+ && id_equal (name, "_vptr"))
error ("member %qD conflicts with virtual function table field name",
   value);
 }
diff --git a/gcc/cp/error.c b/gcc/cp/error.c
index 1ae25bb..b65cee4 100644
--- a/gcc/cp/error.c
+++ b/gcc/cp/error.c
@@ -2149,8 +2149,7 @@ dump_expr (cxx_pretty_printer *pp, tree t, int flags)
flags | TFF_EXPR_IN_PARENS);
pp_cxx_dot (pp);
  }
-   else if (TREE_CODE (ob) != PARM_DECL
-|| strcmp (IDENTIFIER_POINTER (DECL_NAME (ob)), "this"))
+   else if (!is_this_parameter (ob))
  {
dump_expr (pp, ob, flags | TFF_EXPR_IN_PARENS);
pp_cxx_arrow (pp);
@@ -2231,9 +2230,7 @@ dump_expr (cxx_pretty_printer *pp, tree t, int flags)
if (INDIRECT_REF_P (ob))
  {
ob = TREE_OPERAND (ob, 0);
-   if (TREE_CODE (ob) != PARM_DECL
-   || (DECL_NAME (ob)
-   && strcmp (IDENTIFIER_POINTER (DECL_NAME (ob)), "this")))
+   if (!is_this_parameter (ob))
  {
dump_expr (pp, ob, flags | TFF_EXPR_IN_PARENS);
if (TREE_CODE (TREE_TYPE (ob)) == REFERENCE_TYPE)
diff --git a/gcc

C++ PATCH to -Wunused with C++17 structured bindings

2017-05-23 Thread Jason Merrill
Someone on IRC complained that there was no way to suppress -Wunused
on structured bindings.  It seemed to me that the way the feature
works, we shouldn't warn about the bindings individually; users need
to give each of the subobjects a name even if they're only interested
in using one of them.

So this patch switches to tracking whether the underlying aggregate
object as a whole is used; using one of the bindings will avoid any
warning.

This doesn't apply to tuple structured bindings, since in that case
the bindings are actual variables rather than aliases to subobjects.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit a10b737bee6f269c6d6cf2a668d03fb322e1c45e
Author: Jason Merrill 
Date:   Thu May 11 13:30:24 2017 -0400

-Wunused and C++17 structured bindings

* decl.c (poplevel): Don't warn about unused structured bindings,
only real variables.
* error.c (dump_simple_decl): Handle structured bindings.
* expr.c (mark_exp_read): Look through DECL_VALUE_EXPR.

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 5877f37..afd47bb 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -656,7 +656,10 @@ poplevel (int keep, int reverse, int functionbody)
if (VAR_P (decl)
&& (! TREE_USED (decl) || !DECL_READ_P (decl))
&& ! DECL_IN_SYSTEM_HEADER (decl)
-   && DECL_NAME (decl) && ! DECL_ARTIFICIAL (decl)
+   /* For structured bindings, consider only real variables, not
+  subobjects.  */
+   && (DECL_DECOMPOSITION_P (decl) ? !DECL_VALUE_EXPR (decl)
+   : (DECL_NAME (decl) && !DECL_ARTIFICIAL (decl)))
&& type != error_mark_node
&& (!CLASS_TYPE_P (type)
|| !TYPE_HAS_NONTRIVIAL_DESTRUCTOR (type)
diff --git a/gcc/cp/error.c b/gcc/cp/error.c
index b65cee4..66a4b60 100644
--- a/gcc/cp/error.c
+++ b/gcc/cp/error.c
@@ -992,6 +992,8 @@ dump_simple_decl (cxx_pretty_printer *pp, tree t, tree 
type, int flags)
   else
dump_decl (pp, DECL_NAME (t), flags);
 }
+  else if (DECL_DECOMPOSITION_P (t))
+pp_string (pp, M_(""));
   else
 pp_string (pp, M_(""));
   if (flags & TFF_DECL_SPECIFIERS)
diff --git a/gcc/cp/expr.c b/gcc/cp/expr.c
index 77af54e..75e99e5 100644
--- a/gcc/cp/expr.c
+++ b/gcc/cp/expr.c
@@ -133,6 +133,9 @@ mark_exp_read (tree exp)
   switch (TREE_CODE (exp))
 {
 case VAR_DECL:
+  if (DECL_VALUE_EXPR (exp))
+   mark_exp_read (DECL_VALUE_EXPR (exp));
+  gcc_fallthrough ();
 case PARM_DECL:
   DECL_READ_P (exp) = 1;
   break;
diff --git a/gcc/testsuite/g++.dg/cpp1z/decomp29.C 
b/gcc/testsuite/g++.dg/cpp1z/decomp29.C
new file mode 100644
index 000..daf07a0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/decomp29.C
@@ -0,0 +1,26 @@
+// { dg-options "-std=c++17 -Wunused" }
+
+#include 
+
+struct A { int i,j,k; };
+
+A f();
+
+int z;
+
+int main()
+{
+  {
+auto [i,j,k] = f();// { dg-warning "unused" }
+  }
+  {
+auto [i,j,k] = f();
+z = i;
+  }
+  {
+auto [i,j] = std::tuple{1,2}; // { dg-warning "unused" }
+  }
+  // No parallel second test, because in this case i and j are variables rather
+  // than mere bindings, so there isn't a link between them and using i will
+  // not prevent a warning about unused j.
+}


Re: MinGW compilation warnings in libiberty's waitpid.c

2017-05-23 Thread Eli Zaretskii
> From: DJ Delorie 
> Cc: gcc-patches@gcc.gnu.org, gdb-patc...@sourceware.org
> Date: Tue, 23 May 2017 15:37:32 -0400
> 
> 
> Eli Zaretskii  writes:
> > Instead of making waitpid an always-failing stub on MinGW, wouldn't it
> > be better to make it work on MinGW?  Like this:
> 
> That's up to you, if it's target-specific.

Then I prefer this solution.

> What about mingw64?

It will be covered as well, because it defines __MINGW32__, and
because _cwait is in the MS runtime, so available to MinGW64, too.

Is there anything else I need to do about this part to get it solved
in the upstream repository?

Thanks.


Re: MinGW compilation warnings in libiberty's waitpid.c

2017-05-23 Thread DJ Delorie
Eli Zaretskii  writes:
>> What about mingw64?
>
> It will be covered as well, because it defines __MINGW32__,

Ah, OK.

> Is there anything else I need to do about this part to get it solved
> in the upstream repository?

A ChangeLog entry would be nice, so I don't have to write one ;-)


Re: PR80806

2017-05-23 Thread Martin Sebor

I attach an updated patch that actually bootstraps and (with
one minor exception) passes regression tests.  It's a C front
end-only proof of concept for now.  The complete patch will
include attributes read-only and read-write and support C++
of course.  I think the optimization bits can be done in
a subsequent pass.

Martin

On 05/23/2017 09:58 AM, Martin Sebor wrote:

On 05/18/2017 12:55 PM, Prathamesh Kulkarni wrote:

Hi,
The attached patch tries to fix PR80806 by warning when a variable is
set using memset (and friends) but not used. I chose to warn in dse
pass since dse would detect if the variable passed as 1st argument is
a dead store. Does this approach look OK ?


Detecting -Wunused-but-set-variable in the optimizer means that
the warning will not be issued without optimization.  It also
means that the warning will trigger in cases where the variable
is used conditionally and the condition is subject to constant
propagation.  For instance:

  void sink (void*);

  void test (int i)
  {
  char buf[10];   // -Wunused-but-set-variable
  memset (buf, 0, sizeof(buf));

  if (i)
sink (buf);
  }

  void f (void)
  {
  test (0);
  }

I suspect this would be considered a false positive by most users.
In my view, it would be more in line with the design of the warning
to enhance the front end to detect this case, and it would avoid
these issues.

I have a patch that does that.  Rather than checking the finite
set of known built-in functions like memset that are known not
to read the referenced object, I took the approach of adding
a new  function attribute (I call it write-only) and avoiding
setting the DECL_READ_P flag for DECLs that are passed to
function arguments decorated with the attribute.  That makes
it possible to issue the warning even if the variable is passed
to ordinary (non-built-in) functions like getline(), and should
open up optimization opportunities beyond built-ins.  The only
wrinkle is that the front end sets DECL_READ_P even for uses that
aren't reads such as a sizeof expression, so while an otherwise
unused buf is diagnosed given a call to memset(buf, 0, 10), it
isn't diagnosed if a call is made to memset(buf, 0, sizeof buf).
I am yet to see what impact not setting DECL_READ_P would have
when the decl is used without being evaluated.  (In any event,
setting DECL_READ_P on a use that doesn't involve reading the
DECL doesn't seem right.)

I attach what I have so far in case you would like to check it
out.  I think you have more experience with DSE than me so I'd
be interested in your thoughts on making use of the attribute
for optimization.  (Another couple attributes I'm considering
to complement write-only is read-only and read-write, also
with the hope of improving both warnings and code generation.
Ideas on those would be welcome as well.)



There were following fallouts observed during bootstrap build:

* double-int.c (div_and_round_double):
Warning emitted 'den' set but not used for following call to memset:
memset (den, 0, sizeof den);

I assume the warning is correct since there's immediately call to:
encode (den, lden, hden);

and encode overwrites all the contents of den.
Should the above call to memset be removed from the source ?


IIUC, this seems to be a more involved example of the simple one
above.  AFAICS, the buffer is subsequently read in the function,
but only conditionally.

That said, since encode() overwrites the whole buffer right after
it has been cleared by memset, I would think that a warning pointing
that out could be helpful (although I'm not sure -Wunused is the
right warning to issue).



* tree-streamer.c (streamer_check_handled_ts_structures)
The function defines a local array bool handled_p[LAST_TS_ENUM];
and the warning is emitted for:
memset (&handled_p, 0, sizeof (handled_p));

That's because the function then initializes each element of the array
handled_p to true
making the memset call redundant.
I am not sure if warning for the above case is a good idea ? The call
to memset() seems deliberate, to initialize all elements to 0, and
later assert checks if all the elements were explicitly set to true.


Right.  Warning on this function doesn't seem right since
the variable is subsequently used in the test loop.



* value-prof.c (free_hist):
Warns for the call to memset:

static int
free_hist (void **slot, void *data ATTRIBUTE_UNUSED)
{
  histogram_value hist = *(histogram_value *) slot;
  free (hist->hvalue.counters);
  if (flag_checking)
memset (hist, 0xab, sizeof (*hist));
  free (hist);
  return 1;
}

Assuming flag_checking was true, the call to memset would be dead
anyway since it would be immediately freed ? Um, I don't understand
the purpose of memset in the above function.


I'm guessing it's an effort to invalidate the memory to detect
subsequent accesses to its contents.  Warning on such code would
be valuable because it's a common misconception that a memset
can be used to wipe sensitive data (e.g., passwords) 

Re: [PATCH] [i386] Recompute the frame layout less often

2017-05-23 Thread Daniel Santos

On 05/23/2017 09:31 AM, Bernd Edlinger wrote:

Hi,

this is the latest version of my patch.

As already said, it attempts to compute
the frame layout only when relevant data have
changed.

Apologies for doing more clean-up on Daniel's
patch than absolutely necessary, but ...

Bootstrap and reg-tested successfully on
x86_64-pc-linux-gnu with unix\{,-m32\}.
Is it OK for trunk?


Thanks
Bernd.


OK with me.

Thanks,
Daniel


  1   2   >