Re: [PATCH], Define __FP_FAST_FMAF128 on PowerPC ISA 3.0

2017-09-27 Thread Joseph Myers
On Wed, 27 Sep 2017, Michael Meissner wrote:

> The glibc team has requested we define the standard macro (__FP_FAST_FMAF128)
> for PowerPC code when we have the IEEE 128-bit floating point hardware
> instructions enabled.

It's not a standard macro.  TS 18661-3 has FP_FAST_FMAF128 as an optional 
math.h macro (but glibc doesn't define it anywhere at present).

> This patch does this in the PowerPC backend.  As I look at the whole issue, at
> some point we should do this more in the machine independent portion of the
> compiler.  I have some initial patches to do this in the c-family files, but 
> at
> the present time, the patches are not complete, and I need to think about it
> more.

I think a machine-independent definition (for _FloatN / _FloatNx types in 
general) should go along with machine-independent fmafN / fmafNx built-in 
functions; when the built-in function is machine-specific, it's natural 
for the macro to be as well.

But in any case, the new macro should be documented in cpp.texi alongside 
the existing __FP_FAST_FMA* macros (probably in the generic 
__FP_FAST_FMAF@var{n} and __FP_FAST_FMAF@var{n}X form).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Enhance PHI processing in VN

2017-09-27 Thread David Edelsohn
On Wed, Sep 27, 2017 at 6:58 PM, Richard Sandiford
 wrote:
> David Edelsohn  writes:
>> On Fri, Sep 15, 2017 at 2:53 AM, Richard Biener  wrote:
>>> On Thu, 14 Sep 2017, David Edelsohn wrote:
>>>
 * tree-ssa-sccvn.c (visit_phi): Merge undefined values similar
 to VN_TOP.

 This seems to have regressed

 FAIL: gcc.dg/tree-prof/time-profiler-2.c scan-ipa-dump-times profile
 "Read tp_first_run: 0" 2
 FAIL: gcc.dg/tree-prof/time-profiler-2.c scan-ipa-dump-times profile
 "Read tp_first_run: 2" 1
 FAIL: gcc.dg/tree-prof/time-profiler-2.c scan-ipa-dump-times profile
 "Read tp_first_run: 3" 1
>>>
>>> Hmm, I don't see these FAILs.  Looking at the testcase there are
>>> no undefined uses so I wonder how the patch could have any effect.
>>>
>>> Can you re-check and open a bugreport?
>>
>> It disappeared again.  A different failure appeared and disappeared a
>> few weeks ago.  Something in the testsuite infrastructure appears to
>> not be stable, at least on AIX.  Sorry for the incorrect report.
>
> Perhaps this is unrelated, but when doing the "has this patch
> changed assembly on these targets?" testing, I noticed that AIX
> had differences like:
>
> --- old/powerpc-ibm-aix7.0/test/-O3/g++.dg/init/constant1.s
> +++ new/powerpc-ibm-aix7.0/test/-O3/g++.dg/init/constant1.s
> @@ -4,21 +4,21 @@
> .csect ..text.startup[PR],2
> .align 2
> .align 4
> -   .globl 
> _GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0xbb0d20e181e3a401
> -   .globl 
> ._GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0xbb0d20e181e3a401
> -   .csect 
> _GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0xbb0d20e181e3a401[DS]
> -_GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0xbb0d20e181e3a401:
> -   .long 
> ._GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0xbb0d20e181e3a401,
>  TOC[tc0], 0
> +   .globl 
> _GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0x5610f7ec143966c9
> +   .globl 
> ._GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0x5610f7ec143966c9
> +   .csect 
> _GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0x5610f7ec143966c9[DS]
> +_GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0x5610f7ec143966c9:
> +   .long 
> ._GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0x5610f7ec143966c9,
>  TOC[tc0], 0
>
> even though -frandom-seed is forced to the same value for both runs.
>
> Is this behaviour deliberate?  I thought runs from a few weeks ago
> had stable names, but maybe I just misremember.

AIX does not have init/fini sections, so it is built at link time as a
C file by collect2-ld.  I suspect that collect2-ld doesn't use
-frandom-seed to build its temporary file.

- David


[committed] jit: implement gcc_jit_function_get_address

2017-09-27 Thread David Malcolm
On Fri, 2017-09-22 at 11:21 +0200, Bartosz Szreder wrote:
> Hello David,
> 
> > > 1. The documentation doesn't mention existence of
> > > gcc_jit_context_new_function_ptr_type() as a mechanism of
> > > handling
> > > function pointers, yet contains
> > > gcc_jit_context_new_call_through_ptr().
> > 
> > [...]
> > 
> > It's just missing documentation.  I'm working on fixing it.
> > Have a look at gcc/testsuite/jit.dg/test-calling-function-ptr.c
> 
> Thanks, this is very helpful!
> 
> > > - What is the proper way of obtaining a function pointer to be
> > > passed
> > > to gcc_jit_context_new_call_through_ptr()? There doesn't seem to
> > > be
> > > any counterpart to gcc_jit_lvalue_get_address() for functions. As
> > > the
> > > name suggests, gcc_jit_lvalue_get_address() works on an L-value
> > > and
> > > gcc_jit_function type isn't an ancestor of gcc_jit_lvalue in the
> > > internal type system, therefore upcasting is impossible.
> > 
> > [...]
> > 
> > If it's a function that's being compiled as part of the same
> > gcc_jit_context, then I think you've identified a weakness in the
> > current API.  As you say, one fix would be to make gcc_jit_function
> > be
> > a subclass of gcc_jit_lvalue (and add casting functions); I don't
> > yet
> > know how easy that would be to implement.
> 
> Yes, this is the exact use case I'm thinking of.
> 
> From an user's point of view, I believe making gcc_jit_function a
> subclass of gcc_jit_lvalue makes for an awkward design. Maybe
> something like this would be easier to implement without upending the
> current hierarchy:
> 
> gcc_jit_rvalue *gcc_jit_context_function_get_address(gcc_jit_context
> *ctxt, gcc_jit_function *fun);
> 
> Thanks,
> Bartosz Szreder

Thanks.

I ended up adding the following:

  extern gcc_jit_rvalue *
  gcc_jit_function_get_address (gcc_jit_function *fn,
gcc_jit_location *loc);

FWIW I initially attempted to make functions be a subclass of rvalue
internally (within jit-recording.*, but ran into difficulties with
function::write_reproducer, which was having to do double-duty:
printing the body of the function AND printing the name of the function);
doing it all as a new API of functions ended up being simpler.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu;
takes jit.sum from 9789 to 9889 PASS results.

Committed to trunk as r253244.

gcc/jit/ChangeLog:
* docs/cp/topics/expressions.rst (Function pointers): New section.
* docs/topics/compatibility.rst (LIBGCCJIT_ABI_9): New tag.
* docs/topics/expressions.rst (Function pointers): New section.
* docs/_build/texinfo/libgccjit.texi: Regenerate.
* jit-common.h (class gcc::jit::recording::function_pointer): New
forward decl.
* jit-playback.c (gcc::jit::playback::function::get_address): New
method.
* jit-playback.h (gcc::jit::playback::function::get_address): New
method decl.
* jit-recording.c: Within namespace gcc::jit::recording...
(function::function): Initialize new field "m_fn_ptr_type".
(function::get_address): New method.
(function_pointer::replay_into): New method.
(function_pointer::visit_children): New method.
(function_pointer::make_debug_string): New method.
(function_pointer::write_reproducer): New method.
* jit-recording.h: Within namespace gcc::jit::recording...
(function::get_address): New method.
(function): Add field "m_fn_ptr_type".
(class function_pointer): New subclass of rvalue.
* libgccjit++.h (gccjit::function::get_address): New method.
* libgccjit.c (gcc_jit_function_get_address): New function.
* libgccjit.h (LIBGCCJIT_HAVE_gcc_jit_function_get_address): New
macro.
(gcc_jit_function_get_address): New API entrypoint.
* libgccjit.map (LIBGCCJIT_ABI_9): New tag.

gcc/testsuite/ChangeLog:
* jit.dg/all-non-failing-tests.h: Add
test-returning-function-ptr.c.
* jit.dg/test-returning-function-ptr.c: New test case.
---
 gcc/jit/docs/cp/topics/expressions.rst |   9 ++
 gcc/jit/docs/topics/compatibility.rst  |   7 +
 gcc/jit/docs/topics/expressions.rst|  17 +++
 gcc/jit/jit-common.h   |   1 +
 gcc/jit/jit-playback.c |  14 ++
 gcc/jit/jit-playback.h |   3 +
 gcc/jit/jit-recording.c|  77 ++-
 gcc/jit/jit-recording.h|  29 +
 gcc/jit/libgccjit++.h  |   9 ++
 gcc/jit/libgccjit.c|  20 +++
 gcc/jit/libgccjit.h|  15 +++
 gcc/jit/libgccjit.map  |   5 +
 gcc/testsuite/jit.dg/all-non-failing-tests.h   |  10 ++
 gcc/testsuite/jit.dg/test-returning-function-ptr.c | 143 +
 14 files changed, 

Re: [PATCH][mingw] Enable colorized diagnostics

2017-09-27 Thread JonY
On 09/27/2017 08:54 PM, Liu Hao wrote:
> On 2017/9/28 4:09, Joseph Myers wrote:
>> On Thu, 28 Sep 2017, Liu Hao wrote:
>>
>>> Colorized diagnostics used to be disabled for MinGW targets (on which
>>> the macro `_WIN32` is defined), and this patch enables it.
>>
>> I'd hope this is all to do with MinGW host, and nothing to do with the
>> target.
>>
> Oh you are right. Since I build native compilers, host == target here.
> But strictly speaking the patch applies to the host, not the target.
> 

Does it make sense to use a global lock in mingw_ansi_fputs?



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] Enhance PHI processing in VN

2017-09-27 Thread Richard Sandiford
David Edelsohn  writes:
> On Fri, Sep 15, 2017 at 2:53 AM, Richard Biener  wrote:
>> On Thu, 14 Sep 2017, David Edelsohn wrote:
>>
>>> * tree-ssa-sccvn.c (visit_phi): Merge undefined values similar
>>> to VN_TOP.
>>>
>>> This seems to have regressed
>>>
>>> FAIL: gcc.dg/tree-prof/time-profiler-2.c scan-ipa-dump-times profile
>>> "Read tp_first_run: 0" 2
>>> FAIL: gcc.dg/tree-prof/time-profiler-2.c scan-ipa-dump-times profile
>>> "Read tp_first_run: 2" 1
>>> FAIL: gcc.dg/tree-prof/time-profiler-2.c scan-ipa-dump-times profile
>>> "Read tp_first_run: 3" 1
>>
>> Hmm, I don't see these FAILs.  Looking at the testcase there are
>> no undefined uses so I wonder how the patch could have any effect.
>>
>> Can you re-check and open a bugreport?
>
> It disappeared again.  A different failure appeared and disappeared a
> few weeks ago.  Something in the testsuite infrastructure appears to
> not be stable, at least on AIX.  Sorry for the incorrect report.

Perhaps this is unrelated, but when doing the "has this patch
changed assembly on these targets?" testing, I noticed that AIX
had differences like:

--- old/powerpc-ibm-aix7.0/test/-O3/g++.dg/init/constant1.s
+++ new/powerpc-ibm-aix7.0/test/-O3/g++.dg/init/constant1.s
@@ -4,21 +4,21 @@
.csect ..text.startup[PR],2
.align 2
.align 4
-   .globl 
_GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0xbb0d20e181e3a401
-   .globl 
._GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0xbb0d20e181e3a401
-   .csect 
_GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0xbb0d20e181e3a401[DS]
-_GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0xbb0d20e181e3a401:
-   .long 
._GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0xbb0d20e181e3a401,
 TOC[tc0], 0
+   .globl 
_GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0x5610f7ec143966c9
+   .globl 
._GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0x5610f7ec143966c9
+   .csect 
_GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0x5610f7ec143966c9[DS]
+_GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0x5610f7ec143966c9:
+   .long 
._GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0x5610f7ec143966c9,
 TOC[tc0], 0

even though -frandom-seed is forced to the same value for both runs.

Is this behaviour deliberate?  I thought runs from a few weeks ago
had stable names, but maybe I just misremember.

Thanks,
Richard


[PATCH], Define __FP_FAST_FMAF128 on PowerPC ISA 3.0

2017-09-27 Thread Michael Meissner
The glibc team has requested we define the standard macro (__FP_FAST_FMAF128)
for PowerPC code when we have the IEEE 128-bit floating point hardware
instructions enabled.

This patch does this in the PowerPC backend.  As I look at the whole issue, at
some point we should do this more in the machine independent portion of the
compiler.  I have some initial patches to do this in the c-family files, but at
the present time, the patches are not complete, and I need to think about it
more.

So, I would like to check in this patch now, and if we come up with a machine
independent version, we can back out this particular patch.  I have done a full
bootstrap and regression test, there were no regressions, and the new test case
does run correctly.  Can I check this into the trunk?

[gcc]
2017-09-27  Michael Meissner  

* config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Define
__FP_FAST_FMAF128 on ISA 3.0.

[gcc/testsuite]
2017-09-27  Michael Meissner  

* gcc.target/powerpc/float128-fma3.c: New test.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000-c.c
===
--- gcc/config/rs6000/rs6000-c.c(revision 253236)
+++ gcc/config/rs6000/rs6000-c.c(working copy)
@@ -585,7 +585,10 @@ rs6000_target_modify_macros (bool define
   /* OPTION_MASK_FLOAT128_HARDWARE can be turned on if -mcpu=power9 is used or
  via the target attribute/pragma.  */
   if ((flags & OPTION_MASK_FLOAT128_HW) != 0)
-rs6000_define_or_undefine_macro (define_p, "__FLOAT128_HARDWARE__");
+{
+  rs6000_define_or_undefine_macro (define_p, "__FLOAT128_HARDWARE__");
+  rs6000_define_or_undefine_macro (define_p, "__FP_FAST_FMAF128");
+}
 
   /* options from the builtin masks.  */
   /* Note that RS6000_BTM_PAIRED is enabled only if
Index: gcc/testsuite/gcc.target/powerpc/float128-fma3.c
===
--- gcc/testsuite/gcc.target/powerpc/float128-fma3.c(nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/float128-fma3.c(working copy)
@@ -0,0 +1,33 @@
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mpower9-vector -O2" } */
+
+/* Make sure the appropriate FMA fast macros are defined.  */
+
+#ifdef __FP_FAST_FMAF
+float
+do_fmaf (float a, float b, float c)
+{
+  return __builtin_fmaf (a, b, c);
+}
+#endif
+
+#ifdef __FP_FAST_FMA
+double
+do_fma (double a, double b, double c)
+{
+  return __builtin_fma (a, b, c);
+}
+#endif
+
+#ifdef __FP_FAST_FMAF128
+_Float128
+do_fmaf128 (_Float128 a, _Float128 b, _Float128 c)
+{
+  return __builtin_fmaf128 (a, b, c);
+}
+#endif
+
+/* { dg-final { scan-assembler {\mfmadds\M|\mxsmadd.sp\M}  } } */
+/* { dg-final { scan-assembler {\mfmadd\M|\mxsmadd.dp\M}   } } */
+/* { dg-final { scan-assembler {\mxsmaddqp\M}  } } */


Re: [PATCH][mingw] Enable colorized diagnostics

2017-09-27 Thread Liu Hao

On 2017/9/28 4:09, Joseph Myers wrote:

On Thu, 28 Sep 2017, Liu Hao wrote:


Colorized diagnostics used to be disabled for MinGW targets (on which
the macro `_WIN32` is defined), and this patch enables it.


I'd hope this is all to do with MinGW host, and nothing to do with the
target.

Oh you are right. Since I build native compilers, host == target here. 
But strictly speaking the patch applies to the host, not the target.


--
Best regards,
LH_Mouse



Make tests failing with version namespace UNSUPPORTED

2017-09-27 Thread François Dumont

Hi

    I would like to propose to add a new dg-require-normal-namespace 
attribute to make several tests failing when version namespace is active 
UNSUPPORTED. It is like dg-require-normal-mode but also consider when 
version namespace is being used.


    I still need to complete execution of all tests with version 
namespace but I think that all tests will be ok then.


    I have also updated the code used for dg-require-normal-mode to 
include c++config.h in case users are changing this file to activate any 
mode.


    * testsuite/lib/libstdc++.exp ([check_v3_target_normal_namespace]): 
New.

    * testsuite/lib/dg-options.exp ([dg-require-normal-namespace]): New,
    use latter.
    * testsuite/23_containers/headers/bitset/synopsis.cc: Replace
    dg-require-normal-mode with latter.
    * testsuite/23_containers/headers/deque/synopsis.cc: Likewise.
    * testsuite/23_containers/headers/forward_list/synopsis.cc: Likewise.
    * testsuite/23_containers/headers/list/synopsis.cc: Likewise.
    * testsuite/23_containers/headers/map/synopsis.cc: Likewise.
    * testsuite/23_containers/headers/set/synopsis.cc: Likewise.
    * testsuite/23_containers/headers/vector/synopsis.cc: Likewise.
    * testsuite/23_containers/map/modifiers/erase/abi_tag.cc: Likewise.
    * testsuite/23_containers/multimap/modifiers/erase/abi_tag.cc: 
Likewise.
    * testsuite/23_containers/multiset/modifiers/erase/abi_tag.cc: 
Likewise.

    * testsuite/23_containers/set/modifiers/erase/abi_tag.cc: Likewise.

    Ok to commit ?

François


diff --git a/libstdc++-v3/testsuite/23_containers/headers/bitset/synopsis.cc b/libstdc++-v3/testsuite/23_containers/headers/bitset/synopsis.cc
index 6ef085a..8b3967c 100644
--- a/libstdc++-v3/testsuite/23_containers/headers/bitset/synopsis.cc
+++ b/libstdc++-v3/testsuite/23_containers/headers/bitset/synopsis.cc
@@ -1,5 +1,5 @@
 // { dg-do compile }
-// { dg-require-normal-mode "" }
+// { dg-require-normal-namespace "" }
 
 // Copyright (C) 2007-2017 Free Software Foundation, Inc.
 //
diff --git a/libstdc++-v3/testsuite/23_containers/headers/deque/synopsis.cc b/libstdc++-v3/testsuite/23_containers/headers/deque/synopsis.cc
index aa2b787..0de29c2 100644
--- a/libstdc++-v3/testsuite/23_containers/headers/deque/synopsis.cc
+++ b/libstdc++-v3/testsuite/23_containers/headers/deque/synopsis.cc
@@ -1,5 +1,5 @@
 // { dg-do compile }
-// { dg-require-normal-mode "" }
+// { dg-require-normal-namespace "" }
 
 // Copyright (C) 2007-2017 Free Software Foundation, Inc.
 //
diff --git a/libstdc++-v3/testsuite/23_containers/headers/forward_list/synopsis.cc b/libstdc++-v3/testsuite/23_containers/headers/forward_list/synopsis.cc
index b1792ce..21e79c4 100644
--- a/libstdc++-v3/testsuite/23_containers/headers/forward_list/synopsis.cc
+++ b/libstdc++-v3/testsuite/23_containers/headers/forward_list/synopsis.cc
@@ -1,5 +1,5 @@
 // { dg-do compile { target c++11 } }
-// { dg-require-normal-mode "" }
+// { dg-require-normal-namespace "" }
 
 // Copyright (C) 2008-2017 Free Software Foundation, Inc.
 //
diff --git a/libstdc++-v3/testsuite/23_containers/headers/list/synopsis.cc b/libstdc++-v3/testsuite/23_containers/headers/list/synopsis.cc
index ab22b9f..e8c61aa 100644
--- a/libstdc++-v3/testsuite/23_containers/headers/list/synopsis.cc
+++ b/libstdc++-v3/testsuite/23_containers/headers/list/synopsis.cc
@@ -1,5 +1,5 @@
 // { dg-do compile }
-// { dg-require-normal-mode "" }
+// { dg-require-normal-namespace "" }
 
 // Copyright (C) 2007-2017 Free Software Foundation, Inc.
 //
diff --git a/libstdc++-v3/testsuite/23_containers/headers/map/synopsis.cc b/libstdc++-v3/testsuite/23_containers/headers/map/synopsis.cc
index f4a0826..d74fca1 100644
--- a/libstdc++-v3/testsuite/23_containers/headers/map/synopsis.cc
+++ b/libstdc++-v3/testsuite/23_containers/headers/map/synopsis.cc
@@ -1,5 +1,5 @@
 // { dg-do compile }
-// { dg-require-normal-mode "" }
+// { dg-require-normal-namespace "" }
 
 // Copyright (C) 2007-2017 Free Software Foundation, Inc.
 //
diff --git a/libstdc++-v3/testsuite/23_containers/headers/set/synopsis.cc b/libstdc++-v3/testsuite/23_containers/headers/set/synopsis.cc
index e50a044..58e94bc 100644
--- a/libstdc++-v3/testsuite/23_containers/headers/set/synopsis.cc
+++ b/libstdc++-v3/testsuite/23_containers/headers/set/synopsis.cc
@@ -1,5 +1,5 @@
 // { dg-do compile }
-// { dg-require-normal-mode "" }
+// { dg-require-normal-namespace "" }
 
 // Copyright (C) 2007-2017 Free Software Foundation, Inc.
 //
diff --git a/libstdc++-v3/testsuite/23_containers/headers/vector/synopsis.cc b/libstdc++-v3/testsuite/23_containers/headers/vector/synopsis.cc
index c127b5d..47cbe16 100644
--- a/libstdc++-v3/testsuite/23_containers/headers/vector/synopsis.cc
+++ b/libstdc++-v3/testsuite/23_containers/headers/vector/synopsis.cc
@@ -1,5 +1,5 @@
 // { dg-do compile }
-// { dg-require-normal-mode "" }
+// { dg-require-normal-namespace "" }
 
 // Copyright (C) 2007-2017 Free Software Foundation, Inc.
 //
diff --git 

Re: [Patch][aarch64] Use IFUNCs to enable LSE instructions in libatomic on aarch64

2017-09-27 Thread Steve Ellcey
Ping.

Steve Ellcey
sell...@cavium.com

On Thu, 2017-08-31 at 10:24 -0700, Steve Ellcey wrote:
> On Tue, 2017-08-29 at 12:25 +0100, Szabolcs Nagy wrote:
> > 
> >  
> > in glibc the hwcap is not used, because it has accesses to
> > cached dispatch info, but in libatomic using the hwcap
> > argument is the right way.
> Here is an updated version of the patch to allow aarch64 to use
> ifuncs
> in libatomic.
> 
> The main difference from the last patch is that the library does not
> access the hwcap value directly but accesses it through the ifunc
> resolver argument.  That means that we no longer need the
> init_cpu_revision static constructor to set a flag that the resolver
> checks, instead the resolver just does a comparision of its incoming
> argument with HWCAP_ATOMICS.
> 
> This did mean I had to change the prototype for the resolver
> functions
> in libatomic_i.h to have an argument, which is the way glibc calls
> them.  One complication of this is that the type of the argument can
> differ between platforms and ABIs so I added code to configure.tgt to
> set the type.  I used uint64_t for aarch64 and 'long unsigned int'
> for everything else.  That is not correct for all platforms but at 
> this point no other platforms access the argument so it should not
> matter.  If and when platforms do need to access it they can change
> the type if necessary.
> 
> Steve Ellcey
> sell...@cavium.com
> 
> 
> 2017-08-31  Steve Ellcey  
> 
>   * Makefile.am (ARCH_AARCH64_LINUX_LSE): Add IFUNC_OPTIONS and
>   libatomic_la_LIBADD.
>   * config/linux/aarch64/host-config.h: New file.
>   * configure.ac (HWCAP_TYPE): Define.
>   (AC_CHECK_HEADERS): Check for sys/auxv.h.
>   (AC_CHECK_FUNCS): Check for getauxval.
>   (ARCH_AARCH64_LINUX_LSE): New conditional for IFUNC builds.
>   * configure.tgt (aarch64): Set AARCH and try_ifunc.
>   (aarch64*-*-linux*) Update config_path.
>   (aarch64*-*-linux*) Set HWCAP_TYPE.
>   * libatomic_i.h (GEN_SELECTOR): Add "HWCAP_TYPE hwcap"
> argument.
>   * Makefile.in: Regenerate.
>   * auto-config.h.in: Regenerate.
>   * configure: Regenerate.
> 


Make tests less istreambuf_iterator implementation dependent

2017-09-27 Thread François Dumont

Hi

    I just committed attached patch as trivial.

    Those tests were highly istreambuf_iterator implementation, it is 
the result of the call to money_get<>::get which is pointing immediately 
beyond the last character recognized to quote Standard words.


2017-09-27  François Dumont  

    * testsuite/22_locale/money_get/get/char/22131.cc: Make test less
    istreambuf_iterator implementation dependent.
    * testsuite/22_locale/money_get/get/wchar_t/22131.cc: Likewise.

François

diff --git a/libstdc++-v3/testsuite/22_locale/money_get/get/char/22131.cc b/libstdc++-v3/testsuite/22_locale/money_get/get/char/22131.cc
index 6f363ef..5a05817 100644
--- a/libstdc++-v3/testsuite/22_locale/money_get/get/char/22131.cc
+++ b/libstdc++-v3/testsuite/22_locale/money_get/get/char/22131.cc
@@ -67,7 +67,7 @@ void test01()
   fmt2.imbue(loc);
   InIt ibeg2(fmt2);
   err2 = ios_base::goodbit;
-  mg.get(ibeg2, iend2, intl, fmt2, err2, val2);
+  ibeg2 = mg.get(ibeg2, iend2, intl, fmt2, err2, val2);
   VERIFY( err2 == ios_base::failbit );
   VERIFY( *ibeg2 == '#' );
   VERIFY( val2 == "" );
diff --git a/libstdc++-v3/testsuite/22_locale/money_get/get/wchar_t/22131.cc b/libstdc++-v3/testsuite/22_locale/money_get/get/wchar_t/22131.cc
index 3830cc3..0626e3c 100644
--- a/libstdc++-v3/testsuite/22_locale/money_get/get/wchar_t/22131.cc
+++ b/libstdc++-v3/testsuite/22_locale/money_get/get/wchar_t/22131.cc
@@ -67,7 +67,7 @@ void test01()
   fmt2.imbue(loc);
   InIt ibeg2(fmt2);
   err2 = ios_base::goodbit;
-  mg.get(ibeg2, iend2, intl, fmt2, err2, val2);
+  ibeg2 = mg.get(ibeg2, iend2, intl, fmt2, err2, val2);
   VERIFY( err2 == ios_base::failbit );
   VERIFY( *ibeg2 == L'#' );
   VERIFY( val2 == L"" );


Re: [PATCH][mingw] Enable colorized diagnostics

2017-09-27 Thread Joseph Myers
On Thu, 28 Sep 2017, Liu Hao wrote:

> Colorized diagnostics used to be disabled for MinGW targets (on which 
> the macro `_WIN32` is defined), and this patch enables it.

I'd hope this is all to do with MinGW host, and nothing to do with the 
target.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH] Fix fortran/81509

2017-09-27 Thread Steve Kargl
The attached patch fixes PR fortran/81509.

In short, F2008 now allows boz-literal-constants in IAND, IOR, IEOR,
DSHIFTL, DSHIFTR, and MERGE_BITS.  gfortran currently allows BOZ
argument, but she was not enforcing restrictions in F2008.  The
attach patch causes gfortran to conform to F2008.

As aside effect, the patch removes a questionable GNU Fortran
extension that allowed arguments to IAND, IOR, and IEOR to have
different kind type parameters.  The behavior of this extension
was not documented.

2017-09-27  Steven G. Kargl  

PR fortran/81509
* check.c: Rename function gfc_check_iand to gfc_check_iand_ieor_ior.
* check.c (boz_args_check): New function.  Check I and J not both BOZ.
(gfc_check_dshift,gfc_check_iand_ieor_ior, gfc_check_ishft,
 gfc_check_and, gfc_check_merge_bits): Use it.
* check.c (gfc_check_iand_ieor_ior): Force conversion of BOZ to kind
type of other agrument.  Remove silly GNU extension.
(gfc_check_ieor, gfc_check_ior): Delete now unused functions.
* intrinsic.c (add_functions): Use gfc_check_iand_ieor_ior. Wrap long
line.
* intrinsic.h: Rename gfc_check_iand to gfc_check_iand_ieor_ior.
Delete prototype for bool gfc_check_ieor and gfc_check_ior
* intrinsic.texi: Update documentation for boz-literal-constant.

2017-09-27  Steven G. Kargl  

PR fortran/81509
* gfortran.dg/graphite/id-26.f03: Fix non-conforming use of IAND.
* gfortran.dg/pr81509_1.f90: New test.
* gfortran.dg/pr81509_2.f90: New test.

-- 
Steve
20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4
20161221 https://www.youtube.com/watch?v=IbCHE-hONow
Index: gcc/fortran/check.c
===
--- gcc/fortran/check.c	(revision 253236)
+++ gcc/fortran/check.c	(working copy)
@@ -2101,6 +2101,21 @@ gfc_check_dprod (gfc_expr *x, gfc_expr *y)
 }
 
 
+static bool
+boz_args_check(gfc_expr *i, gfc_expr *j)
+{
+  if (i->is_boz && j->is_boz)
+{
+  gfc_error ("Arguments of %qs at %L and %L cannot both be BOZ "
+		 "literal constants", gfc_current_intrinsic, >where,
+		 >where);
+  return false;
+
+}
+  return true;
+}
+
+
 bool
 gfc_check_dshift (gfc_expr *i, gfc_expr *j, gfc_expr *shift)
 {
@@ -2110,12 +2125,8 @@ gfc_check_dshift (gfc_expr *i, gfc_expr *j, gfc_expr *
   if (!type_check (j, 1, BT_INTEGER))
 return false;
 
-  if (i->is_boz && j->is_boz)
-{
-  gfc_error ("% at %L and %' at %L cannot both be BOZ literal "
-		   "constants", >where, >where);
-  return false;
-}
+  if (!boz_args_check (i, j))
+return false;
 
   if (!i->is_boz && !j->is_boz && !same_type_check (i, 0, j, 1))
 return false;
@@ -2361,7 +2372,7 @@ gfc_check_i (gfc_expr *i)
 
 
 bool
-gfc_check_iand (gfc_expr *i, gfc_expr *j)
+gfc_check_iand_ieor_ior (gfc_expr *i, gfc_expr *j)
 {
   if (!type_check (i, 0, BT_INTEGER))
 return false;
@@ -2369,10 +2380,16 @@ gfc_check_iand (gfc_expr *i, gfc_expr *j)
   if (!type_check (j, 1, BT_INTEGER))
 return false;
 
+  if (!boz_args_check (i, j))
+return false;
+
+  if (i->is_boz) i->ts.kind = j->ts.kind;
+  if (j->is_boz) j->ts.kind = i->ts.kind;
+
   if (i->ts.kind != j->ts.kind)
 {
-  if (!gfc_notify_std (GFC_STD_GNU, "Different type kinds at %L",
-			   >where))
+  gfc_error ("Arguments of %qs have different kind type parameters "
+		 "at %L", gfc_current_intrinsic, >where);
 	return false;
 }
 
@@ -2487,26 +2504,6 @@ gfc_check_idnint (gfc_expr *a)
 
 
 bool
-gfc_check_ieor (gfc_expr *i, gfc_expr *j)
-{
-  if (!type_check (i, 0, BT_INTEGER))
-return false;
-
-  if (!type_check (j, 1, BT_INTEGER))
-return false;
-
-  if (i->ts.kind != j->ts.kind)
-{
-  if (!gfc_notify_std (GFC_STD_GNU, "Different type kinds at %L",
-			   >where))
-	return false;
-}
-
-  return true;
-}
-
-
-bool
 gfc_check_index (gfc_expr *string, gfc_expr *substring, gfc_expr *back,
 		 gfc_expr *kind)
 {
@@ -2559,28 +2556,7 @@ gfc_check_intconv (gfc_expr *x)
   return true;
 }
 
-
 bool
-gfc_check_ior (gfc_expr *i, gfc_expr *j)
-{
-  if (!type_check (i, 0, BT_INTEGER))
-return false;
-
-  if (!type_check (j, 1, BT_INTEGER))
-return false;
-
-  if (i->ts.kind != j->ts.kind)
-{
-  if (!gfc_notify_std (GFC_STD_GNU, "Different type kinds at %L",
-			   >where))
-	return false;
-}
-
-  return true;
-}
-
-
-bool
 gfc_check_ishft (gfc_expr *i, gfc_expr *shift)
 {
   if (!type_check (i, 0, BT_INTEGER)
@@ -3364,6 +3340,12 @@ gfc_check_merge_bits (gfc_expr *i, gfc_expr *j, gfc_ex
   if (!type_check (j, 1, BT_INTEGER))
 return false;
 
+  if (!boz_args_check (i, j))
+return false;
+
+  if (i->is_boz) i->ts.kind = j->ts.kind;
+  if (j->is_boz) j->ts.kind = i->ts.kind;
+
   if (!type_check (mask, 2, BT_INTEGER))
 return false;
 
@@ -3373,6 +3355,8 @@ gfc_check_merge_bits (gfc_expr *i, gfc_expr *j, gfc_ex
   if 

[PATCH] Fix PR82337 (SLSR and abnormal PHIs)

2017-09-27 Thread Bill Schmidt
Hi,

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82337 reports a problem
with SLSR performing an invalid optimization across an abnormal PHI.
This is easy to avoid by ensuring that SSA names used in an abnormal
PHI never appear as a basis or as a PHI basis in the candidate table.
We won't optimize what we can't see...

I've cleaned up the test case in the bug report so that it compiles
without warnings and added it to the torture tests.  Bootstrapped and
tested on powerpc64le-linux-gnu with no regressions.  Is this ok for
trunk?  I would also like to backport it to all open releases after a
short wait.

Thanks,
Bill


2017-09-26  Bill Schmidt  

PR tree-optimization/82337
* gimple-ssa-strength-reduction.c (find_phi_def): Don't record a
phi definition if the PHI result appears in an abnormal PHI.
(find_basis_for_base_expr): Don't record a basis if the LHS of the
basis appears in an abnormal PHI.


Index: gcc/gimple-ssa-strength-reduction.c
===
--- gcc/gimple-ssa-strength-reduction.c (revision 253232)
+++ gcc/gimple-ssa-strength-reduction.c (working copy)
@@ -488,7 +488,8 @@ find_phi_def (tree base)
 
   c = base_cand_from_table (base);
 
-  if (!c || c->kind != CAND_PHI)
+  if (!c || c->kind != CAND_PHI
+  || SSA_NAME_OCCURS_IN_ABNORMAL_PHI (gimple_phi_result (c->cand_stmt)))
 return 0;
 
   return c->cand_num;
@@ -557,6 +558,11 @@ find_basis_for_base_expr (slsr_cand_t c, tree base
  gimple_bb (one_basis->cand_stmt)))
continue;
 
+  tree lhs = gimple_assign_lhs (one_basis->cand_stmt);
+  if (lhs && TREE_CODE (lhs) == SSA_NAME
+ && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
+   continue;
+
   if (!basis || basis->cand_num < one_basis->cand_num)
basis = one_basis;
 }
Index: gcc/testsuite/gcc.c-torture/compile/pr82337.c
===
--- gcc/testsuite/gcc.c-torture/compile/pr82337.c   (nonexistent)
+++ gcc/testsuite/gcc.c-torture/compile/pr82337.c   (working copy)
@@ -0,0 +1,25 @@
+/* PR82337: SLSR needs to prevent abnormal SSA names from
+   serving as a basis. */
+char *a, *b, *c;
+
+struct d {
+  short e;
+  char f[];
+};
+
+extern void j (void);
+
+void
+g() {
+  struct d *h;
+  char *i;
+  int d;
+  do {
+i = h->f + d;
+20 ? j() : 0;
+i = c;
+if (__builtin_setjmp (h))
+  b = h->f + d;
+d = (int)(*i);
+  } while (a);
+}



Re: [AArch64], patch] PR71727 fix -mstrict-align

2017-09-27 Thread James Greenhalgh
On Wed, Sep 27, 2017 at 06:25:56PM +0100, Christophe Lyon wrote:
> ping?

OK, thanks.

Reviewed-by: James Greenhalgh 

James

> 
> On 20 September 2017 at 15:17, Christophe Lyon
>  wrote:
> > Hi,
> >
> > On 11 September 2017 at 10:45, Andrew Pinski  wrote:
> >> On Tue, Jul 18, 2017 at 5:50 AM, Christophe Lyon
> >>  wrote:
> >>> Hello,
> >>>
> >>> I've received a complaint that GCC for AArch64 would generate
> >>> vectorized code relying on unaligned memory accesses even when using
> >>> -mstrict-align. This is a problem for code where such accesses lead to
> >>> memory faults.
> >>>
> >>> A previous patch (r24) introduced
> >>> aarch64_builtin_support_vector_misalignment, which rejects such
> >>> accesses when the element size is 64 bits, and accept them otherwise,
> >>> which I think it shouldn't. The testcase added at that time only used
> >>> 64 bits elements, and therefore didn't fully test the patch.
> >>>
> >>> The report I received is about vectorized accesses to an array of
> >>> unsigned chars, whose start address is not aligned on a 128 bits
> >>> boundary.
> >>>
> >>> The attached patch fixes the problem by making
> >>> aarch64_builtin_support_vector_misalignment always return false when
> >>> the misalignment is not known at compile time.
> >>>
> >>> I've also added a testcase, which tries to check if the array start
> >>> address alignment is checked (using %16, and-ing with #15), so that
> >>> loop peeling is performed *before* using vectorized accesses. Without
> >>> the patch, vectorized accesses are used at the beginning of the array,
> >>> and byte accesses are used for the remainder at the end, and there is
> >>> not such 'and wX,wX,15'.
> >>>
> >>> BTW, I'm not sure about the same hook for arm... it seems to me it has
> >>> a similar problem.
> >>>
> >>> OK?
> >>
> >> I would keep part of the comment:
> >> -  /* Misalignment factor is unknown at compile time but we know
> >> - it's word aligned.  */
> >>
> >> Something like:
> >> /* Misalignment factor is unknown at compile time. */
> >>
> >> Otherwise it is not obvious what -1 means.
> >>
> >> Other than I think this patch is good (I cannot approve though).
> >>
> >
> > Here is an updated patch, with the comment added as Andrew suggested.
> >
> > OK?
> >
> > Thanks,
> >
> > Christophe
> >
> >> Thanks,
> >> Andrew
> >>
> >>>
> >>> Thanks,
> >>>
> >>> Christophe


[PATCH][mingw] Enable colorized diagnostics

2017-09-27 Thread Liu Hao
Hello,

(I don't have SVN write access. So please apply this patch if you think 
it is OK. I have got FSF's copyright assignment paper for GCC. I will 
send you a copy of it when required.)

Colorized diagnostics used to be disabled for MinGW targets (on which 
the macro `_WIN32` is defined), and this patch enables it.

At the moment, GCC colorizes diagnostics using ANSI escape codes. Unlike 
most other terminal devices or simulators, Windows consoles did not 
recognize ANSI escape codes until Windows 10 "Threshold 2" in 2016, and 
this capability is not enabled by default thus has to be turned on 
explicitly sometime.

In reality, Windows consoles do support 
coloring/highlighting/underlining characters, moving the cursor or 
filling (erasing) screen areas. However the individual text attributes 
must be set or unset using functions provided by the system. Here comes 
my solution:

Normally, GCC outputs the diagnostic messages, infused with 
miscellaneous formatting, using the standard `fputs()` function. For 
Windows consoles we use a new function, `mingw_ansi_fputs()` to output 
such messages. This function strips ANSI escape codes, interprets them, 
orchestrates the modes as requested, then outputs the text remaining. 
Unimplemented codes are silently ignored.

I have successfully bootstrapped GCC for `i686-w64-mingw32` and 
`x86_64-w64-mingw32` with this patch enabled. Automated tests are not an 
option because all this patch enables are visual effects that are not 
unreadable by scripts.

`check_GNU_style.sh` spits some false positives about the comments and a 
line of non-conforming comment in the original file, which I think can 
be ignored.

* * * * * * * * * *

KNOWN ISSUES

1. Unlike `fputs()`, `mingw_ansi_fputs()` is no longer atomic. When 
multiple processes are outputting text to the same console, messages 
could interleave with each other. There is no simple solution to this 
issue. The atomicity of stdio functions is mandatory according to ISO 
C11, which was not the case in C99. The `fputs()` function from 
MinGW-w64 suffers from the same problem, but the one from MSVCRT does not.

2. `mingw_ansi_fputs()` is eventually compiled into `libcommon.a`. 
Personally, I think it is a bad idea to put target-specific code in it, 
notwithstanding `#if ... #endif` guards that has been went over 
carefully. I could have implemented this function in a separated 
target-specific file, which, actually, results in only undefined 
references, since all target-specific object files precede `libcommon.a` 
in the final linker command line.

* * * * * * * * * *

2017-09-28  Liu Hao  

* gcc/pretty-print.c [_WIN32] (colorize_init): Remove.  Use
the generic version below instead.
(should_colorize): Recognize Windows consoles as terminals
for MinGW targets.
* gcc/pretty-print.c [__MINGW32__] (write_all): New function.
[__MINGW32__] (find_esc_head): Likewise.
[__MINGW32__] (find_esc_terminator): Likewise.
[__MINGW32__] (eat_esc_sequence): Likewise.
[__MINGW32__] (mingw_ansi_fputs): New function that handles
ANSI escape codes.
(pp_write_text_to_stream): Use mingw_ansi_fputs instead of fputs
for MinGW targets.



-- 
Best regards,
LH_Mouse
From 6a02e27b10a20ead725391804610bfca6d9c1e06 Mon Sep 17 00:00:00 2001
From: Liu Hao 
Date: Thu, 7 Sep 2017 12:50:07 +0800
Subject: [PATCH] Enable a native GCC to color diagnostic messages sent over
 Windows consoles.

---
 gcc/diagnostic-color.c |  28 ++-
 gcc/pretty-print.c | 664 +
 2 files changed, 682 insertions(+), 10 deletions(-)

diff --git a/gcc/diagnostic-color.c b/gcc/diagnostic-color.c
index 6adb872146b..b8cf6f2c045 100644
--- a/gcc/diagnostic-color.c
+++ b/gcc/diagnostic-color.c
@@ -20,6 +20,10 @@
 #include "system.h"
 #include "diagnostic-color.h"
 
+#ifdef __MINGW32__
+#  include 
+#endif
+
 /* Select Graphic Rendition (SGR, "\33[...m") strings.  */
 /* Also Erase in Line (EL) to Right ("\33[K") by default.  */
 /*Why have EL to Right after SGR?
@@ -275,23 +279,28 @@ parse_gcc_colors (void)
   return true;
 }
 
-#if defined(_WIN32)
-bool
-colorize_init (diagnostic_color_rule_t)
-{
-  return false;
-}
-#else
-
 /* Return true if we should use color when in auto mode, false otherwise. */
 static bool
 should_colorize (void)
 {
+#ifdef __MINGW32__
+  /* For consistency reasons, one should check the handle returned by
+ _get_osfhandle(_fileno(stderr)) because the function
+ pp_write_text_to_stream() in pretty-print.c calls fputs() on
+ that stream.  However, the code below for non-Windows doesn't seem
+ to care about it either...  */
+  HANDLE h;
+  DWORD m;
+
+  h = GetStdHandle (STD_ERROR_HANDLE);
+  return (h != INVALID_HANDLE_VALUE) && (h != NULL)
+ && GetConsoleMode (h, );
+#else
   char const *t = getenv ("TERM");
   return t && strcmp (t, "dumb") != 0 

Re: [PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern

2017-09-27 Thread Sudi Das


Hi James

I have made the requested changes to the patch.


2017-09-27 Sudakshina Das  

* config/aarch64/aarch64-protos.h (enum simd_immediate_check): New 
check type
for aarch64_simd_valid_immediate.
(aarch64_output_simd_mov_immediate): Update prototype.
(aarch64_simd_valid_immediate): Update prototype.

* config/aarch64/aarch64-simd.md (orr3): modified pattern to add
support for ORR-immediate.
(and3): modified pattern to add support for BIC-immediate.

* config/aarch64/aarch64.c (aarch64_simd_valid_immediate): Function now 
checks
for valid immediate for BIC and ORR based on new enum argument.
(aarch64_output_simd_mov_immediate): Function now used to output 
BIC/ORR imm
as well based on new enum argument.
 
* config/aarch64/constraints.md (Do): New vector immediate constraint.
(Db): Likewise.

2017-09-27 Sudakshina Das  

* gcc.target/aarch64/bic_imm_1.c: New test.
* gcc.target/aarch64/orr_imm_1.c: Likewise.


Thanks
Sudi

  
From: James Greenhalgh 
Sent: Tuesday, September 26, 2017 8:04:38 PM
To: Sudi Das
Cc: Richard Earnshaw; gcc-patches@gcc.gnu.org; nd; Marcus Shawcroft
Subject: Re: [PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern
    
On Mon, Sep 25, 2017 at 11:13:57AM +0100, Sudi Das wrote:
> 
> Hi James
> 
> I put aarch64_output_simd_general_immediate looking at the similarities of
> the immediates for mov/mvni and orr/bic. The CHECK macro in
> aarch64_simd_valid_immediate both checks
> and converts the immediates in a manner that are needed for the instructions.
> 
> Having said that, I agree that maybe I could have refactored
> aarch64_output_simd_mov_immediate to do the work rather than creating a new
> functions to do similar things. I have done so in this patch.

Thanks, this looks much neater.

> I have also changed the names of the enum simd_immediate_check to be better
> indicative of what they are doing. 

Thanks, I'd tweak them to look more like the bitmasks you use them as, but
that is a small change for my personal preference.

> Lastly I have added more cases in the tests (according to all the possible
> CHECKs) and made them dg-do assemble (although I had to add --save-temps so
> that the scan-assembler would work). Do you think I should not put that
> option and rather create separate tests?

This is good - thanks.

I think clean up the enum definitions and this patch will be good.

> @@ -308,6 +308,16 @@ enum aarch64_parse_opt_result
>    AARCH64_PARSE_INVALID_ARG  /* Invalid arch, tune, cpu arg.  */
>  };
>  
> +/* Enum to distinguish which type of check is to be done in
> +   aarch64_simd_valid_immediate.  This is used as a bitmask where
> +   AARCH64_CHECK_MOV has both bits set.  Thus AARCH64_CHECK_MOV will
> +   perform all checks.  Adding new types would require changes accordingly.  
> */
> +enum simd_immediate_check {
> +  AARCH64_CHECK_ORR  = 1,    /* Perform immediate checks for ORR.  */
> +  AARCH64_CHECK_BIC  = 2,    /* Perform immediate checks for BIC.  */
> +  AARCH64_CHECK_MOV  = 3 /* Perform all checks (used for MOVI/MNVI).  */

These are used in bit-mask style, so how about:

  AARCH64_CHECK_ORR = 1 << 0,
  AARCH64_CHECK_BIC = 1 << 1,
  AARCH64_CHECK_MOV = AARCH64_CHECK_ORR | AARCH64_CHECK_BIC

Which is more self-documenting.

> @@ -13001,7 +13013,8 @@ aarch64_float_const_representable_p (rtx x)
>  char*
>  aarch64_output_simd_mov_immediate (rtx const_vector,
>   machine_mode mode,
> -    unsigned width)
> +    unsigned width,
> +    enum simd_immediate_check which)

This function is sorely missing a comment explaining the parameters - it
would be very helpful if you could add one as part of this patch.

Thanks,
James

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index e67c2ed..5d7c5df 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -308,6 +308,16 @@ enum aarch64_parse_opt_result
   AARCH64_PARSE_INVALID_ARG		/* Invalid arch, tune, cpu arg.  */
 };
 
+/* Enum to distinguish which type of check is to be done in
+   aarch64_simd_valid_immediate.  This is used as a bitmask where
+   AARCH64_CHECK_MOV has both bits set.  Thus AARCH64_CHECK_MOV will
+   perform all checks.  Adding new types would require changes accordingly.  */
+enum simd_immediate_check {
+  AARCH64_CHECK_ORR  = 1 << 0,
+  AARCH64_CHECK_BIC  = 1 << 1,
+  AARCH64_CHECK_MOV  = AARCH64_CHECK_ORR | AARCH64_CHECK_BIC
+};
+
 extern struct tune_params aarch64_tune_params;
 
 HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
@@ -345,7 +355,8 @@ bool aarch64_mov_operand_p (rtx, machine_mode);
 rtx aarch64_reverse_mask (machine_mode);
 bool aarch64_offset_7bit_signed_scaled_p 

[PATCH, rs6000] Correct some Power9 scheduling info

2017-09-27 Thread Pat Haugen
The following patch corrects some Power9 resource requirements and
instruction latencies. Bootstrap/regtest on powerpc64le-linux with no
new regressions. Ok for trunk?

-Pat


2017-09-27  Pat Haugen  

* config/rs6000/power9.md (DU_C2_3_power9): Remove an incorrect
combination.
(power9-alu): Split out insert/shift types...
(power9-rot): ... to here. Correct dispatch resources.
(power9-cracked-alu): Correct dispatch resources.
(power9-mul): Likewise.
(power9-mul-compare): Likewise.
(power9-fp): Correct latency.
(power9-ddiv): Likewise.
(power9-vecfdiv): Likewise.
(power9-vecdiv): Likewise.
Index: gcc/config/rs6000/power9.md
===
--- gcc/config/rs6000/power9.md	(revision 252029)
+++ gcc/config/rs6000/power9.md	(working copy)
@@ -80,7 +80,6 @@ (define_reservation "DU_C2_power9" "x0_p
 ; 2-way cracked plus 3rd slot
 (define_reservation "DU_C2_3_power9" "x0_power9+x1_power9+xa0_power9|
   x1_power9+x2_power9+xa0_power9|
-  x1_power9+x2_power9+xb0_power9|
   x2_power9+x3_power9+xb0_power9")
 
 ; 3-way cracked (consumes whole decode/dispatch cycle)
@@ -243,21 +242,29 @@ (define_insn_reservation "power9-sync" 4
 
 ; Most ALU insns are simple 2 cycle, including record form
 (define_insn_reservation "power9-alu" 2
-  (and (ior (eq_attr "type" "add,exts,integer,logical,isel")
-	(and (eq_attr "type" "insert,shift")
-		 (eq_attr "dot" "no")))
+  (and (eq_attr "type" "add,exts,integer,logical,isel")
(eq_attr "cpu" "power9"))
   "DU_any_power9,VSU_power9")
 ; 5 cycle CR latency
 (define_bypass 5 "power9-alu"
 		 "power9-crlogical,power9-mfcr,power9-mfcrf")
 
+; Rotate/shift prevent use of third slot
+(define_insn_reservation "power9-rot" 2
+  (and (eq_attr "type" "insert,shift")
+   (eq_attr "dot" "no")
+   (eq_attr "cpu" "power9"))
+  "DU_slice_3_power9,VSU_power9")
+; 5 cycle CR latency
+(define_bypass 5 "power9-rot"
+		 "power9-crlogical,power9-mfcr,power9-mfcrf")
+
 ; Record form rotate/shift are cracked
 (define_insn_reservation "power9-cracked-alu" 2
   (and (eq_attr "type" "insert,shift")
(eq_attr "dot" "yes")
(eq_attr "cpu" "power9"))
-  "DU_C2_power9,VSU_power9")
+  "DU_C2_3_power9,VSU_power9")
 ; 7 cycle CR latency
 (define_bypass 7 "power9-cracked-alu"
 		 "power9-crlogical,power9-mfcr,power9-mfcrf")
@@ -291,13 +298,13 @@ (define_insn_reservation "power9-mul" 5
   (and (eq_attr "type" "mul")
(eq_attr "dot" "no")
(eq_attr "cpu" "power9"))
-  "DU_any_power9,VSU_power9")
+  "DU_slice_3_power9,VSU_power9")
 
 (define_insn_reservation "power9-mul-compare" 5
   (and (eq_attr "type" "mul")
(eq_attr "dot" "yes")
(eq_attr "cpu" "power9"))
-  "DU_C2_power9,VSU_power9")
+  "DU_C2_3_power9,VSU_power9")
 ; 10 cycle CR latency
 (define_bypass 10 "power9-mul-compare"
 		 "power9-crlogical,power9-mfcr,power9-mfcrf")
@@ -349,7 +356,7 @@ (define_insn_reservation "power9-fpsimpl
(eq_attr "cpu" "power9"))
   "DU_slice_3_power9,VSU_power9")
 
-(define_insn_reservation "power9-fp" 7
+(define_insn_reservation "power9-fp" 5
   (and (eq_attr "type" "fp,dmul")
(eq_attr "cpu" "power9"))
   "DU_slice_3_power9,VSU_power9")
@@ -366,7 +373,7 @@ (define_insn_reservation "power9-sdiv" 2
(eq_attr "cpu" "power9"))
   "DU_slice_3_power9,VSU_power9")
 
-(define_insn_reservation "power9-ddiv" 33
+(define_insn_reservation "power9-ddiv" 27
   (and (eq_attr "type" "ddiv")
(eq_attr "cpu" "power9"))
   "DU_slice_3_power9,VSU_power9")
@@ -419,12 +426,12 @@ (define_insn_reservation "power9-veccomp
(eq_attr "cpu" "power9"))
   "DU_super_power9,VSU_super_power9")
 
-(define_insn_reservation "power9-vecfdiv" 28
+(define_insn_reservation "power9-vecfdiv" 24
   (and (eq_attr "type" "vecfdiv")
(eq_attr "cpu" "power9"))
   "DU_super_power9,VSU_super_power9")
 
-(define_insn_reservation "power9-vecdiv" 32
+(define_insn_reservation "power9-vecdiv" 27
   (and (eq_attr "type" "vecdiv")
(eq_attr "size" "!128")
(eq_attr "cpu" "power9"))


Go patch committed: fix crash on struct that embeds pointer type

2017-09-27 Thread Ian Lance Taylor
This patch by Than McIntosh fixes a crash in the Go frontend that
incorrectly embeds a pointer type.  This fixes
https://golang.org/issue/22050.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 253231)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-cdf1f58c7578980e1d1949680c7e404961b7c153
+11b7dae7de94215e92eb46e703cfecd76c0a3282
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/types.cc
===
--- gcc/go/gofrontend/types.cc  (revision 253025)
+++ gcc/go/gofrontend/types.cc  (working copy)
@@ -5842,7 +5842,9 @@ Struct_type::do_verify()
   Type* t = p->type();
   if (p->is_anonymous())
{
- if (t->named_type() != NULL && t->points_to() != NULL)
+ if ((t->named_type() != NULL && t->points_to() != NULL)
+  || (t->named_type() == NULL && t->points_to() != NULL
+  && t->points_to()->points_to() != NULL))
{
  go_error_at(p->location(), "embedded type may not be a pointer");
  p->set_type(Type::make_error_type());
@@ -11848,6 +11850,12 @@ Type::bind_field_or_method(Gogo* gogo, c
  go_assert(expr->type()->struct_type() == st);
}
  ret = st->field_reference(expr, name, location);
+  if (ret == NULL)
+{
+  go_error_at(location, "type has no field %qs",
+  Gogo::message_name(name).c_str());
+  return Expression::make_error(location);
+}
}
   else if (it != NULL && it->find_method(name) != NULL)
ret = Expression::make_interface_field_reference(expr, name,


Re: [PATCH 4/5] New target check: vect_nopeel - v2

2017-09-27 Thread Sandra Loosemore

On 09/27/2017 03:05 AM, Rainer Orth wrote:

Hi Andreas,


On 09/27/2017 10:10 AM, Rainer Orth wrote:

Hi Andreas,


On 09/26/2017 02:26 PM, Rainer Orth wrote:

Hi Andreas,


diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 307c726..3acfd85 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1398,6 +1398,9 @@ Target supports a vector misalign access.
  @item vect_no_align
  Target does not support a vector alignment mechanism.

+@item vect_no_peel
+Target does not require any loop peeling for alignment purposes.
+
  @item vect_no_int_min_max
  Target does not support a vector min and max instruction on @code{int}.


please keep the items sorted alphabetically.


The items do not appear to be sorted alphabetically.


they should be.  Your patch makes the ordering even more random.

Patch to fix this preapproved ;-)

The items rather appear to be arranged by subject. Does it really make
sense do pull items like this
apart just to have it in alphabetical order?

@item vect_intfloat_cvt
Target supports conversion from @code{signed int} to @code{float}.

@item vect_uintfloat_cvt
Target supports conversion from @code{unsigned int} to @code{float}.

@item vect_floatint_cvt
Target supports conversion from @code{float} to @code{signed int}.

@item vect_floatuint_cvt
Target supports conversion from @code{float} to @code{unsigned int}.


I've added the no_peel item intentionally to the hw_misalign/no_align block.


granted, there are some attempts at that, but I find it hard to make my
way through that longish list.  The way it is, you have to skip through
the whole list beginning to end.  Texinfo seems to have no subsubsection
which would allow to make the sub-grouping explicit...

Let's hear what Sandra thinks.


U.  There is no common convention in the GCC documentation and other 
parts of the manual do deliberately diverge from alphabetization in 
places.  There's a perpetual tension between putting the most 
commonly-needed information first vs grouping things by related concepts 
vs alphabetize vs the tendency of people to insert new items at random 
places in an existing list regardless of how it's previously been 
organized.  :-(


Alphabetical lists are useful when you already know the name of the 
thing you are searching for, but almost everybody reads the 
documentation in a web browser or PDF viewer with a search feature 
nowadays so you can find the term no matter how the list is sorted.  So 
I'd say we shouldn't alphabetize as a matter of policy if there is some 
other organization that makes sense.


In this case, the section is already broken into multiple sublists by 
topic, most of the sublists are fairly short, and where there's some 
discernible sort order within the sublists, it seems to be grouping 
related things together rather than alphabetical.  So I wouldn't insist 
on alphabetizing this particular sublist either.


-Sandra



Re: [AArch64], patch] PR71727 fix -mstrict-align

2017-09-27 Thread Christophe Lyon
ping?

On 20 September 2017 at 15:17, Christophe Lyon
 wrote:
> Hi,
>
> On 11 September 2017 at 10:45, Andrew Pinski  wrote:
>> On Tue, Jul 18, 2017 at 5:50 AM, Christophe Lyon
>>  wrote:
>>> Hello,
>>>
>>> I've received a complaint that GCC for AArch64 would generate
>>> vectorized code relying on unaligned memory accesses even when using
>>> -mstrict-align. This is a problem for code where such accesses lead to
>>> memory faults.
>>>
>>> A previous patch (r24) introduced
>>> aarch64_builtin_support_vector_misalignment, which rejects such
>>> accesses when the element size is 64 bits, and accept them otherwise,
>>> which I think it shouldn't. The testcase added at that time only used
>>> 64 bits elements, and therefore didn't fully test the patch.
>>>
>>> The report I received is about vectorized accesses to an array of
>>> unsigned chars, whose start address is not aligned on a 128 bits
>>> boundary.
>>>
>>> The attached patch fixes the problem by making
>>> aarch64_builtin_support_vector_misalignment always return false when
>>> the misalignment is not known at compile time.
>>>
>>> I've also added a testcase, which tries to check if the array start
>>> address alignment is checked (using %16, and-ing with #15), so that
>>> loop peeling is performed *before* using vectorized accesses. Without
>>> the patch, vectorized accesses are used at the beginning of the array,
>>> and byte accesses are used for the remainder at the end, and there is
>>> not such 'and wX,wX,15'.
>>>
>>> BTW, I'm not sure about the same hook for arm... it seems to me it has
>>> a similar problem.
>>>
>>> OK?
>>
>> I would keep part of the comment:
>> -  /* Misalignment factor is unknown at compile time but we know
>> - it's word aligned.  */
>>
>> Something like:
>> /* Misalignment factor is unknown at compile time. */
>>
>> Otherwise it is not obvious what -1 means.
>>
>> Other than I think this patch is good (I cannot approve though).
>>
>
> Here is an updated patch, with the comment added as Andrew suggested.
>
> OK?
>
> Thanks,
>
> Christophe
>
>> Thanks,
>> Andrew
>>
>>>
>>> Thanks,
>>>
>>> Christophe


Re: 0005-Part-5.-Add-x86-CET-documentation

2017-09-27 Thread Joseph Myers
On Wed, 27 Sep 2017, Florian Weimer wrote:

> This is part of the ABI GCC implements, so it has to be documented somewhere,
> and not just as part of the GCC source code.
> 
> CET is not properly described in the ABI supplement and I don't think this
> will change, so detailed documentation in the GCC manual is very much
> desirable.

Isn't this a matter to take up further in the thread HJ started on the ABI 
mailing lists, or a new such thread (possibly e.g. sending pull requests 
that build further on his wording, or propose alternative wording, to 
clarify them things left unclear there, with a goal of getting it clearly 
defined in the master sources for x86_64 and x86)?  Clearly the best 
result would be proper documentation in the ABI and the GCC manual 
cross-referencing the relevant ABI documents.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: 0005-Part-5.-Add-x86-CET-documentation

2017-09-27 Thread Sandra Loosemore

On 09/27/2017 02:52 AM, Florian Weimer wrote:

On 09/27/2017 05:40 AM, Sandra Loosemore wrote:


+@emph{x86 implementation:} when @option{-fcf-protection} option is
+specified the compiler inserts an ENDBR instruction at function's
+prologue if the function's type does not have the @code{nocf_check}
+attribute and addresses to which indirect control-flow transfer can
+happen.  The instruction triggers the HW check if a control-flow
+transfer to the address of ENDBR instruction is valid.


Implementation details like this should be comments in the code, not
included in the user-facing documentation.


This is part of the ABI GCC implements, so it has to be documented
somewhere, and not just as part of the GCC source code.

CET is not properly described in the ABI supplement and I don't think
this will change, so detailed documentation in the GCC manual is very
much desirable.



Not if you're a documentation maintainer.  :-(  Generally speaking, 
user-facing manuals like the GCC manual should document user-visible GCC 
features, not internal implementation details.  Especially the 
target-independent parts of the manual are not the right place to 
discuss target-specific code generation patterns or conventions that 
should be in the ABI supplement or some other non-GCC documentation.


I don't have so much objection to expanding the discussion of the 
target-specific -mcet option in the x86 options section, as long as the 
documentation is there because it helps people *use* the feature and not 
to explain things that are only interesting to compiler implementors.


-Sandra



Re: correct attribute ifunc C++ type safety (PR 82301)

2017-09-27 Thread Martin Sebor

Hi Nathan,

This patch is a tweak for the C++ part of the type safety
enhancement to attribute ifunc committed in r253041.  It touches
the C++ ifunc tests you added some years ago and I recently broke
with the initial commits of the feature.  The notable difference
between r253041 and this update (other than correcting the type
expected to be returned by the resolver for a member function)
is issuing the incompatibility warning under -Wincompatible-
pointer-types rather than -Wattributes.  When you and/or Jason
have a minute, can you please review it?

https://gcc.gnu.org/ml/gcc-patches/2017-09/msg01603.html

Thanks
Martin

On 09/24/2017 07:03 PM, Martin Sebor wrote:

r253041 enhanced type checking for alias and ifunc attributes to
detect declarations of incompatible aliases, or ifunc resolvers
that return pointers to functions of an incompatible type.  More
extensive testing exposed a bug in the implementation of the ifunc
attribute handling in C++ where the checker expected the ifunc
resolver to return a pointer to a member function when the
implementation actually expects it return a pointer to a non-
member function.

In a discussion of the test suite failures, Jakub also suggested
to break the enhanced warning out of -Wattributes and issue it
under a different option.

The attached patch corrects the C++ problem and moves the warning
under -Wincompatible-pointer-types.  Since this is a C-only option,
the patch also enables for it C++.  Since the option is enabled by
default, the patch further requires -Wextra to issue the warning
for ifunc resolvers returning void*.  However, the patched checker
diagnoses other incompatibilities without it.

Martin




RE: 0005-Part-5.-Add-x86-CET-documentation

2017-09-27 Thread Tsimbalist, Igor V
Updated version #3.

> -Original Message-
> From: Sandra Loosemore [mailto:san...@codesourcery.com]
> Sent: Wednesday, September 27, 2017 5:41 AM
> To: Tsimbalist, Igor V ; Uros Bizjak
> 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: 0005-Part-5.-Add-x86-CET-documentation
> 
> On 09/26/2017 07:47 AM, Tsimbalist, Igor V wrote:
> > Here is a new version of the patch.
> >
> > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index
> > a374890..a900ed1 100644
> > --- a/gcc/doc/extend.texi
> > +++ b/gcc/doc/extend.texi
> > @@ -5655,6 +5655,13 @@ compiled with the
> > @option{-fcf-protection=branch} option.  The  compiler assumes that
> > the function's address is a valid target for a  control-flow transfer.
> >
> > +@emph{x86 implementation:} when @option{-fcf-protection} option is
> > +specified the compiler inserts an ENDBR instruction at function's
> > +prologue if the function's type does not have the @code{nocf_check}
> > +attribute and addresses to which indirect control-flow transfer can
> > +happen.  The instruction triggers the HW check if a control-flow
> > +transfer to the address of ENDBR instruction is valid.
> 
> Implementation details like this should be comments in the code, not
> included in the user-facing documentation.
> 
> > @@ -5662,7 +5669,8 @@ not be instrumented when compiled with the
> that
> > the function's address from the pointer is a valid target for  a
> > control-flow transfer.  A direct function call through a function
> > name is assumed to be a safe call thus direct calls are not
> > -instrumented by the compiler.
> > +instrumented by the compiler.  For @emph{x86 implementation} the
> > +compiler inserts a NOTRACK prefix before an indirect call instruction.
> 
> Likewise here.

For this comment and above could you please let me know what is the right place
To move the description? Also I enclosed ENDBR and NOTRACK in @code{} and
wrote it in lower case.

> > @@ -21217,6 +21225,25 @@ void __builtin_ia32_wrpkru (unsigned int)
> > unsigned int __builtin_ia32_rdpkru ()  @end smallexample
> >
> > +The following built-in functions are available when @option{-mcet} is
> used.
> > +They are used to support Intel Control-flow Enforcment Technology (CET).
> > +Each built-in function generate a machine instruction that is part of
> > +the
> 
> s/generate a/generates the/

Fixed.

> > @@ -11378,6 +11379,20 @@ You can also use the @code{nocf_check}
> > attribute to identify  which functions and calls should be skipped
> > from instrumentation  (@pxref{Function Attributes}).
> >
> > +Currently x86 GNU/Linux target provides an implementation based on
> 
> s/x86/the x86/

Fixed.

> > +Intel Control-flow Enforcement Technology (CET), thus @option{-mcet}
> 
> s/@option/the @option/

Fixed.

> > +option is required to enable this feature.
> 
> I think you should put a cross-reference to the x86 options node here, and
> move all the following x86-specific discussion to that section.

Put cross-reference.

> > In order to get an
> > +application to be CET compatible the x86 implementation requires all
> > +object files have to be compiled with @option{-fcf-protection} option
> > +and all linked in libraries have to be CET compatible.
> 
> I'm having difficulty parsing this.  What does "CET compatible" mean?
> Is this an ABI compatibility issue, so that all objects linked into the 
> executable
> have to be compiled with the (same?) @option{-fcf-protection} option if any
> of them do?  Or do you just lose checking on code in uninstrumented
> objects?

I re-wrote the paragraph and removed "compatibility topic".

> > +Instrumentation for x86 is controlled by target specific options
> 
> hyphenate target-specific here

Fixed.

> > +@option{-mcet}, @option{-mibt} and @option{-mshstk}. The compiler
> > +also provides a number of built-in functions for fine-grained control
> > +of CET-based implementation.  See @xref{x86 Built-in Functions}, for
> > +more information.
> > +
> >  @item -fstack-protector
> >  @opindex fstack-protector
> >  Emit extra code to check for buffer overflows, such as stack smashing
> > @@ -25755,15 +25770,19 @@ preferred alignment to @option{-
> mpreferred-stack-boundary=2}.
> >  @need 200
> >  @itemx -mclzero
> >  @opindex mclzero
> > +@need 200
> >  @itemx -mpku
> >  @opindex mpku
> > +@need 200
> > +@itemx -mcet
> > +@opindex mcet
> >  These switches enable the use of instructions in the MMX, SSE,  SSE2,
> > SSE3, SSSE3, SSE4.1, AVX, AVX2, AVX512F, AVX512PF, AVX512ER,
> AVX512CD,
> > SHA, AES, PCLMUL, FSGSBASE, RDRND, F16C, FMA, SSE4A, FMA4, XOP,
> LWP,
> > ABM,  AVX512VL, AVX512BW, AVX512DQ, AVX512IFMA AVX512VBMI, BMI,
> BMI2,
> > FXSR, -XSAVE, XSAVEOPT, LZCNT, RTM, MPX, MWAITX, PKU, 3DNow!@: or
> enhanced 3DNow!@:
> > -extended instruction sets.  Each has a corresponding @option{-mno-}
> > option -to disable use of these instructions.
> > +XSAVE, XSAVEOPT, LZCNT, RTM, MPX, MWAITX, PKU, IBT, SHSTK,
> > +3DNow!@: or 

Re: [C++ PATCH] C++2A P0386R1 - default member initializers for bit-fields

2017-09-27 Thread Nathan Sidwell

Jakub,

The following patch implements P0386R1 - NSDMIs for bit-fields.
While working on that, I've discovered our parser mishandles attributes
on bitfields, already C++11 says:
identifier[opt] attribute-specifier-seq[opt] : constant-expression
in the grammar, but we actually parsed
identifier[opt] : constant-expression attribute-specifier-seq[opt]


I'm sorry for my tardiness.  It think the patch would be better broken 
apart:
	1) fix the parsing bug you found and move to (ab)using 
DECL_BIT_FIELD_REPRESENTATIVE


2) the new c++2a feature

Is that feasible?

WRT your questions

 1, I think a default strict arg on cp_parser_constant_expression is 
the way to go.  That makes it clear that its default behaviour is 
'sloppy', saving someone the detective work you've done.


 2.  I'm all for a pedwarn, Jason is often more conservative than me 
though.


 3.  D_B_F_R is quite probably fine -- you'll know better than me, 
having poked at it


 4. They are not valid on unamed bitfields.  You may have spotted an 
inconsistency in the spec.


nathan

--
Nathan Sidwell


Re: [PATCH][GRAPHITE] More TLC

2017-09-27 Thread Richard Biener
On Wed, 27 Sep 2017, Richard Biener wrote:

> On Wed, 27 Sep 2017, Richard Biener wrote:
> 
> > On Tue, 26 Sep 2017, Sebastian Pop wrote:
> > 
> > > On Mon, Sep 25, 2017 at 8:12 AM, Richard Biener  wrote:
> > > 
> > > > On Fri, 22 Sep 2017, Sebastian Pop wrote:
> > > >
> > > > > On Fri, Sep 22, 2017 at 8:03 AM, Richard Biener 
> > > > wrote:
> > > > >
> > > > > >
> > > > > > This simplifies canonicalize_loop_closed_ssa and does other minimal
> > > > > > TLC.  It also adds a testcase I reduced from a stupid mistake I made
> > > > > > when reworking canonicalize_loop_closed_ssa.
> > > > > >
> > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to 
> > > > > > trunk.
> > > > > >
> > > > > > SPEC CPU 2006 is happy with it, current statistics on x86_64 with
> > > > > > -Ofast -march=haswell -floop-nest-optimize are
> > > > > >
> > > > > >  61 loop nests "optimized"
> > > > > >  45 loop nest transforms cancelled because of code generation issues
> > > > > >  21 loop nest optimizations timed out the 35 ISL "operations" we
> > > > allow
> > > > > >
> > > > > > I say "optimized" because the usual transform I've seen is static
> > > > tiling
> > > > > > as enforced by GRAPHITE according to --param loop-block-tile-size.
> > > > > > There's no way to automagically figure what kind of transform ISL 
> > > > > > did
> > > > > >
> > > > >
> > > > > Here is how to automate (without magic) the detection
> > > > > of the transform that isl did.
> > > > >
> > > > > The problem solved by isl is the minimization of strides
> > > > > in memory, and to do this, we need to tell the isl scheduler
> > > > > the validity dependence graph, in graphite-optimize-isl.c
> > > > > see the validity (RAW, WAR, WAW) and the proximity
> > > > > (RAR + validity) maps.  The proximity does include the
> > > > > read after read, as the isl scheduler needs to minimize
> > > > > strides between consecutive reads.
> > 
> > Ah, so I now see why we do not perform interchange on trivial cases like
> > 
> > double A[1024][1024], B[1024][1024];
> > 
> > void foo(void)
> > {
> >   for (int i = 0; i < 1024; ++i)
> > for (int j = 0; j < 1024; ++j)
> >   A[j][i] = B[j][i];
> > }
> > 
> > which is probably because
> > 
> >   /* FIXME: proximity should not be validity.  */
> >   isl_union_map *proximity = isl_union_map_copy (validity);
> > 
> > falls apart when there is _no_ dependence?
> > 
> > I can trick GRAPHITE into performing the interchange for
> > 
> > double A[1024][1024], B[1024][1024];
> > 
> > void foo(void)
> > {
> >   for (int i = 1; i < 1023; ++i)
> > for (int j = 0; j < 1024; ++j)
> >   A[j][i] = B[j][i-1] + A[j][i+1];
> > }
> > 
> > because now there is a dependence.  Any idea on how to rewrite
> > scop_get_dependences to avoid "simplifying"?  I suppose the
> > validity constraints _do_ also specify kind-of a proximity
> > we just may not prune / optimize them in the same way as
> > dependences?
> 
> Another thing I notice is that we don't handle the multi-dimensional
> accesses the fortran frontend produces:
> 
> (gdb) p debug_data_reference (dr)
> #(Data Ref: 
> #  bb: 18 
> #  stmt: _43 = *a_141(D)[_42];
> #  ref: *a_141(D)[_42];
> #  base_object: *a_141(D);
> #  Access function 0: {{(_38 + stride.88_115) + 1, +, 1}_4, +, 
> stride.88_115}_5
> 
> ultimatively we fail here because we try to build a constraint for
> 
> {{(_38 + stride.88_115) + 1, +, 1}_4, +, stride.88_115}_5
> 
> which ends up computing isl_pw_aff_mul (A, stride.88_115) with
> A being the non-constant constraint generated for
> {(_38 + stride.88_115) + 1, +, 1}_4 and stride.88_115 being
> a parameter.  ISL doesn't like that multiplication as the result
> isn't affine (well - it is, we just have parameters in there).
> 
> I suppose ISL doesn't handle this form of accesses given the
> two "dimensions" in this scalarized form may overlap?  So we'd
> really need to turn those into references with different access
> functions (even if that's not 100% a valid semantic transformation
> as scalarization isn't reversible without extra information)?

Looks like even when hacking the Fortran FE to produce nested
ARRAY_REFs we run into the same issue for

(gdb) p debug_data_reference (dr)
#(Data Ref: 
#  bb: 17 
#  stmt: 
VIEW_CONVERT_EXPR(*y_117(D))[_24]{lb:
 
1 sz: _20 * 8}[_26]{lb: 1 sz: _21 * 8}[_28]{lb: 1 sz: _22 * 8}[_29]{lb: 1 
sz: 8} = 0.0;
#  ref: 
VIEW_CONVERT_EXPR(*y_117(D))[_24]{lb:
 
1 sz: _20 * 8}[_26]{lb: 1 sz: _21 * 8}[_28]{lb: 1 sz: _22 * 8}[_29]{lb: 1 
sz: 8};
#  base_object: 
VIEW_CONVERT_EXPR(*y_117(D));
#  Access function 0: {1, +, 1}_4
#  Access function 1: (integer(kind=8)) {(unsigned long) stride.88_92, +, 
(unsigned long) stride.88_92}_3;
#  Access function 2: (integer(kind=8)) {(unsigned 

libgo patch committed: Check Getsockname error return

2017-09-27 Thread Ian Lance Taylor
This patch by Tony Reix checks for errors from Getsockname in a couple
of places.  Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.
Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 253105)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-e0c1f0b645b12a544b484c0f477f8fb6f5980550
+cdf1f58c7578980e1d1949680c7e404961b7c153
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/net/sock_posix.go
===
--- libgo/go/net/sock_posix.go  (revision 253025)
+++ libgo/go/net/sock_posix.go  (working copy)
@@ -182,7 +182,10 @@ func (fd *netFD) listenStream(laddr sock
if err := fd.init(); err != nil {
return err
}
-   lsa, _ := syscall.Getsockname(fd.pfd.Sysfd)
+   lsa, err := syscall.Getsockname(fd.pfd.Sysfd)
+   if err != nil {
+   return os.NewSyscallError("getsockname", err)
+   }
fd.setAddr(fd.addrFunc()(lsa), nil)
return nil
 }
@@ -221,7 +224,10 @@ func (fd *netFD) listenDatagram(laddr so
if err := fd.init(); err != nil {
return err
}
-   lsa, _ := syscall.Getsockname(fd.pfd.Sysfd)
+   lsa, err := syscall.Getsockname(fd.pfd.Sysfd)
+   if err != nil {
+   return os.NewSyscallError("getsockname", err)
+   }
fd.setAddr(fd.addrFunc()(lsa), nil)
return nil
 }


Re: [PATCH] Pretty-print GOACC_REDUCTION arguments

2017-09-27 Thread Thomas Schwinge
Hi!

On Mon, 25 Sep 2017 16:57:52 +0200, Tom de Vries  wrote:
> currently for a GOACC_REDUCTION internal fn call we print:
> ...
>sum_5 = GOACC_REDUCTION (SETUP, _3, 0, 0, 67, 0);
> ...
> 
> This patch adds a comment for some arguments explaining the meaning of 
> the argument:
> ...
>sum_5 = GOACC_REDUCTION (SETUP, _3, 0, 0 /*gang*/, 67 /*+*/, 0);
> ...

ACK to the general idea.

However, I note that for the "CODE" we currently *only* print the textual
variant ("SETUP") (just like we're also doing for a few other "IFN_*"),
whereas for "LEVEL" and "OP" you now print both.  Should these really be
different?  I think I actually do prefer the style you're using (print
both).  I would print the actual "GOMP_DIM_GANG" instead of "gang" etc.,
to make it easier to see where these values are coming from.

(I do see that for "CODE", we can easily use a suitable "DEF" macro with
the "IFN_GOACC_REDUCTION_CODES" define -- not currently possible with the
"GOMP_DIM_*" constants.  That can of course be addressed later,
separately, if a Reviewer agrees to the proposed patch generally.)

> OK for trunk, if testing is ok?

> --- a/gcc/gimple-pretty-print.c
> +++ b/gcc/gimple-pretty-print.c
> @@ -765,6 +766,40 @@ dump_gimple_call_args (pretty_printer *buffer, gcall 
> *gs, dump_flags_t flags)
>if (i)
>   pp_string (buffer, ", ");
>dump_generic_node (buffer, gimple_call_arg (gs, i), 0, flags, false);
> +
> +  if (gimple_call_internal_p (gs))
> + switch (gimple_call_internal_fn (gs))
> +   {

Will need to add a "default" case:

[...]/source-gcc/gcc/gimple-pretty-print.c:771:9: warning: enumeration 
value 'IFN_MASK_LOAD' not handled in switch [-Wswitch]
  switch (gimple_call_internal_fn (gs))
 ^
[followed by many more]

> +   case IFN_GOACC_REDUCTION:
> + switch (i)
> +   {
> +   case 3:
> + switch (tree_to_uhwi (gimple_call_arg (gs, i)))

Something ;-) is wrong.  Running this on
libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-1.c, I run into:

during GIMPLE pass: omplower
dump file: reduction-1.c.006t.omplower

In file included from 
source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-1.c:9:0:
source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-1.c: In 
function 'test_reductions':
source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction.h:20:7: 
internal compiler error: in tree_to_uhwi, at tree.c:6606
   abort ();\
   ^~~~
source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-1.c:55:3: 
note: in expansion of macro 'check_reduction_op'
   check_reduction_op (int, ^, 0, array[i], num_gangs (ng) num_workers (nw)
   ^~

(gdb) bt
#0  fancy_abort (file=0x1659d70 "[...]/source-gcc/gcc/tree.c", line=6606, 
function=0x165e7f8  
"tree_to_uhwi") at [...]/source-gcc/gcc/diagnostic.c:1487
#1  0x00edf053 in tree_to_uhwi (t=) at 
[...]/source-gcc/gcc/tree.c:6606
#2  0x0096ca63 in dump_gimple_call_args (buffer=0x7fffc6c0, 
gs=0x766ad210, flags=0) at [...]/source-gcc/gcc/gimple-pretty-print.c:777
#3  0x0096f7f0 in dump_gimple_call (buffer=0x7fffc6c0, 
gs=0x766ad210, spc=20, flags=0) at 
[...]/source-gcc/gcc/gimple-pretty-print.c:946
[...]

This seems to be the "LEVEL" being "-1" -- probably meaning "not yet
decided"?  (Haven't looked that up.)

> +   {
> +   case GOMP_DIM_GANG:
> + pp_string (buffer, " /*gang*/");
> + break;
> +   case GOMP_DIM_WORKER:
> + pp_string (buffer, " /*worker*/");
> + break;
> +   case GOMP_DIM_VECTOR:
> + pp_string (buffer, " /*vector*/");
> + break;
> +   default:
> + gcc_unreachable ();
> +   }
> + break;
> +   case 4:
> + {
> +   enum tree_code rcode
> + = (enum tree_code)tree_to_uhwi (gimple_call_arg (gs, i));
> +   pp_string (buffer, " /*");
> +   pp_string (buffer, op_symbol_code (rcode));
> +   pp_string (buffer, "*/");
> + }
> + break;

I take it, there is no canned function for printing the textual
representation of the tree_code "OP" ("case 4").

> +   }
> +   }
>  }
>  
>if (gimple_call_va_arg_pack_p (gs))


Grüße
 Thomas


[PATCH] For -Os change movabsq $(imm32 << shift), %rX[xip] to movl $imm2, %eX[xip]; shl $shift, %rX[xip] (PR target/82339)

2017-09-27 Thread Jakub Jelinek
Hi!

Doing a movl + shlq by constant seems to be 1 byte shorter
than movabsq, so this patch attempts to use the former form
unless flags is live.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Performance-wise, not really sure what is a win (on i7-5960X on the
testcase in the PR movl + shlq seems to be significantly faster, but e.g.
on i7-2600 it is the same) so not doing anything for speed yet.

2017-09-27  Jakub Jelinek  

PR target/82339
* config/i386/i386.md (*movdi_internal peephole2): New -Os peephole
for movabsq $(i32 << shift), r64.

--- gcc/config/i386/i386.md.jj  2017-09-21 09:26:42.0 +0200
+++ gcc/config/i386/i386.md 2017-09-27 10:24:01.520673889 +0200
@@ -2379,6 +2379,28 @@ (define_split
  gen_lowpart (SImode, operands[1]));
 })
 
+;; movabsq $0x001234567800, %rax is longer
+;; than movl $0x12345678, %eax; shlq $24, %rax.
+(define_peephole2
+  [(set (match_operand:DI 0 "register_operand")
+   (match_operand:DI 1 "const_int_operand"))]
+  "TARGET_64BIT
+   && optimize_insn_for_size_p ()
+   && LEGACY_INT_REG_P (operands[0])
+   && !x86_64_immediate_operand (operands[1], DImode)
+   && !x86_64_zext_immediate_operand (operands[1], DImode)
+   && !((UINTVAL (operands[1]) >> ctz_hwi (UINTVAL (operands[1])))
+& ~(HOST_WIDE_INT) 0x)
+   && peep2_regno_dead_p (0, FLAGS_REG)"
+  [(set (match_dup 0) (match_dup 1))
+   (parallel [(set (match_dup 0) (ashift:DI (match_dup 0) (match_dup 2)))
+ (clobber (reg:CC FLAGS_REG))])]
+{
+  int shift = ctz_hwi (UINTVAL (operands[1]));
+  operands[1] = gen_int_mode (UINTVAL (operands[1]) >> shift, DImode);
+  operands[2] = gen_int_mode (shift, QImode);
+})
+
 (define_insn "*movsi_internal"
   [(set (match_operand:SI 0 "nonimmediate_operand"
 "=r,m ,*y,*y,?*y,?m,?r ,?*Ym,*v,*v,*v,m ,?r ,?*Yi,*k,*k ,*rm")

Jakub


Re: [PATCH] Don't optimize away lhs from calls with addressable zero sized return type (PR c++/82159)

2017-09-27 Thread Richard Biener
On Wed, 27 Sep 2017, Jakub Jelinek wrote:

> Hi!
> 
> The expansion relies on lhs being kept for calls that return addressable
> types.  On the following testcase (which is a GNU extension, pedantically
> we error out on zero sized arrays) we return TREE_ADDRESSABLE
> zero_sized_type and optimize away the lhs which we need later on.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
> during the bootstrap+regtests it only made a difference on the newly added
> testcase.  Ok for trunk?

Ok.

Thanks,
Richard.

> 2017-09-27  Jakub Jelinek  
> 
>   PR c++/82159
>   * gimplify.c (gimplify_modify_expr): Don't optimize away zero sized
>   lhs from calls if the lhs has addressable type.
> 
>   * g++.dg/opt/pr82159.C: New test.
> 
> --- gcc/gimplify.c.jj 2017-09-01 09:25:34.0 +0200
> +++ gcc/gimplify.c2017-09-26 13:03:11.614726601 +0200
> @@ -5479,7 +5479,12 @@ gimplify_modify_expr (tree *expr_p, gimp
>   side as statements and throw away the assignment.  Do this after
>   gimplify_modify_expr_rhs so we handle TARGET_EXPRs of addressable
>   types properly.  */
> -  if (zero_sized_type (TREE_TYPE (*from_p)) && !want_value)
> +  if (zero_sized_type (TREE_TYPE (*from_p))
> +  && !want_value
> +  /* Don't do this for calls that return addressable types, expand_call
> +  relies on those having a lhs.  */
> +  && !(TREE_ADDRESSABLE (TREE_TYPE (*from_p))
> +&& TREE_CODE (*from_p) == CALL_EXPR))
>  {
>gimplify_stmt (from_p, pre_p);
>gimplify_stmt (to_p, pre_p);
> --- gcc/testsuite/g++.dg/opt/pr82159.C.jj 2017-09-26 13:04:08.711027279 
> +0200
> +++ gcc/testsuite/g++.dg/opt/pr82159.C2017-09-26 14:20:01.519361945 
> +0200
> @@ -0,0 +1,18 @@
> +// PR c++/82159
> +// { dg-do compile }
> +// { dg-options "" }
> +
> +template
> +struct S
> +{
> +  ~S () {}
> +  template S foo () { return S (); }
> +  unsigned char data[N];
> +};
> +
> +int
> +main ()
> +{
> +  S<16> d;
> +  S<0> t = d.foo<0> ();
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


[PATCH] Don't optimize away lhs from calls with addressable zero sized return type (PR c++/82159)

2017-09-27 Thread Jakub Jelinek
Hi!

The expansion relies on lhs being kept for calls that return addressable
types.  On the following testcase (which is a GNU extension, pedantically
we error out on zero sized arrays) we return TREE_ADDRESSABLE
zero_sized_type and optimize away the lhs which we need later on.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
during the bootstrap+regtests it only made a difference on the newly added
testcase.  Ok for trunk?

2017-09-27  Jakub Jelinek  

PR c++/82159
* gimplify.c (gimplify_modify_expr): Don't optimize away zero sized
lhs from calls if the lhs has addressable type.

* g++.dg/opt/pr82159.C: New test.

--- gcc/gimplify.c.jj   2017-09-01 09:25:34.0 +0200
+++ gcc/gimplify.c  2017-09-26 13:03:11.614726601 +0200
@@ -5479,7 +5479,12 @@ gimplify_modify_expr (tree *expr_p, gimp
  side as statements and throw away the assignment.  Do this after
  gimplify_modify_expr_rhs so we handle TARGET_EXPRs of addressable
  types properly.  */
-  if (zero_sized_type (TREE_TYPE (*from_p)) && !want_value)
+  if (zero_sized_type (TREE_TYPE (*from_p))
+  && !want_value
+  /* Don't do this for calls that return addressable types, expand_call
+relies on those having a lhs.  */
+  && !(TREE_ADDRESSABLE (TREE_TYPE (*from_p))
+  && TREE_CODE (*from_p) == CALL_EXPR))
 {
   gimplify_stmt (from_p, pre_p);
   gimplify_stmt (to_p, pre_p);
--- gcc/testsuite/g++.dg/opt/pr82159.C.jj   2017-09-26 13:04:08.711027279 
+0200
+++ gcc/testsuite/g++.dg/opt/pr82159.C  2017-09-26 14:20:01.519361945 +0200
@@ -0,0 +1,18 @@
+// PR c++/82159
+// { dg-do compile }
+// { dg-options "" }
+
+template
+struct S
+{
+  ~S () {}
+  template S foo () { return S (); }
+  unsigned char data[N];
+};
+
+int
+main ()
+{
+  S<16> d;
+  S<0> t = d.foo<0> ();
+}

Jakub


Re: [PATCH][GRAPHITE] More TLC

2017-09-27 Thread Richard Biener
On Wed, 27 Sep 2017, Richard Biener wrote:

> On Tue, 26 Sep 2017, Sebastian Pop wrote:
> 
> > On Mon, Sep 25, 2017 at 8:12 AM, Richard Biener  wrote:
> > 
> > > On Fri, 22 Sep 2017, Sebastian Pop wrote:
> > >
> > > > On Fri, Sep 22, 2017 at 8:03 AM, Richard Biener 
> > > wrote:
> > > >
> > > > >
> > > > > This simplifies canonicalize_loop_closed_ssa and does other minimal
> > > > > TLC.  It also adds a testcase I reduced from a stupid mistake I made
> > > > > when reworking canonicalize_loop_closed_ssa.
> > > > >
> > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
> > > > >
> > > > > SPEC CPU 2006 is happy with it, current statistics on x86_64 with
> > > > > -Ofast -march=haswell -floop-nest-optimize are
> > > > >
> > > > >  61 loop nests "optimized"
> > > > >  45 loop nest transforms cancelled because of code generation issues
> > > > >  21 loop nest optimizations timed out the 35 ISL "operations" we
> > > allow
> > > > >
> > > > > I say "optimized" because the usual transform I've seen is static
> > > tiling
> > > > > as enforced by GRAPHITE according to --param loop-block-tile-size.
> > > > > There's no way to automagically figure what kind of transform ISL did
> > > > >
> > > >
> > > > Here is how to automate (without magic) the detection
> > > > of the transform that isl did.
> > > >
> > > > The problem solved by isl is the minimization of strides
> > > > in memory, and to do this, we need to tell the isl scheduler
> > > > the validity dependence graph, in graphite-optimize-isl.c
> > > > see the validity (RAW, WAR, WAW) and the proximity
> > > > (RAR + validity) maps.  The proximity does include the
> > > > read after read, as the isl scheduler needs to minimize
> > > > strides between consecutive reads.
> 
> Ah, so I now see why we do not perform interchange on trivial cases like
> 
> double A[1024][1024], B[1024][1024];
> 
> void foo(void)
> {
>   for (int i = 0; i < 1024; ++i)
> for (int j = 0; j < 1024; ++j)
>   A[j][i] = B[j][i];
> }
> 
> which is probably because
> 
>   /* FIXME: proximity should not be validity.  */
>   isl_union_map *proximity = isl_union_map_copy (validity);
> 
> falls apart when there is _no_ dependence?
> 
> I can trick GRAPHITE into performing the interchange for
> 
> double A[1024][1024], B[1024][1024];
> 
> void foo(void)
> {
>   for (int i = 1; i < 1023; ++i)
> for (int j = 0; j < 1024; ++j)
>   A[j][i] = B[j][i-1] + A[j][i+1];
> }
> 
> because now there is a dependence.  Any idea on how to rewrite
> scop_get_dependences to avoid "simplifying"?  I suppose the
> validity constraints _do_ also specify kind-of a proximity
> we just may not prune / optimize them in the same way as
> dependences?

Another thing I notice is that we don't handle the multi-dimensional
accesses the fortran frontend produces:

(gdb) p debug_data_reference (dr)
#(Data Ref: 
#  bb: 18 
#  stmt: _43 = *a_141(D)[_42];
#  ref: *a_141(D)[_42];
#  base_object: *a_141(D);
#  Access function 0: {{(_38 + stride.88_115) + 1, +, 1}_4, +, 
stride.88_115}_5

ultimatively we fail here because we try to build a constraint for

{{(_38 + stride.88_115) + 1, +, 1}_4, +, stride.88_115}_5

which ends up computing isl_pw_aff_mul (A, stride.88_115) with
A being the non-constant constraint generated for
{(_38 + stride.88_115) + 1, +, 1}_4 and stride.88_115 being
a parameter.  ISL doesn't like that multiplication as the result
isn't affine (well - it is, we just have parameters in there).

I suppose ISL doesn't handle this form of accesses given the
two "dimensions" in this scalarized form may overlap?  So we'd
really need to turn those into references with different access
functions (even if that's not 100% a valid semantic transformation
as scalarization isn't reversible without extra information)?

Thanks,
Richard.


[openacc, testsuite, committed] Fix libgomp.oacc-c-c++-common/parallel-reduction.c for non-nvidia devices

2017-09-27 Thread Tom de Vries

Hi,

this patch makes the test-case 
libgomp.oacc-c-c++-common/parallel-reduction.c work for non-nvidia devices.


Committed as obvious.

Thanks,
- Tom
Fix libgomp.oacc-c-c++-common/parallel-reduction.c for non-nvidia devices

2017-09-27  Tom de Vries  

	* testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c (main):
	Remove acc_device_nvidia references.

---
 libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c
index b2c60e5..077571f 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c
@@ -21,7 +21,7 @@ main ()
 }
   }
 
-  if (acc_get_device_type () != acc_device_nvidia)
+  if (acc_get_device_type () == acc_device_host)
 {
   if (s1 != 1)
 	abort ();
@@ -41,7 +41,7 @@ main ()
 s2 += N;
   }
 
-  if (acc_get_device_type () != acc_device_nvidia)
+  if (acc_get_device_type () == acc_device_host)
 {
   if (s1 != 1)
 	abort ();


RE: 0002-Part-2.-Document-finstrument-control-flow-and-notrack attribute

2017-09-27 Thread Tsimbalist, Igor V
Updated version #4.

> -Original Message-
> From: Sandra Loosemore [mailto:san...@codesourcery.com]
> Sent: Wednesday, September 27, 2017 5:11 AM
> To: Tsimbalist, Igor V ; 'gcc-
> patc...@gcc.gnu.org' 
> Cc: Jeff Law 
> Subject: Re: 0002-Part-2.-Document-finstrument-control-flow-and-notrack
> attribute
> 
> On 09/26/2017 07:45 AM, Tsimbalist, Igor V wrote:
> > Here is the updated version (version#3). All comments below are fixed.
> 
> This still needs more work.  Specific comments below:
> 
> > +The @code{nocf_check} attribute is applied to an object's type.
> > +In case of assignment of a function address or a function pointer to
> > +another pointer, the attribute is not carried over from the
> > +right-hand object's type, the type of left-hand object stays
> > +unchanged.  The
> 
> s/object's type,/object's type;/


Fixed.

> > @@ -11348,6 +11349,35 @@ is used to link a program, the GCC driver
> > automatically links  against @file{libmpxwrappers}.  See also @option{-
> static-libmpxwrappers}.
> >  Enabled by default.
> >
> > +@item -fcf-
> protection==@r{[}full@r{|}branch@r{|}return@r{|}none@r{]}
> > +@opindex fcf-protection
> > +Enable code instrumentation of control-flow transfers to increase
> > +program security by checking that target addresses of control-flow
> > +transfer instructions (such as indirect function call, function
> > +return, indirect jump) are valid.  This prevents diverting the
> > +control flow instructions from its original target address to a new
> > +undesigned
> 
> s/control flow instructions/control-flow instructions/
> 
> I'd rewrite the next sentence as
> 
> This prevents diverting the flow of control to an unexpected target.

I used your suggestion.

> > +target.  This is intended to protect against such threats as
> > +Return-oriented Programming (ROP), and similarly call/jmp-oriented
> > +programming (COP/JOP).
> > +
> > +Each compiler target, which is going to support the control-flow
> > +instrumentation, is supposed to have its own target specific
> > +implementation. For all targets where an implementation is absent the
> > +usage of @option{-fcf-protection} option causes an error message.
> 
> I would really prefer that you list the targets this works on here instead.

Another patch you are reviewing now (its name starts with 0005-Part-5)
has the statement you would like to put here. The important point here is
an error issuing. When I commit the first patch none of target platforms 
supports
the option and an error is printed when the option is specified. I removed the
first sentence but keep the second one:

For all targets, which do not support the @option{-fcf-protection}
option, the option usage results in an error message.

> > +The value @code{branch} tells the compiler to implement checking of
> > +validity of control-flow transfer at the point of indirect branch
> > +instructions, i.e. call/jmp instructions.  The value @code{return}
> > +implements checking of validity at the point of returning from a
> > +function.  The value @code{full} is an alias for specifying both
> > +@code{branch} and @code{return}. The value @code{none} turns off
> > +instrumentation.  This value may be used for future architectures
> > +where @option{-fcf-protection} option is switched on by default.
> 
> I don't think we need to document GCC's future behavior for future
> architectures (I'm always going around removing useless discussion from
> 20 years ago of possible extensions that never got implemented).  I assume
> that this is just provided for completeness and to override a previous -fcf-
> protection option on the command line.

Ok, removed the last sentence.

> > +You can also use the @code{nocf_check} attribute to identify which
> > +functions and calls should be skipped from instrumentation
> > +(@pxref{Function Attributes}).
> > +
> >  @item -fstack-protector
> >  @opindex fstack-protector
> >  Emit extra code to check for buffer overflows, such as stack smashing
> > diff --git a/gcc/doc/rtl.texi b/gcc/doc/rtl.texi index
> > 12355c2..b4fc5f3 100644
> > --- a/gcc/doc/rtl.texi
> > +++ b/gcc/doc/rtl.texi
> > @@ -4040,6 +4040,22 @@ is used in place of the actual insn pattern.
> > This is done in cases where  the pattern is either complex or misleading.
> >  @end table
> >
> > +The note @code{REG_CALL_NOCF_CHECK} is used in conjunction with
> the
> > +@option{-fcf-protection=branch} option.  The note is set if a
> > +@code{nocf_check} attribute is specified for a function type or a
> > +pointer to function type.  The note is stored in the @code{REG_NOTES}
> > +field of an insn.
> > +
> > +@table @code
> > +@findex REG_CALL_NOCF_CHECK
> > +@item REG_CALL_NOCF_CHECK
> > +A user has a control through the @code{nocf_check} attribute to
> > +identify
> 
> S/A user has a control/Users have control/

Fixed.

> > +which call to a function should be skipped from control-flow
> > +instrumentation
> 
> 

[PATCH][GRAPHITE] Make --param loop-block-tile-size=0 disable tiling

2017-09-27 Thread Richard Biener

Currently ISL aborts on this special value and for debugging (and 
tuning?) it's nice to avoid all the clutter introduced by tiling.

Committed as obvious.

Richard.

2017-09-27  Richard Biener  

* graphite-optimize-isl.c (get_schedule_for_node_st): Allow
--param loop-block-tile-size=0 to disable tiling.

Index: gcc/graphite-optimize-isl.c
===
--- gcc/graphite-optimize-isl.c (revision 253226)
+++ gcc/graphite-optimize-isl.c (working copy)
@@ -64,7 +64,10 @@ get_schedule_for_node_st (__isl_take isl
   if (type != isl_schedule_node_leaf)
 return node;
 
-  if (dims <= 1 || !isl_schedule_node_band_get_permutable (node))
+  long tile_size = PARAM_VALUE (PARAM_LOOP_BLOCK_TILE_SIZE);
+  if (dims <= 1
+  || tile_size == 0
+  || !isl_schedule_node_band_get_permutable (node))
 {
   if (dump_file && dump_flags)
fprintf (dump_file, "not tiled\n");
@@ -74,7 +77,6 @@ get_schedule_for_node_st (__isl_take isl
   /* Tile loops.  */
   space = isl_schedule_node_band_get_space (node);
   isl_multi_val *sizes = isl_multi_val_zero (space);
-  long tile_size = PARAM_VALUE (PARAM_LOOP_BLOCK_TILE_SIZE);
   isl_ctx *ctx = isl_schedule_node_get_ctx (node);
 
   for (unsigned i = 0; i < dims; i++)


Re: [PATCH][GRAPHITE] More TLC

2017-09-27 Thread Richard Biener
On Tue, 26 Sep 2017, Sebastian Pop wrote:

> On Mon, Sep 25, 2017 at 8:12 AM, Richard Biener  wrote:
> 
> > On Fri, 22 Sep 2017, Sebastian Pop wrote:
> >
> > > On Fri, Sep 22, 2017 at 8:03 AM, Richard Biener 
> > wrote:
> > >
> > > >
> > > > This simplifies canonicalize_loop_closed_ssa and does other minimal
> > > > TLC.  It also adds a testcase I reduced from a stupid mistake I made
> > > > when reworking canonicalize_loop_closed_ssa.
> > > >
> > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
> > > >
> > > > SPEC CPU 2006 is happy with it, current statistics on x86_64 with
> > > > -Ofast -march=haswell -floop-nest-optimize are
> > > >
> > > >  61 loop nests "optimized"
> > > >  45 loop nest transforms cancelled because of code generation issues
> > > >  21 loop nest optimizations timed out the 35 ISL "operations" we
> > allow
> > > >
> > > > I say "optimized" because the usual transform I've seen is static
> > tiling
> > > > as enforced by GRAPHITE according to --param loop-block-tile-size.
> > > > There's no way to automagically figure what kind of transform ISL did
> > > >
> > >
> > > Here is how to automate (without magic) the detection
> > > of the transform that isl did.
> > >
> > > The problem solved by isl is the minimization of strides
> > > in memory, and to do this, we need to tell the isl scheduler
> > > the validity dependence graph, in graphite-optimize-isl.c
> > > see the validity (RAW, WAR, WAW) and the proximity
> > > (RAR + validity) maps.  The proximity does include the
> > > read after read, as the isl scheduler needs to minimize
> > > strides between consecutive reads.

Ah, so I now see why we do not perform interchange on trivial cases like

double A[1024][1024], B[1024][1024];

void foo(void)
{
  for (int i = 0; i < 1024; ++i)
for (int j = 0; j < 1024; ++j)
  A[j][i] = B[j][i];
}

which is probably because

  /* FIXME: proximity should not be validity.  */
  isl_union_map *proximity = isl_union_map_copy (validity);

falls apart when there is _no_ dependence?

I can trick GRAPHITE into performing the interchange for

double A[1024][1024], B[1024][1024];

void foo(void)
{
  for (int i = 1; i < 1023; ++i)
for (int j = 0; j < 1024; ++j)
  A[j][i] = B[j][i-1] + A[j][i+1];
}

because now there is a dependence.  Any idea on how to rewrite
scop_get_dependences to avoid "simplifying"?  I suppose the
validity constraints _do_ also specify kind-of a proximity
we just may not prune / optimize them in the same way as
dependences?

Richard.



RE: 0005-Part-5.-Add-x86-CET-documentation

2017-09-27 Thread Tsimbalist, Igor V
> -Original Message-
> From: Florian Weimer [mailto:fwei...@redhat.com]
> Sent: Wednesday, September 27, 2017 10:52 AM
> To: Sandra Loosemore ; Tsimbalist, Igor V
> ; Uros Bizjak 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: 0005-Part-5.-Add-x86-CET-documentation
> 
> On 09/27/2017 05:40 AM, Sandra Loosemore wrote:
> >>
> >> +@emph{x86 implementation:} when @option{-fcf-protection} option is
> >> +specified the compiler inserts an ENDBR instruction at function's
> >> +prologue if the function's type does not have the @code{nocf_check}
> >> +attribute and addresses to which indirect control-flow transfer can
> >> +happen.  The instruction triggers the HW check if a control-flow
> >> +transfer to the address of ENDBR instruction is valid.
> >
> > Implementation details like this should be comments in the code, not
> > included in the user-facing documentation.
> 
> This is part of the ABI GCC implements, so it has to be documented
> somewhere, and not just as part of the GCC source code.

A question for both Sandra and Florian - What is your suggestion where the text 
should go?

> CET is not properly described in the ABI supplement and I don't think this 
> will
> change, so detailed documentation in the GCC manual is very much
> desirable.
> 
> That being said, the implementation notes above need some clarification.
>   It's not clear to me what the conditions are under which the ENDBR
> instruction is emitted (and we probably should use @code{endbr} in the
> manual), what it is trying to achieve, and how the x86 calling convention
> changes.  I assume it is somehow related to what we call internally “the 
> suffix

We are diving into implementation details but it's simple enough.

- endbr is generated for every function, which does not have nocf_check 
attribute.
   Optimization can be done later to exclude functions, whose address was not 
taken.
- there is no change in calling convention

Thanks,
Igor

> problem”: without control flow integrity, an attacker might skip over
> precondition/hardening checks, directly to the critical changes we want to
> protect, executing only the suffix of a function (hence the name).
> 
> Thanks,
> Florian


[PATCH][GRAPHITE] Allow --param graphite-max-arrays-per-scop=0

2017-09-27 Thread Richard Biener

The following is to allow making --param graphite-max-arrays-per-scop
unbounded.  That's a little tricky because the bound is used when
computing "alias-sets" for scalar constraints.  There's an easy way
out though as we know the maximum alias-set assigned in the SCOP,
we only have to remember it.  The advantage (if it matters at all)
is that we avoid a constraint coefficient gap between that last
used alias-set and the former PARAM_GRAPHITE_MAX_ARRAYS_PER_SCOP.

Bootstrap and regtest running on x86_64-unknown-linux-gnu, SPEC CPU 2006
tested.  Will apply after testing finished.

Richard.

2017-09-27  Richard Biener  

* graphite.h (scop::max_alias_set): New member.
* graphite-scop-detection.c: Remove references to non-existing
--param in comments.
(build_alias_sets): Record the maximum alias set used for drs.
(build_scops): Support zero as unlimited for
--param graphite-max-arrays-per-scop.
* graphite-sese-to-poly.c (add_scalar_version_numbers): Remove
and inline into ...
(build_poly_sr_1): ... here.  Compute alias set based on the
maximum alias set used for drs rather than
PARAM_GRAPHITE_MAX_ARRAYS_PER_SCOP

Index: gcc/graphite-scop-detection.c
===
--- gcc/graphite-scop-detection.c   (revision 253226)
+++ gcc/graphite-scop-detection.c   (working copy)
@@ -389,10 +389,7 @@ public:
 
   void remove_intersecting_scops (sese_l s1);
 
-  /* Return true when a statement in SCOP cannot be represented by Graphite.
- The assumptions are that L1 dominates L2, and SCOP->entry dominates L1.
- Limit the number of bbs between adjacent loops to
- PARAM_SCOP_MAX_NUM_BBS_BETWEEN_LOOPS.  */
+  /* Return true when a statement in SCOP cannot be represented by Graphite.  
*/
 
   bool harmful_loop_in_region (sese_l scop) const;
 
@@ -760,10 +757,7 @@ scop_detection::add_scop (sese_l s)
   DEBUG_PRINT (dp << "[scop-detection] Adding SCoP: "; print_sese (dump_file, 
s));
 }
 
-/* Return true when a statement in SCOP cannot be represented by Graphite.
-   The assumptions are that L1 dominates L2, and SCOP->entry dominates L1.
-   Limit the number of bbs between adjacent loops to
-   PARAM_SCOP_MAX_NUM_BBS_BETWEEN_LOOPS.  */
+/* Return true when a statement in SCOP cannot be represented by Graphite.  */
 
 bool
 scop_detection::harmful_loop_in_region (sese_l scop) const
@@ -1531,7 +1525,8 @@ build_alias_set (scop_p scop)
   for (i = 0; i < num_vertices; i++)
 all_vertices[i] = i;
 
-  graphds_dfs (g, all_vertices, num_vertices, NULL, true, NULL);
+  scop->max_alias_set
+= graphds_dfs (g, all_vertices, num_vertices, NULL, true, NULL) + 1;
   free (all_vertices);
 
   for (i = 0; i < g->n_vertices; i++)
@@ -1755,7 +1750,8 @@ build_scops (vec *scops)
}
 
   unsigned max_arrays = PARAM_VALUE (PARAM_GRAPHITE_MAX_ARRAYS_PER_SCOP);
-  if (scop->drs.length () >= max_arrays)
+  if (max_arrays > 0
+ && scop->drs.length () >= max_arrays)
{
  DEBUG_PRINT (dp << "[scop-detection-fail] too many data references: "
   << scop->drs.length ()
Index: gcc/graphite-sese-to-poly.c
===
--- gcc/graphite-sese-to-poly.c (revision 253225)
+++ gcc/graphite-sese-to-poly.c (working copy)
@@ -491,25 +491,6 @@ pdr_add_alias_set (isl_map *acc, dr_info
   return isl_map_add_constraint (acc, c);
 }
 
-/* Add a constrain to the ACCESSES polyhedron for the alias set of
-   data reference DR.  ACCESSP_NB_DIMS is the dimension of the
-   ACCESSES polyhedron, DOM_NB_DIMS is the dimension of the iteration
-   domain.  */
-
-static isl_map *
-add_scalar_version_numbers (isl_map *acc, tree var)
-{
-  isl_constraint *c = isl_equality_alloc
-  (isl_local_space_from_space (isl_map_get_space (acc)));
-  int max_arrays = PARAM_VALUE (PARAM_GRAPHITE_MAX_ARRAYS_PER_SCOP);
-  /* Each scalar variables has a unique alias set number starting from
- max_arrays.  */
-  c = isl_constraint_set_constant_si (c, -max_arrays - SSA_NAME_VERSION (var));
-  c = isl_constraint_set_coefficient_si (c, isl_dim_out, 0, 1);
-
-  return isl_map_add_constraint (acc, c);
-}
-
 /* Assign the affine expression INDEX to the output dimension POS of
MAP and return the result.  */
 
@@ -684,13 +665,21 @@ static void
 build_poly_sr_1 (poly_bb_p pbb, gimple *stmt, tree var, enum poly_dr_type kind,
 isl_map *acc, isl_set *subscript_sizes)
 {
-  int max_arrays = PARAM_VALUE (PARAM_GRAPHITE_MAX_ARRAYS_PER_SCOP);
+  scop_p scop = PBB_SCOP (pbb);
   /* Each scalar variables has a unique alias set number starting from
- max_arrays.  */
+ the maximum alias set assigned to a dr.  */
+  int alias_set = scop->max_alias_set + SSA_NAME_VERSION (var);
   subscript_sizes = isl_set_fix_si (subscript_sizes, isl_dim_set, 0,
-   

[PATCH][GRAPHITE] Remove another small quadraticness

2017-09-27 Thread Richard Biener

Turns out loop_nest recorded in scop-info isn't really necessary as
we can simply process parameters in loop bounds during the gather_bbs
walk where we encounter each loop (identified by its header) once.

This avoids the linear search in record_loop_in_sese.

Bootstrap / regtest running on x86_64-unknown-linux-gnu, will apply.

Richard.

2017-09-27  Richard Biener  

* graphite-scop-detection.c (find_scop_parameters): Move
loop bound handling ...
(gather_bbs::before_dom_children): ... here, avoiding the need
to build scop_info->loop_nest.
(record_loop_in_sese): Remove.
* sese.h (sese_info_t::loop_nest): Remove.
* sese.c (new_sese_info): Do not allocate loop_nest.
(free_sese_info): Do not free loop_nest.

Index: gcc/graphite-scop-detection.c
===
--- gcc/graphite-scop-detection.c   (revision 253226)
+++ gcc/graphite-scop-detection.c   (working copy)
@@ -1330,7 +1324,7 @@ find_params_in_bb (sese_info_p region, g
 }
 }
 
-/* Record the parameters used in the SCOP.  A variable is a parameter
+/* Record the parameters used in the SCOP BBs.  A variable is a parameter
in a scop if it does not vary during the execution of that scop.  */
 
 static void
@@ -1338,19 +1332,8 @@ find_scop_parameters (scop_p scop)
 {
   unsigned i;
   sese_info_p region = scop->scop_info;
-  struct loop *loop;
 
-  /* Find the parameters used in the loop bounds.  */
-  FOR_EACH_VEC_ELT (region->loop_nest, i, loop)
-{
-  tree nb_iters = number_of_latch_executions (loop);
-
-  if (!chrec_contains_symbols (nb_iters))
-   continue;
-
-  nb_iters = scalar_evolution_in_region (region->region, loop, nb_iters);
-  scan_tree_for_params (region, nb_iters);
-}
+  /* Parameters used in loop bounds are processed during gather_bbs.  */
 
   /* Find the parameters used in data accesses.  */
   poly_bb_p pbb;
@@ -1560,28 +1544,6 @@ gather_bbs::gather_bbs (cdi_direction di
 {
 }
 
-/* Record in execution order the loops fully contained in the region.  */
-
-static void
-record_loop_in_sese (basic_block bb, sese_info_p region)
-{
-  loop_p father = bb->loop_father;
-  if (loop_in_sese_p (father, region->region))
-{
-  bool found = false;
-  loop_p loop0;
-  int j;
-  FOR_EACH_VEC_ELT (region->loop_nest, j, loop0)
-   if (father == loop0)
- {
-   found = true;
-   break;
- }
-  if (!found)
-   region->loop_nest.safe_push (father);
-}
-}
-
 /* Call-back for dom_walk executed before visiting the dominated
blocks.  */
 
@@ -1592,7 +1554,20 @@ gather_bbs::before_dom_children (basic_b
   if (!bb_in_sese_p (bb, region->region))
 return dom_walker::STOP;
 
-  record_loop_in_sese (bb, region);
+  /* For loops fully contained in the region record parameters in the
+ loop bounds.  */
+  loop_p loop = bb->loop_father;
+  if (loop->header == bb
+  && loop_in_sese_p (loop, region->region))
+{
+  tree nb_iters = number_of_latch_executions (loop);
+  if (chrec_contains_symbols (nb_iters))
+   {
+ nb_iters = scalar_evolution_in_region (region->region,
+loop, nb_iters);
+ scan_tree_for_params (region, nb_iters);
+   }
+}
 
   gcond *stmt = single_pred_cond_non_loop_exit (bb);
 
Index: gcc/sese.c
===
--- gcc/sese.c  (revision 253226)
+++ gcc/sese.c  (working copy)
@@ -179,7 +179,6 @@ new_sese_info (edge entry, edge exit)
 
   region->region.entry = entry;
   region->region.exit = exit;
-  region->loop_nest.create (3);
   region->params.create (3);
   region->rename_map = new rename_map_t;
   region->parameter_rename_map = new parameter_rename_map_t;
@@ -197,7 +196,6 @@ void
 free_sese_info (sese_info_p region)
 {
   region->params.release ();
-  region->loop_nest.release ();
 
   for (rename_map_t::iterator it = region->rename_map->begin ();
it != region->rename_map->end (); ++it)
Index: gcc/sese.h
===
--- gcc/sese.h  (revision 253226)
+++ gcc/sese.h  (working copy)
@@ -94,9 +94,6 @@ typedef struct sese_info_t
   /* Parameters to be renamed.  */
   parameter_rename_map_t *parameter_rename_map;
 
-  /* Loops completely contained in this SESE.  */
-  vec loop_nest;
-
   /* Basic blocks contained in this SESE.  */
   vec bbs;
 


[PATCH][GRAPHITE] Speedup SCOP detection some more, add region handling to domwalk

2017-09-27 Thread Richard Biener

This removes another quadraticness from SCOP detection, gather_bbs
domwalk.  This is done by enhancing domwalk to handle SEME regions
via a special return value from before_dom_children.

With this I'm now confident to remove the 
PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION parameter and its associated limit.
Being there I've adjusted PARAM_GRAPHITE_MAX_NB_SCOP_PARAMS to its
documented default value which enables 90 more loos to be processed
in SPEC CPU 2006.  I've also made a value of zero magic in disabling
the limit (a trick commonly used in GCC).

Statistics I have gathered a few patches before for SPEC CPU 2006:

1255 multi-loop SESEs in SCOP processing
max. params 34, 3 scops >= 20, 15 scops >= 10, 33 scops >= 8
max. drs per scop 869, 10 scops >= 100
max. pbbs per scop 36, 12 scops >= 10
919 SCOPs fail in build_alias_sets

which shows the default for PARAM_GRAPHITE_MAX_ARRAYS_PER_SCOP
is reasonable (if tuned to SPEC CPU 2006).

I've also included the hunk that allows -fgraphite-identity
to work ontop of -floop-nest-optimize and for -floop-nest-optimize
-ftree-parallelize-all also make sure to code-gen loops that
end up not transformed.

Bootstrapped and tested on x86_64-unknown-linux-gnu, SPEC CPU 2006
tested, applied to trunk.

Richard.

2017-09-27  Richard Biener  

* doc/invoke.texi (graphite-max-bbs-per-function): Remove.
(graphite-max-nb-scop-params): Document special value zero.
* domwalk.h (dom_walker::STOP): New symbolical constant.
(dom_walker::dom_walker): Add optional parameter for bb to
RPO mapping.
(dom_walker::~dom_walker): Declare.
(dom_walker::before_dom_children): Document STOP return value.
(dom_walker::m_user_bb_to_rpo): New member.
(dom_walker::m_bb_to_rpo): Likewise.
* domwalk.c (dom_walker::dom_walker): Compute bb to RPO
mapping here if not provided by the user.
(dom_walker::~dom_walker): Free bb to RPO mapping if not
provided by the user.
(dom_walker::STOP): Define.
(dom_walker::walk): Do not compute bb to RPO mapping here.
Support STOP return value from before_dom_children to stop
walking.
* graphite-optimize-isl.c (optimize_isl): If the schedule
is the same still generate code if -fgraphite-identity
or -floop-parallelize-all are given.
* graphite-scop-detection.c: Include cfganal.h.
(gather_bbs::gather_bbs): Get and pass through bb to RPO
mapping.
(gather_bbs::before_dom_children): Return STOP for BBs
not in the region.
(build_scops): Compute bb to RPO mapping and pass it to
the domwalk.  Treat --param graphite-max-nb-scop-params=0
as not limiting the number of params.
* graphite.c (graphite_initialize): Remove limit on the
number of basic-blocks in a function.
* params.def (PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION): Remove.
(PARAM_GRAPHITE_MAX_NB_SCOP_PARAMS): Adjust to documented
default value of 10.

Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 253224)
+++ gcc/doc/invoke.texi (working copy)
@@ -10512,13 +10512,9 @@ sequence pairs.  This option only applie
 @item graphite-max-nb-scop-params
 To avoid exponential effects in the Graphite loop transforms, the
 number of parameters in a Static Control Part (SCoP) is bounded.  The
-default value is 10 parameters.  A variable whose value is unknown at
-compilation time and defined outside a SCoP is a parameter of the SCoP.
-
-@item graphite-max-bbs-per-function
-To avoid exponential effects in the detection of SCoPs, the size of
-the functions analyzed by Graphite is bounded.  The default value is
-100 basic blocks.
+default value is 10 parameters, a value of zero can be used to lift
+the bound.  A variable whose value is unknown at compilation time and
+defined outside a SCoP is a parameter of the SCoP.
 
 @item loop-block-tile-size
 Loop blocking or strip mining transforms, enabled with
Index: gcc/domwalk.c
===
--- gcc/domwalk.c   (revision 253224)
+++ gcc/domwalk.c   (working copy)
@@ -174,13 +174,29 @@ sort_bbs_postorder (basic_block *bbs, in
If SKIP_UNREACHBLE_BLOCKS is true, then we need to set
EDGE_EXECUTABLE on every edge in the CFG. */
 dom_walker::dom_walker (cdi_direction direction,
-   bool skip_unreachable_blocks)
+   bool skip_unreachable_blocks,
+   int *bb_index_to_rpo)
   : m_dom_direction (direction),
 m_skip_unreachable_blocks (skip_unreachable_blocks),
-m_unreachable_dom (NULL)
+m_user_bb_to_rpo (bb_index_to_rpo != NULL),
+m_unreachable_dom (NULL),
+m_bb_to_rpo (bb_index_to_rpo)
 {
+  /* Compute the basic-block index to RPO mapping if not provided by
+ the user.  */
+  if (! m_bb_to_rpo && direction 

Re: [AArch64] PR71307: Define union class of POINTER+FP

2017-09-27 Thread Richard Earnshaw (lists)
On 18/09/17 17:39, Richard Sandiford wrote:
> ALL_REGS doesn't function as a union class of POINTER_REGS and FP_REGS
> since it includes the CC register as well.  REGNO_REG_CLASS (CC_REGNUM)
> is NO_REGS, but of course NO_REGS rightly doesn't include CC_REGNUM.
> 
> Adding a union class for POINTER+FP allows the RA to use it as the
> preferred or alternative class of a pseudo.  It also works as a
> union class of GENERAL+FP for modes that aren't allowed in SP.
> 
> This is also needed for the SVE port, which adds predicate registers
> to the mix.
> 
> The combination of r252033 and this patch fixes PR71307.  Tested on
> aarch64-linux-gnu.  Also tested on SPEC2k6, where there were no
> differences outside the (mostly low) noise.  OK to install?
> 
> The main potential disadvantage I can see is that the -fsched-pressure
> code isn't very good at handling union classes: it generally just updates
> one pressure class for each pseudo.  I haven't found any specific examples
> of that causing problems though.
> 
> Thanks,
> Richard
> 
> 
> 2017-09-15  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   PR target/71307
>   * config/aarch64/aarch64.h (POINTER_AND_FP_REGS): New reg class.
>   (REG_CLASS_NAMES, REG_CLASS_CONTENTS): Update accordingly.
>   * config/aarch64/aarch64.c (aarch64_class_max_nregs): Handle
>   POINTER_AND_FP_REGS.
> 
> gcc/testsuite/
>   PR target/71307
>   * gcc.target/aarch64/vect_copy_lane_1.c: Remove XFAIL.

OK.

R.

> 
> Index: gcc/config/aarch64/aarch64.h
> ===
> --- gcc/config/aarch64/aarch64.h  2017-09-15 14:47:33.167333414 +0100
> +++ gcc/config/aarch64/aarch64.h  2017-09-18 17:31:34.720209011 +0100
> @@ -452,6 +452,7 @@ enum reg_class
>POINTER_REGS,
>FP_LO_REGS,
>FP_REGS,
> +  POINTER_AND_FP_REGS,
>ALL_REGS,
>LIM_REG_CLASSES/* Last */
>  };
> @@ -467,6 +468,7 @@ #define REG_CLASS_NAMES   \
>"POINTER_REGS",\
>"FP_LO_REGS",  \
>"FP_REGS", \
> +  "POINTER_AND_FP_REGS", \
>"ALL_REGS" \
>  }
>  
> @@ -479,6 +481,7 @@ #define REG_CLASS_CONTENTS
> \
>{ 0x, 0x, 0x0003 },/* POINTER_REGS */  \
>{ 0x, 0x, 0x },   /* FP_LO_REGS  */\
>{ 0x, 0x, 0x },   /* FP_REGS  */   
> \
> +  { 0x, 0x, 0x0003 },/* POINTER_AND_FP_REGS */\
>{ 0x, 0x, 0x0007 } /* ALL_REGS */  \
>  }
>  
> Index: gcc/config/aarch64/aarch64.c
> ===
> --- gcc/config/aarch64/aarch64.c  2017-09-18 14:58:24.012256423 +0100
> +++ gcc/config/aarch64/aarch64.c  2017-09-18 17:31:34.720209011 +0100
> @@ -6009,6 +6009,7 @@ aarch64_class_max_nregs (reg_class_t reg
>  case POINTER_REGS:
>  case GENERAL_REGS:
>  case ALL_REGS:
> +case POINTER_AND_FP_REGS:
>  case FP_REGS:
>  case FP_LO_REGS:
>return
> Index: gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c
> ===
> --- gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c   2016-11-22 
> 21:16:00.0 +
> +++ gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c   2017-09-18 
> 17:31:34.720209011 +0100
> @@ -45,8 +45,7 @@ BUILD_TEST (uint32x2_t,  uint32x4_t,  ,
>  BUILD_TEST (float64x1_t, float64x2_t, , q, f64, 0, 1)
>  BUILD_TEST (int64x1_t,  int64x2_t,, q, s64, 0, 1)
>  BUILD_TEST (uint64x1_t, uint64x2_t,   , q, u64, 0, 1)
> -/* XFAIL due to PR 71307.  */
> -/* { dg-final { scan-assembler-times "dup\\td0, v1.d\\\[1\\\]" 3 { xfail 
> *-*-* } } } */
> +/* { dg-final { scan-assembler-times "dup\\td0, v1.d\\\[1\\\]" 3 } } */
>  
>  /* vcopyq_lane.  */
>  BUILD_TEST (poly8x16_t, poly8x8_t, q, , p8, 15, 7)
> 



Re: [PATCH] [testsuite, ARM] Backport to GCC 7 branch

2017-09-27 Thread Richard Earnshaw (lists)
On 21/09/17 09:32, Christophe Lyon wrote:
> Hi,
> 
> Can I backport my patch r249639 (Add -mfloat-abi=hard to arm_neon_ok)
> to the gcc-7 branch ?
> It fixes a few false failures.
> 
> It applies cleanly to current trunk, and we have had it in our
> linaro-7-branch for a while.
> 

OK.

R.

> Thanks.
> 
> Christophe
> 
> 
> backport-r249639.chlog.txt
> 
> 
>   gcc/
>   Backport from trunk r249639.
>   2017-06-26  Christophe Lyon  
> 
>   * doc/sourcebuild.texi (ARM-specific attributes): Document new
>   arm_neon_ok_no_float_abi effective target.
> 
>   gcc/testsuite/
>   Backport from trunk r249639.
>   2017-06-26  Christophe Lyon  
> 
>   * lib/target-supports.exp
>   (check_effective_target_arm_neon_ok_nocache): Add flags with
>   -mfloat-abi=hard. Include arm_neon.h.
>   (check_effective_target_arm_neon_ok_no_float_abi_nocache): New.
>   (check_effective_target_arm_neon_ok_no_float_abi): New.
>   * gcc.target/arm/lto/pr65837_0.c: Require
>   arm_neon_ok_no_float_abi. Add -mfpu=neon to dg-lto-options.
>   * gcc.target/arm/lto/pr65837-attr_0.c: Require
>   arm_neon_ok_no_float_abi. Remove dg-suppress-ld-options.
> 
> 
> backport-r249639.patch.txt
> 
> 
> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> index 84d9a22..c7bb4b7 100644
> --- a/gcc/doc/sourcebuild.texi
> +++ b/gcc/doc/sourcebuild.texi
> @@ -1570,6 +1570,12 @@
>  ARM Target supports @code{-mfpu=neon -mfloat-abi=softfp} or compatible
>  options.  Some multilibs may be incompatible with these options.
>  
> +@item arm_neon_ok_no_float_abi
> +@anchor{arm_neon_ok_no_float_abi}
> +ARM Target supports NEON with @code{-mfpu=neon}, but without any
> +-mfloat-abi= option.  Some multilibs may be incompatible with this
> +option.
> +
>  @item arm_neonv2_ok
>  @anchor{arm_neonv2_ok}
>  ARM Target supports @code{-mfpu=neon-vfpv4 -mfloat-abi=softfp} or compatible
> diff --git a/gcc/testsuite/gcc.target/arm/lto/pr65837-attr_0.c 
> b/gcc/testsuite/gcc.target/arm/lto/pr65837-attr_0.c
> index ebc5f44..f00480b 100644
> --- a/gcc/testsuite/gcc.target/arm/lto/pr65837-attr_0.c
> +++ b/gcc/testsuite/gcc.target/arm/lto/pr65837-attr_0.c
> @@ -1,6 +1,7 @@
>  /* { dg-lto-do run } */
>  /* { dg-require-effective-target arm_neon_hw } */
> -/* { dg-lto-options {{-flto}} } */
> +/* { dg-require-effective-target arm_neon_ok_no_float_abi } */
> +/* { dg-lto-options {{-flto -mfpu=neon}} } */
>  
>  #include "arm_neon.h"
>  
> diff --git a/gcc/testsuite/gcc.target/arm/lto/pr65837_0.c 
> b/gcc/testsuite/gcc.target/arm/lto/pr65837_0.c
> index 6b2def9..5d7cea7 100644
> --- a/gcc/testsuite/gcc.target/arm/lto/pr65837_0.c
> +++ b/gcc/testsuite/gcc.target/arm/lto/pr65837_0.c
> @@ -1,7 +1,7 @@
>  /* { dg-lto-do run } */
>  /* { dg-require-effective-target arm_neon_hw } */
> +/* { dg-require-effective-target arm_neon_ok_no_float_abi } */
>  /* { dg-lto-options {{-flto -mfpu=neon}} } */
> -/* { dg-suppress-ld-options {-mfpu=neon} } */
>  
>  #include "arm_neon.h"
>  
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index 57caec7..d20e7d3 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -3428,8 +3428,9 @@
>  global et_arm_neon_flags
>  set et_arm_neon_flags ""
>  if { [check_effective_target_arm32] } {
> - foreach flags {"" "-mfloat-abi=softfp" "-mfpu=neon" "-mfpu=neon 
> -mfloat-abi=softfp" "-mfpu=neon -mfloat-abi=softfp -march=armv7-a"} {
> + foreach flags {"" "-mfloat-abi=softfp" "-mfpu=neon" "-mfpu=neon 
> -mfloat-abi=softfp" "-mfpu=neon -mfloat-abi=softfp -march=armv7-a" 
> "-mfloat-abi=hard" "-mfpu=neon -mfloat-abi=hard" "-mfpu=neon -mfloat-abi=hard 
> -march=armv7-a"} {
>   if { [check_no_compiler_messages_nocache arm_neon_ok object {
> + #include 
>   int dummy;
>   #ifndef __ARM_NEON__
>   #error not NEON
> @@ -3454,6 +3455,38 @@
>   check_effective_target_arm_neon_ok_nocache]
>  }
>  
> +# Return 1 if this is an ARM target supporting -mfpu=neon without any
> +# -mfloat-abi= option.  Useful in tests where add_options is not
> +# supported (such as lto tests).
> +
> +proc check_effective_target_arm_neon_ok_no_float_abi_nocache { } {
> +if { [check_effective_target_arm32] } {
> + foreach flags {"-mfpu=neon"} {
> + if { [check_no_compiler_messages_nocache arm_neon_ok_no_float_abi 
> object {
> + #include 
> + int dummy;
> + #ifndef __ARM_NEON__
> + #error not NEON
> + #endif
> + /* Avoid the case where a test adds -mfpu=neon, but the 
> toolchain is
> +configured for -mcpu=arm926ej-s, for example.  */
> + #if __ARM_ARCH < 7 || __ARM_ARCH_PROFILE == 'M'
> + #error Architecture does not support NEON.
> + #endif
> +  

C++ patch ping

2017-09-27 Thread Jakub Jelinek
Hi!

I'd like to ping 2 C++2A patches:
  http://gcc.gnu.org/ml/gcc-patches/2017-09/msg01235.html
P0683R1 - default member initializers for bit-fields

  http://gcc.gnu.org/ml/gcc-patches/2017-09/msg01237.html
P0704R1 - fixing const-qualified pointers to members

Thanks

Jakub


Re: [GCC][PATCH][testsuite][mid-end] Fix failing slp test on aarch64 and arm.

2017-09-27 Thread Richard Earnshaw (lists)
On 26/09/17 14:12, Tamar Christina wrote:
> Hi All,
> 
> The slp vectorization test currently fails on AArch32 and AArch64
> due to it not taking into account that we do have 128 bit vectors in
> NEON. This means that two of the loops get vectorized instead of just 1.
> 
> So update the conditions to include a check for neon.
> 
> Regtested on aarch64-none-elf.
> 
> Ok for trunk?
> 
> Thanks,
> Tamar.
> 
> gcc/testsuite/
> 2017-09-26  Tamar Christina  
> 
>   * gcc.dg/vect/slp-perm-9.c: Add arm_neon_ok checks.
> 
> 

It would be better to fix this generically by adding a vect_sizes_16B_8B
rule.

R.

> 8221-diff.patch
> 
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-9.c 
> b/gcc/testsuite/gcc.dg/vect/slp-perm-9.c
> index 
> 4d9c11dcc476a8023b3eaac2ae76cc01bd0db182..816c4b31be80dc6ab77bda838f77357e2157ffb9
>  100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-perm-9.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-9.c
> @@ -54,8 +54,8 @@ int main (int argc, const char* argv[])
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target 
> { {! vect_perm } || {! vect_sizes_32B_16B } } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect"  { target 
> { { vect_perm } && { vect_sizes_32B_16B } } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target 
> { { {! vect_perm } || {! vect_sizes_32B_16B } } && {! arm_neon_ok} } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect"  { target 
> { { { vect_perm } && { vect_sizes_32B_16B } } || arm_neon_ok } } } } */
>  /* { dg-final { scan-tree-dump-times "permutation requires at least three 
> vectors" 1 "vect" { target vect_perm_short } } } */
>  /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" 
> { target { {! vect_perm } || {! vect_sizes_32B_16B } } } } } */
>  /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
> { target { { vect_perm } && { vect_sizes_32B_16B } } } } } */
> 



Re: [PATCH], Improve moving SFmode to GPR on PowerPC, #7 of 8

2017-09-27 Thread Segher Boessenkool
On Tue, Sep 26, 2017 at 06:37:17PM -0400, Michael Meissner wrote:
> On Tue, Sep 26, 2017 at 04:56:54PM -0500, Segher Boessenkool wrote:
> > On Tue, Sep 26, 2017 at 10:48:29AM -0400, Michael Meissner wrote:
> > >   * config/rs6000/vsx.md (peephole for optimizing move SF to GPR):
> > >   Adjust code to eliminate needing to do the shift right 32-bits
> > >   operation after XSCVDPSPN.
> > 
> > After staring at this way too long...  Looks correct.  What a monster :-)
> > 
> > Okay for trunk.  Thanks!
> 
> Thanks for taking the time to verify it.
> 
> Yeah, it is a monster to get right.  It would be nice to put this off to a
> separate MD pass, instead of abusing peephole2's.

Or maybe we can teach LRA to help us out here.  LRA dearly needs some
target hooks for the target to tell it what code sequences are good
for the target.


Segher


Re: [PATCH 4/5] New target check: vect_nopeel - v2

2017-09-27 Thread Rainer Orth
Hi Andreas,

> On 09/27/2017 10:10 AM, Rainer Orth wrote:
>> Hi Andreas,
>> 
>>> On 09/26/2017 02:26 PM, Rainer Orth wrote:
 Hi Andreas,

> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> index 307c726..3acfd85 100644
> --- a/gcc/doc/sourcebuild.texi
> +++ b/gcc/doc/sourcebuild.texi
> @@ -1398,6 +1398,9 @@ Target supports a vector misalign access.
>  @item vect_no_align
>  Target does not support a vector alignment mechanism.
>
> +@item vect_no_peel
> +Target does not require any loop peeling for alignment purposes.
> +
>  @item vect_no_int_min_max
>  Target does not support a vector min and max instruction on @code{int}.

 please keep the items sorted alphabetically.
>>>
>>> The items do not appear to be sorted alphabetically.
>> 
>> they should be.  Your patch makes the ordering even more random.
>> 
>> Patch to fix this preapproved ;-)
> The items rather appear to be arranged by subject. Does it really make
> sense do pull items like this
> apart just to have it in alphabetical order?
>
> @item vect_intfloat_cvt
> Target supports conversion from @code{signed int} to @code{float}.
>
> @item vect_uintfloat_cvt
> Target supports conversion from @code{unsigned int} to @code{float}.
>
> @item vect_floatint_cvt
> Target supports conversion from @code{float} to @code{signed int}.
>
> @item vect_floatuint_cvt
> Target supports conversion from @code{float} to @code{unsigned int}.
>
>
> I've added the no_peel item intentionally to the hw_misalign/no_align block.

granted, there are some attempts at that, but I find it hard to make my
way through that longish list.  The way it is, you have to skip through
the whole list beginning to end.  Texinfo seems to have no subsubsection
which would allow to make the sub-grouping explicit...

Let's hear what Sandra thinks.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: 0005-Part-5.-Add-x86-CET-documentation

2017-09-27 Thread Florian Weimer

On 09/27/2017 05:40 AM, Sandra Loosemore wrote:


+@emph{x86 implementation:} when @option{-fcf-protection} option is
+specified the compiler inserts an ENDBR instruction at function's
+prologue if the function's type does not have the @code{nocf_check}
+attribute and addresses to which indirect control-flow transfer can
+happen.  The instruction triggers the HW check if a control-flow
+transfer to the address of ENDBR instruction is valid.


Implementation details like this should be comments in the code, not 
included in the user-facing documentation.


This is part of the ABI GCC implements, so it has to be documented 
somewhere, and not just as part of the GCC source code.


CET is not properly described in the ABI supplement and I don't think 
this will change, so detailed documentation in the GCC manual is very 
much desirable.


That being said, the implementation notes above need some clarification. 
 It's not clear to me what the conditions are under which the ENDBR 
instruction is emitted (and we probably should use @code{endbr} in the 
manual), what it is trying to achieve, and how the x86 calling 
convention changes.  I assume it is somehow related to what we call 
internally “the suffix problem”: without control flow integrity, an 
attacker might skip over precondition/hardening checks, directly to the 
critical changes we want to protect, executing only the suffix of a 
function (hence the name).


Thanks,
Florian


Re: [patch, fortran] Warn about out-of-bounds access with DO subscripts

2017-09-27 Thread Thomas Schwinge
Hi!

On Tue, 26 Sep 2017 18:51:52 +0200, Thomas Koenig  wrote:
> > On Mon, 25 Sep 2017 18:50:49 +0200, Thomas Koenig  
> > wrote:
> >> Thanks for the review, committed as r253156.
> >>
> >> Now, on to some other bugs...
> > 
> > No, back to this one please.  ;-)
> 
> OK, if you insist :-)

;-)


> > And the following gets highlighted, too:
> > 
> >  FAIL: compiler driver --help=fortran option(s): "^ +-.*[^:.]$" absent 
> > from output: "  -Wdo-subscript  Warn about possibly incorrect 
> > subscripts in do loops"
> >  FAIL: compiler driver --help=warnings option(s): "^ +-.*[^:.]$" absent 
> > from output: "  -Wdo-subscript  Warn about possibly incorrect 
> > subscripts in do loops"
> 
> This I don't understand.

Hmm, don't you see that in your test logs?

> Was there anything wrong with my
> change to fortran/lang.opt?

Easy enough to fix, so as obvious, committed to trunk in r253225:

commit a7717725d0bed402378b085bcda8fa3fe337fae9
Author: tschwinge 
Date:   Wed Sep 27 08:35:05 2017 +

Placate gcc.misc-tests/help.exp regarding -Wdo-subscript

gcc/fortran/
* lang.opt : End help text with a period.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@253225 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/fortran/ChangeLog | 4 
 gcc/fortran/lang.opt  | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git gcc/fortran/ChangeLog gcc/fortran/ChangeLog
index d0eef84..c698014 100644
--- gcc/fortran/ChangeLog
+++ gcc/fortran/ChangeLog
@@ -1,3 +1,7 @@
+2017-09-27  Thomas Schwinge  
+
+   * lang.opt : End help text with a period.
+
 2017-09-26  Thomas Koenig  
 
* frontend-passes.c (do_subscript): Don't do anything
diff --git gcc/fortran/lang.opt gcc/fortran/lang.opt
index 37ed4a3..88f6af5 100644
--- gcc/fortran/lang.opt
+++ gcc/fortran/lang.opt
@@ -239,7 +239,7 @@ Warn about most implicit conversions.
 
 Wdo-subscript
 Fortran Var(warn_do_subscript) Warning LangEnabledBy(Fortran,Wextra)
-Warn about possibly incorrect subscripts in do loops
+Warn about possibly incorrect subscripts in do loops.
 
 Wextra
 Fortran Warning


Grüße
 Thomas


Re: [PATCH 4/5] New target check: vect_nopeel - v2

2017-09-27 Thread Andreas Krebbel
On 09/27/2017 10:10 AM, Rainer Orth wrote:
> Hi Andreas,
> 
>> On 09/26/2017 02:26 PM, Rainer Orth wrote:
>>> Hi Andreas,
>>>
 diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
 index 307c726..3acfd85 100644
 --- a/gcc/doc/sourcebuild.texi
 +++ b/gcc/doc/sourcebuild.texi
 @@ -1398,6 +1398,9 @@ Target supports a vector misalign access.
  @item vect_no_align
  Target does not support a vector alignment mechanism.

 +@item vect_no_peel
 +Target does not require any loop peeling for alignment purposes.
 +
  @item vect_no_int_min_max
  Target does not support a vector min and max instruction on @code{int}.
>>>
>>> please keep the items sorted alphabetically.
>>
>> The items do not appear to be sorted alphabetically.
> 
> they should be.  Your patch makes the ordering even more random.
> 
> Patch to fix this preapproved ;-)
The items rather appear to be arranged by subject. Does it really make sense do 
pull items like this
apart just to have it in alphabetical order?

@item vect_intfloat_cvt
Target supports conversion from @code{signed int} to @code{float}.

@item vect_uintfloat_cvt
Target supports conversion from @code{unsigned int} to @code{float}.

@item vect_floatint_cvt
Target supports conversion from @code{float} to @code{signed int}.

@item vect_floatuint_cvt
Target supports conversion from @code{float} to @code{unsigned int}.


I've added the no_peel item intentionally to the hw_misalign/no_align block.

-Andreas-



Re: [PR middle-end/82319] Fix ICE in pattern

2017-09-27 Thread Andrew Pinski
On Wed, Sep 27, 2017 at 12:56 AM, Richard Biener  wrote:
> On Tue, 26 Sep 2017, Andrew Pinski wrote:
>
>> On Tue, Sep 26, 2017 at 10:56 PM, Yuri Gribov  wrote:
>> > Hi all,
>> >
>> > This patch fixes a trivial ICE in recent pattern.  Bootstrapped and
>> > regtested on x86_64.
>> >
>> > Ok to commit?
>
> Ok.
>
>> >+ bool cst_int_p = ! real_isnan (cst) && real_identical (, cst);
>>
>> The GCC coding style says no space between the ! and the expression.
>
> Does it?  I thought it says the opposite.

Yes, see https://gcc.gnu.org/codingconventions.html#Expressions .

For   Use..instead of
logical not !x   ! x
bitwise complement  ~x ~ x
unary minus  -x  - x
cast   (foo) x   (foo)x
pointer dereference  *x   * x


Thanks,
Andrew

>
> Richard.
>
>> Note for clarity I would put () around !real_isnan (cst) though.
>> Other than that I don't see anything wrong with the patch (I cannot
>> approve the patch though).
>>
>> Thanks,
>> Andrew
>>
>> >
>> > -Y
>>
>>
>
> --
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)


Re: [PATCH 4/5] New target check: vect_nopeel - v2

2017-09-27 Thread Rainer Orth
Hi Andreas,

> On 09/26/2017 02:26 PM, Rainer Orth wrote:
>> Hi Andreas,
>> 
>>> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
>>> index 307c726..3acfd85 100644
>>> --- a/gcc/doc/sourcebuild.texi
>>> +++ b/gcc/doc/sourcebuild.texi
>>> @@ -1398,6 +1398,9 @@ Target supports a vector misalign access.
>>>  @item vect_no_align
>>>  Target does not support a vector alignment mechanism.
>>>
>>> +@item vect_no_peel
>>> +Target does not require any loop peeling for alignment purposes.
>>> +
>>>  @item vect_no_int_min_max
>>>  Target does not support a vector min and max instruction on @code{int}.
>> 
>> please keep the items sorted alphabetically.
>
> The items do not appear to be sorted alphabetically.

they should be.  Your patch makes the ordering even more random.

Patch to fix this preapproved ;-)

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PR middle-end/82319] Fix ICE in pattern

2017-09-27 Thread Richard Biener
On Tue, 26 Sep 2017, Andrew Pinski wrote:

> On Tue, Sep 26, 2017 at 10:56 PM, Yuri Gribov  wrote:
> > Hi all,
> >
> > This patch fixes a trivial ICE in recent pattern.  Bootstrapped and
> > regtested on x86_64.
> >
> > Ok to commit?

Ok.

> >+ bool cst_int_p = ! real_isnan (cst) && real_identical (, cst);
> 
> The GCC coding style says no space between the ! and the expression.

Does it?  I thought it says the opposite.

Richard.

> Note for clarity I would put () around !real_isnan (cst) though.
> Other than that I don't see anything wrong with the patch (I cannot
> approve the patch though).
> 
> Thanks,
> Andrew
> 
> >
> > -Y
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH][GRAPHITE] Simplify SCOP detection

2017-09-27 Thread Richard Biener
On Tue, 26 Sep 2017, Sebastian Pop wrote:

> On Tue, Sep 26, 2017 at 7:03 AM, Richard Biener  wrote:
> 
> >
> > The following is the result of me trying to understand SCOP detection
> > and the validity checks spread around the machinery.  It removes several
> > quadraticnesses by folding validity checks into
> > scop_detection::harmful_loop_in_region where we already walk over all
> > BBs in the region and process individual found loops.
> >
> > It also rewrites build_scop_depth/build_scop_breadth into something
> > I can undestand.
> >
> > Bootstrap and regtest is running on x86_64-unknown-linux-gnu (graphite.exp
> > for all langs is happy, so is SPEC CPU 2006 testing where the statistics
> > agree before/after the patch).
> >
> > I'll apply this after the bootstrap finished.
> >
> 
> Have you tried to bootstrap with BOOT_CFLAGS="-O2 -fgraphite-identity"?

I do "-O2 -g -floop-nest-optimize" but I guess -fgraphite-identity
should catch more issues?  Hmm, maybe -floop-nest-optimize and
-fgraphite-identity should be combinable

Index: gcc/graphite-optimize-isl.c
===
--- gcc/graphite-optimize-isl.c (revision 253203)
+++ gcc/graphite-optimize-isl.c (working copy)
@@ -189,7 +189,7 @@ optimize_isl (scop_p scop)
print_schedule_ast (dump_file, scop->original_schedule, scop);
   isl_schedule_free (scop->transformed_schedule);
   scop->transformed_schedule = isl_schedule_copy 
(scop->original_schedule);
-  return false;
+  return flag_graphite_identity || flag_loop_parallelize_all;
 }
 
   return true;

I'll test/commit the above.

> 
> > Richard.
> >
> > 2017-09-26  Richard Biener  
> >
> > * graphite-scop-detection.c (scop_detection::build_scop_depth):
> > Rewrite,
> > fold in ...
> > (scop_detection::build_scop_breadth): ... this.  Removed.
> > (scop_detection::loop_is_valid_in_scop): Fold into single caller.
> > (scop_detection::harmful_stmt_in_bb): Likewise.
> > (scop_detection::graphite_can_represent_stmt): Likewise.
> > (scop_detection::loop_body_is_valid_scop): Likewise.  Remove
> > recursion.
> > (scop_detection::can_represent_loop): Remove recursion, fold in
> > ...
> > (scop_detection::can_represent_loop_1): ... this.  Removed.
> > (scop_detection::harmful_loop_in_region): Simplify after inlining
> > the above and remove more quadraticness.
> > (build_scops): Adjust.
> > * tree-data-ref.c (loop_nest_has_data_refs): Remove pointless
> > quadraticness.
> >
> >
> This goes in the right direction: it cuts down compilation time.
> As it is not a trivial change, I need some time to understand how
> the scop detection works with this change.

The only functional change should be that the SESE composition now
works top-down instead of working its way bottom-up.  It's not clear
whether we do more or less work that way but at least the function
is now readable -- I think we can structure it bottom-up as well
without doing the confusing two-function way (which I believe
did quite some duplicate work but I never was sure...).

Richard.


Re: [PATCH 4/5] New target check: vect_nopeel - v2

2017-09-27 Thread Andreas Krebbel
On 09/26/2017 02:26 PM, Rainer Orth wrote:
> Hi Andreas,
> 
>> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
>> index 307c726..3acfd85 100644
>> --- a/gcc/doc/sourcebuild.texi
>> +++ b/gcc/doc/sourcebuild.texi
>> @@ -1398,6 +1398,9 @@ Target supports a vector misalign access.
>>  @item vect_no_align
>>  Target does not support a vector alignment mechanism.
>>
>> +@item vect_no_peel
>> +Target does not require any loop peeling for alignment purposes.
>> +
>>  @item vect_no_int_min_max
>>  Target does not support a vector min and max instruction on @code{int}.
> 
> please keep the items sorted alphabetically.

The items do not appear to be sorted alphabetically.

-Andreas-



Re: [PR middle-end/82319] Fix ICE in pattern

2017-09-27 Thread Andrew Pinski
On Tue, Sep 26, 2017 at 10:56 PM, Yuri Gribov  wrote:
> Hi all,
>
> This patch fixes a trivial ICE in recent pattern.  Bootstrapped and
> regtested on x86_64.
>
> Ok to commit?

>+ bool cst_int_p = ! real_isnan (cst) && real_identical (, cst);

The GCC coding style says no space between the ! and the expression.
Note for clarity I would put () around !real_isnan (cst) though.
Other than that I don't see anything wrong with the patch (I cannot
approve the patch though).

Thanks,
Andrew

>
> -Y