Re: [PATCH], Define __FP_FAST_FMAF128 on PowerPC ISA 3.0
On Wed, 27 Sep 2017, Michael Meissner wrote: > The glibc team has requested we define the standard macro (__FP_FAST_FMAF128) > for PowerPC code when we have the IEEE 128-bit floating point hardware > instructions enabled. It's not a standard macro. TS 18661-3 has FP_FAST_FMAF128 as an optional math.h macro (but glibc doesn't define it anywhere at present). > This patch does this in the PowerPC backend. As I look at the whole issue, at > some point we should do this more in the machine independent portion of the > compiler. I have some initial patches to do this in the c-family files, but > at > the present time, the patches are not complete, and I need to think about it > more. I think a machine-independent definition (for _FloatN / _FloatNx types in general) should go along with machine-independent fmafN / fmafNx built-in functions; when the built-in function is machine-specific, it's natural for the macro to be as well. But in any case, the new macro should be documented in cpp.texi alongside the existing __FP_FAST_FMA* macros (probably in the generic __FP_FAST_FMAF@var{n} and __FP_FAST_FMAF@var{n}X form). -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] Enhance PHI processing in VN
On Wed, Sep 27, 2017 at 6:58 PM, Richard Sandifordwrote: > David Edelsohn writes: >> On Fri, Sep 15, 2017 at 2:53 AM, Richard Biener wrote: >>> On Thu, 14 Sep 2017, David Edelsohn wrote: >>> * tree-ssa-sccvn.c (visit_phi): Merge undefined values similar to VN_TOP. This seems to have regressed FAIL: gcc.dg/tree-prof/time-profiler-2.c scan-ipa-dump-times profile "Read tp_first_run: 0" 2 FAIL: gcc.dg/tree-prof/time-profiler-2.c scan-ipa-dump-times profile "Read tp_first_run: 2" 1 FAIL: gcc.dg/tree-prof/time-profiler-2.c scan-ipa-dump-times profile "Read tp_first_run: 3" 1 >>> >>> Hmm, I don't see these FAILs. Looking at the testcase there are >>> no undefined uses so I wonder how the patch could have any effect. >>> >>> Can you re-check and open a bugreport? >> >> It disappeared again. A different failure appeared and disappeared a >> few weeks ago. Something in the testsuite infrastructure appears to >> not be stable, at least on AIX. Sorry for the incorrect report. > > Perhaps this is unrelated, but when doing the "has this patch > changed assembly on these targets?" testing, I noticed that AIX > had differences like: > > --- old/powerpc-ibm-aix7.0/test/-O3/g++.dg/init/constant1.s > +++ new/powerpc-ibm-aix7.0/test/-O3/g++.dg/init/constant1.s > @@ -4,21 +4,21 @@ > .csect ..text.startup[PR],2 > .align 2 > .align 4 > - .globl > _GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0xbb0d20e181e3a401 > - .globl > ._GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0xbb0d20e181e3a401 > - .csect > _GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0xbb0d20e181e3a401[DS] > -_GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0xbb0d20e181e3a401: > - .long > ._GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0xbb0d20e181e3a401, > TOC[tc0], 0 > + .globl > _GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0x5610f7ec143966c9 > + .globl > ._GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0x5610f7ec143966c9 > + .csect > _GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0x5610f7ec143966c9[DS] > +_GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0x5610f7ec143966c9: > + .long > ._GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0x5610f7ec143966c9, > TOC[tc0], 0 > > even though -frandom-seed is forced to the same value for both runs. > > Is this behaviour deliberate? I thought runs from a few weeks ago > had stable names, but maybe I just misremember. AIX does not have init/fini sections, so it is built at link time as a C file by collect2-ld. I suspect that collect2-ld doesn't use -frandom-seed to build its temporary file. - David
[committed] jit: implement gcc_jit_function_get_address
On Fri, 2017-09-22 at 11:21 +0200, Bartosz Szreder wrote: > Hello David, > > > > 1. The documentation doesn't mention existence of > > > gcc_jit_context_new_function_ptr_type() as a mechanism of > > > handling > > > function pointers, yet contains > > > gcc_jit_context_new_call_through_ptr(). > > > > [...] > > > > It's just missing documentation. I'm working on fixing it. > > Have a look at gcc/testsuite/jit.dg/test-calling-function-ptr.c > > Thanks, this is very helpful! > > > > - What is the proper way of obtaining a function pointer to be > > > passed > > > to gcc_jit_context_new_call_through_ptr()? There doesn't seem to > > > be > > > any counterpart to gcc_jit_lvalue_get_address() for functions. As > > > the > > > name suggests, gcc_jit_lvalue_get_address() works on an L-value > > > and > > > gcc_jit_function type isn't an ancestor of gcc_jit_lvalue in the > > > internal type system, therefore upcasting is impossible. > > > > [...] > > > > If it's a function that's being compiled as part of the same > > gcc_jit_context, then I think you've identified a weakness in the > > current API. As you say, one fix would be to make gcc_jit_function > > be > > a subclass of gcc_jit_lvalue (and add casting functions); I don't > > yet > > know how easy that would be to implement. > > Yes, this is the exact use case I'm thinking of. > > From an user's point of view, I believe making gcc_jit_function a > subclass of gcc_jit_lvalue makes for an awkward design. Maybe > something like this would be easier to implement without upending the > current hierarchy: > > gcc_jit_rvalue *gcc_jit_context_function_get_address(gcc_jit_context > *ctxt, gcc_jit_function *fun); > > Thanks, > Bartosz Szreder Thanks. I ended up adding the following: extern gcc_jit_rvalue * gcc_jit_function_get_address (gcc_jit_function *fn, gcc_jit_location *loc); FWIW I initially attempted to make functions be a subclass of rvalue internally (within jit-recording.*, but ran into difficulties with function::write_reproducer, which was having to do double-duty: printing the body of the function AND printing the name of the function); doing it all as a new API of functions ended up being simpler. Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu; takes jit.sum from 9789 to 9889 PASS results. Committed to trunk as r253244. gcc/jit/ChangeLog: * docs/cp/topics/expressions.rst (Function pointers): New section. * docs/topics/compatibility.rst (LIBGCCJIT_ABI_9): New tag. * docs/topics/expressions.rst (Function pointers): New section. * docs/_build/texinfo/libgccjit.texi: Regenerate. * jit-common.h (class gcc::jit::recording::function_pointer): New forward decl. * jit-playback.c (gcc::jit::playback::function::get_address): New method. * jit-playback.h (gcc::jit::playback::function::get_address): New method decl. * jit-recording.c: Within namespace gcc::jit::recording... (function::function): Initialize new field "m_fn_ptr_type". (function::get_address): New method. (function_pointer::replay_into): New method. (function_pointer::visit_children): New method. (function_pointer::make_debug_string): New method. (function_pointer::write_reproducer): New method. * jit-recording.h: Within namespace gcc::jit::recording... (function::get_address): New method. (function): Add field "m_fn_ptr_type". (class function_pointer): New subclass of rvalue. * libgccjit++.h (gccjit::function::get_address): New method. * libgccjit.c (gcc_jit_function_get_address): New function. * libgccjit.h (LIBGCCJIT_HAVE_gcc_jit_function_get_address): New macro. (gcc_jit_function_get_address): New API entrypoint. * libgccjit.map (LIBGCCJIT_ABI_9): New tag. gcc/testsuite/ChangeLog: * jit.dg/all-non-failing-tests.h: Add test-returning-function-ptr.c. * jit.dg/test-returning-function-ptr.c: New test case. --- gcc/jit/docs/cp/topics/expressions.rst | 9 ++ gcc/jit/docs/topics/compatibility.rst | 7 + gcc/jit/docs/topics/expressions.rst| 17 +++ gcc/jit/jit-common.h | 1 + gcc/jit/jit-playback.c | 14 ++ gcc/jit/jit-playback.h | 3 + gcc/jit/jit-recording.c| 77 ++- gcc/jit/jit-recording.h| 29 + gcc/jit/libgccjit++.h | 9 ++ gcc/jit/libgccjit.c| 20 +++ gcc/jit/libgccjit.h| 15 +++ gcc/jit/libgccjit.map | 5 + gcc/testsuite/jit.dg/all-non-failing-tests.h | 10 ++ gcc/testsuite/jit.dg/test-returning-function-ptr.c | 143 + 14 files changed,
Re: [PATCH][mingw] Enable colorized diagnostics
On 09/27/2017 08:54 PM, Liu Hao wrote: > On 2017/9/28 4:09, Joseph Myers wrote: >> On Thu, 28 Sep 2017, Liu Hao wrote: >> >>> Colorized diagnostics used to be disabled for MinGW targets (on which >>> the macro `_WIN32` is defined), and this patch enables it. >> >> I'd hope this is all to do with MinGW host, and nothing to do with the >> target. >> > Oh you are right. Since I build native compilers, host == target here. > But strictly speaking the patch applies to the host, not the target. > Does it make sense to use a global lock in mingw_ansi_fputs? signature.asc Description: OpenPGP digital signature
Re: [PATCH] Enhance PHI processing in VN
David Edelsohnwrites: > On Fri, Sep 15, 2017 at 2:53 AM, Richard Biener wrote: >> On Thu, 14 Sep 2017, David Edelsohn wrote: >> >>> * tree-ssa-sccvn.c (visit_phi): Merge undefined values similar >>> to VN_TOP. >>> >>> This seems to have regressed >>> >>> FAIL: gcc.dg/tree-prof/time-profiler-2.c scan-ipa-dump-times profile >>> "Read tp_first_run: 0" 2 >>> FAIL: gcc.dg/tree-prof/time-profiler-2.c scan-ipa-dump-times profile >>> "Read tp_first_run: 2" 1 >>> FAIL: gcc.dg/tree-prof/time-profiler-2.c scan-ipa-dump-times profile >>> "Read tp_first_run: 3" 1 >> >> Hmm, I don't see these FAILs. Looking at the testcase there are >> no undefined uses so I wonder how the patch could have any effect. >> >> Can you re-check and open a bugreport? > > It disappeared again. A different failure appeared and disappeared a > few weeks ago. Something in the testsuite infrastructure appears to > not be stable, at least on AIX. Sorry for the incorrect report. Perhaps this is unrelated, but when doing the "has this patch changed assembly on these targets?" testing, I noticed that AIX had differences like: --- old/powerpc-ibm-aix7.0/test/-O3/g++.dg/init/constant1.s +++ new/powerpc-ibm-aix7.0/test/-O3/g++.dg/init/constant1.s @@ -4,21 +4,21 @@ .csect ..text.startup[PR],2 .align 2 .align 4 - .globl _GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0xbb0d20e181e3a401 - .globl ._GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0xbb0d20e181e3a401 - .csect _GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0xbb0d20e181e3a401[DS] -_GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0xbb0d20e181e3a401: - .long ._GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0xbb0d20e181e3a401, TOC[tc0], 0 + .globl _GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0x5610f7ec143966c9 + .globl ._GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0x5610f7ec143966c9 + .csect _GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0x5610f7ec143966c9[DS] +_GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0x5610f7ec143966c9: + .long ._GLOBAL__I_65535_0_.._.._.._testsuite_g__.dg_._init_constant1.C__0x5610f7ec143966c9, TOC[tc0], 0 even though -frandom-seed is forced to the same value for both runs. Is this behaviour deliberate? I thought runs from a few weeks ago had stable names, but maybe I just misremember. Thanks, Richard
[PATCH], Define __FP_FAST_FMAF128 on PowerPC ISA 3.0
The glibc team has requested we define the standard macro (__FP_FAST_FMAF128) for PowerPC code when we have the IEEE 128-bit floating point hardware instructions enabled. This patch does this in the PowerPC backend. As I look at the whole issue, at some point we should do this more in the machine independent portion of the compiler. I have some initial patches to do this in the c-family files, but at the present time, the patches are not complete, and I need to think about it more. So, I would like to check in this patch now, and if we come up with a machine independent version, we can back out this particular patch. I have done a full bootstrap and regression test, there were no regressions, and the new test case does run correctly. Can I check this into the trunk? [gcc] 2017-09-27 Michael Meissner* config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Define __FP_FAST_FMAF128 on ISA 3.0. [gcc/testsuite] 2017-09-27 Michael Meissner * gcc.target/powerpc/float128-fma3.c: New test. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797 Index: gcc/config/rs6000/rs6000-c.c === --- gcc/config/rs6000/rs6000-c.c(revision 253236) +++ gcc/config/rs6000/rs6000-c.c(working copy) @@ -585,7 +585,10 @@ rs6000_target_modify_macros (bool define /* OPTION_MASK_FLOAT128_HARDWARE can be turned on if -mcpu=power9 is used or via the target attribute/pragma. */ if ((flags & OPTION_MASK_FLOAT128_HW) != 0) -rs6000_define_or_undefine_macro (define_p, "__FLOAT128_HARDWARE__"); +{ + rs6000_define_or_undefine_macro (define_p, "__FLOAT128_HARDWARE__"); + rs6000_define_or_undefine_macro (define_p, "__FP_FAST_FMAF128"); +} /* options from the builtin masks. */ /* Note that RS6000_BTM_PAIRED is enabled only if Index: gcc/testsuite/gcc.target/powerpc/float128-fma3.c === --- gcc/testsuite/gcc.target/powerpc/float128-fma3.c(nonexistent) +++ gcc/testsuite/gcc.target/powerpc/float128-fma3.c(working copy) @@ -0,0 +1,33 @@ +/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-mpower9-vector -O2" } */ + +/* Make sure the appropriate FMA fast macros are defined. */ + +#ifdef __FP_FAST_FMAF +float +do_fmaf (float a, float b, float c) +{ + return __builtin_fmaf (a, b, c); +} +#endif + +#ifdef __FP_FAST_FMA +double +do_fma (double a, double b, double c) +{ + return __builtin_fma (a, b, c); +} +#endif + +#ifdef __FP_FAST_FMAF128 +_Float128 +do_fmaf128 (_Float128 a, _Float128 b, _Float128 c) +{ + return __builtin_fmaf128 (a, b, c); +} +#endif + +/* { dg-final { scan-assembler {\mfmadds\M|\mxsmadd.sp\M} } } */ +/* { dg-final { scan-assembler {\mfmadd\M|\mxsmadd.dp\M} } } */ +/* { dg-final { scan-assembler {\mxsmaddqp\M} } } */
Re: [PATCH][mingw] Enable colorized diagnostics
On 2017/9/28 4:09, Joseph Myers wrote: On Thu, 28 Sep 2017, Liu Hao wrote: Colorized diagnostics used to be disabled for MinGW targets (on which the macro `_WIN32` is defined), and this patch enables it. I'd hope this is all to do with MinGW host, and nothing to do with the target. Oh you are right. Since I build native compilers, host == target here. But strictly speaking the patch applies to the host, not the target. -- Best regards, LH_Mouse
Make tests failing with version namespace UNSUPPORTED
Hi I would like to propose to add a new dg-require-normal-namespace attribute to make several tests failing when version namespace is active UNSUPPORTED. It is like dg-require-normal-mode but also consider when version namespace is being used. I still need to complete execution of all tests with version namespace but I think that all tests will be ok then. I have also updated the code used for dg-require-normal-mode to include c++config.h in case users are changing this file to activate any mode. * testsuite/lib/libstdc++.exp ([check_v3_target_normal_namespace]): New. * testsuite/lib/dg-options.exp ([dg-require-normal-namespace]): New, use latter. * testsuite/23_containers/headers/bitset/synopsis.cc: Replace dg-require-normal-mode with latter. * testsuite/23_containers/headers/deque/synopsis.cc: Likewise. * testsuite/23_containers/headers/forward_list/synopsis.cc: Likewise. * testsuite/23_containers/headers/list/synopsis.cc: Likewise. * testsuite/23_containers/headers/map/synopsis.cc: Likewise. * testsuite/23_containers/headers/set/synopsis.cc: Likewise. * testsuite/23_containers/headers/vector/synopsis.cc: Likewise. * testsuite/23_containers/map/modifiers/erase/abi_tag.cc: Likewise. * testsuite/23_containers/multimap/modifiers/erase/abi_tag.cc: Likewise. * testsuite/23_containers/multiset/modifiers/erase/abi_tag.cc: Likewise. * testsuite/23_containers/set/modifiers/erase/abi_tag.cc: Likewise. Ok to commit ? François diff --git a/libstdc++-v3/testsuite/23_containers/headers/bitset/synopsis.cc b/libstdc++-v3/testsuite/23_containers/headers/bitset/synopsis.cc index 6ef085a..8b3967c 100644 --- a/libstdc++-v3/testsuite/23_containers/headers/bitset/synopsis.cc +++ b/libstdc++-v3/testsuite/23_containers/headers/bitset/synopsis.cc @@ -1,5 +1,5 @@ // { dg-do compile } -// { dg-require-normal-mode "" } +// { dg-require-normal-namespace "" } // Copyright (C) 2007-2017 Free Software Foundation, Inc. // diff --git a/libstdc++-v3/testsuite/23_containers/headers/deque/synopsis.cc b/libstdc++-v3/testsuite/23_containers/headers/deque/synopsis.cc index aa2b787..0de29c2 100644 --- a/libstdc++-v3/testsuite/23_containers/headers/deque/synopsis.cc +++ b/libstdc++-v3/testsuite/23_containers/headers/deque/synopsis.cc @@ -1,5 +1,5 @@ // { dg-do compile } -// { dg-require-normal-mode "" } +// { dg-require-normal-namespace "" } // Copyright (C) 2007-2017 Free Software Foundation, Inc. // diff --git a/libstdc++-v3/testsuite/23_containers/headers/forward_list/synopsis.cc b/libstdc++-v3/testsuite/23_containers/headers/forward_list/synopsis.cc index b1792ce..21e79c4 100644 --- a/libstdc++-v3/testsuite/23_containers/headers/forward_list/synopsis.cc +++ b/libstdc++-v3/testsuite/23_containers/headers/forward_list/synopsis.cc @@ -1,5 +1,5 @@ // { dg-do compile { target c++11 } } -// { dg-require-normal-mode "" } +// { dg-require-normal-namespace "" } // Copyright (C) 2008-2017 Free Software Foundation, Inc. // diff --git a/libstdc++-v3/testsuite/23_containers/headers/list/synopsis.cc b/libstdc++-v3/testsuite/23_containers/headers/list/synopsis.cc index ab22b9f..e8c61aa 100644 --- a/libstdc++-v3/testsuite/23_containers/headers/list/synopsis.cc +++ b/libstdc++-v3/testsuite/23_containers/headers/list/synopsis.cc @@ -1,5 +1,5 @@ // { dg-do compile } -// { dg-require-normal-mode "" } +// { dg-require-normal-namespace "" } // Copyright (C) 2007-2017 Free Software Foundation, Inc. // diff --git a/libstdc++-v3/testsuite/23_containers/headers/map/synopsis.cc b/libstdc++-v3/testsuite/23_containers/headers/map/synopsis.cc index f4a0826..d74fca1 100644 --- a/libstdc++-v3/testsuite/23_containers/headers/map/synopsis.cc +++ b/libstdc++-v3/testsuite/23_containers/headers/map/synopsis.cc @@ -1,5 +1,5 @@ // { dg-do compile } -// { dg-require-normal-mode "" } +// { dg-require-normal-namespace "" } // Copyright (C) 2007-2017 Free Software Foundation, Inc. // diff --git a/libstdc++-v3/testsuite/23_containers/headers/set/synopsis.cc b/libstdc++-v3/testsuite/23_containers/headers/set/synopsis.cc index e50a044..58e94bc 100644 --- a/libstdc++-v3/testsuite/23_containers/headers/set/synopsis.cc +++ b/libstdc++-v3/testsuite/23_containers/headers/set/synopsis.cc @@ -1,5 +1,5 @@ // { dg-do compile } -// { dg-require-normal-mode "" } +// { dg-require-normal-namespace "" } // Copyright (C) 2007-2017 Free Software Foundation, Inc. // diff --git a/libstdc++-v3/testsuite/23_containers/headers/vector/synopsis.cc b/libstdc++-v3/testsuite/23_containers/headers/vector/synopsis.cc index c127b5d..47cbe16 100644 --- a/libstdc++-v3/testsuite/23_containers/headers/vector/synopsis.cc +++ b/libstdc++-v3/testsuite/23_containers/headers/vector/synopsis.cc @@ -1,5 +1,5 @@ // { dg-do compile } -// { dg-require-normal-mode "" } +// { dg-require-normal-namespace "" } // Copyright (C) 2007-2017 Free Software Foundation, Inc. // diff --git
Re: [Patch][aarch64] Use IFUNCs to enable LSE instructions in libatomic on aarch64
Ping. Steve Ellcey sell...@cavium.com On Thu, 2017-08-31 at 10:24 -0700, Steve Ellcey wrote: > On Tue, 2017-08-29 at 12:25 +0100, Szabolcs Nagy wrote: > > > > > > in glibc the hwcap is not used, because it has accesses to > > cached dispatch info, but in libatomic using the hwcap > > argument is the right way. > Here is an updated version of the patch to allow aarch64 to use > ifuncs > in libatomic. > > The main difference from the last patch is that the library does not > access the hwcap value directly but accesses it through the ifunc > resolver argument. That means that we no longer need the > init_cpu_revision static constructor to set a flag that the resolver > checks, instead the resolver just does a comparision of its incoming > argument with HWCAP_ATOMICS. > > This did mean I had to change the prototype for the resolver > functions > in libatomic_i.h to have an argument, which is the way glibc calls > them. One complication of this is that the type of the argument can > differ between platforms and ABIs so I added code to configure.tgt to > set the type. I used uint64_t for aarch64 and 'long unsigned int' > for everything else. That is not correct for all platforms but at > this point no other platforms access the argument so it should not > matter. If and when platforms do need to access it they can change > the type if necessary. > > Steve Ellcey > sell...@cavium.com > > > 2017-08-31 Steve Ellcey> > * Makefile.am (ARCH_AARCH64_LINUX_LSE): Add IFUNC_OPTIONS and > libatomic_la_LIBADD. > * config/linux/aarch64/host-config.h: New file. > * configure.ac (HWCAP_TYPE): Define. > (AC_CHECK_HEADERS): Check for sys/auxv.h. > (AC_CHECK_FUNCS): Check for getauxval. > (ARCH_AARCH64_LINUX_LSE): New conditional for IFUNC builds. > * configure.tgt (aarch64): Set AARCH and try_ifunc. > (aarch64*-*-linux*) Update config_path. > (aarch64*-*-linux*) Set HWCAP_TYPE. > * libatomic_i.h (GEN_SELECTOR): Add "HWCAP_TYPE hwcap" > argument. > * Makefile.in: Regenerate. > * auto-config.h.in: Regenerate. > * configure: Regenerate. >
Make tests less istreambuf_iterator implementation dependent
Hi I just committed attached patch as trivial. Those tests were highly istreambuf_iterator implementation, it is the result of the call to money_get<>::get which is pointing immediately beyond the last character recognized to quote Standard words. 2017-09-27 François Dumont* testsuite/22_locale/money_get/get/char/22131.cc: Make test less istreambuf_iterator implementation dependent. * testsuite/22_locale/money_get/get/wchar_t/22131.cc: Likewise. François diff --git a/libstdc++-v3/testsuite/22_locale/money_get/get/char/22131.cc b/libstdc++-v3/testsuite/22_locale/money_get/get/char/22131.cc index 6f363ef..5a05817 100644 --- a/libstdc++-v3/testsuite/22_locale/money_get/get/char/22131.cc +++ b/libstdc++-v3/testsuite/22_locale/money_get/get/char/22131.cc @@ -67,7 +67,7 @@ void test01() fmt2.imbue(loc); InIt ibeg2(fmt2); err2 = ios_base::goodbit; - mg.get(ibeg2, iend2, intl, fmt2, err2, val2); + ibeg2 = mg.get(ibeg2, iend2, intl, fmt2, err2, val2); VERIFY( err2 == ios_base::failbit ); VERIFY( *ibeg2 == '#' ); VERIFY( val2 == "" ); diff --git a/libstdc++-v3/testsuite/22_locale/money_get/get/wchar_t/22131.cc b/libstdc++-v3/testsuite/22_locale/money_get/get/wchar_t/22131.cc index 3830cc3..0626e3c 100644 --- a/libstdc++-v3/testsuite/22_locale/money_get/get/wchar_t/22131.cc +++ b/libstdc++-v3/testsuite/22_locale/money_get/get/wchar_t/22131.cc @@ -67,7 +67,7 @@ void test01() fmt2.imbue(loc); InIt ibeg2(fmt2); err2 = ios_base::goodbit; - mg.get(ibeg2, iend2, intl, fmt2, err2, val2); + ibeg2 = mg.get(ibeg2, iend2, intl, fmt2, err2, val2); VERIFY( err2 == ios_base::failbit ); VERIFY( *ibeg2 == L'#' ); VERIFY( val2 == L"" );
Re: [PATCH][mingw] Enable colorized diagnostics
On Thu, 28 Sep 2017, Liu Hao wrote: > Colorized diagnostics used to be disabled for MinGW targets (on which > the macro `_WIN32` is defined), and this patch enables it. I'd hope this is all to do with MinGW host, and nothing to do with the target. -- Joseph S. Myers jos...@codesourcery.com
[PATCH] Fix fortran/81509
The attached patch fixes PR fortran/81509. In short, F2008 now allows boz-literal-constants in IAND, IOR, IEOR, DSHIFTL, DSHIFTR, and MERGE_BITS. gfortran currently allows BOZ argument, but she was not enforcing restrictions in F2008. The attach patch causes gfortran to conform to F2008. As aside effect, the patch removes a questionable GNU Fortran extension that allowed arguments to IAND, IOR, and IEOR to have different kind type parameters. The behavior of this extension was not documented. 2017-09-27 Steven G. KarglPR fortran/81509 * check.c: Rename function gfc_check_iand to gfc_check_iand_ieor_ior. * check.c (boz_args_check): New function. Check I and J not both BOZ. (gfc_check_dshift,gfc_check_iand_ieor_ior, gfc_check_ishft, gfc_check_and, gfc_check_merge_bits): Use it. * check.c (gfc_check_iand_ieor_ior): Force conversion of BOZ to kind type of other agrument. Remove silly GNU extension. (gfc_check_ieor, gfc_check_ior): Delete now unused functions. * intrinsic.c (add_functions): Use gfc_check_iand_ieor_ior. Wrap long line. * intrinsic.h: Rename gfc_check_iand to gfc_check_iand_ieor_ior. Delete prototype for bool gfc_check_ieor and gfc_check_ior * intrinsic.texi: Update documentation for boz-literal-constant. 2017-09-27 Steven G. Kargl PR fortran/81509 * gfortran.dg/graphite/id-26.f03: Fix non-conforming use of IAND. * gfortran.dg/pr81509_1.f90: New test. * gfortran.dg/pr81509_2.f90: New test. -- Steve 20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4 20161221 https://www.youtube.com/watch?v=IbCHE-hONow Index: gcc/fortran/check.c === --- gcc/fortran/check.c (revision 253236) +++ gcc/fortran/check.c (working copy) @@ -2101,6 +2101,21 @@ gfc_check_dprod (gfc_expr *x, gfc_expr *y) } +static bool +boz_args_check(gfc_expr *i, gfc_expr *j) +{ + if (i->is_boz && j->is_boz) +{ + gfc_error ("Arguments of %qs at %L and %L cannot both be BOZ " + "literal constants", gfc_current_intrinsic, >where, + >where); + return false; + +} + return true; +} + + bool gfc_check_dshift (gfc_expr *i, gfc_expr *j, gfc_expr *shift) { @@ -2110,12 +2125,8 @@ gfc_check_dshift (gfc_expr *i, gfc_expr *j, gfc_expr * if (!type_check (j, 1, BT_INTEGER)) return false; - if (i->is_boz && j->is_boz) -{ - gfc_error ("% at %L and % ' at %L cannot both be BOZ literal " - "constants", >where, >where); - return false; -} + if (!boz_args_check (i, j)) +return false; if (!i->is_boz && !j->is_boz && !same_type_check (i, 0, j, 1)) return false; @@ -2361,7 +2372,7 @@ gfc_check_i (gfc_expr *i) bool -gfc_check_iand (gfc_expr *i, gfc_expr *j) +gfc_check_iand_ieor_ior (gfc_expr *i, gfc_expr *j) { if (!type_check (i, 0, BT_INTEGER)) return false; @@ -2369,10 +2380,16 @@ gfc_check_iand (gfc_expr *i, gfc_expr *j) if (!type_check (j, 1, BT_INTEGER)) return false; + if (!boz_args_check (i, j)) +return false; + + if (i->is_boz) i->ts.kind = j->ts.kind; + if (j->is_boz) j->ts.kind = i->ts.kind; + if (i->ts.kind != j->ts.kind) { - if (!gfc_notify_std (GFC_STD_GNU, "Different type kinds at %L", - >where)) + gfc_error ("Arguments of %qs have different kind type parameters " + "at %L", gfc_current_intrinsic, >where); return false; } @@ -2487,26 +2504,6 @@ gfc_check_idnint (gfc_expr *a) bool -gfc_check_ieor (gfc_expr *i, gfc_expr *j) -{ - if (!type_check (i, 0, BT_INTEGER)) -return false; - - if (!type_check (j, 1, BT_INTEGER)) -return false; - - if (i->ts.kind != j->ts.kind) -{ - if (!gfc_notify_std (GFC_STD_GNU, "Different type kinds at %L", - >where)) - return false; -} - - return true; -} - - -bool gfc_check_index (gfc_expr *string, gfc_expr *substring, gfc_expr *back, gfc_expr *kind) { @@ -2559,28 +2556,7 @@ gfc_check_intconv (gfc_expr *x) return true; } - bool -gfc_check_ior (gfc_expr *i, gfc_expr *j) -{ - if (!type_check (i, 0, BT_INTEGER)) -return false; - - if (!type_check (j, 1, BT_INTEGER)) -return false; - - if (i->ts.kind != j->ts.kind) -{ - if (!gfc_notify_std (GFC_STD_GNU, "Different type kinds at %L", - >where)) - return false; -} - - return true; -} - - -bool gfc_check_ishft (gfc_expr *i, gfc_expr *shift) { if (!type_check (i, 0, BT_INTEGER) @@ -3364,6 +3340,12 @@ gfc_check_merge_bits (gfc_expr *i, gfc_expr *j, gfc_ex if (!type_check (j, 1, BT_INTEGER)) return false; + if (!boz_args_check (i, j)) +return false; + + if (i->is_boz) i->ts.kind = j->ts.kind; + if (j->is_boz) j->ts.kind = i->ts.kind; + if (!type_check (mask, 2, BT_INTEGER)) return false; @@ -3373,6 +3355,8 @@ gfc_check_merge_bits (gfc_expr *i, gfc_expr *j, gfc_ex if
[PATCH] Fix PR82337 (SLSR and abnormal PHIs)
Hi, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82337 reports a problem with SLSR performing an invalid optimization across an abnormal PHI. This is easy to avoid by ensuring that SSA names used in an abnormal PHI never appear as a basis or as a PHI basis in the candidate table. We won't optimize what we can't see... I've cleaned up the test case in the bug report so that it compiles without warnings and added it to the torture tests. Bootstrapped and tested on powerpc64le-linux-gnu with no regressions. Is this ok for trunk? I would also like to backport it to all open releases after a short wait. Thanks, Bill 2017-09-26 Bill SchmidtPR tree-optimization/82337 * gimple-ssa-strength-reduction.c (find_phi_def): Don't record a phi definition if the PHI result appears in an abnormal PHI. (find_basis_for_base_expr): Don't record a basis if the LHS of the basis appears in an abnormal PHI. Index: gcc/gimple-ssa-strength-reduction.c === --- gcc/gimple-ssa-strength-reduction.c (revision 253232) +++ gcc/gimple-ssa-strength-reduction.c (working copy) @@ -488,7 +488,8 @@ find_phi_def (tree base) c = base_cand_from_table (base); - if (!c || c->kind != CAND_PHI) + if (!c || c->kind != CAND_PHI + || SSA_NAME_OCCURS_IN_ABNORMAL_PHI (gimple_phi_result (c->cand_stmt))) return 0; return c->cand_num; @@ -557,6 +558,11 @@ find_basis_for_base_expr (slsr_cand_t c, tree base gimple_bb (one_basis->cand_stmt))) continue; + tree lhs = gimple_assign_lhs (one_basis->cand_stmt); + if (lhs && TREE_CODE (lhs) == SSA_NAME + && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs)) + continue; + if (!basis || basis->cand_num < one_basis->cand_num) basis = one_basis; } Index: gcc/testsuite/gcc.c-torture/compile/pr82337.c === --- gcc/testsuite/gcc.c-torture/compile/pr82337.c (nonexistent) +++ gcc/testsuite/gcc.c-torture/compile/pr82337.c (working copy) @@ -0,0 +1,25 @@ +/* PR82337: SLSR needs to prevent abnormal SSA names from + serving as a basis. */ +char *a, *b, *c; + +struct d { + short e; + char f[]; +}; + +extern void j (void); + +void +g() { + struct d *h; + char *i; + int d; + do { +i = h->f + d; +20 ? j() : 0; +i = c; +if (__builtin_setjmp (h)) + b = h->f + d; +d = (int)(*i); + } while (a); +}
Re: [AArch64], patch] PR71727 fix -mstrict-align
On Wed, Sep 27, 2017 at 06:25:56PM +0100, Christophe Lyon wrote: > ping? OK, thanks. Reviewed-by: James GreenhalghJames > > On 20 September 2017 at 15:17, Christophe Lyon > wrote: > > Hi, > > > > On 11 September 2017 at 10:45, Andrew Pinski wrote: > >> On Tue, Jul 18, 2017 at 5:50 AM, Christophe Lyon > >> wrote: > >>> Hello, > >>> > >>> I've received a complaint that GCC for AArch64 would generate > >>> vectorized code relying on unaligned memory accesses even when using > >>> -mstrict-align. This is a problem for code where such accesses lead to > >>> memory faults. > >>> > >>> A previous patch (r24) introduced > >>> aarch64_builtin_support_vector_misalignment, which rejects such > >>> accesses when the element size is 64 bits, and accept them otherwise, > >>> which I think it shouldn't. The testcase added at that time only used > >>> 64 bits elements, and therefore didn't fully test the patch. > >>> > >>> The report I received is about vectorized accesses to an array of > >>> unsigned chars, whose start address is not aligned on a 128 bits > >>> boundary. > >>> > >>> The attached patch fixes the problem by making > >>> aarch64_builtin_support_vector_misalignment always return false when > >>> the misalignment is not known at compile time. > >>> > >>> I've also added a testcase, which tries to check if the array start > >>> address alignment is checked (using %16, and-ing with #15), so that > >>> loop peeling is performed *before* using vectorized accesses. Without > >>> the patch, vectorized accesses are used at the beginning of the array, > >>> and byte accesses are used for the remainder at the end, and there is > >>> not such 'and wX,wX,15'. > >>> > >>> BTW, I'm not sure about the same hook for arm... it seems to me it has > >>> a similar problem. > >>> > >>> OK? > >> > >> I would keep part of the comment: > >> - /* Misalignment factor is unknown at compile time but we know > >> - it's word aligned. */ > >> > >> Something like: > >> /* Misalignment factor is unknown at compile time. */ > >> > >> Otherwise it is not obvious what -1 means. > >> > >> Other than I think this patch is good (I cannot approve though). > >> > > > > Here is an updated patch, with the comment added as Andrew suggested. > > > > OK? > > > > Thanks, > > > > Christophe > > > >> Thanks, > >> Andrew > >> > >>> > >>> Thanks, > >>> > >>> Christophe
[PATCH][mingw] Enable colorized diagnostics
Hello, (I don't have SVN write access. So please apply this patch if you think it is OK. I have got FSF's copyright assignment paper for GCC. I will send you a copy of it when required.) Colorized diagnostics used to be disabled for MinGW targets (on which the macro `_WIN32` is defined), and this patch enables it. At the moment, GCC colorizes diagnostics using ANSI escape codes. Unlike most other terminal devices or simulators, Windows consoles did not recognize ANSI escape codes until Windows 10 "Threshold 2" in 2016, and this capability is not enabled by default thus has to be turned on explicitly sometime. In reality, Windows consoles do support coloring/highlighting/underlining characters, moving the cursor or filling (erasing) screen areas. However the individual text attributes must be set or unset using functions provided by the system. Here comes my solution: Normally, GCC outputs the diagnostic messages, infused with miscellaneous formatting, using the standard `fputs()` function. For Windows consoles we use a new function, `mingw_ansi_fputs()` to output such messages. This function strips ANSI escape codes, interprets them, orchestrates the modes as requested, then outputs the text remaining. Unimplemented codes are silently ignored. I have successfully bootstrapped GCC for `i686-w64-mingw32` and `x86_64-w64-mingw32` with this patch enabled. Automated tests are not an option because all this patch enables are visual effects that are not unreadable by scripts. `check_GNU_style.sh` spits some false positives about the comments and a line of non-conforming comment in the original file, which I think can be ignored. * * * * * * * * * * KNOWN ISSUES 1. Unlike `fputs()`, `mingw_ansi_fputs()` is no longer atomic. When multiple processes are outputting text to the same console, messages could interleave with each other. There is no simple solution to this issue. The atomicity of stdio functions is mandatory according to ISO C11, which was not the case in C99. The `fputs()` function from MinGW-w64 suffers from the same problem, but the one from MSVCRT does not. 2. `mingw_ansi_fputs()` is eventually compiled into `libcommon.a`. Personally, I think it is a bad idea to put target-specific code in it, notwithstanding `#if ... #endif` guards that has been went over carefully. I could have implemented this function in a separated target-specific file, which, actually, results in only undefined references, since all target-specific object files precede `libcommon.a` in the final linker command line. * * * * * * * * * * 2017-09-28 Liu Hao* gcc/pretty-print.c [_WIN32] (colorize_init): Remove. Use the generic version below instead. (should_colorize): Recognize Windows consoles as terminals for MinGW targets. * gcc/pretty-print.c [__MINGW32__] (write_all): New function. [__MINGW32__] (find_esc_head): Likewise. [__MINGW32__] (find_esc_terminator): Likewise. [__MINGW32__] (eat_esc_sequence): Likewise. [__MINGW32__] (mingw_ansi_fputs): New function that handles ANSI escape codes. (pp_write_text_to_stream): Use mingw_ansi_fputs instead of fputs for MinGW targets. -- Best regards, LH_Mouse From 6a02e27b10a20ead725391804610bfca6d9c1e06 Mon Sep 17 00:00:00 2001 From: Liu Hao Date: Thu, 7 Sep 2017 12:50:07 +0800 Subject: [PATCH] Enable a native GCC to color diagnostic messages sent over Windows consoles. --- gcc/diagnostic-color.c | 28 ++- gcc/pretty-print.c | 664 + 2 files changed, 682 insertions(+), 10 deletions(-) diff --git a/gcc/diagnostic-color.c b/gcc/diagnostic-color.c index 6adb872146b..b8cf6f2c045 100644 --- a/gcc/diagnostic-color.c +++ b/gcc/diagnostic-color.c @@ -20,6 +20,10 @@ #include "system.h" #include "diagnostic-color.h" +#ifdef __MINGW32__ +# include +#endif + /* Select Graphic Rendition (SGR, "\33[...m") strings. */ /* Also Erase in Line (EL) to Right ("\33[K") by default. */ /*Why have EL to Right after SGR? @@ -275,23 +279,28 @@ parse_gcc_colors (void) return true; } -#if defined(_WIN32) -bool -colorize_init (diagnostic_color_rule_t) -{ - return false; -} -#else - /* Return true if we should use color when in auto mode, false otherwise. */ static bool should_colorize (void) { +#ifdef __MINGW32__ + /* For consistency reasons, one should check the handle returned by + _get_osfhandle(_fileno(stderr)) because the function + pp_write_text_to_stream() in pretty-print.c calls fputs() on + that stream. However, the code below for non-Windows doesn't seem + to care about it either... */ + HANDLE h; + DWORD m; + + h = GetStdHandle (STD_ERROR_HANDLE); + return (h != INVALID_HANDLE_VALUE) && (h != NULL) + && GetConsoleMode (h, ); +#else char const *t = getenv ("TERM"); return t && strcmp (t, "dumb") != 0
Re: [PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern
Hi James I have made the requested changes to the patch. 2017-09-27 Sudakshina Das* config/aarch64/aarch64-protos.h (enum simd_immediate_check): New check type for aarch64_simd_valid_immediate. (aarch64_output_simd_mov_immediate): Update prototype. (aarch64_simd_valid_immediate): Update prototype. * config/aarch64/aarch64-simd.md (orr3): modified pattern to add support for ORR-immediate. (and3): modified pattern to add support for BIC-immediate. * config/aarch64/aarch64.c (aarch64_simd_valid_immediate): Function now checks for valid immediate for BIC and ORR based on new enum argument. (aarch64_output_simd_mov_immediate): Function now used to output BIC/ORR imm as well based on new enum argument. * config/aarch64/constraints.md (Do): New vector immediate constraint. (Db): Likewise. 2017-09-27 Sudakshina Das * gcc.target/aarch64/bic_imm_1.c: New test. * gcc.target/aarch64/orr_imm_1.c: Likewise. Thanks Sudi From: James Greenhalgh Sent: Tuesday, September 26, 2017 8:04:38 PM To: Sudi Das Cc: Richard Earnshaw; gcc-patches@gcc.gnu.org; nd; Marcus Shawcroft Subject: Re: [PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern On Mon, Sep 25, 2017 at 11:13:57AM +0100, Sudi Das wrote: > > Hi James > > I put aarch64_output_simd_general_immediate looking at the similarities of > the immediates for mov/mvni and orr/bic. The CHECK macro in > aarch64_simd_valid_immediate both checks > and converts the immediates in a manner that are needed for the instructions. > > Having said that, I agree that maybe I could have refactored > aarch64_output_simd_mov_immediate to do the work rather than creating a new > functions to do similar things. I have done so in this patch. Thanks, this looks much neater. > I have also changed the names of the enum simd_immediate_check to be better > indicative of what they are doing. Thanks, I'd tweak them to look more like the bitmasks you use them as, but that is a small change for my personal preference. > Lastly I have added more cases in the tests (according to all the possible > CHECKs) and made them dg-do assemble (although I had to add --save-temps so > that the scan-assembler would work). Do you think I should not put that > option and rather create separate tests? This is good - thanks. I think clean up the enum definitions and this patch will be good. > @@ -308,6 +308,16 @@ enum aarch64_parse_opt_result > AARCH64_PARSE_INVALID_ARG /* Invalid arch, tune, cpu arg. */ > }; > > +/* Enum to distinguish which type of check is to be done in > + aarch64_simd_valid_immediate. This is used as a bitmask where > + AARCH64_CHECK_MOV has both bits set. Thus AARCH64_CHECK_MOV will > + perform all checks. Adding new types would require changes accordingly. > */ > +enum simd_immediate_check { > + AARCH64_CHECK_ORR = 1, /* Perform immediate checks for ORR. */ > + AARCH64_CHECK_BIC = 2, /* Perform immediate checks for BIC. */ > + AARCH64_CHECK_MOV = 3 /* Perform all checks (used for MOVI/MNVI). */ These are used in bit-mask style, so how about: AARCH64_CHECK_ORR = 1 << 0, AARCH64_CHECK_BIC = 1 << 1, AARCH64_CHECK_MOV = AARCH64_CHECK_ORR | AARCH64_CHECK_BIC Which is more self-documenting. > @@ -13001,7 +13013,8 @@ aarch64_float_const_representable_p (rtx x) > char* > aarch64_output_simd_mov_immediate (rtx const_vector, > machine_mode mode, > - unsigned width) > + unsigned width, > + enum simd_immediate_check which) This function is sorely missing a comment explaining the parameters - it would be very helpful if you could add one as part of this patch. Thanks, James diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index e67c2ed..5d7c5df 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -308,6 +308,16 @@ enum aarch64_parse_opt_result AARCH64_PARSE_INVALID_ARG /* Invalid arch, tune, cpu arg. */ }; +/* Enum to distinguish which type of check is to be done in + aarch64_simd_valid_immediate. This is used as a bitmask where + AARCH64_CHECK_MOV has both bits set. Thus AARCH64_CHECK_MOV will + perform all checks. Adding new types would require changes accordingly. */ +enum simd_immediate_check { + AARCH64_CHECK_ORR = 1 << 0, + AARCH64_CHECK_BIC = 1 << 1, + AARCH64_CHECK_MOV = AARCH64_CHECK_ORR | AARCH64_CHECK_BIC +}; + extern struct tune_params aarch64_tune_params; HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned); @@ -345,7 +355,8 @@ bool aarch64_mov_operand_p (rtx, machine_mode); rtx aarch64_reverse_mask (machine_mode); bool aarch64_offset_7bit_signed_scaled_p
[PATCH, rs6000] Correct some Power9 scheduling info
The following patch corrects some Power9 resource requirements and instruction latencies. Bootstrap/regtest on powerpc64le-linux with no new regressions. Ok for trunk? -Pat 2017-09-27 Pat Haugen* config/rs6000/power9.md (DU_C2_3_power9): Remove an incorrect combination. (power9-alu): Split out insert/shift types... (power9-rot): ... to here. Correct dispatch resources. (power9-cracked-alu): Correct dispatch resources. (power9-mul): Likewise. (power9-mul-compare): Likewise. (power9-fp): Correct latency. (power9-ddiv): Likewise. (power9-vecfdiv): Likewise. (power9-vecdiv): Likewise. Index: gcc/config/rs6000/power9.md === --- gcc/config/rs6000/power9.md (revision 252029) +++ gcc/config/rs6000/power9.md (working copy) @@ -80,7 +80,6 @@ (define_reservation "DU_C2_power9" "x0_p ; 2-way cracked plus 3rd slot (define_reservation "DU_C2_3_power9" "x0_power9+x1_power9+xa0_power9| x1_power9+x2_power9+xa0_power9| - x1_power9+x2_power9+xb0_power9| x2_power9+x3_power9+xb0_power9") ; 3-way cracked (consumes whole decode/dispatch cycle) @@ -243,21 +242,29 @@ (define_insn_reservation "power9-sync" 4 ; Most ALU insns are simple 2 cycle, including record form (define_insn_reservation "power9-alu" 2 - (and (ior (eq_attr "type" "add,exts,integer,logical,isel") - (and (eq_attr "type" "insert,shift") - (eq_attr "dot" "no"))) + (and (eq_attr "type" "add,exts,integer,logical,isel") (eq_attr "cpu" "power9")) "DU_any_power9,VSU_power9") ; 5 cycle CR latency (define_bypass 5 "power9-alu" "power9-crlogical,power9-mfcr,power9-mfcrf") +; Rotate/shift prevent use of third slot +(define_insn_reservation "power9-rot" 2 + (and (eq_attr "type" "insert,shift") + (eq_attr "dot" "no") + (eq_attr "cpu" "power9")) + "DU_slice_3_power9,VSU_power9") +; 5 cycle CR latency +(define_bypass 5 "power9-rot" + "power9-crlogical,power9-mfcr,power9-mfcrf") + ; Record form rotate/shift are cracked (define_insn_reservation "power9-cracked-alu" 2 (and (eq_attr "type" "insert,shift") (eq_attr "dot" "yes") (eq_attr "cpu" "power9")) - "DU_C2_power9,VSU_power9") + "DU_C2_3_power9,VSU_power9") ; 7 cycle CR latency (define_bypass 7 "power9-cracked-alu" "power9-crlogical,power9-mfcr,power9-mfcrf") @@ -291,13 +298,13 @@ (define_insn_reservation "power9-mul" 5 (and (eq_attr "type" "mul") (eq_attr "dot" "no") (eq_attr "cpu" "power9")) - "DU_any_power9,VSU_power9") + "DU_slice_3_power9,VSU_power9") (define_insn_reservation "power9-mul-compare" 5 (and (eq_attr "type" "mul") (eq_attr "dot" "yes") (eq_attr "cpu" "power9")) - "DU_C2_power9,VSU_power9") + "DU_C2_3_power9,VSU_power9") ; 10 cycle CR latency (define_bypass 10 "power9-mul-compare" "power9-crlogical,power9-mfcr,power9-mfcrf") @@ -349,7 +356,7 @@ (define_insn_reservation "power9-fpsimpl (eq_attr "cpu" "power9")) "DU_slice_3_power9,VSU_power9") -(define_insn_reservation "power9-fp" 7 +(define_insn_reservation "power9-fp" 5 (and (eq_attr "type" "fp,dmul") (eq_attr "cpu" "power9")) "DU_slice_3_power9,VSU_power9") @@ -366,7 +373,7 @@ (define_insn_reservation "power9-sdiv" 2 (eq_attr "cpu" "power9")) "DU_slice_3_power9,VSU_power9") -(define_insn_reservation "power9-ddiv" 33 +(define_insn_reservation "power9-ddiv" 27 (and (eq_attr "type" "ddiv") (eq_attr "cpu" "power9")) "DU_slice_3_power9,VSU_power9") @@ -419,12 +426,12 @@ (define_insn_reservation "power9-veccomp (eq_attr "cpu" "power9")) "DU_super_power9,VSU_super_power9") -(define_insn_reservation "power9-vecfdiv" 28 +(define_insn_reservation "power9-vecfdiv" 24 (and (eq_attr "type" "vecfdiv") (eq_attr "cpu" "power9")) "DU_super_power9,VSU_super_power9") -(define_insn_reservation "power9-vecdiv" 32 +(define_insn_reservation "power9-vecdiv" 27 (and (eq_attr "type" "vecdiv") (eq_attr "size" "!128") (eq_attr "cpu" "power9"))
Go patch committed: fix crash on struct that embeds pointer type
This patch by Than McIntosh fixes a crash in the Go frontend that incorrectly embeds a pointer type. This fixes https://golang.org/issue/22050. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 253231) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -cdf1f58c7578980e1d1949680c7e404961b7c153 +11b7dae7de94215e92eb46e703cfecd76c0a3282 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: gcc/go/gofrontend/types.cc === --- gcc/go/gofrontend/types.cc (revision 253025) +++ gcc/go/gofrontend/types.cc (working copy) @@ -5842,7 +5842,9 @@ Struct_type::do_verify() Type* t = p->type(); if (p->is_anonymous()) { - if (t->named_type() != NULL && t->points_to() != NULL) + if ((t->named_type() != NULL && t->points_to() != NULL) + || (t->named_type() == NULL && t->points_to() != NULL + && t->points_to()->points_to() != NULL)) { go_error_at(p->location(), "embedded type may not be a pointer"); p->set_type(Type::make_error_type()); @@ -11848,6 +11850,12 @@ Type::bind_field_or_method(Gogo* gogo, c go_assert(expr->type()->struct_type() == st); } ret = st->field_reference(expr, name, location); + if (ret == NULL) +{ + go_error_at(location, "type has no field %qs", + Gogo::message_name(name).c_str()); + return Expression::make_error(location); +} } else if (it != NULL && it->find_method(name) != NULL) ret = Expression::make_interface_field_reference(expr, name,
Re: [PATCH 4/5] New target check: vect_nopeel - v2
On 09/27/2017 03:05 AM, Rainer Orth wrote: Hi Andreas, On 09/27/2017 10:10 AM, Rainer Orth wrote: Hi Andreas, On 09/26/2017 02:26 PM, Rainer Orth wrote: Hi Andreas, diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi index 307c726..3acfd85 100644 --- a/gcc/doc/sourcebuild.texi +++ b/gcc/doc/sourcebuild.texi @@ -1398,6 +1398,9 @@ Target supports a vector misalign access. @item vect_no_align Target does not support a vector alignment mechanism. +@item vect_no_peel +Target does not require any loop peeling for alignment purposes. + @item vect_no_int_min_max Target does not support a vector min and max instruction on @code{int}. please keep the items sorted alphabetically. The items do not appear to be sorted alphabetically. they should be. Your patch makes the ordering even more random. Patch to fix this preapproved ;-) The items rather appear to be arranged by subject. Does it really make sense do pull items like this apart just to have it in alphabetical order? @item vect_intfloat_cvt Target supports conversion from @code{signed int} to @code{float}. @item vect_uintfloat_cvt Target supports conversion from @code{unsigned int} to @code{float}. @item vect_floatint_cvt Target supports conversion from @code{float} to @code{signed int}. @item vect_floatuint_cvt Target supports conversion from @code{float} to @code{unsigned int}. I've added the no_peel item intentionally to the hw_misalign/no_align block. granted, there are some attempts at that, but I find it hard to make my way through that longish list. The way it is, you have to skip through the whole list beginning to end. Texinfo seems to have no subsubsection which would allow to make the sub-grouping explicit... Let's hear what Sandra thinks. U. There is no common convention in the GCC documentation and other parts of the manual do deliberately diverge from alphabetization in places. There's a perpetual tension between putting the most commonly-needed information first vs grouping things by related concepts vs alphabetize vs the tendency of people to insert new items at random places in an existing list regardless of how it's previously been organized. :-( Alphabetical lists are useful when you already know the name of the thing you are searching for, but almost everybody reads the documentation in a web browser or PDF viewer with a search feature nowadays so you can find the term no matter how the list is sorted. So I'd say we shouldn't alphabetize as a matter of policy if there is some other organization that makes sense. In this case, the section is already broken into multiple sublists by topic, most of the sublists are fairly short, and where there's some discernible sort order within the sublists, it seems to be grouping related things together rather than alphabetical. So I wouldn't insist on alphabetizing this particular sublist either. -Sandra
Re: [AArch64], patch] PR71727 fix -mstrict-align
ping? On 20 September 2017 at 15:17, Christophe Lyonwrote: > Hi, > > On 11 September 2017 at 10:45, Andrew Pinski wrote: >> On Tue, Jul 18, 2017 at 5:50 AM, Christophe Lyon >> wrote: >>> Hello, >>> >>> I've received a complaint that GCC for AArch64 would generate >>> vectorized code relying on unaligned memory accesses even when using >>> -mstrict-align. This is a problem for code where such accesses lead to >>> memory faults. >>> >>> A previous patch (r24) introduced >>> aarch64_builtin_support_vector_misalignment, which rejects such >>> accesses when the element size is 64 bits, and accept them otherwise, >>> which I think it shouldn't. The testcase added at that time only used >>> 64 bits elements, and therefore didn't fully test the patch. >>> >>> The report I received is about vectorized accesses to an array of >>> unsigned chars, whose start address is not aligned on a 128 bits >>> boundary. >>> >>> The attached patch fixes the problem by making >>> aarch64_builtin_support_vector_misalignment always return false when >>> the misalignment is not known at compile time. >>> >>> I've also added a testcase, which tries to check if the array start >>> address alignment is checked (using %16, and-ing with #15), so that >>> loop peeling is performed *before* using vectorized accesses. Without >>> the patch, vectorized accesses are used at the beginning of the array, >>> and byte accesses are used for the remainder at the end, and there is >>> not such 'and wX,wX,15'. >>> >>> BTW, I'm not sure about the same hook for arm... it seems to me it has >>> a similar problem. >>> >>> OK? >> >> I would keep part of the comment: >> - /* Misalignment factor is unknown at compile time but we know >> - it's word aligned. */ >> >> Something like: >> /* Misalignment factor is unknown at compile time. */ >> >> Otherwise it is not obvious what -1 means. >> >> Other than I think this patch is good (I cannot approve though). >> > > Here is an updated patch, with the comment added as Andrew suggested. > > OK? > > Thanks, > > Christophe > >> Thanks, >> Andrew >> >>> >>> Thanks, >>> >>> Christophe
Re: 0005-Part-5.-Add-x86-CET-documentation
On Wed, 27 Sep 2017, Florian Weimer wrote: > This is part of the ABI GCC implements, so it has to be documented somewhere, > and not just as part of the GCC source code. > > CET is not properly described in the ABI supplement and I don't think this > will change, so detailed documentation in the GCC manual is very much > desirable. Isn't this a matter to take up further in the thread HJ started on the ABI mailing lists, or a new such thread (possibly e.g. sending pull requests that build further on his wording, or propose alternative wording, to clarify them things left unclear there, with a goal of getting it clearly defined in the master sources for x86_64 and x86)? Clearly the best result would be proper documentation in the ABI and the GCC manual cross-referencing the relevant ABI documents. -- Joseph S. Myers jos...@codesourcery.com
Re: 0005-Part-5.-Add-x86-CET-documentation
On 09/27/2017 02:52 AM, Florian Weimer wrote: On 09/27/2017 05:40 AM, Sandra Loosemore wrote: +@emph{x86 implementation:} when @option{-fcf-protection} option is +specified the compiler inserts an ENDBR instruction at function's +prologue if the function's type does not have the @code{nocf_check} +attribute and addresses to which indirect control-flow transfer can +happen. The instruction triggers the HW check if a control-flow +transfer to the address of ENDBR instruction is valid. Implementation details like this should be comments in the code, not included in the user-facing documentation. This is part of the ABI GCC implements, so it has to be documented somewhere, and not just as part of the GCC source code. CET is not properly described in the ABI supplement and I don't think this will change, so detailed documentation in the GCC manual is very much desirable. Not if you're a documentation maintainer. :-( Generally speaking, user-facing manuals like the GCC manual should document user-visible GCC features, not internal implementation details. Especially the target-independent parts of the manual are not the right place to discuss target-specific code generation patterns or conventions that should be in the ABI supplement or some other non-GCC documentation. I don't have so much objection to expanding the discussion of the target-specific -mcet option in the x86 options section, as long as the documentation is there because it helps people *use* the feature and not to explain things that are only interesting to compiler implementors. -Sandra
Re: correct attribute ifunc C++ type safety (PR 82301)
Hi Nathan, This patch is a tweak for the C++ part of the type safety enhancement to attribute ifunc committed in r253041. It touches the C++ ifunc tests you added some years ago and I recently broke with the initial commits of the feature. The notable difference between r253041 and this update (other than correcting the type expected to be returned by the resolver for a member function) is issuing the incompatibility warning under -Wincompatible- pointer-types rather than -Wattributes. When you and/or Jason have a minute, can you please review it? https://gcc.gnu.org/ml/gcc-patches/2017-09/msg01603.html Thanks Martin On 09/24/2017 07:03 PM, Martin Sebor wrote: r253041 enhanced type checking for alias and ifunc attributes to detect declarations of incompatible aliases, or ifunc resolvers that return pointers to functions of an incompatible type. More extensive testing exposed a bug in the implementation of the ifunc attribute handling in C++ where the checker expected the ifunc resolver to return a pointer to a member function when the implementation actually expects it return a pointer to a non- member function. In a discussion of the test suite failures, Jakub also suggested to break the enhanced warning out of -Wattributes and issue it under a different option. The attached patch corrects the C++ problem and moves the warning under -Wincompatible-pointer-types. Since this is a C-only option, the patch also enables for it C++. Since the option is enabled by default, the patch further requires -Wextra to issue the warning for ifunc resolvers returning void*. However, the patched checker diagnoses other incompatibilities without it. Martin
RE: 0005-Part-5.-Add-x86-CET-documentation
Updated version #3. > -Original Message- > From: Sandra Loosemore [mailto:san...@codesourcery.com] > Sent: Wednesday, September 27, 2017 5:41 AM > To: Tsimbalist, Igor V; Uros Bizjak > > Cc: gcc-patches@gcc.gnu.org > Subject: Re: 0005-Part-5.-Add-x86-CET-documentation > > On 09/26/2017 07:47 AM, Tsimbalist, Igor V wrote: > > Here is a new version of the patch. > > > > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index > > a374890..a900ed1 100644 > > --- a/gcc/doc/extend.texi > > +++ b/gcc/doc/extend.texi > > @@ -5655,6 +5655,13 @@ compiled with the > > @option{-fcf-protection=branch} option. The compiler assumes that > > the function's address is a valid target for a control-flow transfer. > > > > +@emph{x86 implementation:} when @option{-fcf-protection} option is > > +specified the compiler inserts an ENDBR instruction at function's > > +prologue if the function's type does not have the @code{nocf_check} > > +attribute and addresses to which indirect control-flow transfer can > > +happen. The instruction triggers the HW check if a control-flow > > +transfer to the address of ENDBR instruction is valid. > > Implementation details like this should be comments in the code, not > included in the user-facing documentation. > > > @@ -5662,7 +5669,8 @@ not be instrumented when compiled with the > that > > the function's address from the pointer is a valid target for a > > control-flow transfer. A direct function call through a function > > name is assumed to be a safe call thus direct calls are not > > -instrumented by the compiler. > > +instrumented by the compiler. For @emph{x86 implementation} the > > +compiler inserts a NOTRACK prefix before an indirect call instruction. > > Likewise here. For this comment and above could you please let me know what is the right place To move the description? Also I enclosed ENDBR and NOTRACK in @code{} and wrote it in lower case. > > @@ -21217,6 +21225,25 @@ void __builtin_ia32_wrpkru (unsigned int) > > unsigned int __builtin_ia32_rdpkru () @end smallexample > > > > +The following built-in functions are available when @option{-mcet} is > used. > > +They are used to support Intel Control-flow Enforcment Technology (CET). > > +Each built-in function generate a machine instruction that is part of > > +the > > s/generate a/generates the/ Fixed. > > @@ -11378,6 +11379,20 @@ You can also use the @code{nocf_check} > > attribute to identify which functions and calls should be skipped > > from instrumentation (@pxref{Function Attributes}). > > > > +Currently x86 GNU/Linux target provides an implementation based on > > s/x86/the x86/ Fixed. > > +Intel Control-flow Enforcement Technology (CET), thus @option{-mcet} > > s/@option/the @option/ Fixed. > > +option is required to enable this feature. > > I think you should put a cross-reference to the x86 options node here, and > move all the following x86-specific discussion to that section. Put cross-reference. > > In order to get an > > +application to be CET compatible the x86 implementation requires all > > +object files have to be compiled with @option{-fcf-protection} option > > +and all linked in libraries have to be CET compatible. > > I'm having difficulty parsing this. What does "CET compatible" mean? > Is this an ABI compatibility issue, so that all objects linked into the > executable > have to be compiled with the (same?) @option{-fcf-protection} option if any > of them do? Or do you just lose checking on code in uninstrumented > objects? I re-wrote the paragraph and removed "compatibility topic". > > +Instrumentation for x86 is controlled by target specific options > > hyphenate target-specific here Fixed. > > +@option{-mcet}, @option{-mibt} and @option{-mshstk}. The compiler > > +also provides a number of built-in functions for fine-grained control > > +of CET-based implementation. See @xref{x86 Built-in Functions}, for > > +more information. > > + > > @item -fstack-protector > > @opindex fstack-protector > > Emit extra code to check for buffer overflows, such as stack smashing > > @@ -25755,15 +25770,19 @@ preferred alignment to @option{- > mpreferred-stack-boundary=2}. > > @need 200 > > @itemx -mclzero > > @opindex mclzero > > +@need 200 > > @itemx -mpku > > @opindex mpku > > +@need 200 > > +@itemx -mcet > > +@opindex mcet > > These switches enable the use of instructions in the MMX, SSE, SSE2, > > SSE3, SSSE3, SSE4.1, AVX, AVX2, AVX512F, AVX512PF, AVX512ER, > AVX512CD, > > SHA, AES, PCLMUL, FSGSBASE, RDRND, F16C, FMA, SSE4A, FMA4, XOP, > LWP, > > ABM, AVX512VL, AVX512BW, AVX512DQ, AVX512IFMA AVX512VBMI, BMI, > BMI2, > > FXSR, -XSAVE, XSAVEOPT, LZCNT, RTM, MPX, MWAITX, PKU, 3DNow!@: or > enhanced 3DNow!@: > > -extended instruction sets. Each has a corresponding @option{-mno-} > > option -to disable use of these instructions. > > +XSAVE, XSAVEOPT, LZCNT, RTM, MPX, MWAITX, PKU, IBT, SHSTK, > > +3DNow!@: or
Re: [C++ PATCH] C++2A P0386R1 - default member initializers for bit-fields
Jakub, The following patch implements P0386R1 - NSDMIs for bit-fields. While working on that, I've discovered our parser mishandles attributes on bitfields, already C++11 says: identifier[opt] attribute-specifier-seq[opt] : constant-expression in the grammar, but we actually parsed identifier[opt] : constant-expression attribute-specifier-seq[opt] I'm sorry for my tardiness. It think the patch would be better broken apart: 1) fix the parsing bug you found and move to (ab)using DECL_BIT_FIELD_REPRESENTATIVE 2) the new c++2a feature Is that feasible? WRT your questions 1, I think a default strict arg on cp_parser_constant_expression is the way to go. That makes it clear that its default behaviour is 'sloppy', saving someone the detective work you've done. 2. I'm all for a pedwarn, Jason is often more conservative than me though. 3. D_B_F_R is quite probably fine -- you'll know better than me, having poked at it 4. They are not valid on unamed bitfields. You may have spotted an inconsistency in the spec. nathan -- Nathan Sidwell
Re: [PATCH][GRAPHITE] More TLC
On Wed, 27 Sep 2017, Richard Biener wrote: > On Wed, 27 Sep 2017, Richard Biener wrote: > > > On Tue, 26 Sep 2017, Sebastian Pop wrote: > > > > > On Mon, Sep 25, 2017 at 8:12 AM, Richard Bienerwrote: > > > > > > > On Fri, 22 Sep 2017, Sebastian Pop wrote: > > > > > > > > > On Fri, Sep 22, 2017 at 8:03 AM, Richard Biener > > > > wrote: > > > > > > > > > > > > > > > > > This simplifies canonicalize_loop_closed_ssa and does other minimal > > > > > > TLC. It also adds a testcase I reduced from a stupid mistake I made > > > > > > when reworking canonicalize_loop_closed_ssa. > > > > > > > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to > > > > > > trunk. > > > > > > > > > > > > SPEC CPU 2006 is happy with it, current statistics on x86_64 with > > > > > > -Ofast -march=haswell -floop-nest-optimize are > > > > > > > > > > > > 61 loop nests "optimized" > > > > > > 45 loop nest transforms cancelled because of code generation issues > > > > > > 21 loop nest optimizations timed out the 35 ISL "operations" we > > > > allow > > > > > > > > > > > > I say "optimized" because the usual transform I've seen is static > > > > tiling > > > > > > as enforced by GRAPHITE according to --param loop-block-tile-size. > > > > > > There's no way to automagically figure what kind of transform ISL > > > > > > did > > > > > > > > > > > > > > > > Here is how to automate (without magic) the detection > > > > > of the transform that isl did. > > > > > > > > > > The problem solved by isl is the minimization of strides > > > > > in memory, and to do this, we need to tell the isl scheduler > > > > > the validity dependence graph, in graphite-optimize-isl.c > > > > > see the validity (RAW, WAR, WAW) and the proximity > > > > > (RAR + validity) maps. The proximity does include the > > > > > read after read, as the isl scheduler needs to minimize > > > > > strides between consecutive reads. > > > > Ah, so I now see why we do not perform interchange on trivial cases like > > > > double A[1024][1024], B[1024][1024]; > > > > void foo(void) > > { > > for (int i = 0; i < 1024; ++i) > > for (int j = 0; j < 1024; ++j) > > A[j][i] = B[j][i]; > > } > > > > which is probably because > > > > /* FIXME: proximity should not be validity. */ > > isl_union_map *proximity = isl_union_map_copy (validity); > > > > falls apart when there is _no_ dependence? > > > > I can trick GRAPHITE into performing the interchange for > > > > double A[1024][1024], B[1024][1024]; > > > > void foo(void) > > { > > for (int i = 1; i < 1023; ++i) > > for (int j = 0; j < 1024; ++j) > > A[j][i] = B[j][i-1] + A[j][i+1]; > > } > > > > because now there is a dependence. Any idea on how to rewrite > > scop_get_dependences to avoid "simplifying"? I suppose the > > validity constraints _do_ also specify kind-of a proximity > > we just may not prune / optimize them in the same way as > > dependences? > > Another thing I notice is that we don't handle the multi-dimensional > accesses the fortran frontend produces: > > (gdb) p debug_data_reference (dr) > #(Data Ref: > # bb: 18 > # stmt: _43 = *a_141(D)[_42]; > # ref: *a_141(D)[_42]; > # base_object: *a_141(D); > # Access function 0: {{(_38 + stride.88_115) + 1, +, 1}_4, +, > stride.88_115}_5 > > ultimatively we fail here because we try to build a constraint for > > {{(_38 + stride.88_115) + 1, +, 1}_4, +, stride.88_115}_5 > > which ends up computing isl_pw_aff_mul (A, stride.88_115) with > A being the non-constant constraint generated for > {(_38 + stride.88_115) + 1, +, 1}_4 and stride.88_115 being > a parameter. ISL doesn't like that multiplication as the result > isn't affine (well - it is, we just have parameters in there). > > I suppose ISL doesn't handle this form of accesses given the > two "dimensions" in this scalarized form may overlap? So we'd > really need to turn those into references with different access > functions (even if that's not 100% a valid semantic transformation > as scalarization isn't reversible without extra information)? Looks like even when hacking the Fortran FE to produce nested ARRAY_REFs we run into the same issue for (gdb) p debug_data_reference (dr) #(Data Ref: # bb: 17 # stmt: VIEW_CONVERT_EXPR (*y_117(D))[_24]{lb: 1 sz: _20 * 8}[_26]{lb: 1 sz: _21 * 8}[_28]{lb: 1 sz: _22 * 8}[_29]{lb: 1 sz: 8} = 0.0; # ref: VIEW_CONVERT_EXPR (*y_117(D))[_24]{lb: 1 sz: _20 * 8}[_26]{lb: 1 sz: _21 * 8}[_28]{lb: 1 sz: _22 * 8}[_29]{lb: 1 sz: 8}; # base_object: VIEW_CONVERT_EXPR (*y_117(D)); # Access function 0: {1, +, 1}_4 # Access function 1: (integer(kind=8)) {(unsigned long) stride.88_92, +, (unsigned long) stride.88_92}_3; # Access function 2: (integer(kind=8)) {(unsigned
libgo patch committed: Check Getsockname error return
This patch by Tony Reix checks for errors from Getsockname in a couple of places. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 253105) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -e0c1f0b645b12a544b484c0f477f8fb6f5980550 +cdf1f58c7578980e1d1949680c7e404961b7c153 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: libgo/go/net/sock_posix.go === --- libgo/go/net/sock_posix.go (revision 253025) +++ libgo/go/net/sock_posix.go (working copy) @@ -182,7 +182,10 @@ func (fd *netFD) listenStream(laddr sock if err := fd.init(); err != nil { return err } - lsa, _ := syscall.Getsockname(fd.pfd.Sysfd) + lsa, err := syscall.Getsockname(fd.pfd.Sysfd) + if err != nil { + return os.NewSyscallError("getsockname", err) + } fd.setAddr(fd.addrFunc()(lsa), nil) return nil } @@ -221,7 +224,10 @@ func (fd *netFD) listenDatagram(laddr so if err := fd.init(); err != nil { return err } - lsa, _ := syscall.Getsockname(fd.pfd.Sysfd) + lsa, err := syscall.Getsockname(fd.pfd.Sysfd) + if err != nil { + return os.NewSyscallError("getsockname", err) + } fd.setAddr(fd.addrFunc()(lsa), nil) return nil }
Re: [PATCH] Pretty-print GOACC_REDUCTION arguments
Hi! On Mon, 25 Sep 2017 16:57:52 +0200, Tom de Vrieswrote: > currently for a GOACC_REDUCTION internal fn call we print: > ... >sum_5 = GOACC_REDUCTION (SETUP, _3, 0, 0, 67, 0); > ... > > This patch adds a comment for some arguments explaining the meaning of > the argument: > ... >sum_5 = GOACC_REDUCTION (SETUP, _3, 0, 0 /*gang*/, 67 /*+*/, 0); > ... ACK to the general idea. However, I note that for the "CODE" we currently *only* print the textual variant ("SETUP") (just like we're also doing for a few other "IFN_*"), whereas for "LEVEL" and "OP" you now print both. Should these really be different? I think I actually do prefer the style you're using (print both). I would print the actual "GOMP_DIM_GANG" instead of "gang" etc., to make it easier to see where these values are coming from. (I do see that for "CODE", we can easily use a suitable "DEF" macro with the "IFN_GOACC_REDUCTION_CODES" define -- not currently possible with the "GOMP_DIM_*" constants. That can of course be addressed later, separately, if a Reviewer agrees to the proposed patch generally.) > OK for trunk, if testing is ok? > --- a/gcc/gimple-pretty-print.c > +++ b/gcc/gimple-pretty-print.c > @@ -765,6 +766,40 @@ dump_gimple_call_args (pretty_printer *buffer, gcall > *gs, dump_flags_t flags) >if (i) > pp_string (buffer, ", "); >dump_generic_node (buffer, gimple_call_arg (gs, i), 0, flags, false); > + > + if (gimple_call_internal_p (gs)) > + switch (gimple_call_internal_fn (gs)) > + { Will need to add a "default" case: [...]/source-gcc/gcc/gimple-pretty-print.c:771:9: warning: enumeration value 'IFN_MASK_LOAD' not handled in switch [-Wswitch] switch (gimple_call_internal_fn (gs)) ^ [followed by many more] > + case IFN_GOACC_REDUCTION: > + switch (i) > + { > + case 3: > + switch (tree_to_uhwi (gimple_call_arg (gs, i))) Something ;-) is wrong. Running this on libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-1.c, I run into: during GIMPLE pass: omplower dump file: reduction-1.c.006t.omplower In file included from source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-1.c:9:0: source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-1.c: In function 'test_reductions': source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction.h:20:7: internal compiler error: in tree_to_uhwi, at tree.c:6606 abort ();\ ^~~~ source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-1.c:55:3: note: in expansion of macro 'check_reduction_op' check_reduction_op (int, ^, 0, array[i], num_gangs (ng) num_workers (nw) ^~ (gdb) bt #0 fancy_abort (file=0x1659d70 "[...]/source-gcc/gcc/tree.c", line=6606, function=0x165e7f8 "tree_to_uhwi") at [...]/source-gcc/gcc/diagnostic.c:1487 #1 0x00edf053 in tree_to_uhwi (t=) at [...]/source-gcc/gcc/tree.c:6606 #2 0x0096ca63 in dump_gimple_call_args (buffer=0x7fffc6c0, gs=0x766ad210, flags=0) at [...]/source-gcc/gcc/gimple-pretty-print.c:777 #3 0x0096f7f0 in dump_gimple_call (buffer=0x7fffc6c0, gs=0x766ad210, spc=20, flags=0) at [...]/source-gcc/gcc/gimple-pretty-print.c:946 [...] This seems to be the "LEVEL" being "-1" -- probably meaning "not yet decided"? (Haven't looked that up.) > + { > + case GOMP_DIM_GANG: > + pp_string (buffer, " /*gang*/"); > + break; > + case GOMP_DIM_WORKER: > + pp_string (buffer, " /*worker*/"); > + break; > + case GOMP_DIM_VECTOR: > + pp_string (buffer, " /*vector*/"); > + break; > + default: > + gcc_unreachable (); > + } > + break; > + case 4: > + { > + enum tree_code rcode > + = (enum tree_code)tree_to_uhwi (gimple_call_arg (gs, i)); > + pp_string (buffer, " /*"); > + pp_string (buffer, op_symbol_code (rcode)); > + pp_string (buffer, "*/"); > + } > + break; I take it, there is no canned function for printing the textual representation of the tree_code "OP" ("case 4"). > + } > + } > } > >if (gimple_call_va_arg_pack_p (gs)) Grüße Thomas
[PATCH] For -Os change movabsq $(imm32 << shift), %rX[xip] to movl $imm2, %eX[xip]; shl $shift, %rX[xip] (PR target/82339)
Hi! Doing a movl + shlq by constant seems to be 1 byte shorter than movabsq, so this patch attempts to use the former form unless flags is live. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Performance-wise, not really sure what is a win (on i7-5960X on the testcase in the PR movl + shlq seems to be significantly faster, but e.g. on i7-2600 it is the same) so not doing anything for speed yet. 2017-09-27 Jakub JelinekPR target/82339 * config/i386/i386.md (*movdi_internal peephole2): New -Os peephole for movabsq $(i32 << shift), r64. --- gcc/config/i386/i386.md.jj 2017-09-21 09:26:42.0 +0200 +++ gcc/config/i386/i386.md 2017-09-27 10:24:01.520673889 +0200 @@ -2379,6 +2379,28 @@ (define_split gen_lowpart (SImode, operands[1])); }) +;; movabsq $0x001234567800, %rax is longer +;; than movl $0x12345678, %eax; shlq $24, %rax. +(define_peephole2 + [(set (match_operand:DI 0 "register_operand") + (match_operand:DI 1 "const_int_operand"))] + "TARGET_64BIT + && optimize_insn_for_size_p () + && LEGACY_INT_REG_P (operands[0]) + && !x86_64_immediate_operand (operands[1], DImode) + && !x86_64_zext_immediate_operand (operands[1], DImode) + && !((UINTVAL (operands[1]) >> ctz_hwi (UINTVAL (operands[1]))) +& ~(HOST_WIDE_INT) 0x) + && peep2_regno_dead_p (0, FLAGS_REG)" + [(set (match_dup 0) (match_dup 1)) + (parallel [(set (match_dup 0) (ashift:DI (match_dup 0) (match_dup 2))) + (clobber (reg:CC FLAGS_REG))])] +{ + int shift = ctz_hwi (UINTVAL (operands[1])); + operands[1] = gen_int_mode (UINTVAL (operands[1]) >> shift, DImode); + operands[2] = gen_int_mode (shift, QImode); +}) + (define_insn "*movsi_internal" [(set (match_operand:SI 0 "nonimmediate_operand" "=r,m ,*y,*y,?*y,?m,?r ,?*Ym,*v,*v,*v,m ,?r ,?*Yi,*k,*k ,*rm") Jakub
Re: [PATCH] Don't optimize away lhs from calls with addressable zero sized return type (PR c++/82159)
On Wed, 27 Sep 2017, Jakub Jelinek wrote: > Hi! > > The expansion relies on lhs being kept for calls that return addressable > types. On the following testcase (which is a GNU extension, pedantically > we error out on zero sized arrays) we return TREE_ADDRESSABLE > zero_sized_type and optimize away the lhs which we need later on. > > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, > during the bootstrap+regtests it only made a difference on the newly added > testcase. Ok for trunk? Ok. Thanks, Richard. > 2017-09-27 Jakub Jelinek> > PR c++/82159 > * gimplify.c (gimplify_modify_expr): Don't optimize away zero sized > lhs from calls if the lhs has addressable type. > > * g++.dg/opt/pr82159.C: New test. > > --- gcc/gimplify.c.jj 2017-09-01 09:25:34.0 +0200 > +++ gcc/gimplify.c2017-09-26 13:03:11.614726601 +0200 > @@ -5479,7 +5479,12 @@ gimplify_modify_expr (tree *expr_p, gimp > side as statements and throw away the assignment. Do this after > gimplify_modify_expr_rhs so we handle TARGET_EXPRs of addressable > types properly. */ > - if (zero_sized_type (TREE_TYPE (*from_p)) && !want_value) > + if (zero_sized_type (TREE_TYPE (*from_p)) > + && !want_value > + /* Don't do this for calls that return addressable types, expand_call > + relies on those having a lhs. */ > + && !(TREE_ADDRESSABLE (TREE_TYPE (*from_p)) > +&& TREE_CODE (*from_p) == CALL_EXPR)) > { >gimplify_stmt (from_p, pre_p); >gimplify_stmt (to_p, pre_p); > --- gcc/testsuite/g++.dg/opt/pr82159.C.jj 2017-09-26 13:04:08.711027279 > +0200 > +++ gcc/testsuite/g++.dg/opt/pr82159.C2017-09-26 14:20:01.519361945 > +0200 > @@ -0,0 +1,18 @@ > +// PR c++/82159 > +// { dg-do compile } > +// { dg-options "" } > + > +template > +struct S > +{ > + ~S () {} > + template S foo () { return S (); } > + unsigned char data[N]; > +}; > + > +int > +main () > +{ > + S<16> d; > + S<0> t = d.foo<0> (); > +} > > Jakub > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
[PATCH] Don't optimize away lhs from calls with addressable zero sized return type (PR c++/82159)
Hi! The expansion relies on lhs being kept for calls that return addressable types. On the following testcase (which is a GNU extension, pedantically we error out on zero sized arrays) we return TREE_ADDRESSABLE zero_sized_type and optimize away the lhs which we need later on. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, during the bootstrap+regtests it only made a difference on the newly added testcase. Ok for trunk? 2017-09-27 Jakub JelinekPR c++/82159 * gimplify.c (gimplify_modify_expr): Don't optimize away zero sized lhs from calls if the lhs has addressable type. * g++.dg/opt/pr82159.C: New test. --- gcc/gimplify.c.jj 2017-09-01 09:25:34.0 +0200 +++ gcc/gimplify.c 2017-09-26 13:03:11.614726601 +0200 @@ -5479,7 +5479,12 @@ gimplify_modify_expr (tree *expr_p, gimp side as statements and throw away the assignment. Do this after gimplify_modify_expr_rhs so we handle TARGET_EXPRs of addressable types properly. */ - if (zero_sized_type (TREE_TYPE (*from_p)) && !want_value) + if (zero_sized_type (TREE_TYPE (*from_p)) + && !want_value + /* Don't do this for calls that return addressable types, expand_call +relies on those having a lhs. */ + && !(TREE_ADDRESSABLE (TREE_TYPE (*from_p)) + && TREE_CODE (*from_p) == CALL_EXPR)) { gimplify_stmt (from_p, pre_p); gimplify_stmt (to_p, pre_p); --- gcc/testsuite/g++.dg/opt/pr82159.C.jj 2017-09-26 13:04:08.711027279 +0200 +++ gcc/testsuite/g++.dg/opt/pr82159.C 2017-09-26 14:20:01.519361945 +0200 @@ -0,0 +1,18 @@ +// PR c++/82159 +// { dg-do compile } +// { dg-options "" } + +template +struct S +{ + ~S () {} + template S foo () { return S (); } + unsigned char data[N]; +}; + +int +main () +{ + S<16> d; + S<0> t = d.foo<0> (); +} Jakub
Re: [PATCH][GRAPHITE] More TLC
On Wed, 27 Sep 2017, Richard Biener wrote: > On Tue, 26 Sep 2017, Sebastian Pop wrote: > > > On Mon, Sep 25, 2017 at 8:12 AM, Richard Bienerwrote: > > > > > On Fri, 22 Sep 2017, Sebastian Pop wrote: > > > > > > > On Fri, Sep 22, 2017 at 8:03 AM, Richard Biener > > > wrote: > > > > > > > > > > > > > > This simplifies canonicalize_loop_closed_ssa and does other minimal > > > > > TLC. It also adds a testcase I reduced from a stupid mistake I made > > > > > when reworking canonicalize_loop_closed_ssa. > > > > > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. > > > > > > > > > > SPEC CPU 2006 is happy with it, current statistics on x86_64 with > > > > > -Ofast -march=haswell -floop-nest-optimize are > > > > > > > > > > 61 loop nests "optimized" > > > > > 45 loop nest transforms cancelled because of code generation issues > > > > > 21 loop nest optimizations timed out the 35 ISL "operations" we > > > allow > > > > > > > > > > I say "optimized" because the usual transform I've seen is static > > > tiling > > > > > as enforced by GRAPHITE according to --param loop-block-tile-size. > > > > > There's no way to automagically figure what kind of transform ISL did > > > > > > > > > > > > > Here is how to automate (without magic) the detection > > > > of the transform that isl did. > > > > > > > > The problem solved by isl is the minimization of strides > > > > in memory, and to do this, we need to tell the isl scheduler > > > > the validity dependence graph, in graphite-optimize-isl.c > > > > see the validity (RAW, WAR, WAW) and the proximity > > > > (RAR + validity) maps. The proximity does include the > > > > read after read, as the isl scheduler needs to minimize > > > > strides between consecutive reads. > > Ah, so I now see why we do not perform interchange on trivial cases like > > double A[1024][1024], B[1024][1024]; > > void foo(void) > { > for (int i = 0; i < 1024; ++i) > for (int j = 0; j < 1024; ++j) > A[j][i] = B[j][i]; > } > > which is probably because > > /* FIXME: proximity should not be validity. */ > isl_union_map *proximity = isl_union_map_copy (validity); > > falls apart when there is _no_ dependence? > > I can trick GRAPHITE into performing the interchange for > > double A[1024][1024], B[1024][1024]; > > void foo(void) > { > for (int i = 1; i < 1023; ++i) > for (int j = 0; j < 1024; ++j) > A[j][i] = B[j][i-1] + A[j][i+1]; > } > > because now there is a dependence. Any idea on how to rewrite > scop_get_dependences to avoid "simplifying"? I suppose the > validity constraints _do_ also specify kind-of a proximity > we just may not prune / optimize them in the same way as > dependences? Another thing I notice is that we don't handle the multi-dimensional accesses the fortran frontend produces: (gdb) p debug_data_reference (dr) #(Data Ref: # bb: 18 # stmt: _43 = *a_141(D)[_42]; # ref: *a_141(D)[_42]; # base_object: *a_141(D); # Access function 0: {{(_38 + stride.88_115) + 1, +, 1}_4, +, stride.88_115}_5 ultimatively we fail here because we try to build a constraint for {{(_38 + stride.88_115) + 1, +, 1}_4, +, stride.88_115}_5 which ends up computing isl_pw_aff_mul (A, stride.88_115) with A being the non-constant constraint generated for {(_38 + stride.88_115) + 1, +, 1}_4 and stride.88_115 being a parameter. ISL doesn't like that multiplication as the result isn't affine (well - it is, we just have parameters in there). I suppose ISL doesn't handle this form of accesses given the two "dimensions" in this scalarized form may overlap? So we'd really need to turn those into references with different access functions (even if that's not 100% a valid semantic transformation as scalarization isn't reversible without extra information)? Thanks, Richard.
[openacc, testsuite, committed] Fix libgomp.oacc-c-c++-common/parallel-reduction.c for non-nvidia devices
Hi, this patch makes the test-case libgomp.oacc-c-c++-common/parallel-reduction.c work for non-nvidia devices. Committed as obvious. Thanks, - Tom Fix libgomp.oacc-c-c++-common/parallel-reduction.c for non-nvidia devices 2017-09-27 Tom de Vries* testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c (main): Remove acc_device_nvidia references. --- libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c index b2c60e5..077571f 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c @@ -21,7 +21,7 @@ main () } } - if (acc_get_device_type () != acc_device_nvidia) + if (acc_get_device_type () == acc_device_host) { if (s1 != 1) abort (); @@ -41,7 +41,7 @@ main () s2 += N; } - if (acc_get_device_type () != acc_device_nvidia) + if (acc_get_device_type () == acc_device_host) { if (s1 != 1) abort ();
RE: 0002-Part-2.-Document-finstrument-control-flow-and-notrack attribute
Updated version #4. > -Original Message- > From: Sandra Loosemore [mailto:san...@codesourcery.com] > Sent: Wednesday, September 27, 2017 5:11 AM > To: Tsimbalist, Igor V; 'gcc- > patc...@gcc.gnu.org' > Cc: Jeff Law > Subject: Re: 0002-Part-2.-Document-finstrument-control-flow-and-notrack > attribute > > On 09/26/2017 07:45 AM, Tsimbalist, Igor V wrote: > > Here is the updated version (version#3). All comments below are fixed. > > This still needs more work. Specific comments below: > > > +The @code{nocf_check} attribute is applied to an object's type. > > +In case of assignment of a function address or a function pointer to > > +another pointer, the attribute is not carried over from the > > +right-hand object's type, the type of left-hand object stays > > +unchanged. The > > s/object's type,/object's type;/ Fixed. > > @@ -11348,6 +11349,35 @@ is used to link a program, the GCC driver > > automatically links against @file{libmpxwrappers}. See also @option{- > static-libmpxwrappers}. > > Enabled by default. > > > > +@item -fcf- > protection==@r{[}full@r{|}branch@r{|}return@r{|}none@r{]} > > +@opindex fcf-protection > > +Enable code instrumentation of control-flow transfers to increase > > +program security by checking that target addresses of control-flow > > +transfer instructions (such as indirect function call, function > > +return, indirect jump) are valid. This prevents diverting the > > +control flow instructions from its original target address to a new > > +undesigned > > s/control flow instructions/control-flow instructions/ > > I'd rewrite the next sentence as > > This prevents diverting the flow of control to an unexpected target. I used your suggestion. > > +target. This is intended to protect against such threats as > > +Return-oriented Programming (ROP), and similarly call/jmp-oriented > > +programming (COP/JOP). > > + > > +Each compiler target, which is going to support the control-flow > > +instrumentation, is supposed to have its own target specific > > +implementation. For all targets where an implementation is absent the > > +usage of @option{-fcf-protection} option causes an error message. > > I would really prefer that you list the targets this works on here instead. Another patch you are reviewing now (its name starts with 0005-Part-5) has the statement you would like to put here. The important point here is an error issuing. When I commit the first patch none of target platforms supports the option and an error is printed when the option is specified. I removed the first sentence but keep the second one: For all targets, which do not support the @option{-fcf-protection} option, the option usage results in an error message. > > +The value @code{branch} tells the compiler to implement checking of > > +validity of control-flow transfer at the point of indirect branch > > +instructions, i.e. call/jmp instructions. The value @code{return} > > +implements checking of validity at the point of returning from a > > +function. The value @code{full} is an alias for specifying both > > +@code{branch} and @code{return}. The value @code{none} turns off > > +instrumentation. This value may be used for future architectures > > +where @option{-fcf-protection} option is switched on by default. > > I don't think we need to document GCC's future behavior for future > architectures (I'm always going around removing useless discussion from > 20 years ago of possible extensions that never got implemented). I assume > that this is just provided for completeness and to override a previous -fcf- > protection option on the command line. Ok, removed the last sentence. > > +You can also use the @code{nocf_check} attribute to identify which > > +functions and calls should be skipped from instrumentation > > +(@pxref{Function Attributes}). > > + > > @item -fstack-protector > > @opindex fstack-protector > > Emit extra code to check for buffer overflows, such as stack smashing > > diff --git a/gcc/doc/rtl.texi b/gcc/doc/rtl.texi index > > 12355c2..b4fc5f3 100644 > > --- a/gcc/doc/rtl.texi > > +++ b/gcc/doc/rtl.texi > > @@ -4040,6 +4040,22 @@ is used in place of the actual insn pattern. > > This is done in cases where the pattern is either complex or misleading. > > @end table > > > > +The note @code{REG_CALL_NOCF_CHECK} is used in conjunction with > the > > +@option{-fcf-protection=branch} option. The note is set if a > > +@code{nocf_check} attribute is specified for a function type or a > > +pointer to function type. The note is stored in the @code{REG_NOTES} > > +field of an insn. > > + > > +@table @code > > +@findex REG_CALL_NOCF_CHECK > > +@item REG_CALL_NOCF_CHECK > > +A user has a control through the @code{nocf_check} attribute to > > +identify > > S/A user has a control/Users have control/ Fixed. > > +which call to a function should be skipped from control-flow > > +instrumentation > >
[PATCH][GRAPHITE] Make --param loop-block-tile-size=0 disable tiling
Currently ISL aborts on this special value and for debugging (and tuning?) it's nice to avoid all the clutter introduced by tiling. Committed as obvious. Richard. 2017-09-27 Richard Biener* graphite-optimize-isl.c (get_schedule_for_node_st): Allow --param loop-block-tile-size=0 to disable tiling. Index: gcc/graphite-optimize-isl.c === --- gcc/graphite-optimize-isl.c (revision 253226) +++ gcc/graphite-optimize-isl.c (working copy) @@ -64,7 +64,10 @@ get_schedule_for_node_st (__isl_take isl if (type != isl_schedule_node_leaf) return node; - if (dims <= 1 || !isl_schedule_node_band_get_permutable (node)) + long tile_size = PARAM_VALUE (PARAM_LOOP_BLOCK_TILE_SIZE); + if (dims <= 1 + || tile_size == 0 + || !isl_schedule_node_band_get_permutable (node)) { if (dump_file && dump_flags) fprintf (dump_file, "not tiled\n"); @@ -74,7 +77,6 @@ get_schedule_for_node_st (__isl_take isl /* Tile loops. */ space = isl_schedule_node_band_get_space (node); isl_multi_val *sizes = isl_multi_val_zero (space); - long tile_size = PARAM_VALUE (PARAM_LOOP_BLOCK_TILE_SIZE); isl_ctx *ctx = isl_schedule_node_get_ctx (node); for (unsigned i = 0; i < dims; i++)
Re: [PATCH][GRAPHITE] More TLC
On Tue, 26 Sep 2017, Sebastian Pop wrote: > On Mon, Sep 25, 2017 at 8:12 AM, Richard Bienerwrote: > > > On Fri, 22 Sep 2017, Sebastian Pop wrote: > > > > > On Fri, Sep 22, 2017 at 8:03 AM, Richard Biener > > wrote: > > > > > > > > > > > This simplifies canonicalize_loop_closed_ssa and does other minimal > > > > TLC. It also adds a testcase I reduced from a stupid mistake I made > > > > when reworking canonicalize_loop_closed_ssa. > > > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. > > > > > > > > SPEC CPU 2006 is happy with it, current statistics on x86_64 with > > > > -Ofast -march=haswell -floop-nest-optimize are > > > > > > > > 61 loop nests "optimized" > > > > 45 loop nest transforms cancelled because of code generation issues > > > > 21 loop nest optimizations timed out the 35 ISL "operations" we > > allow > > > > > > > > I say "optimized" because the usual transform I've seen is static > > tiling > > > > as enforced by GRAPHITE according to --param loop-block-tile-size. > > > > There's no way to automagically figure what kind of transform ISL did > > > > > > > > > > Here is how to automate (without magic) the detection > > > of the transform that isl did. > > > > > > The problem solved by isl is the minimization of strides > > > in memory, and to do this, we need to tell the isl scheduler > > > the validity dependence graph, in graphite-optimize-isl.c > > > see the validity (RAW, WAR, WAW) and the proximity > > > (RAR + validity) maps. The proximity does include the > > > read after read, as the isl scheduler needs to minimize > > > strides between consecutive reads. Ah, so I now see why we do not perform interchange on trivial cases like double A[1024][1024], B[1024][1024]; void foo(void) { for (int i = 0; i < 1024; ++i) for (int j = 0; j < 1024; ++j) A[j][i] = B[j][i]; } which is probably because /* FIXME: proximity should not be validity. */ isl_union_map *proximity = isl_union_map_copy (validity); falls apart when there is _no_ dependence? I can trick GRAPHITE into performing the interchange for double A[1024][1024], B[1024][1024]; void foo(void) { for (int i = 1; i < 1023; ++i) for (int j = 0; j < 1024; ++j) A[j][i] = B[j][i-1] + A[j][i+1]; } because now there is a dependence. Any idea on how to rewrite scop_get_dependences to avoid "simplifying"? I suppose the validity constraints _do_ also specify kind-of a proximity we just may not prune / optimize them in the same way as dependences? Richard.
RE: 0005-Part-5.-Add-x86-CET-documentation
> -Original Message- > From: Florian Weimer [mailto:fwei...@redhat.com] > Sent: Wednesday, September 27, 2017 10:52 AM > To: Sandra Loosemore; Tsimbalist, Igor V > ; Uros Bizjak > Cc: gcc-patches@gcc.gnu.org > Subject: Re: 0005-Part-5.-Add-x86-CET-documentation > > On 09/27/2017 05:40 AM, Sandra Loosemore wrote: > >> > >> +@emph{x86 implementation:} when @option{-fcf-protection} option is > >> +specified the compiler inserts an ENDBR instruction at function's > >> +prologue if the function's type does not have the @code{nocf_check} > >> +attribute and addresses to which indirect control-flow transfer can > >> +happen. The instruction triggers the HW check if a control-flow > >> +transfer to the address of ENDBR instruction is valid. > > > > Implementation details like this should be comments in the code, not > > included in the user-facing documentation. > > This is part of the ABI GCC implements, so it has to be documented > somewhere, and not just as part of the GCC source code. A question for both Sandra and Florian - What is your suggestion where the text should go? > CET is not properly described in the ABI supplement and I don't think this > will > change, so detailed documentation in the GCC manual is very much > desirable. > > That being said, the implementation notes above need some clarification. > It's not clear to me what the conditions are under which the ENDBR > instruction is emitted (and we probably should use @code{endbr} in the > manual), what it is trying to achieve, and how the x86 calling convention > changes. I assume it is somehow related to what we call internally “the > suffix We are diving into implementation details but it's simple enough. - endbr is generated for every function, which does not have nocf_check attribute. Optimization can be done later to exclude functions, whose address was not taken. - there is no change in calling convention Thanks, Igor > problem”: without control flow integrity, an attacker might skip over > precondition/hardening checks, directly to the critical changes we want to > protect, executing only the suffix of a function (hence the name). > > Thanks, > Florian
[PATCH][GRAPHITE] Allow --param graphite-max-arrays-per-scop=0
The following is to allow making --param graphite-max-arrays-per-scop unbounded. That's a little tricky because the bound is used when computing "alias-sets" for scalar constraints. There's an easy way out though as we know the maximum alias-set assigned in the SCOP, we only have to remember it. The advantage (if it matters at all) is that we avoid a constraint coefficient gap between that last used alias-set and the former PARAM_GRAPHITE_MAX_ARRAYS_PER_SCOP. Bootstrap and regtest running on x86_64-unknown-linux-gnu, SPEC CPU 2006 tested. Will apply after testing finished. Richard. 2017-09-27 Richard Biener* graphite.h (scop::max_alias_set): New member. * graphite-scop-detection.c: Remove references to non-existing --param in comments. (build_alias_sets): Record the maximum alias set used for drs. (build_scops): Support zero as unlimited for --param graphite-max-arrays-per-scop. * graphite-sese-to-poly.c (add_scalar_version_numbers): Remove and inline into ... (build_poly_sr_1): ... here. Compute alias set based on the maximum alias set used for drs rather than PARAM_GRAPHITE_MAX_ARRAYS_PER_SCOP Index: gcc/graphite-scop-detection.c === --- gcc/graphite-scop-detection.c (revision 253226) +++ gcc/graphite-scop-detection.c (working copy) @@ -389,10 +389,7 @@ public: void remove_intersecting_scops (sese_l s1); - /* Return true when a statement in SCOP cannot be represented by Graphite. - The assumptions are that L1 dominates L2, and SCOP->entry dominates L1. - Limit the number of bbs between adjacent loops to - PARAM_SCOP_MAX_NUM_BBS_BETWEEN_LOOPS. */ + /* Return true when a statement in SCOP cannot be represented by Graphite. */ bool harmful_loop_in_region (sese_l scop) const; @@ -760,10 +757,7 @@ scop_detection::add_scop (sese_l s) DEBUG_PRINT (dp << "[scop-detection] Adding SCoP: "; print_sese (dump_file, s)); } -/* Return true when a statement in SCOP cannot be represented by Graphite. - The assumptions are that L1 dominates L2, and SCOP->entry dominates L1. - Limit the number of bbs between adjacent loops to - PARAM_SCOP_MAX_NUM_BBS_BETWEEN_LOOPS. */ +/* Return true when a statement in SCOP cannot be represented by Graphite. */ bool scop_detection::harmful_loop_in_region (sese_l scop) const @@ -1531,7 +1525,8 @@ build_alias_set (scop_p scop) for (i = 0; i < num_vertices; i++) all_vertices[i] = i; - graphds_dfs (g, all_vertices, num_vertices, NULL, true, NULL); + scop->max_alias_set += graphds_dfs (g, all_vertices, num_vertices, NULL, true, NULL) + 1; free (all_vertices); for (i = 0; i < g->n_vertices; i++) @@ -1755,7 +1750,8 @@ build_scops (vec *scops) } unsigned max_arrays = PARAM_VALUE (PARAM_GRAPHITE_MAX_ARRAYS_PER_SCOP); - if (scop->drs.length () >= max_arrays) + if (max_arrays > 0 + && scop->drs.length () >= max_arrays) { DEBUG_PRINT (dp << "[scop-detection-fail] too many data references: " << scop->drs.length () Index: gcc/graphite-sese-to-poly.c === --- gcc/graphite-sese-to-poly.c (revision 253225) +++ gcc/graphite-sese-to-poly.c (working copy) @@ -491,25 +491,6 @@ pdr_add_alias_set (isl_map *acc, dr_info return isl_map_add_constraint (acc, c); } -/* Add a constrain to the ACCESSES polyhedron for the alias set of - data reference DR. ACCESSP_NB_DIMS is the dimension of the - ACCESSES polyhedron, DOM_NB_DIMS is the dimension of the iteration - domain. */ - -static isl_map * -add_scalar_version_numbers (isl_map *acc, tree var) -{ - isl_constraint *c = isl_equality_alloc - (isl_local_space_from_space (isl_map_get_space (acc))); - int max_arrays = PARAM_VALUE (PARAM_GRAPHITE_MAX_ARRAYS_PER_SCOP); - /* Each scalar variables has a unique alias set number starting from - max_arrays. */ - c = isl_constraint_set_constant_si (c, -max_arrays - SSA_NAME_VERSION (var)); - c = isl_constraint_set_coefficient_si (c, isl_dim_out, 0, 1); - - return isl_map_add_constraint (acc, c); -} - /* Assign the affine expression INDEX to the output dimension POS of MAP and return the result. */ @@ -684,13 +665,21 @@ static void build_poly_sr_1 (poly_bb_p pbb, gimple *stmt, tree var, enum poly_dr_type kind, isl_map *acc, isl_set *subscript_sizes) { - int max_arrays = PARAM_VALUE (PARAM_GRAPHITE_MAX_ARRAYS_PER_SCOP); + scop_p scop = PBB_SCOP (pbb); /* Each scalar variables has a unique alias set number starting from - max_arrays. */ + the maximum alias set assigned to a dr. */ + int alias_set = scop->max_alias_set + SSA_NAME_VERSION (var); subscript_sizes = isl_set_fix_si (subscript_sizes, isl_dim_set, 0, -
[PATCH][GRAPHITE] Remove another small quadraticness
Turns out loop_nest recorded in scop-info isn't really necessary as we can simply process parameters in loop bounds during the gather_bbs walk where we encounter each loop (identified by its header) once. This avoids the linear search in record_loop_in_sese. Bootstrap / regtest running on x86_64-unknown-linux-gnu, will apply. Richard. 2017-09-27 Richard Biener* graphite-scop-detection.c (find_scop_parameters): Move loop bound handling ... (gather_bbs::before_dom_children): ... here, avoiding the need to build scop_info->loop_nest. (record_loop_in_sese): Remove. * sese.h (sese_info_t::loop_nest): Remove. * sese.c (new_sese_info): Do not allocate loop_nest. (free_sese_info): Do not free loop_nest. Index: gcc/graphite-scop-detection.c === --- gcc/graphite-scop-detection.c (revision 253226) +++ gcc/graphite-scop-detection.c (working copy) @@ -1330,7 +1324,7 @@ find_params_in_bb (sese_info_p region, g } } -/* Record the parameters used in the SCOP. A variable is a parameter +/* Record the parameters used in the SCOP BBs. A variable is a parameter in a scop if it does not vary during the execution of that scop. */ static void @@ -1338,19 +1332,8 @@ find_scop_parameters (scop_p scop) { unsigned i; sese_info_p region = scop->scop_info; - struct loop *loop; - /* Find the parameters used in the loop bounds. */ - FOR_EACH_VEC_ELT (region->loop_nest, i, loop) -{ - tree nb_iters = number_of_latch_executions (loop); - - if (!chrec_contains_symbols (nb_iters)) - continue; - - nb_iters = scalar_evolution_in_region (region->region, loop, nb_iters); - scan_tree_for_params (region, nb_iters); -} + /* Parameters used in loop bounds are processed during gather_bbs. */ /* Find the parameters used in data accesses. */ poly_bb_p pbb; @@ -1560,28 +1544,6 @@ gather_bbs::gather_bbs (cdi_direction di { } -/* Record in execution order the loops fully contained in the region. */ - -static void -record_loop_in_sese (basic_block bb, sese_info_p region) -{ - loop_p father = bb->loop_father; - if (loop_in_sese_p (father, region->region)) -{ - bool found = false; - loop_p loop0; - int j; - FOR_EACH_VEC_ELT (region->loop_nest, j, loop0) - if (father == loop0) - { - found = true; - break; - } - if (!found) - region->loop_nest.safe_push (father); -} -} - /* Call-back for dom_walk executed before visiting the dominated blocks. */ @@ -1592,7 +1554,20 @@ gather_bbs::before_dom_children (basic_b if (!bb_in_sese_p (bb, region->region)) return dom_walker::STOP; - record_loop_in_sese (bb, region); + /* For loops fully contained in the region record parameters in the + loop bounds. */ + loop_p loop = bb->loop_father; + if (loop->header == bb + && loop_in_sese_p (loop, region->region)) +{ + tree nb_iters = number_of_latch_executions (loop); + if (chrec_contains_symbols (nb_iters)) + { + nb_iters = scalar_evolution_in_region (region->region, +loop, nb_iters); + scan_tree_for_params (region, nb_iters); + } +} gcond *stmt = single_pred_cond_non_loop_exit (bb); Index: gcc/sese.c === --- gcc/sese.c (revision 253226) +++ gcc/sese.c (working copy) @@ -179,7 +179,6 @@ new_sese_info (edge entry, edge exit) region->region.entry = entry; region->region.exit = exit; - region->loop_nest.create (3); region->params.create (3); region->rename_map = new rename_map_t; region->parameter_rename_map = new parameter_rename_map_t; @@ -197,7 +196,6 @@ void free_sese_info (sese_info_p region) { region->params.release (); - region->loop_nest.release (); for (rename_map_t::iterator it = region->rename_map->begin (); it != region->rename_map->end (); ++it) Index: gcc/sese.h === --- gcc/sese.h (revision 253226) +++ gcc/sese.h (working copy) @@ -94,9 +94,6 @@ typedef struct sese_info_t /* Parameters to be renamed. */ parameter_rename_map_t *parameter_rename_map; - /* Loops completely contained in this SESE. */ - vec loop_nest; - /* Basic blocks contained in this SESE. */ vec bbs;
[PATCH][GRAPHITE] Speedup SCOP detection some more, add region handling to domwalk
This removes another quadraticness from SCOP detection, gather_bbs domwalk. This is done by enhancing domwalk to handle SEME regions via a special return value from before_dom_children. With this I'm now confident to remove the PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION parameter and its associated limit. Being there I've adjusted PARAM_GRAPHITE_MAX_NB_SCOP_PARAMS to its documented default value which enables 90 more loos to be processed in SPEC CPU 2006. I've also made a value of zero magic in disabling the limit (a trick commonly used in GCC). Statistics I have gathered a few patches before for SPEC CPU 2006: 1255 multi-loop SESEs in SCOP processing max. params 34, 3 scops >= 20, 15 scops >= 10, 33 scops >= 8 max. drs per scop 869, 10 scops >= 100 max. pbbs per scop 36, 12 scops >= 10 919 SCOPs fail in build_alias_sets which shows the default for PARAM_GRAPHITE_MAX_ARRAYS_PER_SCOP is reasonable (if tuned to SPEC CPU 2006). I've also included the hunk that allows -fgraphite-identity to work ontop of -floop-nest-optimize and for -floop-nest-optimize -ftree-parallelize-all also make sure to code-gen loops that end up not transformed. Bootstrapped and tested on x86_64-unknown-linux-gnu, SPEC CPU 2006 tested, applied to trunk. Richard. 2017-09-27 Richard Biener* doc/invoke.texi (graphite-max-bbs-per-function): Remove. (graphite-max-nb-scop-params): Document special value zero. * domwalk.h (dom_walker::STOP): New symbolical constant. (dom_walker::dom_walker): Add optional parameter for bb to RPO mapping. (dom_walker::~dom_walker): Declare. (dom_walker::before_dom_children): Document STOP return value. (dom_walker::m_user_bb_to_rpo): New member. (dom_walker::m_bb_to_rpo): Likewise. * domwalk.c (dom_walker::dom_walker): Compute bb to RPO mapping here if not provided by the user. (dom_walker::~dom_walker): Free bb to RPO mapping if not provided by the user. (dom_walker::STOP): Define. (dom_walker::walk): Do not compute bb to RPO mapping here. Support STOP return value from before_dom_children to stop walking. * graphite-optimize-isl.c (optimize_isl): If the schedule is the same still generate code if -fgraphite-identity or -floop-parallelize-all are given. * graphite-scop-detection.c: Include cfganal.h. (gather_bbs::gather_bbs): Get and pass through bb to RPO mapping. (gather_bbs::before_dom_children): Return STOP for BBs not in the region. (build_scops): Compute bb to RPO mapping and pass it to the domwalk. Treat --param graphite-max-nb-scop-params=0 as not limiting the number of params. * graphite.c (graphite_initialize): Remove limit on the number of basic-blocks in a function. * params.def (PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION): Remove. (PARAM_GRAPHITE_MAX_NB_SCOP_PARAMS): Adjust to documented default value of 10. Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 253224) +++ gcc/doc/invoke.texi (working copy) @@ -10512,13 +10512,9 @@ sequence pairs. This option only applie @item graphite-max-nb-scop-params To avoid exponential effects in the Graphite loop transforms, the number of parameters in a Static Control Part (SCoP) is bounded. The -default value is 10 parameters. A variable whose value is unknown at -compilation time and defined outside a SCoP is a parameter of the SCoP. - -@item graphite-max-bbs-per-function -To avoid exponential effects in the detection of SCoPs, the size of -the functions analyzed by Graphite is bounded. The default value is -100 basic blocks. +default value is 10 parameters, a value of zero can be used to lift +the bound. A variable whose value is unknown at compilation time and +defined outside a SCoP is a parameter of the SCoP. @item loop-block-tile-size Loop blocking or strip mining transforms, enabled with Index: gcc/domwalk.c === --- gcc/domwalk.c (revision 253224) +++ gcc/domwalk.c (working copy) @@ -174,13 +174,29 @@ sort_bbs_postorder (basic_block *bbs, in If SKIP_UNREACHBLE_BLOCKS is true, then we need to set EDGE_EXECUTABLE on every edge in the CFG. */ dom_walker::dom_walker (cdi_direction direction, - bool skip_unreachable_blocks) + bool skip_unreachable_blocks, + int *bb_index_to_rpo) : m_dom_direction (direction), m_skip_unreachable_blocks (skip_unreachable_blocks), -m_unreachable_dom (NULL) +m_user_bb_to_rpo (bb_index_to_rpo != NULL), +m_unreachable_dom (NULL), +m_bb_to_rpo (bb_index_to_rpo) { + /* Compute the basic-block index to RPO mapping if not provided by + the user. */ + if (! m_bb_to_rpo && direction
Re: [AArch64] PR71307: Define union class of POINTER+FP
On 18/09/17 17:39, Richard Sandiford wrote: > ALL_REGS doesn't function as a union class of POINTER_REGS and FP_REGS > since it includes the CC register as well. REGNO_REG_CLASS (CC_REGNUM) > is NO_REGS, but of course NO_REGS rightly doesn't include CC_REGNUM. > > Adding a union class for POINTER+FP allows the RA to use it as the > preferred or alternative class of a pseudo. It also works as a > union class of GENERAL+FP for modes that aren't allowed in SP. > > This is also needed for the SVE port, which adds predicate registers > to the mix. > > The combination of r252033 and this patch fixes PR71307. Tested on > aarch64-linux-gnu. Also tested on SPEC2k6, where there were no > differences outside the (mostly low) noise. OK to install? > > The main potential disadvantage I can see is that the -fsched-pressure > code isn't very good at handling union classes: it generally just updates > one pressure class for each pseudo. I haven't found any specific examples > of that causing problems though. > > Thanks, > Richard > > > 2017-09-15 Richard Sandiford> Alan Hayward > David Sherwood > > gcc/ > PR target/71307 > * config/aarch64/aarch64.h (POINTER_AND_FP_REGS): New reg class. > (REG_CLASS_NAMES, REG_CLASS_CONTENTS): Update accordingly. > * config/aarch64/aarch64.c (aarch64_class_max_nregs): Handle > POINTER_AND_FP_REGS. > > gcc/testsuite/ > PR target/71307 > * gcc.target/aarch64/vect_copy_lane_1.c: Remove XFAIL. OK. R. > > Index: gcc/config/aarch64/aarch64.h > === > --- gcc/config/aarch64/aarch64.h 2017-09-15 14:47:33.167333414 +0100 > +++ gcc/config/aarch64/aarch64.h 2017-09-18 17:31:34.720209011 +0100 > @@ -452,6 +452,7 @@ enum reg_class >POINTER_REGS, >FP_LO_REGS, >FP_REGS, > + POINTER_AND_FP_REGS, >ALL_REGS, >LIM_REG_CLASSES/* Last */ > }; > @@ -467,6 +468,7 @@ #define REG_CLASS_NAMES \ >"POINTER_REGS",\ >"FP_LO_REGS", \ >"FP_REGS", \ > + "POINTER_AND_FP_REGS", \ >"ALL_REGS" \ > } > > @@ -479,6 +481,7 @@ #define REG_CLASS_CONTENTS > \ >{ 0x, 0x, 0x0003 },/* POINTER_REGS */ \ >{ 0x, 0x, 0x }, /* FP_LO_REGS */\ >{ 0x, 0x, 0x }, /* FP_REGS */ > \ > + { 0x, 0x, 0x0003 },/* POINTER_AND_FP_REGS */\ >{ 0x, 0x, 0x0007 } /* ALL_REGS */ \ > } > > Index: gcc/config/aarch64/aarch64.c > === > --- gcc/config/aarch64/aarch64.c 2017-09-18 14:58:24.012256423 +0100 > +++ gcc/config/aarch64/aarch64.c 2017-09-18 17:31:34.720209011 +0100 > @@ -6009,6 +6009,7 @@ aarch64_class_max_nregs (reg_class_t reg > case POINTER_REGS: > case GENERAL_REGS: > case ALL_REGS: > +case POINTER_AND_FP_REGS: > case FP_REGS: > case FP_LO_REGS: >return > Index: gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c > === > --- gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c 2016-11-22 > 21:16:00.0 + > +++ gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c 2017-09-18 > 17:31:34.720209011 +0100 > @@ -45,8 +45,7 @@ BUILD_TEST (uint32x2_t, uint32x4_t, , > BUILD_TEST (float64x1_t, float64x2_t, , q, f64, 0, 1) > BUILD_TEST (int64x1_t, int64x2_t,, q, s64, 0, 1) > BUILD_TEST (uint64x1_t, uint64x2_t, , q, u64, 0, 1) > -/* XFAIL due to PR 71307. */ > -/* { dg-final { scan-assembler-times "dup\\td0, v1.d\\\[1\\\]" 3 { xfail > *-*-* } } } */ > +/* { dg-final { scan-assembler-times "dup\\td0, v1.d\\\[1\\\]" 3 } } */ > > /* vcopyq_lane. */ > BUILD_TEST (poly8x16_t, poly8x8_t, q, , p8, 15, 7) >
Re: [PATCH] [testsuite, ARM] Backport to GCC 7 branch
On 21/09/17 09:32, Christophe Lyon wrote: > Hi, > > Can I backport my patch r249639 (Add -mfloat-abi=hard to arm_neon_ok) > to the gcc-7 branch ? > It fixes a few false failures. > > It applies cleanly to current trunk, and we have had it in our > linaro-7-branch for a while. > OK. R. > Thanks. > > Christophe > > > backport-r249639.chlog.txt > > > gcc/ > Backport from trunk r249639. > 2017-06-26 Christophe Lyon> > * doc/sourcebuild.texi (ARM-specific attributes): Document new > arm_neon_ok_no_float_abi effective target. > > gcc/testsuite/ > Backport from trunk r249639. > 2017-06-26 Christophe Lyon > > * lib/target-supports.exp > (check_effective_target_arm_neon_ok_nocache): Add flags with > -mfloat-abi=hard. Include arm_neon.h. > (check_effective_target_arm_neon_ok_no_float_abi_nocache): New. > (check_effective_target_arm_neon_ok_no_float_abi): New. > * gcc.target/arm/lto/pr65837_0.c: Require > arm_neon_ok_no_float_abi. Add -mfpu=neon to dg-lto-options. > * gcc.target/arm/lto/pr65837-attr_0.c: Require > arm_neon_ok_no_float_abi. Remove dg-suppress-ld-options. > > > backport-r249639.patch.txt > > > diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi > index 84d9a22..c7bb4b7 100644 > --- a/gcc/doc/sourcebuild.texi > +++ b/gcc/doc/sourcebuild.texi > @@ -1570,6 +1570,12 @@ > ARM Target supports @code{-mfpu=neon -mfloat-abi=softfp} or compatible > options. Some multilibs may be incompatible with these options. > > +@item arm_neon_ok_no_float_abi > +@anchor{arm_neon_ok_no_float_abi} > +ARM Target supports NEON with @code{-mfpu=neon}, but without any > +-mfloat-abi= option. Some multilibs may be incompatible with this > +option. > + > @item arm_neonv2_ok > @anchor{arm_neonv2_ok} > ARM Target supports @code{-mfpu=neon-vfpv4 -mfloat-abi=softfp} or compatible > diff --git a/gcc/testsuite/gcc.target/arm/lto/pr65837-attr_0.c > b/gcc/testsuite/gcc.target/arm/lto/pr65837-attr_0.c > index ebc5f44..f00480b 100644 > --- a/gcc/testsuite/gcc.target/arm/lto/pr65837-attr_0.c > +++ b/gcc/testsuite/gcc.target/arm/lto/pr65837-attr_0.c > @@ -1,6 +1,7 @@ > /* { dg-lto-do run } */ > /* { dg-require-effective-target arm_neon_hw } */ > -/* { dg-lto-options {{-flto}} } */ > +/* { dg-require-effective-target arm_neon_ok_no_float_abi } */ > +/* { dg-lto-options {{-flto -mfpu=neon}} } */ > > #include "arm_neon.h" > > diff --git a/gcc/testsuite/gcc.target/arm/lto/pr65837_0.c > b/gcc/testsuite/gcc.target/arm/lto/pr65837_0.c > index 6b2def9..5d7cea7 100644 > --- a/gcc/testsuite/gcc.target/arm/lto/pr65837_0.c > +++ b/gcc/testsuite/gcc.target/arm/lto/pr65837_0.c > @@ -1,7 +1,7 @@ > /* { dg-lto-do run } */ > /* { dg-require-effective-target arm_neon_hw } */ > +/* { dg-require-effective-target arm_neon_ok_no_float_abi } */ > /* { dg-lto-options {{-flto -mfpu=neon}} } */ > -/* { dg-suppress-ld-options {-mfpu=neon} } */ > > #include "arm_neon.h" > > diff --git a/gcc/testsuite/lib/target-supports.exp > b/gcc/testsuite/lib/target-supports.exp > index 57caec7..d20e7d3 100644 > --- a/gcc/testsuite/lib/target-supports.exp > +++ b/gcc/testsuite/lib/target-supports.exp > @@ -3428,8 +3428,9 @@ > global et_arm_neon_flags > set et_arm_neon_flags "" > if { [check_effective_target_arm32] } { > - foreach flags {"" "-mfloat-abi=softfp" "-mfpu=neon" "-mfpu=neon > -mfloat-abi=softfp" "-mfpu=neon -mfloat-abi=softfp -march=armv7-a"} { > + foreach flags {"" "-mfloat-abi=softfp" "-mfpu=neon" "-mfpu=neon > -mfloat-abi=softfp" "-mfpu=neon -mfloat-abi=softfp -march=armv7-a" > "-mfloat-abi=hard" "-mfpu=neon -mfloat-abi=hard" "-mfpu=neon -mfloat-abi=hard > -march=armv7-a"} { > if { [check_no_compiler_messages_nocache arm_neon_ok object { > + #include > int dummy; > #ifndef __ARM_NEON__ > #error not NEON > @@ -3454,6 +3455,38 @@ > check_effective_target_arm_neon_ok_nocache] > } > > +# Return 1 if this is an ARM target supporting -mfpu=neon without any > +# -mfloat-abi= option. Useful in tests where add_options is not > +# supported (such as lto tests). > + > +proc check_effective_target_arm_neon_ok_no_float_abi_nocache { } { > +if { [check_effective_target_arm32] } { > + foreach flags {"-mfpu=neon"} { > + if { [check_no_compiler_messages_nocache arm_neon_ok_no_float_abi > object { > + #include > + int dummy; > + #ifndef __ARM_NEON__ > + #error not NEON > + #endif > + /* Avoid the case where a test adds -mfpu=neon, but the > toolchain is > +configured for -mcpu=arm926ej-s, for example. */ > + #if __ARM_ARCH < 7 || __ARM_ARCH_PROFILE == 'M' > + #error Architecture does not support NEON. > + #endif > +
C++ patch ping
Hi! I'd like to ping 2 C++2A patches: http://gcc.gnu.org/ml/gcc-patches/2017-09/msg01235.html P0683R1 - default member initializers for bit-fields http://gcc.gnu.org/ml/gcc-patches/2017-09/msg01237.html P0704R1 - fixing const-qualified pointers to members Thanks Jakub
Re: [GCC][PATCH][testsuite][mid-end] Fix failing slp test on aarch64 and arm.
On 26/09/17 14:12, Tamar Christina wrote: > Hi All, > > The slp vectorization test currently fails on AArch32 and AArch64 > due to it not taking into account that we do have 128 bit vectors in > NEON. This means that two of the loops get vectorized instead of just 1. > > So update the conditions to include a check for neon. > > Regtested on aarch64-none-elf. > > Ok for trunk? > > Thanks, > Tamar. > > gcc/testsuite/ > 2017-09-26 Tamar Christina> > * gcc.dg/vect/slp-perm-9.c: Add arm_neon_ok checks. > > It would be better to fix this generically by adding a vect_sizes_16B_8B rule. R. > 8221-diff.patch > > > diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-9.c > b/gcc/testsuite/gcc.dg/vect/slp-perm-9.c > index > 4d9c11dcc476a8023b3eaac2ae76cc01bd0db182..816c4b31be80dc6ab77bda838f77357e2157ffb9 > 100644 > --- a/gcc/testsuite/gcc.dg/vect/slp-perm-9.c > +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-9.c > @@ -54,8 +54,8 @@ int main (int argc, const char* argv[]) >return 0; > } > > -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target > { {! vect_perm } || {! vect_sizes_32B_16B } } } } } */ > -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target > { { vect_perm } && { vect_sizes_32B_16B } } } } } */ > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target > { { {! vect_perm } || {! vect_sizes_32B_16B } } && {! arm_neon_ok} } } } } */ > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target > { { { vect_perm } && { vect_sizes_32B_16B } } || arm_neon_ok } } } } */ > /* { dg-final { scan-tree-dump-times "permutation requires at least three > vectors" 1 "vect" { target vect_perm_short } } } */ > /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" > { target { {! vect_perm } || {! vect_sizes_32B_16B } } } } } */ > /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" > { target { { vect_perm } && { vect_sizes_32B_16B } } } } } */ >
Re: [PATCH], Improve moving SFmode to GPR on PowerPC, #7 of 8
On Tue, Sep 26, 2017 at 06:37:17PM -0400, Michael Meissner wrote: > On Tue, Sep 26, 2017 at 04:56:54PM -0500, Segher Boessenkool wrote: > > On Tue, Sep 26, 2017 at 10:48:29AM -0400, Michael Meissner wrote: > > > * config/rs6000/vsx.md (peephole for optimizing move SF to GPR): > > > Adjust code to eliminate needing to do the shift right 32-bits > > > operation after XSCVDPSPN. > > > > After staring at this way too long... Looks correct. What a monster :-) > > > > Okay for trunk. Thanks! > > Thanks for taking the time to verify it. > > Yeah, it is a monster to get right. It would be nice to put this off to a > separate MD pass, instead of abusing peephole2's. Or maybe we can teach LRA to help us out here. LRA dearly needs some target hooks for the target to tell it what code sequences are good for the target. Segher
Re: [PATCH 4/5] New target check: vect_nopeel - v2
Hi Andreas, > On 09/27/2017 10:10 AM, Rainer Orth wrote: >> Hi Andreas, >> >>> On 09/26/2017 02:26 PM, Rainer Orth wrote: Hi Andreas, > diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi > index 307c726..3acfd85 100644 > --- a/gcc/doc/sourcebuild.texi > +++ b/gcc/doc/sourcebuild.texi > @@ -1398,6 +1398,9 @@ Target supports a vector misalign access. > @item vect_no_align > Target does not support a vector alignment mechanism. > > +@item vect_no_peel > +Target does not require any loop peeling for alignment purposes. > + > @item vect_no_int_min_max > Target does not support a vector min and max instruction on @code{int}. please keep the items sorted alphabetically. >>> >>> The items do not appear to be sorted alphabetically. >> >> they should be. Your patch makes the ordering even more random. >> >> Patch to fix this preapproved ;-) > The items rather appear to be arranged by subject. Does it really make > sense do pull items like this > apart just to have it in alphabetical order? > > @item vect_intfloat_cvt > Target supports conversion from @code{signed int} to @code{float}. > > @item vect_uintfloat_cvt > Target supports conversion from @code{unsigned int} to @code{float}. > > @item vect_floatint_cvt > Target supports conversion from @code{float} to @code{signed int}. > > @item vect_floatuint_cvt > Target supports conversion from @code{float} to @code{unsigned int}. > > > I've added the no_peel item intentionally to the hw_misalign/no_align block. granted, there are some attempts at that, but I find it hard to make my way through that longish list. The way it is, you have to skip through the whole list beginning to end. Texinfo seems to have no subsubsection which would allow to make the sub-grouping explicit... Let's hear what Sandra thinks. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: 0005-Part-5.-Add-x86-CET-documentation
On 09/27/2017 05:40 AM, Sandra Loosemore wrote: +@emph{x86 implementation:} when @option{-fcf-protection} option is +specified the compiler inserts an ENDBR instruction at function's +prologue if the function's type does not have the @code{nocf_check} +attribute and addresses to which indirect control-flow transfer can +happen. The instruction triggers the HW check if a control-flow +transfer to the address of ENDBR instruction is valid. Implementation details like this should be comments in the code, not included in the user-facing documentation. This is part of the ABI GCC implements, so it has to be documented somewhere, and not just as part of the GCC source code. CET is not properly described in the ABI supplement and I don't think this will change, so detailed documentation in the GCC manual is very much desirable. That being said, the implementation notes above need some clarification. It's not clear to me what the conditions are under which the ENDBR instruction is emitted (and we probably should use @code{endbr} in the manual), what it is trying to achieve, and how the x86 calling convention changes. I assume it is somehow related to what we call internally “the suffix problem”: without control flow integrity, an attacker might skip over precondition/hardening checks, directly to the critical changes we want to protect, executing only the suffix of a function (hence the name). Thanks, Florian
Re: [patch, fortran] Warn about out-of-bounds access with DO subscripts
Hi! On Tue, 26 Sep 2017 18:51:52 +0200, Thomas Koenigwrote: > > On Mon, 25 Sep 2017 18:50:49 +0200, Thomas Koenig > > wrote: > >> Thanks for the review, committed as r253156. > >> > >> Now, on to some other bugs... > > > > No, back to this one please. ;-) > > OK, if you insist :-) ;-) > > And the following gets highlighted, too: > > > > FAIL: compiler driver --help=fortran option(s): "^ +-.*[^:.]$" absent > > from output: " -Wdo-subscript Warn about possibly incorrect > > subscripts in do loops" > > FAIL: compiler driver --help=warnings option(s): "^ +-.*[^:.]$" absent > > from output: " -Wdo-subscript Warn about possibly incorrect > > subscripts in do loops" > > This I don't understand. Hmm, don't you see that in your test logs? > Was there anything wrong with my > change to fortran/lang.opt? Easy enough to fix, so as obvious, committed to trunk in r253225: commit a7717725d0bed402378b085bcda8fa3fe337fae9 Author: tschwinge Date: Wed Sep 27 08:35:05 2017 + Placate gcc.misc-tests/help.exp regarding -Wdo-subscript gcc/fortran/ * lang.opt : End help text with a period. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@253225 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/fortran/ChangeLog | 4 gcc/fortran/lang.opt | 2 +- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git gcc/fortran/ChangeLog gcc/fortran/ChangeLog index d0eef84..c698014 100644 --- gcc/fortran/ChangeLog +++ gcc/fortran/ChangeLog @@ -1,3 +1,7 @@ +2017-09-27 Thomas Schwinge + + * lang.opt : End help text with a period. + 2017-09-26 Thomas Koenig * frontend-passes.c (do_subscript): Don't do anything diff --git gcc/fortran/lang.opt gcc/fortran/lang.opt index 37ed4a3..88f6af5 100644 --- gcc/fortran/lang.opt +++ gcc/fortran/lang.opt @@ -239,7 +239,7 @@ Warn about most implicit conversions. Wdo-subscript Fortran Var(warn_do_subscript) Warning LangEnabledBy(Fortran,Wextra) -Warn about possibly incorrect subscripts in do loops +Warn about possibly incorrect subscripts in do loops. Wextra Fortran Warning Grüße Thomas
Re: [PATCH 4/5] New target check: vect_nopeel - v2
On 09/27/2017 10:10 AM, Rainer Orth wrote: > Hi Andreas, > >> On 09/26/2017 02:26 PM, Rainer Orth wrote: >>> Hi Andreas, >>> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi index 307c726..3acfd85 100644 --- a/gcc/doc/sourcebuild.texi +++ b/gcc/doc/sourcebuild.texi @@ -1398,6 +1398,9 @@ Target supports a vector misalign access. @item vect_no_align Target does not support a vector alignment mechanism. +@item vect_no_peel +Target does not require any loop peeling for alignment purposes. + @item vect_no_int_min_max Target does not support a vector min and max instruction on @code{int}. >>> >>> please keep the items sorted alphabetically. >> >> The items do not appear to be sorted alphabetically. > > they should be. Your patch makes the ordering even more random. > > Patch to fix this preapproved ;-) The items rather appear to be arranged by subject. Does it really make sense do pull items like this apart just to have it in alphabetical order? @item vect_intfloat_cvt Target supports conversion from @code{signed int} to @code{float}. @item vect_uintfloat_cvt Target supports conversion from @code{unsigned int} to @code{float}. @item vect_floatint_cvt Target supports conversion from @code{float} to @code{signed int}. @item vect_floatuint_cvt Target supports conversion from @code{float} to @code{unsigned int}. I've added the no_peel item intentionally to the hw_misalign/no_align block. -Andreas-
Re: [PR middle-end/82319] Fix ICE in pattern
On Wed, Sep 27, 2017 at 12:56 AM, Richard Bienerwrote: > On Tue, 26 Sep 2017, Andrew Pinski wrote: > >> On Tue, Sep 26, 2017 at 10:56 PM, Yuri Gribov wrote: >> > Hi all, >> > >> > This patch fixes a trivial ICE in recent pattern. Bootstrapped and >> > regtested on x86_64. >> > >> > Ok to commit? > > Ok. > >> >+ bool cst_int_p = ! real_isnan (cst) && real_identical (, cst); >> >> The GCC coding style says no space between the ! and the expression. > > Does it? I thought it says the opposite. Yes, see https://gcc.gnu.org/codingconventions.html#Expressions . For Use..instead of logical not !x ! x bitwise complement ~x ~ x unary minus -x - x cast (foo) x (foo)x pointer dereference *x * x Thanks, Andrew > > Richard. > >> Note for clarity I would put () around !real_isnan (cst) though. >> Other than that I don't see anything wrong with the patch (I cannot >> approve the patch though). >> >> Thanks, >> Andrew >> >> > >> > -Y >> >> > > -- > Richard Biener > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB > 21284 (AG Nuernberg)
Re: [PATCH 4/5] New target check: vect_nopeel - v2
Hi Andreas, > On 09/26/2017 02:26 PM, Rainer Orth wrote: >> Hi Andreas, >> >>> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi >>> index 307c726..3acfd85 100644 >>> --- a/gcc/doc/sourcebuild.texi >>> +++ b/gcc/doc/sourcebuild.texi >>> @@ -1398,6 +1398,9 @@ Target supports a vector misalign access. >>> @item vect_no_align >>> Target does not support a vector alignment mechanism. >>> >>> +@item vect_no_peel >>> +Target does not require any loop peeling for alignment purposes. >>> + >>> @item vect_no_int_min_max >>> Target does not support a vector min and max instruction on @code{int}. >> >> please keep the items sorted alphabetically. > > The items do not appear to be sorted alphabetically. they should be. Your patch makes the ordering even more random. Patch to fix this preapproved ;-) Thanks. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PR middle-end/82319] Fix ICE in pattern
On Tue, 26 Sep 2017, Andrew Pinski wrote: > On Tue, Sep 26, 2017 at 10:56 PM, Yuri Gribovwrote: > > Hi all, > > > > This patch fixes a trivial ICE in recent pattern. Bootstrapped and > > regtested on x86_64. > > > > Ok to commit? Ok. > >+ bool cst_int_p = ! real_isnan (cst) && real_identical (, cst); > > The GCC coding style says no space between the ! and the expression. Does it? I thought it says the opposite. Richard. > Note for clarity I would put () around !real_isnan (cst) though. > Other than that I don't see anything wrong with the patch (I cannot > approve the patch though). > > Thanks, > Andrew > > > > > -Y > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: [PATCH][GRAPHITE] Simplify SCOP detection
On Tue, 26 Sep 2017, Sebastian Pop wrote: > On Tue, Sep 26, 2017 at 7:03 AM, Richard Bienerwrote: > > > > > The following is the result of me trying to understand SCOP detection > > and the validity checks spread around the machinery. It removes several > > quadraticnesses by folding validity checks into > > scop_detection::harmful_loop_in_region where we already walk over all > > BBs in the region and process individual found loops. > > > > It also rewrites build_scop_depth/build_scop_breadth into something > > I can undestand. > > > > Bootstrap and regtest is running on x86_64-unknown-linux-gnu (graphite.exp > > for all langs is happy, so is SPEC CPU 2006 testing where the statistics > > agree before/after the patch). > > > > I'll apply this after the bootstrap finished. > > > > Have you tried to bootstrap with BOOT_CFLAGS="-O2 -fgraphite-identity"? I do "-O2 -g -floop-nest-optimize" but I guess -fgraphite-identity should catch more issues? Hmm, maybe -floop-nest-optimize and -fgraphite-identity should be combinable Index: gcc/graphite-optimize-isl.c === --- gcc/graphite-optimize-isl.c (revision 253203) +++ gcc/graphite-optimize-isl.c (working copy) @@ -189,7 +189,7 @@ optimize_isl (scop_p scop) print_schedule_ast (dump_file, scop->original_schedule, scop); isl_schedule_free (scop->transformed_schedule); scop->transformed_schedule = isl_schedule_copy (scop->original_schedule); - return false; + return flag_graphite_identity || flag_loop_parallelize_all; } return true; I'll test/commit the above. > > > Richard. > > > > 2017-09-26 Richard Biener > > > > * graphite-scop-detection.c (scop_detection::build_scop_depth): > > Rewrite, > > fold in ... > > (scop_detection::build_scop_breadth): ... this. Removed. > > (scop_detection::loop_is_valid_in_scop): Fold into single caller. > > (scop_detection::harmful_stmt_in_bb): Likewise. > > (scop_detection::graphite_can_represent_stmt): Likewise. > > (scop_detection::loop_body_is_valid_scop): Likewise. Remove > > recursion. > > (scop_detection::can_represent_loop): Remove recursion, fold in > > ... > > (scop_detection::can_represent_loop_1): ... this. Removed. > > (scop_detection::harmful_loop_in_region): Simplify after inlining > > the above and remove more quadraticness. > > (build_scops): Adjust. > > * tree-data-ref.c (loop_nest_has_data_refs): Remove pointless > > quadraticness. > > > > > This goes in the right direction: it cuts down compilation time. > As it is not a trivial change, I need some time to understand how > the scop detection works with this change. The only functional change should be that the SESE composition now works top-down instead of working its way bottom-up. It's not clear whether we do more or less work that way but at least the function is now readable -- I think we can structure it bottom-up as well without doing the confusing two-function way (which I believe did quite some duplicate work but I never was sure...). Richard.
Re: [PATCH 4/5] New target check: vect_nopeel - v2
On 09/26/2017 02:26 PM, Rainer Orth wrote: > Hi Andreas, > >> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi >> index 307c726..3acfd85 100644 >> --- a/gcc/doc/sourcebuild.texi >> +++ b/gcc/doc/sourcebuild.texi >> @@ -1398,6 +1398,9 @@ Target supports a vector misalign access. >> @item vect_no_align >> Target does not support a vector alignment mechanism. >> >> +@item vect_no_peel >> +Target does not require any loop peeling for alignment purposes. >> + >> @item vect_no_int_min_max >> Target does not support a vector min and max instruction on @code{int}. > > please keep the items sorted alphabetically. The items do not appear to be sorted alphabetically. -Andreas-
Re: [PR middle-end/82319] Fix ICE in pattern
On Tue, Sep 26, 2017 at 10:56 PM, Yuri Gribovwrote: > Hi all, > > This patch fixes a trivial ICE in recent pattern. Bootstrapped and > regtested on x86_64. > > Ok to commit? >+ bool cst_int_p = ! real_isnan (cst) && real_identical (, cst); The GCC coding style says no space between the ! and the expression. Note for clarity I would put () around !real_isnan (cst) though. Other than that I don't see anything wrong with the patch (I cannot approve the patch though). Thanks, Andrew > > -Y