Re: [PATCH] Improve pow (C, x) -> exp (log (C) * x) optimization (PR middle-end/84309)
On February 10, 2018 12:37:38 AM GMT+01:00, Jakub Jelinek wrote: >Hi! > >Apparently the new pow(C,x) -> exp(log(C)*x) if C > 0 optimization >breaks some package (Marek should know which), as it has 7ulp error. >Generally one should be prepared for some errors with -ffast-math. > >Though, in this case, if the target has c99 runtime and C is >a positive 0x1pNN it seems much better to use exp2 over exp, for >C being 2 pow (2, x) is optimized into exp2 (x) and even for other >values log2(C) will still be some positive or negative integer, so >in many cases there won't be any rounding errors in the multiplication. > >Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. I wonder whether there are vectorized variants in libmvec? Richard. >Perhaps we should do something similar if C is a power of 10 (use exp10 >and log10). > >2018-02-10 Jakub Jelinek > > PR middle-end/84309 > * match.pd (pow(C,x) -> exp(log(C)*x)): Optimize instead into > exp2(log2(C)*x) if C is a power of 2 and c99 runtime is available. > > * gcc.dg/pr84309.c: New test. > >--- gcc/match.pd.jj2018-01-26 12:43:23.208922420 +0100 >+++ gcc/match.pd 2018-02-09 18:48:26.412021408 +0100 >@@ -3992,15 +3992,33 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) >(logs (pows @0 @1)) >(mult @1 (logs @0 > >- /* pow(C,x) -> exp(log(C)*x) if C > 0. */ >+ /* pow(C,x) -> exp(log(C)*x) if C > 0, >+or if C is a positive power of 2, >+pow(C,x) -> exp2(log2(C)*x). */ > (for pows (POW) > exps (EXP) > logs (LOG) >+ exp2s (EXP2) >+ log2s (LOG2) > (simplify >(pows REAL_CST@0 @1) >-(if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), &dconst0) >- && real_isfinite (TREE_REAL_CST_PTR (@0))) >- (exps (mult (logs @0) @1) >+ (if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), &dconst0) >+ && real_isfinite (TREE_REAL_CST_PTR (@0))) >+(with { >+ const REAL_VALUE_TYPE *const value = TREE_REAL_CST_PTR (@0); >+ bool use_exp2 = false; >+ if (targetm.libc_has_function (function_c99_misc) >+ && value->cl == rvc_normal) >+ { >+ REAL_VALUE_TYPE frac_rvt = *value; >+ SET_REAL_EXP (&frac_rvt, 1); >+ if (real_equal (&frac_rvt, &dconst1)) >+ use_exp2 = true; >+ } >+ } >+ (if (use_exp2) >+ (exp2s (mult (log2s @0) @1)) >+ (exps (mult (logs @0) @1))) > > (for sqrts (SQRT) > cbrts (CBRT) >--- gcc/testsuite/gcc.dg/pr84309.c.jj 2018-02-09 18:54:52.254787678 >+0100 >+++ gcc/testsuite/gcc.dg/pr84309.c 2018-02-09 18:59:02.343636178 +0100 >@@ -0,0 +1,14 @@ >+/* PR middle-end/84309 */ >+/* { dg-do run { target c99_runtime } } */ >+/* { dg-options "-O2 -ffast-math" } */ >+ >+int >+main () >+{ >+ unsigned long a = 1024; >+ unsigned long b = 16 * 1024; >+ unsigned long c = __builtin_pow (2, (__builtin_log2 (a) + >__builtin_log2 (b)) / 2); >+ if (c != 4096) >+__builtin_abort (); >+ return 0; >+} > > Jakub
Re: [SFN+LVU+IEPM v4 9/9] [IEPM] Introduce inline entry point markers
On 02/09/2018 09:39 PM, Alexandre Oliva wrote: > On Feb 9, 2018, Alexandre Oliva wrote: > >> On Feb 9, 2018, Jeff Law wrote: >>> On 02/08/2018 08:53 PM, Alan Modra wrote: On Fri, Feb 09, 2018 at 01:21:27AM -0200, Alexandre Oliva wrote: > Here's what I checked in, right after the LVU patch. > > [IEPM] Introduce inline entry point markers One of these two patches breaks ppc64le bootstrap with the assembler complaining "Error: view number mismatch" when compiling libdecnumber. >>> I've just passed along a similar failure (.i, .s and command line >>> options) to Alex for ppc64 (be) building glibc. > >> Thanks. So, I'm told there are more such issues, that non-asm insn >> length attrs can't be relied on at this time to be nonzero only when the >> actual length is not zero. > > I wonder... In the previously-posted patch, we still regard call insns > are advancing PC. I think that's a safe assumption, but... are there > any other kinds of patterns we could recognize that would certainly > generate an actual PC-changing insn? How about jump insns? How about > insns that are SETs, or PARALLELs containing at least one SET? Does > anyone see any risk in recognizing those when the length attr is, > conservatively or not, deemed unreliable? Any other paterns we could > recognize to that end? So given what I've seen in the ARM port, I don't think we can generally assume any insn advances the PC. Here's why. On the ARM there's a little state machine that's used to implement conditional execution. When we're in certain states the backend will generate no code for certain JUMP_INSNs and change the state. That's fine and dandy. THe problem is we can't actually tell outside the ARM backend when that's happened! You might think we could embed tests of the FSM within the length computation. But the state of the FSM is only valid during assembly output (e.g. final). But insn lengths are set up during branch shortening and if you query the length attribute you get value computed by branch shortening. The only way around this would be to do what I would consider some interface abuse and query insn_min_length (not to be confused with get_attr_min_length). Then we wouldn't get the cached version and we could do queries of the state machine on the fly in dwarf2out_var_location. But it seems rather icky and I haven't even been able to convince myself it's really safe. And while this is specific to the ARM and JUMP_INSNs, I can easily envision scenarios where other ports could use the same kind of little state machine for standard INSNs (PA to generate add,tr) or even CALL_INSNs (target dependent optimization of tail calls). You also have to worry about ports that try to optimize away nop-insns that snuck through the optimizers. I once worked on a port that couldn't encode a register self copy. So when one snuck through the optimizers, we had to deal with it in the output code and I know I've seen other instances where ports tried to compensate for nop-insns that snuck through. So in the end I don't think you can assume that any given insn advances the PC. The closest we have is the length attribute, but it has always supposed to have been conservatively correct for the purposes of branch shortening. ie, it can never return a length less than the actual length, but it is allowed to return a length longer than the actual length. Jeff
Re: [SFN+LVU+IEPM v4 9/9] [IEPM] Introduce inline entry point markers
On Feb 9, 2018, Alexandre Oliva wrote: > On Feb 9, 2018, Jeff Law wrote: >> On 02/08/2018 08:53 PM, Alan Modra wrote: >>> On Fri, Feb 09, 2018 at 01:21:27AM -0200, Alexandre Oliva wrote: Here's what I checked in, right after the LVU patch. [IEPM] Introduce inline entry point markers >>> >>> One of these two patches breaks ppc64le bootstrap with the assembler >>> complaining "Error: view number mismatch" when compiling >>> libdecnumber. >>> >> I've just passed along a similar failure (.i, .s and command line >> options) to Alex for ppc64 (be) building glibc. > Thanks. So, I'm told there are more such issues, that non-asm insn > length attrs can't be relied on at this time to be nonzero only when the > actual length is not zero. I wonder... In the previously-posted patch, we still regard call insns are advancing PC. I think that's a safe assumption, but... are there any other kinds of patterns we could recognize that would certainly generate an actual PC-changing insn? How about jump insns? How about insns that are SETs, or PARALLELs containing at least one SET? Does anyone see any risk in recognizing those when the length attr is, conservatively or not, deemed unreliable? Any other paterns we could recognize to that end? -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
Re: PR84300, ICE in dwarf2cfi on ppc64le
On Fri, Feb 09, 2018 at 08:11:44AM -0600, Segher Boessenkool wrote: > On Fri, Feb 09, 2018 at 04:12:47PM +1030, Alan Modra wrote: > > ;; Use r0 to stop regrename twiddling with lr restore insns emitted > > ;; after the call to __morestack. > > (define_insn "split_stack_return" > > - [(unspec_volatile [(use (reg:SI 0))] UNSPECV_SPLIT_STACK_RETURN)] > > + [(unspec_volatile [(use (reg:SI 0)) (use (reg:SI LR_REGNO))] > > + UNSPECV_SPLIT_STACK_RETURN)] > > I'm not sure what a USE as input of an UNSPEC means -- it should work > without the USEs? Hmm, yes, plain [(reg:SI 0) (reg:SI LR_REGNO)] ought to work. A sniff test says it's OK but I'll do the whole bootstrap/regtest cycle before committing. I'm not sure why I put the USE there for r0, probably because I had the r0 dependency outside the unspec initially then decided it could replace (const_int 0) inside the unspec vector to save on useless RTL. I didn't go far enough in trimming the RTL.. -- Alan Modra Australia Development Lab, IBM
Re: [SFN+LVU+IEPM v4 9/9] [IEPM] Introduce inline entry point markers
On Fri, 9 Feb 2018, Joseph Myers wrote: > I'm seeing regressions from my glibc bot for all of arm, mips, s390 and > sh. > > https://sourceware.org/ml/libc-testresults/2018-q1/msg00283.html > > arm and mips are "view number mismatch" building glibc and s390 is the > same error but building libgcc, so presumably those are the present issue. > sh is GCC segfaults building glibc, so it's less clearly associated with > this patch (but GCC mainline r257539 is bad and r257480 is good, in any > case). sh4 is: during RTL pass: final In file included from strtof_l.c:45: strtod_l.c: In function 'strtof_l_internal': strtod_l.c:1769:1: internal compiler error: Segmentation fault } ^ 0xb98e3f crash_signal /scratch/jmyers/glibc-bot/src/gcc/gcc/toplev.c:325 0x856b18 maybe_output_next_view /scratch/jmyers/glibc-bot/src/gcc/gcc/final.c:1726 0x856b18 final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*) /scratch/jmyers/glibc-bot/src/gcc/gcc/final.c:2714 0xef36bc print_slot /scratch/jmyers/glibc-bot/src/gcc/gcc/config/sh/sh.c:2557 0xef6520 output_far_jump(rtx_insn*, rtx_def*) /scratch/jmyers/glibc-bot/src/gcc/gcc/config/sh/sh.c:2617 0x857098 final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*) /scratch/jmyers/glibc-bot/src/gcc/gcc/final.c:3101 0x856c26 final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*) /scratch/jmyers/glibc-bot/src/gcc/gcc/final.c:2806 0x858469 final_1 /scratch/jmyers/glibc-bot/src/gcc/gcc/final.c:2088 0x8591d8 rest_of_handle_final /scratch/jmyers/glibc-bot/src/gcc/gcc/final.c:4637 0x8591d8 execute /scratch/jmyers/glibc-bot/src/gcc/gcc/final.c:4711 Since it's in maybe_output_next_view, I think it probably comes from these patches. > (It's possible more failures might appear on other architectures once the > bot builds the glibc testsuite, that's just failures from building GCC and > glibc.) hppa has a similar ICE building glibc tests, also in maybe_output_next_view, so probably also from these patches: during RTL pass: final test-tgmath2.c: In function 'test_fma_3': test-tgmath2.c:421:1: internal compiler error: Segmentation fault } ^ 0xb7525f crash_signal /scratch/jmyers/glibc-bot/src/gcc/gcc/toplev.c:325 0x835f14 maybe_output_next_view /scratch/jmyers/glibc-bot/src/gcc/gcc/final.c:1726 0x835f14 final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*) /scratch/jmyers/glibc-bot/src/gcc/gcc/final.c:2714 0xed7761 pa_output_call(rtx_insn*, rtx_def*, int) /scratch/jmyers/glibc-bot/src/gcc/gcc/config/pa/pa.c:8019 0x83642b final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*) /scratch/jmyers/glibc-bot/src/gcc/gcc/final.c:3101 0x836026 final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*) /scratch/jmyers/glibc-bot/src/gcc/gcc/final.c:2806 0x837629 final_1 /scratch/jmyers/glibc-bot/src/gcc/gcc/final.c:2088 0x8389d6 rest_of_handle_final /scratch/jmyers/glibc-bot/src/gcc/gcc/final.c:4637 0x8389d6 execute /scratch/jmyers/glibc-bot/src/gcc/gcc/final.c:4711 -- Joseph S. Myers jos...@codesourcery.com
[Fwd: [PATCH][Bug target/84266] mmintrin.h intrinsic headers for PowerPC code fails on power9]
--- Begin Message --- This has a simple fix that I have tested on power8 and Seurer are tested on power9. While there may be a more elegent coding for the require casts, this is the simplest change, considering the current stage. 2018-02-09 Steven Munroe * config/rs6000/mmintrin.h (_mm_cmpeq_pi32 [_ARCH_PWR9]): Cast vec_cmpeq result to correct type. * config/rs6000/mmintrin.h (_mm_cmpgt_pi32 [_ARCH_PWR9]): Cast vec_cmpgt result to correct type. Index: gcc/config/rs6000/mmintrin.h === --- gcc/config/rs6000/mmintrin.h(revision 257533) +++ gcc/config/rs6000/mmintrin.h(working copy) @@ -854,7 +854,7 @@ a = (__vector signed int)vec_splats (__m1); b = (__vector signed int)vec_splats (__m2); - c = (__vector signed short)vec_cmpeq (a, b); + c = (__vector signed int)vec_cmpeq (a, b); return (__builtin_unpack_vector_int128 ((__vector __int128_t)c, 0)); #else __m64_union m1, m2, res; @@ -883,7 +883,7 @@ a = (__vector signed int)vec_splats (__m1); b = (__vector signed int)vec_splats (__m2); - c = (__vector signed short)vec_cmpgt (a, b); + c = (__vector signed int)vec_cmpgt (a, b); return (__builtin_unpack_vector_int128 ((__vector __int128_t)c, 0)); #else __m64_union m1, m2, res; ready to commit?--- End Message ---
Re: [SFN+LVU+IEPM v4 9/9] [IEPM] Introduce inline entry point markers
On Feb 9, 2018, Jakub Jelinek wrote: > On Fri, Feb 09, 2018 at 07:01:25PM -0200, Alexandre Oliva wrote: >> So, as discussed on IRC, I'm trying to use a target hook to allow >> targets to indicate that their length attrs have been assessed for this >> purpose, and a param to make that overridable, but I'm having trouble >> initializing the param from the target hook. How does one do that? > Better in the default version of the target hook check the param > whether it should return true or false, and for analyzed targets > just use an always true (or false, depending on what the hook is) > as the hook. I want it to be overridable, so here's what I ended up with. Testing underway; ok to install if it succeeds? We need accurate length information to be able to optimize away locviews that would contain only zero-numbered views, and even to output correct view numbers. E.g., if we assume an insn has length nonzero, i.e., that it will advance PC, we conclude the view number after it is zero. If the assumption turns out to be false, the assembler view zero assert will catch the mistake, but only if we're using a view-capable assembler. Otherwise, view numbers will silently go out of sync between the compiler and the assembler, and thus the debug info consumer. Unfortunately, a number of targets seem to fail the essential property for this to work, namely, that get_min_attr_length() be zero for every insn that might have length zero. We thus introduce a target hook that indicates whether we can rely on the length attribute for this purpose, and a param that can be used to override it, for testing purposes or to work around errors. for gcc/ChangeLog * params.def (PARAM_USE_ATTR_LENGTH_IN_VIEW_COUNTING): New. * target.def (attr_length_reliable_for_view_count): New. * dwarf2out.c: Include params.h. (unreliable_attr_length_p): New. (dwarf2out_var_location): Use it. * doc/tm.texi.in (TARGET_ATTR_LENGTH_RELIABLE_FOR_VIEW_COUNT): New. * doc/tm.texi: Rebuilt. --- gcc/doc/tm.texi| 13 + gcc/doc/tm.texi.in |2 ++ gcc/dwarf2out.c| 28 +++- gcc/params.def |5 + gcc/target.def | 14 ++ 5 files changed, 61 insertions(+), 1 deletion(-) diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index ddf48cb4b4d2..50fa0d387f32 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -9986,6 +9986,19 @@ following it should not be run. Usually true only for virtual assembler targets. @end deftypevr +@deftypevr {Target Hook} bool TARGET_ATTR_LENGTH_RELIABLE_FOR_VIEW_COUNT +TRUE if get_min_attr_length returns +zero for any non-asm insn that might have length zero, during final. +It's ok if it's conservative and always returns zero. +If, in addition to the essential property, when an insn is known to have +nonzero length, get_min_attr_length returns a positive number, that will +enable loclist optimizations. There is little point in making this TRUE +otherwise. Enabling this when the essential property is not met while +using GNU as 2.30 or newer may cause the assembler to detect the errors +and fail to assemble; other assemblers will silently let the errors +through, with incorrect view numbers in debug information. +@end deftypevr + @defmac ASM_OUTPUT_DWARF_DELTA (@var{stream}, @var{size}, @var{label1}, @var{label2}) A C statement to issue assembly directives that create a difference @var{lab1} minus @var{lab2}, using an integer of the given @var{size}. diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 0aab45f4992c..26b32db77f74 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -6929,6 +6929,8 @@ tables, and hence is desirable if it works. @hook TARGET_NO_REGISTER_ALLOCATION +@hook TARGET_ATTR_LENGTH_RELIABLE_FOR_VIEW_COUNT + @defmac ASM_OUTPUT_DWARF_DELTA (@var{stream}, @var{size}, @var{label1}, @var{label2}) A C statement to issue assembly directives that create a difference @var{lab1} minus @var{lab2}, using an integer of the given @var{size}. diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index 749c7e3b9bbc..c48c117c8a16 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -96,6 +96,7 @@ along with GCC; see the file COPYING3. If not see #include "stringpool.h" #include "attribs.h" #include "file-prefix-map.h" /* remap_debug_filename() */ +#include "params.h" static void dwarf2out_source_line (unsigned int, unsigned int, const char *, int, bool); @@ -26860,6 +26861,30 @@ dwarf2out_next_real_insn (rtx_insn *loc_note) return next_real; } +/* Return TRUE if we should NOT use ATTR_length to optimize locview + lists, and FALSE otherwise. + + If ATTR_length is always zero for insns that could have length + zero, we can use it, then we'll know when we're guaranteed to have + a PC change that implies a view reset. locview lists whose views + are all zero can be optimized out completely. + +
Re: RFA: Fix PR 68028: LTO error when compiling PowerPC binaries with single precision floating point
Hi! On Fri, Feb 09, 2018 at 09:37:28AM +, Nick Clifton wrote: > > I thought you were going to do a patch like the following, to make the > > e500 cores less special (they are not): > > Sorry - my bad. I defer to your patch then. Whatever it takes to get > the BZ fixed and off the books... :-) So does it work? :-) (I cannot reproduce the problem, there must be something in your configuration I'm overlooking). Segher [ Here it is again, with changelog this time: ] >From 44c3b661ef75e57415b10ca2cec5ac6427bbed8f Mon Sep 17 00:00:00 2001 Message-Id: <44c3b661ef75e57415b10ca2cec5ac6427bbed8f.1518109128.git.seg...@kernel.crashing.org> From: Segher Boessenkool Date: Thu, 8 Feb 2018 16:58:33 + Subject: [PATCH] Do not mess with rs6000_{single,double}_float (PR68028) For e500 family cores we do some questionable things with those flags, which does not work with LTO. So don't. 2018-02-10 Segher Boessenkool * config/rs6000/rs6000.c (rs6000_option_override_internal): Don't handle rs6000_single_float and rs6000_double_float specially for e500 family CPUs. --- gcc/config/rs6000/rs6000.c | 19 --- 1 file changed, 19 deletions(-) diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 5adccd2..73de617 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -4809,25 +4809,6 @@ rs6000_option_override_internal (bool global_init_p) if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET) rs6000_print_isa_options (stderr, 0, "after subtarget", rs6000_isa_flags); - /* For the E500 family of cores, reset the single/double FP flags to let us - check that they remain constant across attributes or pragmas. */ - - switch (rs6000_cpu) -{ -case PROCESSOR_PPC8540: -case PROCESSOR_PPC8548: -case PROCESSOR_PPCE500MC: -case PROCESSOR_PPCE500MC64: -case PROCESSOR_PPCE5500: -case PROCESSOR_PPCE6500: - rs6000_single_float = 0; - rs6000_double_float = 0; - break; - -default: - break; -} - if (main_target_opt) { if (main_target_opt->x_rs6000_single_float != rs6000_single_float) -- 1.8.3.1
Re: [RFC PATCH] avoid applying attributes to explicit specializations (PR 83871)
On 02/09/2018 12:52 PM, Jason Merrill wrote: On 02/08/2018 04:52 PM, Martin Sebor wrote: I took me a while to find DECL_TEMPLATE_RESULT. Hopefully that's the right way to get the primary from a TEMPLATE_DECL. Yes. Attached is an updated patch. It hasn't gone through full testing yet but please let me know if you'd like me to make some changes. + const char* const whitelist[] = { +"error", "noreturn", "warning" + }; Why whitelist noreturn? I would expect to want that to be consistent. I expect noreturn to be used on a primary whose definition is provided but that's not meant to be used the way the API is otherwise expected to be. As in: template T [[noreturn]] foo () { throw "not implemented"; } template <> int foo(); // implemented elsewhere Marking that template as noreturn seems pointless, and possibly harmful; the deprecated, warning, or error attributes would be better for this situation. I meant either: template T __attribute__ ((noreturn)) foo () { throw "not implemented"; } template <> int foo(); // implemented elsewhere or (sigh) template [[noreturn]] T foo () { throw "not implemented"; } template <> int foo(); // implemented elsewhere It lets code like this int bar () { return foo(); } be diagnosed because it's likely a bug (as Clang does with -Wunreachable-code). It doesn't stop code like the following from compiling (which is good) but it instead lets them throw at runtime which is what foo's author wants. void bar () { foo(); } It's the same as having an "unimplemented" base virtual function throw an exception when it's called rather than making it pure and having calls to it abort. Declaring the base virtual function noreturn is useful for the same reason (and also diagnosed by Clang). I should remember to add the same warning in GCC 9. I actually had some misgivings about both warning and deprecated for the white-listing, but not for noreturn. My (only mild) concern is that both warning and deprecated functions can and likely will in some cases still be called, and so using them to suppress the warning runs the risk that their calls might be wrong and no one will notice. Warning cannot be suppressed so it seems unlikely to be ignored, but deprecated can be. So I wonder if the white-listing for deprecated should be conditional on -Wdeprecated being enabled. - /* Merge the type qualifiers. */ - if (TREE_READONLY (newdecl)) -TREE_READONLY (olddecl) = 1; if (TREE_THIS_VOLATILE (newdecl)) TREE_THIS_VOLATILE (olddecl) = 1; - if (TREE_NOTHROW (newdecl)) -TREE_NOTHROW (olddecl) = 1; + + if (merge_attr) +{ + /* Merge the type qualifiers. */ + if (TREE_READONLY (newdecl)) +TREE_READONLY (olddecl) = 1; +} + else +{ + /* Set the bits that correspond to the const function attributes. */ + TREE_READONLY (olddecl) = TREE_READONLY (newdecl); +} Let's limit the const/volatile handling here to non-functions, and handle the const/noreturn attributes for functions in the later hunk along with nothrow/malloc/pure. I had to keep the TREE_HAS_VOLATILE handling as is since it applies to functions too (has side-effects). Otherwise the attr-nothrow-2.C test fails. I try to avoid reformatting code that I'm not actually changing, to avoid noise in e.g. git blame. Okay. Attached is an updated patch. Besides the changes above I also used a plural "attributes" in the name of the option for consistency with other attribute options, expanded the documentation, and enhanced one of the tests. Bootstrapped on x86_64-linux (tests still running). Martin PR c++/83871 - wrong code for attribute const and pure on distinct template specializations PR c++/83503 - [8 Regression] bogus -Wattributes for const and pure on function template specialization gcc/ChangeLog: PR c++/83871 * gcc/doc/invoke.texi (-Wmissing-attribute): New option. gcc/c-family/ChangeLog: PR c++/83871 * c.opt (-Wmissing-attribute): New option. gcc/cp/ChangeLog: PR c++/83871 PR c++/83503 * cp-tree.h (warn_spec_missing_attributes): New function. ((check_explicit_specialization): Add an argument. Call the above function. * decl.c (duplicate_decls): Avoid applying primary function template's attributes to its explicit specializations. gcc/testsuite/ChangeLog: PR c++/83871 PR c++/83503 * g++.dg/ext/attr-const-pure.C: New test. * g++.dg/ext/attr-malloc.C: New test. * g++.dg/ext/attr-nonnull.C: New test. * g++.dg/ext/attr-nothrow.C: New test. * g++.dg/ext/attr-nothrow-2.C: New test. * g++.dg/ext/attr-returns-nonnull.C: New test. * g++.dg/Wmissing-attribute.C: New test. diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index 9c71726..1039274 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -781,6 +781,11 @@ Wtemplates C++ ObjC++ Var(warn_templates) Warning Warn on primary template declaration. +Wmissing-attributes +C ObjC C+
Re: [SFN+LVU+IEPM v4 9/9] [IEPM] Introduce inline entry point markers
On Fri, Feb 09, 2018 at 07:01:25PM -0200, Alexandre Oliva wrote: > So, as discussed on IRC, I'm trying to use a target hook to allow > targets to indicate that their length attrs have been assessed for this > purpose, and a param to make that overridable, but I'm having trouble > initializing the param from the target hook. How does one do that? Better in the default version of the target hook check the param whether it should return true or false, and for analyzed targets just use an always true (or false, depending on what the hook is) as the hook. > By disabling it altogether, we won't get the assembler checks or > incorrect view numbers in debug info, but we will get plenty of all-zero > locview lists. Oh well... I guess at this point that's better than > wrong debug info or assembler failures. For the debugging of the target issues, you can also use a hack: in final.c for instructions that have minimum length longer than zero emit a label right before emitting the insn and after it: .Lhacke1: .if (.Lhacke1 - .Lhackb1) == 0 .error "zero length" .endif (or perhaps just: .if (. - .Lhack1) == 0 .error "zero length" .endif ), then make check various targets with that. Jakub
[PATCH] Improve pow (C, x) -> exp (log (C) * x) optimization (PR middle-end/84309)
Hi! Apparently the new pow(C,x) -> exp(log(C)*x) if C > 0 optimization breaks some package (Marek should know which), as it has 7ulp error. Generally one should be prepared for some errors with -ffast-math. Though, in this case, if the target has c99 runtime and C is a positive 0x1pNN it seems much better to use exp2 over exp, for C being 2 pow (2, x) is optimized into exp2 (x) and even for other values log2(C) will still be some positive or negative integer, so in many cases there won't be any rounding errors in the multiplication. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Perhaps we should do something similar if C is a power of 10 (use exp10 and log10). 2018-02-10 Jakub Jelinek PR middle-end/84309 * match.pd (pow(C,x) -> exp(log(C)*x)): Optimize instead into exp2(log2(C)*x) if C is a power of 2 and c99 runtime is available. * gcc.dg/pr84309.c: New test. --- gcc/match.pd.jj 2018-01-26 12:43:23.208922420 +0100 +++ gcc/match.pd2018-02-09 18:48:26.412021408 +0100 @@ -3992,15 +3992,33 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (logs (pows @0 @1)) (mult @1 (logs @0 - /* pow(C,x) -> exp(log(C)*x) if C > 0. */ + /* pow(C,x) -> exp(log(C)*x) if C > 0, +or if C is a positive power of 2, +pow(C,x) -> exp2(log2(C)*x). */ (for pows (POW) exps (EXP) logs (LOG) + exp2s (EXP2) + log2s (LOG2) (simplify (pows REAL_CST@0 @1) -(if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), &dconst0) -&& real_isfinite (TREE_REAL_CST_PTR (@0))) - (exps (mult (logs @0) @1) + (if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), &dconst0) + && real_isfinite (TREE_REAL_CST_PTR (@0))) +(with { + const REAL_VALUE_TYPE *const value = TREE_REAL_CST_PTR (@0); + bool use_exp2 = false; + if (targetm.libc_has_function (function_c99_misc) + && value->cl == rvc_normal) +{ + REAL_VALUE_TYPE frac_rvt = *value; + SET_REAL_EXP (&frac_rvt, 1); + if (real_equal (&frac_rvt, &dconst1)) +use_exp2 = true; +} + } + (if (use_exp2) + (exp2s (mult (log2s @0) @1)) + (exps (mult (logs @0) @1))) (for sqrts (SQRT) cbrts (CBRT) --- gcc/testsuite/gcc.dg/pr84309.c.jj 2018-02-09 18:54:52.254787678 +0100 +++ gcc/testsuite/gcc.dg/pr84309.c 2018-02-09 18:59:02.343636178 +0100 @@ -0,0 +1,14 @@ +/* PR middle-end/84309 */ +/* { dg-do run { target c99_runtime } } */ +/* { dg-options "-O2 -ffast-math" } */ + +int +main () +{ + unsigned long a = 1024; + unsigned long b = 16 * 1024; + unsigned long c = __builtin_pow (2, (__builtin_log2 (a) + __builtin_log2 (b)) / 2); + if (c != 4096) +__builtin_abort (); + return 0; +} Jakub
[committed] Better fix for the -fsanitize=undefined -fopemp -flto ICEs (PR sanitizer/83987)
Hi! Apparently the cp_free_lang_data fix wasn't sufficient, the DECL_OMP_PRIVATIZED_MEMBER vars can be copied to child functions but without the lang specific data so it doesn't trigger any longer, and even trying to free it in cp_free_lang_data if it satisfies omp_member_access_dummy_var doesn't work. So, this patch instead looks for these variables at the end of the omp lowering phase in methods, and removes them from GIMPLE_BIND bind vars as well as BLOCK_VARS in the block tree. Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk. 2018-02-10 Jakub Jelinek PR sanitizer/83987 * omp-low.c (maybe_remove_omp_member_access_dummy_vars, remove_member_access_dummy_vars): New functions. (lower_omp_for, lower_omp_taskreg, lower_omp_target, lower_omp_1, execute_lower_omp): Use them. * tree.c (cp_free_lang_data): Revert 2018-01-23 change. * g++.dg/ubsan/pr83987-2.C: New test. --- gcc/omp-low.c.jj2018-01-24 17:26:25.216357997 +0100 +++ gcc/omp-low.c 2018-02-09 15:26:25.653403123 +0100 @@ -3208,6 +3208,43 @@ scan_omp (gimple_seq *body_p, omp_contex /* Re-gimplification and code generation routines. */ +/* Remove omp_member_access_dummy_var variables from gimple_bind_vars + of BIND if in a method. */ + +static void +maybe_remove_omp_member_access_dummy_vars (gbind *bind) +{ + if (DECL_ARGUMENTS (current_function_decl) + && DECL_ARTIFICIAL (DECL_ARGUMENTS (current_function_decl)) + && (TREE_CODE (TREE_TYPE (DECL_ARGUMENTS (current_function_decl))) + == POINTER_TYPE)) +{ + tree vars = gimple_bind_vars (bind); + for (tree *pvar = &vars; *pvar; ) + if (omp_member_access_dummy_var (*pvar)) + *pvar = DECL_CHAIN (*pvar); + else + pvar = &DECL_CHAIN (*pvar); + gimple_bind_set_vars (bind, vars); +} +} + +/* Remove omp_member_access_dummy_var variables from BLOCK_VARS of + block and its subblocks. */ + +static void +remove_member_access_dummy_vars (tree block) +{ + for (tree *pvar = &BLOCK_VARS (block); *pvar; ) +if (omp_member_access_dummy_var (*pvar)) + *pvar = DECL_CHAIN (*pvar); +else + pvar = &DECL_CHAIN (*pvar); + + for (block = BLOCK_SUBBLOCKS (block); block; block = BLOCK_CHAIN (block)) +remove_member_access_dummy_vars (block); +} + /* If a context was created for STMT when it was scanned, return it. */ static omp_context * @@ -6961,6 +6998,7 @@ lower_omp_for (gimple_stmt_iterator *gsi pop_gimplify_context (new_stmt); gimple_bind_append_vars (new_stmt, ctx->block_vars); + maybe_remove_omp_member_access_dummy_vars (new_stmt); BLOCK_VARS (block) = gimple_bind_vars (new_stmt); if (BLOCK_VARS (block)) TREE_USED (block) = 1; @@ -7413,6 +7451,7 @@ lower_omp_taskreg (gimple_stmt_iterator /* Declare all the variables created by mapping and the variables declared in the scope of the parallel body. */ record_vars_into (ctx->block_vars, child_fn); + maybe_remove_omp_member_access_dummy_vars (par_bind); record_vars_into (gimple_bind_vars (par_bind), child_fn); if (ctx->record_type) @@ -7781,6 +7820,7 @@ lower_omp_target (gimple_stmt_iterator * /* Declare all the variables created by mapping and the variables declared in the scope of the target body. */ record_vars_into (ctx->block_vars, child_fn); + maybe_remove_omp_member_access_dummy_vars (tgt_bind); record_vars_into (gimple_bind_vars (tgt_bind), child_fn); } @@ -8772,6 +8812,7 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p break; case GIMPLE_BIND: lower_omp (gimple_bind_body_ptr (as_a (stmt)), ctx); + maybe_remove_omp_member_access_dummy_vars (as_a (stmt)); break; case GIMPLE_OMP_PARALLEL: case GIMPLE_OMP_TASK: @@ -8976,6 +9017,16 @@ execute_lower_omp (void) all_contexts = NULL; } BITMAP_FREE (task_shared_vars); + + /* If current function is a method, remove artificial dummy VAR_DECL created + for non-static data member privatization, they aren't needed for + debuginfo nor anything else, have been already replaced everywhere in the + IL and cause problems with LTO. */ + if (DECL_ARGUMENTS (current_function_decl) + && DECL_ARTIFICIAL (DECL_ARGUMENTS (current_function_decl)) + && (TREE_CODE (TREE_TYPE (DECL_ARGUMENTS (current_function_decl))) + == POINTER_TYPE)) +remove_member_access_dummy_vars (DECL_INITIAL (current_function_decl)); return 0; } --- gcc/cp/tree.c.jj2018-02-09 06:44:24.855812073 +0100 +++ gcc/cp/tree.c 2018-02-09 13:44:22.938670431 +0100 @@ -5273,16 +5273,6 @@ cp_free_lang_data (tree t) /* We do not need the leftover chaining of namespaces from the binding level. */ DECL_CHAIN (t) = NULL_TREE; - /* Set DECL_VALUE_EXPRs of OpenMP privatized member artificial - decls to error_mark_node. These are DECL_IGNORED_P and after - OpenMP lowering they
Re: [PATCH] Character length cleanup for Coarray Fortran library
On Fri, Feb 09, 2018 at 06:10:46PM +0200, Janne Blomqvist wrote: > Following the change to use size_t for Fortran character lengths (PR > 78534), this patch modifies the Coarray ABI in a similar way. The > single-image implementation that is included in libgfortran is > updated, but this needs corresponding work in the OpenCoarray library > as well for multi-image support. Damian, can you look into that? > > Regtested on x86_64-pc-linux-gnu, Ok for trunk? > Janne, I've scanned the patch and it appears to a fairly straight foward mechanical s/int/size_t (and similar for nodes). OK to commit (although you may want to wait for a response from Damain or Izaak Beekman). As this breaks an ABI and we've broken it for 8.0, I think this should go into trunk. You may need to ping Jakub or Richard for final approval. -- Steve
Re: [Patch, Fortran] PR 84273: Reject allocatable passed-object dummy argument (proc_ptr_47.f90)
On Fri, Feb 09, 2018 at 06:13:34PM +0100, Janus Weil wrote: > > the attached patch fixes some checking code for PASS arguments in > procedure-pointer components, which does not properly account for the > fact that the PASS argument needs to be polymorphic. > > [The reason for this issue is probably that PPCs were mostly > implemented before polymorphism was available. The corresponding > pass-arg checks for TBPs are ok.] > > The patch also fixes an invalid test case (which was detected thanks > to Neil Carlson). It regtests cleanly on x86_64-linux-gnu. Ok for > trunk? Janus, The patch looks ok to me. Trunk is in regression and doc fixes only mode, so you'll probably need to ping Jakub or Richard (ie., release engineer) for an ok. PS: Welcome back! We can definitely use your talent. -- Steve
Re: [PATCH rs6000] Fix for builtins-4-runnable.c testcase FAIL on power7/BE 32-bit
Hi Carl, On Fri, Feb 09, 2018 at 08:22:57AM -0800, Carl Love wrote: > The following patch contains fixes for the builtins4-runnable.c test in > 32-bit mode for the current GCC 8 branch. This is issue #290 in Bill > Schmidt's issues. The int128 and uint128 variable types are not > supported in 32-bit mode. The tests were moved to a new test file and > restricted to run only when int128 type is supported. > 2018-02-09 Carl Love > > * gcc.target/powerpc/builtins-4-runnable.c (main): Move > int128 and > uint128 tests to new testfile. > * gcc.target/powerpc/builtins-4-int128-runnable.c: New testfile > for > int128 and uint128 tests. > * gcc.target/powerpc/powerpc.exp: Add builtins-4-int128- > runnable.c to > list oftorture tests. Missing space (and the layout is messed up a bit, that's your mailer I guess -- the patch is wrapped as well, that's not good). Okay for trunk. Thanks! Segher
Re: [RFC][PATCH v2] PR preprocessor/83173: Additional check before decrementing highest_location
Hi David, Here is a new version of the linemap patch (see my earlier emails for an updated version of the test code). I spent several days poring over the line map code, and I think I understand it a little better now. I also discovered -fdump-internal-locations, and it was a big help. If I use this switch with the code from the test I posted with an initial offset of 0x6001, I get the following (I have removed some unnecessary fields for brevity): ORDINARY MAP: 21 source_location interval: 1610612805 <= loc < 1610612809 file: location-overflow-test-pr83173.c starting at line: 3 reason: 2 (LC_RENAME) location-overflow-test-pr83173.c: 3|loc:1610612805| { dg-do preprocess } location-overflow-test-pr83173.c: 4|loc:1610612806|*/ location-overflow-test-pr83173.c: 5|loc:1610612807| location-overflow-test-pr83173.c: 6|loc:1610612808|#include "location-overflow-test-pr83173.h" ORDINARY MAP: 22 source_location interval: 1610612809 <= loc < 1610612810 file: location-overflow-test-pr83173.h starting at line: 1 reason: 0 (LC_ENTER) location-overflow-test-pr83173.h: 1|loc:1610612809|#include "location-overflow-test-pr83173-1.h" ORDINARY MAP: 23 source_location interval: 1610612810 <= loc < 1610612812 file: location-overflow-test-pr83173-1.h starting at line: 1 reason: 0 (LC_ENTER) location-overflow-test-pr83173-1.h: 1|loc:1610612810|#pragma once location-overflow-test-pr83173-1.h: 2|loc:1610612811|#define LOCATION_OVERFLOW_TEST_PR83173_1_H ORDINARY MAP: 24 source_location interval: 1610612812 <= loc < 1610612812 file: location-overflow-test-pr83173.h starting at line: 2 reason: 1 (LC_LEAVE) ORDINARY MAP: 25 source_location interval: 1610612812 <= loc < 1610612814 file: location-overflow-test-pr83173-2.h starting at line: 1 reason: 0 (LC_ENTER) location-overflow-test-pr83173-2.h: 1|loc:1610612812|#pragma once location-overflow-test-pr83173-2.h: 2|loc:1610612813|#define LOCATION_OVERFLOW_TEST_PR83173_2_H ORDINARY MAP: 26 source_location interval: 1610612814 <= loc < 1610612815 file: location-overflow-test-pr83173.h starting at line: 2 reason: 1 (LC_LEAVE) location-overflow-test-pr83173.h: 2|loc:1610612814|#include "location-overflow-test-pr83173-2.h" ORDINARY MAP: 27 source_location interval: 1610612815 <= loc < 1610612823 file: location-overflow-test-pr83173.c starting at line: 7 reason: 1 (LC_LEAVE) ORDINARY MAP: 28 source_location interval: 1610612823 <= loc < 1610612825 file: location-overflow-test-pr83173.c starting at line: 15 reason: 2 (LC_RENAME) Notice that map 24 and 25 have the same start_location. This is because line_table->highest_location is being decremented when it should not be. This new patch uses a more robust method to check whether the source line that highest_location refers to is greater than loc (which is the source_location of the #include). Instead of comparing the source_location values directly, I use linemap_get_expansion_line to map the source location to a line number, and then compare the line numbers. This handles the case that loc is an adhoc location. I noticed that after updating the filenames in the test to be 'location-overflow-test-pr83173-1.h', the source locations passed into _cpp_stack_include were adhoc locations when I wasn't using the location offset plugin. This is because these filenames are now 34 characters long, which is longer than the 5 range bits that an ordinary source_location allows. I spent a lot of time comparing the -fdump-internal-locations output when this patch is in use to the unpatched version. The only differences occur when a #include is the last line in the file. For the case where the source location > LINE_MAP_MAX_SOURCE_LOCATION, the ordinary map 25 (see above) now starts at 1610612813 instead of 1610612812. For the case where the source location < LINE_MAP_MAX_LOCATION_WITH_COLS, the start location of that map has increased by 32 (which is effectively one column). For the case where the source location is between LINE_MAP_MAX_LOCATION_WITH_COLS and LINE_MAP_MAX_SOURCE_LOCATION (so that there are no range bits), the output was unchanged. I can post these location map dumps if you would like to see them. There are no regressions in the gcc test suite from this patch. The other addition in this version if the patch is a small improvement to the -fdump-internal-locations output to include the reason and included_from fields from the line_map_ordinary data structure. I aos fixed an issue with the output alignment. I found this useful when evaluating these changes. Thanks, Mike From e3f23f66ad18598fc21a9223b7f29572a14d1588 Mon Sep 17 00:00:00 2001 From: Mike Gulick Date: Fri, 1 Dec 2017 09:43:22 -0500 Subject: [PATCH v2] PR preprocessor/83173: Additional check before decrementing highest_location 2017-12-01 Mike Gulick PR preprocessor/83173 * libcpp/files.c (_cpp_stack_include): Check if line_table->highest_location is past current lin
Re: [SFN+LVU+IEPM v4 9/9] [IEPM] Introduce inline entry point markers
I'm seeing regressions from my glibc bot for all of arm, mips, s390 and sh. https://sourceware.org/ml/libc-testresults/2018-q1/msg00283.html arm and mips are "view number mismatch" building glibc and s390 is the same error but building libgcc, so presumably those are the present issue. sh is GCC segfaults building glibc, so it's less clearly associated with this patch (but GCC mainline r257539 is bad and r257480 is good, in any case). (It's possible more failures might appear on other architectures once the bot builds the glibc testsuite, that's just failures from building GCC and glibc.) -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH rs6000] Fix for builtins-4-runnable.c testcase FAIL on power7/BE 32-bit
GCC maintainers: As pointed out, the dg arguments in new test file was missing the {target 128}. I updated the arguments to be { dg-do run { target { int128 && powerpc64*-*-* } } } Without the powerpc64*-*-* the test was still tried to compiled the test case in 32-bit mode on BE and failed. I reran the tests manually to verify the tests run as expected. On a Power 8 BE machine with GCC configured for -m32 and -m64 I ran: make -k check-gcc RUNTESTFLAGS="--target_board=unix'{-m32,-m64}' powerpc.exp=builtins-4-int128-runnable.c" > out The pertinent output was: Test Run By carll on Fri Feb 9 16:42:34 2018 Native configuration is powerpc64-unknown-linux-gnu === gcc tests === Schedule of variations: unix/-m32 unix/-m64 Running target unix/-m32 Using /usr/share/dejagnu/baseboards/unix.exp as board description file for target. Using /usr/share/dejagnu/config/unix.exp as generic interface file for target. Using /home/carll/GCC/gcc-builtin4/gcc/testsuite/config/default.exp as tool-and-target-specific interface file. Running /home/carll/GCC/gcc-builtin4/gcc/testsuite/gcc.target/powerpc/powerpc.exp ... === gcc Summary for unix/-m32 === # of unsupported tests 6 Running target unix/-m64 Using /usr/share/dejagnu/baseboards/unix.exp as board description file for target. Using /usr/share/dejagnu/config/unix.exp as generic interface file for target. Using /home/carll/GCC/gcc-builtin4/gcc/testsuite/config/default.exp as tool-and-target-specific interface file. Running /home/carll/GCC/gcc-builtin4/gcc/testsuite/gcc.target/powerpc/powerpc.exp ... === gcc Summary for unix/-m64 === # of expected passes12 === gcc Summary === # of expected passes12 # of unsupported tests 6 /home/carll/GCC/build/gcc-builtin4/gcc/xgcc version 8.0.1 20180209 (experimental) (GCC) On a Power 8 LE machine I ran: make -k check-gcc RUNTESTFLAGS="powerpc.exp=builtins-4-int128-runnable.c" > out The pertinent output was: Test Run By carll on Fri Feb 9 15:56:29 2018
C++ PATCH for c++/81917, ICE with self-referential partial specialization
In this testcase we were managing to instantiate the same class twice, leading to an ICE. Fixed by setting TYPE_BEING_DEFINED before we consider what partial specialization to use. This also causes several existing testcases to error about incomplete types rather than excess template instantiation depth, which seems appropriate. Tested x86_64-pc-linux-gnu, applying to trunk. commit 46063f6cb79e4f37040a326a2da1378e0e98cb38 Author: Jason Merrill Date: Fri Feb 9 14:26:03 2018 -0500 PR c++/81917 - ICE with void_t and partial specialization. * pt.c (instantiate_class_template_1): Set TYPE_BEING_DEFINED before calling most_specialized_partial_spec. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 9c57709e7a7..281604594ad 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -10347,14 +10347,14 @@ instantiate_class_template_1 (tree type) templ = most_general_template (CLASSTYPE_TI_TEMPLATE (type)); gcc_assert (TREE_CODE (templ) == TEMPLATE_DECL); + /* Mark the type as in the process of being defined. */ + TYPE_BEING_DEFINED (type) = 1; + /* Determine what specialization of the original template to instantiate. */ t = most_specialized_partial_spec (type, tf_warning_or_error); if (t == error_mark_node) -{ - TYPE_BEING_DEFINED (type) = 1; - return error_mark_node; -} +return error_mark_node; else if (t) { /* This TYPE is actually an instantiation of a partial @@ -10379,16 +10379,16 @@ instantiate_class_template_1 (tree type) /* If the template we're instantiating is incomplete, then clearly there's nothing we can do. */ if (!COMPLETE_TYPE_P (pattern)) -return type; +{ + /* We can try again later. */ + TYPE_BEING_DEFINED (type) = 0; + return type; +} /* If we've recursively instantiated too many templates, stop. */ if (! push_tinst_level (type)) return type; - /* Now we're really doing the instantiation. Mark the type as in - the process of being defined. */ - TYPE_BEING_DEFINED (type) = 1; - /* We may be in the middle of deferred access check. Disable it now. */ push_deferring_access_checks (dk_no_deferred); diff --git a/gcc/testsuite/g++.dg/cpp0x/alias-decl-62.C b/gcc/testsuite/g++.dg/cpp0x/alias-decl-62.C new file mode 100644 index 000..6f1fa4584ac --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/alias-decl-62.C @@ -0,0 +1,22 @@ +// PR c++/81917 +// { dg-do compile { target c++11 } } + +template using a = void; +template struct b +{ + typedef int c; +}; +template class b>; +template ::c> class f; +template class g { }; +template class h +{ + class i; + typedef g> j; + class i + { +j k; // { dg-error "incomplete" } + }; +}; +h H; + diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist-template2.C b/gcc/testsuite/g++.dg/cpp0x/initlist-template2.C index 40e3075c162..0df0d4e89c3 100644 --- a/gcc/testsuite/g++.dg/cpp0x/initlist-template2.C +++ b/gcc/testsuite/g++.dg/cpp0x/initlist-template2.C @@ -1,6 +1,5 @@ // PR c++/71747 // { dg-do compile { target c++11 } } -// { dg-options -ftemplate-depth=20 } template < bool > struct A { @@ -14,10 +13,8 @@ template < bool > struct A template < bool, typename = int > struct F; template < bool X > // should be: struct F < X, typename A < A < X > {} () >::type > -struct F < X, typename A < F < X > {} () >::type > // { dg-error "" } +struct F < X, typename A < F < X > {} () >::type > { }; -F < true > f; - -// { dg-prune-output "compilation terminated" } +F < true > f; // { dg-error "" } diff --git a/gcc/testsuite/g++.dg/template/crash125.C b/gcc/testsuite/g++.dg/template/crash125.C index 448f74682d4..de41b99a927 100644 --- a/gcc/testsuite/g++.dg/template/crash125.C +++ b/gcc/testsuite/g++.dg/template/crash125.C @@ -13,6 +13,4 @@ struct TraitCheckImpl > { typedef void Complete; }; -Swappable s; // { dg-error "depth" } - -// { dg-prune-output "compilation terminated" } +Swappable s; // { dg-error "" } diff --git a/gcc/testsuite/g++.dg/template/pr51488.C b/gcc/testsuite/g++.dg/template/pr51488.C index 4979a22787c..794a6cfe067 100644 --- a/gcc/testsuite/g++.dg/template/pr51488.C +++ b/gcc/testsuite/g++.dg/template/pr51488.C @@ -2,6 +2,4 @@ template struct s; template struct s::a> {}; -s ca; // { dg-error "depth" } - -// { dg-prune-output "compilation terminated" } +s ca; // { dg-error "" } diff --git a/gcc/testsuite/g++.dg/template/pr55843.C b/gcc/testsuite/g++.dg/template/pr55843.C index 467dd827218..04079ed1175 100644 --- a/gcc/testsuite/g++.dg/template/pr55843.C +++ b/gcc/testsuite/g++.dg/template/pr55843.C @@ -1,5 +1,3 @@ -// { dg-options "-ftemplate-depth-8" } - template< typename T > struct type_wrapper { }; typedef char (&yes_tag)[2]; @@ -7,11 +5,11 @@ template struct if_c { }; template< typename T > struct has_type { struct gcc_3_2_wknd { -template< typename U > static yes_tag test( type_wrapper cons
Re: [SFN+LVU+IEPM v4 9/9] [IEPM] Introduce inline entry point markers
On Feb 9, 2018, Jeff Law wrote: > On 02/08/2018 08:53 PM, Alan Modra wrote: >> On Fri, Feb 09, 2018 at 01:21:27AM -0200, Alexandre Oliva wrote: >>> Here's what I checked in, right after the LVU patch. >>> >>> [IEPM] Introduce inline entry point markers >> >> One of these two patches breaks ppc64le bootstrap with the assembler >> complaining "Error: view number mismatch" when compiling >> libdecnumber. >> > I've just passed along a similar failure (.i, .s and command line > options) to Alex for ppc64 (be) building glibc. Thanks. So, I'm told there are more such issues, that non-asm insn length attrs can't be relied on at this time to be nonzero only when the actual length is not zero. When we fail that and regard a zero-length insn as nonzero and that's all we have between two subsequent views, the assembler (GNU as 2.30) will catch and report the error. With other assemblers, the incorrect view reset will go unnoticed and result in incorrect debug info. So, as discussed on IRC, I'm trying to use a target hook to allow targets to indicate that their length attrs have been assessed for this purpose, and a param to make that overridable, but I'm having trouble initializing the param from the target hook. How does one do that? Meanwhile, here's the (WIP) patch that introduces the param defaulting to no locview optimizations (which can only be performed at points in which the compiler knows there are PC changes); I'd like it to default to targetm.attr_length_reliable_for_view_count, but I couldn't figure out how to do so. Help? By disabling it altogether, we won't get the assembler checks or incorrect view numbers in debug info, but we will get plenty of all-zero locview lists. Oh well... I guess at this point that's better than wrong debug info or assembler failures. disable locview optimizations for now --- gcc/doc/tm.texi| 13 + gcc/doc/tm.texi.in |2 ++ gcc/dwarf2out.c|4 +++- gcc/opts.c |6 ++ gcc/params.def |5 + gcc/target.def | 13 + 6 files changed, 42 insertions(+), 1 deletion(-) diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index ddf48cb4b4d2..50fa0d387f32 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -9986,6 +9986,19 @@ following it should not be run. Usually true only for virtual assembler targets. @end deftypevr +@deftypevr {Target Hook} bool TARGET_ATTR_LENGTH_RELIABLE_FOR_VIEW_COUNT +TRUE if get_min_attr_length returns +zero for any non-asm insn that might have length zero, during final. +It's ok if it's conservative and always returns zero. +If, in addition to the essential property, when an insn is known to have +nonzero length, get_min_attr_length returns a positive number, that will +enable loclist optimizations. There is little point in making this TRUE +otherwise. Enabling this when the essential property is not met while +using GNU as 2.30 or newer may cause the assembler to detect the errors +and fail to assemble; other assemblers will silently let the errors +through, with incorrect view numbers in debug information. +@end deftypevr + @defmac ASM_OUTPUT_DWARF_DELTA (@var{stream}, @var{size}, @var{label1}, @var{label2}) A C statement to issue assembly directives that create a difference @var{lab1} minus @var{lab2}, using an integer of the given @var{size}. diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 0aab45f4992c..26b32db77f74 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -6929,6 +6929,8 @@ tables, and hence is desirable if it works. @hook TARGET_NO_REGISTER_ALLOCATION +@hook TARGET_ATTR_LENGTH_RELIABLE_FOR_VIEW_COUNT + @defmac ASM_OUTPUT_DWARF_DELTA (@var{stream}, @var{size}, @var{label1}, @var{label2}) A C statement to issue assembly directives that create a difference @var{lab1} minus @var{lab2}, using an integer of the given @var{size}. diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index 749c7e3b9bbc..6a6520a05e8b 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -96,6 +96,7 @@ along with GCC; see the file COPYING3. If not see #include "stringpool.h" #include "attribs.h" #include "file-prefix-map.h" /* remap_debug_filename() */ +#include "params.h" static void dwarf2out_source_line (unsigned int, unsigned int, const char *, int, bool); @@ -26926,7 +26927,8 @@ dwarf2out_var_location (rtx_insn *loc_note) || GET_CODE (loc_note) == ASM_INPUT || asm_noperands (loc_note) >= 0) ; - else if (get_attr_min_length (loc_note) > 0) + else if (PARAM_VALUE (PARAM_USE_ATTR_LENGTH_IN_VIEW_COUNTING) + && get_attr_min_length (loc_note) > 0) RESET_NEXT_VIEW (cur_line_info_table->view); return; diff --git a/gcc/opts.c b/gcc/opts.c index f2795f98bf44..d4b3065caad6 100644 --- a/gcc/opts.c +++ b/gcc/opts.c @@ -1039,6 +1039,12 @@ finish_options (struct gcc_options *opts, struct gcc_options *opts_set, if ((opts->x_flag_saniti
Re: [PATCH, rs6000] (v2) Update vsx-vector-6-le.c tests
Hi Will, On Fri, Feb 09, 2018 at 09:44:28AM -0600, Will Schmidt wrote: > Noted during review of test results on P9. Due to changes and improvements, > our codegen is different for this test on power9. > Modified the existing test to target P8 and added a P9 variant with updated > counts. > > (v2) Updated/added dg-requires and dg-skip-if stanzas pre review and re-check > of the test results. > > Sniff-tested good, now runs clean on P9 where it did not before. Runs clean on > P8LE, P8BE, P7. > OK for trunk? Yes, thanks! Segher > 2018-02-09 Will Schmidt > > * gcc.target/powerpc/vsx-vector-6-le.c: Update dg-* stanzas. > * gcc.target/powerpc/vsx-vector-6-le.p9.c: New.
Re: [PATCH] Improve dead code elimination with -fsanitize=address (PR84307)
On February 9, 2018 7:07:45 PM GMT+01:00, Jakub Jelinek wrote: >On Fri, Feb 09, 2018 at 07:01:08PM +0100, Richard Biener wrote: >> >which indeed fixes the testcase and seems not to break asan.exp. >> >> Huh. Need to double check why that makes sense ;) > >I think it does, for both ASAN_CHECK and ASAN_MARK the pointer argument >is the second one, the first one is an integer argument with flags. >And ASAN_MARK, both poison and unpoison, works kind like a clobber on >the >referenced variable, before unpoison it is generally inaccessible and >after >poison too. Ah, indeed. Richard. > Jakub
Re: [RFC PATCH] avoid applying attributes to explicit specializations (PR 83871)
On 02/08/2018 04:52 PM, Martin Sebor wrote: I took me a while to find DECL_TEMPLATE_RESULT. Hopefully that's the right way to get the primary from a TEMPLATE_DECL. Yes. Attached is an updated patch. It hasn't gone through full testing yet but please let me know if you'd like me to make some changes. + const char* const whitelist[] = { + "error", "noreturn", "warning" + }; Why whitelist noreturn? I would expect to want that to be consistent. I expect noreturn to be used on a primary whose definition is provided but that's not meant to be used the way the API is otherwise expected to be. As in: template T [[noreturn]] foo () { throw "not implemented"; } template <> int foo(); // implemented elsewhere Marking that template as noreturn seems pointless, and possibly harmful; the deprecated, warning, or error attributes would be better for this situation. - /* Merge the type qualifiers. */ - if (TREE_READONLY (newdecl)) - TREE_READONLY (olddecl) = 1; if (TREE_THIS_VOLATILE (newdecl)) TREE_THIS_VOLATILE (olddecl) = 1; - if (TREE_NOTHROW (newdecl)) - TREE_NOTHROW (olddecl) = 1; + + if (merge_attr) + { + /* Merge the type qualifiers. */ + if (TREE_READONLY (newdecl)) + TREE_READONLY (olddecl) = 1; + } + else + { + /* Set the bits that correspond to the const function attributes. */ + TREE_READONLY (olddecl) = TREE_READONLY (newdecl); + } Let's limit the const/volatile handling here to non-functions, and handle the const/noreturn attributes for functions in the later hunk along with nothrow/malloc/pure. - /* Merge parameter attributes. */ + /* Merge or assign parameter attributes. */ tree oldarg, newarg; - for (oldarg = DECL_ARGUMENTS(olddecl), - newarg = DECL_ARGUMENTS(newdecl); - oldarg && newarg; - oldarg = DECL_CHAIN(oldarg), newarg = DECL_CHAIN(newarg)) { - DECL_ATTRIBUTES (newarg) - = (*targetm.merge_decl_attributes) (oldarg, newarg); - DECL_ATTRIBUTES (oldarg) = DECL_ATTRIBUTES (newarg); - } - + for (oldarg = DECL_ARGUMENTS (olddecl), +newarg = DECL_ARGUMENTS (newdecl); + oldarg && newarg; + oldarg = DECL_CHAIN (oldarg), newarg = DECL_CHAIN (newarg)) + { + DECL_ATTRIBUTES (newarg) + = (*targetm.merge_decl_attributes) (oldarg, newarg); + DECL_ATTRIBUTES (oldarg) = DECL_ATTRIBUTES (newarg); + } + I try to avoid reformatting code that I'm not actually changing, to avoid noise in e.g. git blame. Jason
Go patch committed: Track //go:nointerface in export data
In the Go frontend, the magic //go:nointerface comment, used for field tracking, was only implemented for conversions to interface types in the same package. This patch records it in the export data, so that it works as expected for types imported from a different package. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 257527) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -7e94bac5676afc8188677c98ecb263c78c1a7f8d +89105404f94005ffa8e2b08df78015dc9ac91362 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: gcc/go/gofrontend/gogo.cc === --- gcc/go/gofrontend/gogo.cc (revision 257527) +++ gcc/go/gofrontend/gogo.cc (working copy) @@ -5189,17 +5189,24 @@ Function::defer_stack(Location location) void Function::export_func(Export* exp, const std::string& name) const { - Function::export_func_with_type(exp, name, this->type_); + Function::export_func_with_type(exp, name, this->type_, + this->is_method() && this->nointerface()); } // Export a function with a type. void Function::export_func_with_type(Export* exp, const std::string& name, - const Function_type* fntype) + const Function_type* fntype, bool nointerface) { exp->write_c_string("func "); + if (nointerface) +{ + go_assert(fntype->is_method()); + exp->write_c_string("/*nointerface*/ "); +} + if (fntype->is_method()) { exp->write_c_string("("); @@ -5280,10 +5287,21 @@ Function::import_func(Import* imp, std:: Typed_identifier** preceiver, Typed_identifier_list** pparameters, Typed_identifier_list** presults, - bool* is_varargs) + bool* is_varargs, + bool* nointerface) { imp->require_c_string("func "); + *nointerface = false; + if (imp->match_c_string("/*")) +{ + imp->require_c_string("/*nointerface*/ "); + *nointerface = true; + + // Only a method can be nointerface. + go_assert(imp->peek_char() == '('); +} + *preceiver = NULL; if (imp->peek_char() == '(') { @@ -6213,6 +6231,32 @@ Bindings_snapshot::check_goto_defs(Locat // Class Function_declaration. +// Whether this declares a method. + +bool +Function_declaration::is_method() const +{ + return this->fntype_->is_method(); +} + +// Whether this method should not be included in the type descriptor. + +bool +Function_declaration::nointerface() const +{ + go_assert(this->is_method()); + return (this->pragmas_ & GOPRAGMA_NOINTERFACE) != 0; +} + +// Record that this method should not be included in the type +// descriptor. + +void +Function_declaration::set_nointerface() +{ + this->pragmas_ |= GOPRAGMA_NOINTERFACE; +} + // Return the function descriptor. Expression* Index: gcc/go/gofrontend/gogo.h === --- gcc/go/gofrontend/gogo.h(revision 257527) +++ gcc/go/gofrontend/gogo.h(working copy) @@ -1476,13 +1476,14 @@ class Function // Export a function with a type. static void export_func_with_type(Export*, const std::string& name, - const Function_type*); + const Function_type*, bool nointerface); // Import a function. static void import_func(Import*, std::string* pname, Typed_identifier** receiver, Typed_identifier_list** pparameters, - Typed_identifier_list** presults, bool* is_varargs); + Typed_identifier_list** presults, bool* is_varargs, + bool* nointerface); private: // Type for mapping from label names to Label objects. @@ -1607,6 +1608,10 @@ class Function_declaration location() const { return this->location_; } + // Return whether this function declaration is a method. + bool + is_method() const; + const std::string& asm_name() const { return this->asm_name_; } @@ -1628,6 +1633,16 @@ class Function_declaration this->pragmas_ = pragmas; } + // Whether this method should not be included in the type + // descriptor. + bool + nointerface() const; + + // Record that this method should not be included in the type + // descriptor. + void + set_nointerface(); + // Return an expression for the function descriptor, given the named // object for this function. This may only be called for functions // without a closure. This will be an immutable struct with one @@ -1652,7 +1667,10 @@ class Function_declaration // Export a function declaration. void export_func(Export* exp, const std::string& name) const - { Function
Re: [PATCH][C++] Fix PR84281
OK. On Fri, Feb 9, 2018 at 7:44 AM, Richard Biener wrote: > > The following patch optimizes the equal element case of > cxx_eval_vec_init_1 to use a RANGE_EXPR in the built CONSTRUCTOR > instead of repeating the same element over and over. This > cuts down memory use of the invalid testcase in the PR from > tens of GB to nothing and makes us rejct it instantanously > instead of by going OOM. > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > Ok for trunk? > > Thanks, > Richard. > > 2018-02-09 Richard Biener > > PR c++/84281 > * constexpr.c (cxx_eval_vec_init_1): Use a RANGE_EXPR to compact > uniform constructors and delay allocating them fully. > > Index: gcc/cp/constexpr.c > === > --- gcc/cp/constexpr.c (revision 257491) > +++ gcc/cp/constexpr.c (working copy) > @@ -2885,7 +2885,6 @@ cxx_eval_vec_init_1 (const constexpr_ctx >unsigned HOST_WIDE_INT max = tree_to_uhwi (array_type_nelts_top (atype)); >verify_ctor_sanity (ctx, atype); >vec **p = &CONSTRUCTOR_ELTS (ctx->ctor); > - vec_alloc (*p, max + 1); >bool pre_init = false; >unsigned HOST_WIDE_INT i; > > @@ -2978,13 +2977,14 @@ cxx_eval_vec_init_1 (const constexpr_ctx > { > if (new_ctx.ctor != ctx->ctor) > eltinit = new_ctx.ctor; > - for (i = 1; i < max; ++i) > - { > - idx = build_int_cst (size_type_node, i); > - CONSTRUCTOR_APPEND_ELT (*p, idx, unshare_constructor (eltinit)); > - } > + tree range = build2 (RANGE_EXPR, size_type_node, > + build_int_cst (size_type_node, 1), > + build_int_cst (size_type_node, max - 1)); > + CONSTRUCTOR_APPEND_ELT (*p, range, unshare_constructor (eltinit)); > break; > } > + else > + vec_safe_reserve (*p, max + 1); > } > >if (!*non_constant_p)
C++ PATCH for c++/84296, ICE with qualified-id
In my patch to fix template/inherit4.C after 83714, I made all SCOPE_REFs instantiation-dependent if the current class has dependent bases. This was excessive; only a reference to a non-static member of an unknown base needs to be treated as instantiation-dependent. Tested x86_64-pc-linux-gnu, applying to trunk. commit 6d409191cfcd993863e791672ae6de78444796b1 Author: Jason Merrill Date: Fri Feb 9 13:31:27 2018 -0500 PR c++/84296 - ICE with qualified-id in template. PR c++/83714 * pt.c (unknown_base_ref_p): New. (instantiation_dependent_scope_ref_p): Use it instead of any_dependent_bases_p. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index a9e47701527..9c57709e7a7 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -24012,6 +24012,30 @@ dependent_scope_p (tree scope) && !currently_open_class (scope)); } +/* T is a SCOPE_REF. Return whether it represents a non-static member of + an unknown base of 'this' (and is therefore instantiation-dependent). */ + +static bool +unknown_base_ref_p (tree t) +{ + if (!current_class_ptr) +return false; + + tree mem = TREE_OPERAND (t, 1); + if (shared_member_p (mem)) +return false; + + tree cur = current_nonlambda_class_type (); + if (!any_dependent_bases_p (cur)) +return false; + + tree ctx = TREE_OPERAND (t, 0); + if (DERIVED_FROM_P (ctx, cur)) +return false; + + return true; +} + /* T is a SCOPE_REF; return whether we need to consider it instantiation-dependent so that we can check access at instantiation time even though we know which member it resolves to. */ @@ -24021,9 +24045,7 @@ instantiation_dependent_scope_ref_p (tree t) { if (DECL_P (TREE_OPERAND (t, 1)) && CLASS_TYPE_P (TREE_OPERAND (t, 0)) - /* A dependent base could make a member inaccessible in the current -class. */ - && !any_dependent_bases_p () + && !unknown_base_ref_p (t) && accessible_in_template_p (TREE_OPERAND (t, 0), TREE_OPERAND (t, 1))) return false; diff --git a/gcc/testsuite/g++.dg/template/scope5.C b/gcc/testsuite/g++.dg/template/scope5.C new file mode 100644 index 000..629225cd556 --- /dev/null +++ b/gcc/testsuite/g++.dg/template/scope5.C @@ -0,0 +1,66 @@ +// PR c++/84296 + +namespace b {} +namespace c { +using namespace b; +} +namespace b { +template struct e { static const int f = d; }; +} +template struct g; +template +struct g : h::template ab {}; +struct k { + template struct m { typedef typename g::n o; }; +}; +template struct ac; +struct r { + typedef ac p; +}; +template struct s : k { + template + struct ab : q::template t::template ab {}; +}; +struct ad { + typedef int u; +}; +template struct ae; +template struct ah { + typedef ae ai; + typedef typename ai::template w::o n; +}; +struct x { + template struct ab : ah {}; +}; +struct y { + struct z { +template struct t : x {}; + }; + struct aj : s {}; +}; +template struct ak { + typedef y::aj al; + typedef typename al::m::o o; +}; +struct am { + enum { an }; +}; +template struct ao {}; +template struct ap : af::aq {}; +template <> struct ae { + template struct w; + template struct w { +typedef typename as::p o; + }; +}; +enum { a = b::e<0>::f }; +template class au; +template struct ac : ao { typedef c::e aq; }; +template void ay(aw, i, ax) { + au::o>::f> > az(); +} +void v() { + ad a; + void az(); + ay(az, a, v); +}
patch to fix PR57193
The following patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57193 The patch introduces a new heuristic to change order of coloring. Allocnos conflicting with other allocnos preferring hard registers are colored first when other higher-level heuristics can not differ them. On x86_64 this new heuristic results in practically the same SPEC2000 performance and code size. The patch was successfully bootstrapped and tested on x86-64. Committed as rev. 257537. Index: ChangeLog === --- ChangeLog (revision 257536) +++ ChangeLog (working copy) @@ -1,3 +1,15 @@ +2018-02-09 Vladimir Makarov + + PR rtl-optimization/57193 + * ira-color.c (struct allocno_color_data): Add member + conflict_allocno_hard_prefs. + (update_conflict_allocno_hard_prefs): New. + (bucket_allocno_compare_func): Add a preference based on + conflict_allocno_hard_prefs. + (push_allocno_to_stack): Update conflict_allocno_hard_prefs. + (color_allocnos): Remove a dead code. Initiate + conflict_allocno_hard_prefs. Call update_costs_from_prefs. + 2018-02-09 Jakub Jelinek PR target/84226 Index: testsuite/ChangeLog === --- testsuite/ChangeLog (revision 257536) +++ testsuite/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2018-02-09 Vladimir Makarov + + PR rtl-optimization/57193 + * gcc.target/i386/57193.c: New. + 2018-02-09 Jakub Jelinek PR target/84226 Index: ira-color.c === --- ira-color.c (revision 257157) +++ ira-color.c (working copy) @@ -112,6 +112,9 @@ struct allocno_color_data available for the allocno allocation. It is number of the profitable hard regs. */ int available_regs_num; + /* Sum of frequencies of hard register preferences of all + conflicting allocnos which are not the coloring stack yet. */ + int conflict_allocno_hard_prefs; /* Allocnos in a bucket (used in coloring) chained by the following two members. */ ira_allocno_t next_bucket_allocno; @@ -1435,6 +1438,36 @@ update_costs_from_copies (ira_allocno_t update_costs_from_allocno (allocno, hard_regno, 1, decr_p, record_p); } +/* Update conflict_allocno_hard_prefs of allocnos conflicting with + ALLOCNO. */ +static void +update_conflict_allocno_hard_prefs (ira_allocno_t allocno) +{ + int l, nr = ALLOCNO_NUM_OBJECTS (allocno); + + for (l = 0; l < nr; l++) +{ + ira_object_t conflict_obj, obj = ALLOCNO_OBJECT (allocno, l); + ira_object_conflict_iterator oci; + + FOR_EACH_OBJECT_CONFLICT (obj, conflict_obj, oci) + { + ira_allocno_t conflict_a = OBJECT_ALLOCNO (conflict_obj); + allocno_color_data_t conflict_data = ALLOCNO_COLOR_DATA (conflict_a); + ira_pref_t pref; + + if (!(hard_reg_set_intersect_p + (ALLOCNO_COLOR_DATA (allocno)->profitable_hard_regs, + conflict_data->profitable_hard_regs))) + continue; + for (pref = ALLOCNO_PREFS (allocno); + pref != NULL; + pref = pref->next_pref) + conflict_data->conflict_allocno_hard_prefs += pref->freq; + } +} +} + /* Restore costs of allocnos connected to ALLOCNO by copies as it was before updating costs of these allocnos from given allocno. This is a wise thing to do as if given allocno did not get an expected @@ -2223,7 +2256,7 @@ bucket_allocno_compare_func (const void { ira_allocno_t a1 = *(const ira_allocno_t *) v1p; ira_allocno_t a2 = *(const ira_allocno_t *) v2p; - int diff, freq1, freq2, a1_num, a2_num; + int diff, freq1, freq2, a1_num, a2_num, pref1, pref2; ira_allocno_t t1 = ALLOCNO_COLOR_DATA (a1)->first_thread_allocno; ira_allocno_t t2 = ALLOCNO_COLOR_DATA (a2)->first_thread_allocno; int cl1 = ALLOCNO_CLASS (a1), cl2 = ALLOCNO_CLASS (a2); @@ -2253,6 +2286,11 @@ bucket_allocno_compare_func (const void a2_num = ALLOCNO_COLOR_DATA (a2)->available_regs_num; if ((diff = a2_num - a1_num) != 0) return diff; + /* Push allocnos with minimal conflict_allocno_hard_prefs first. */ + pref1 = ALLOCNO_COLOR_DATA (a1)->conflict_allocno_hard_prefs; + pref2 = ALLOCNO_COLOR_DATA (a2)->conflict_allocno_hard_prefs; + if ((diff = pref1 - pref2) != 0) +return diff; return ALLOCNO_NUM (a2) - ALLOCNO_NUM (a1); } @@ -2339,7 +2377,8 @@ delete_allocno_from_bucket (ira_allocno_ /* Put allocno A onto the coloring stack without removing it from its bucket. Pushing allocno to the coloring stack can result in moving conflicting allocnos from the uncolorable bucket to the colorable - one. */ + one. Update conflict_allocno_hard_prefs of the conflicting + allocnos which are not on stack yet. */ static void push_allocno_to_stack (ira_allocno_t a) { @@ -2369,15 +2408,19 @@ push_allocno_to_stack (ira_allocno_t a) FOR_EACH_OBJECT_CONFLICT (obj, conflict_obj, oci) { ira_allocno_t conflict_a = OBJECT_ALLOCNO (conflict_obj); - + ira_pref_t p
Re: PR84239, Reimplement CET intrinsics for rdssp/incssp insn
On 02/09/2018 05:50 AM, Tsimbalist, Igor V wrote: Introduce a couple of new CET intrinsics for reading and updating a shadow stack pointer (_get_ssp and _inc_ssp), which are more user friendly. They replace the existing _rdssp[d|q] and _incssp[d|q] instrinsics. The _get_ssp intrinsic has more deterministic semantic: it returns a value of the shadow stack pointer if HW is CET capable and 0 otherwise. Ok for trunk? Just reviewing the documentation part: diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index cb9df97..9f25dd9 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -12461,6 +12461,7 @@ instructions, but allow the compiler to schedule those calls. * TILEPro Built-in Functions:: * x86 Built-in Functions:: * x86 transactional memory intrinsics:: +* x86 control-flow protection intrinsics:: @end menu @node AArch64 Built-in Functions @@ -21772,13 +21773,17 @@ void __builtin_ia32_wrpkru (unsigned int) unsigned int __builtin_ia32_rdpkru () @end smallexample -The following built-in functions are available when @option{-mcet} is used. -They are used to support Intel Control-flow Enforcment Technology (CET). -Each built-in function generates the machine instruction that is part of the -function's name. +The following built-in functions are available when @option{-mcet} or +@option{-mshstk} option is used. They support shadow stack +machine instructions from Intel Control-flow Enforcment Technology (CET). s/Enforcment/Enforcement/ +Each built-in function generates the machine instruction that is part +of the function's name. These are the internal low level functions. s/low level/low-level/ +Normally the functions in @ref{x86 control-flow protection intrinsics} +should be used instead. + @smallexample -unsigned int __builtin_ia32_rdsspd (unsigned int) -unsigned long long __builtin_ia32_rdsspq (unsigned long long) +unsigned int __builtin_ia32_rdsspd (void) +unsigned long long __builtin_ia32_rdsspq (void) void __builtin_ia32_incsspd (unsigned int) void __builtin_ia32_incsspq (unsigned long long) void __builtin_ia32_saveprevssp(void); @@ -21885,6 +21890,51 @@ else Note that, in most cases, the transactional and non-transactional code must synchronize together to ensure consistency. +@node x86 control-flow protection intrinsics +@subsection x86 Control-Flow Protection Intrinsics + +@deftypefn {CET Function} {ret_type} _get_ssp (void) +The @code{ret_type} is @code{unsigned long long} for x86-64 platform +and @code{unsigned int} for x86 pltform. I'd prefer the sentence about the return type be placed after the description of what the function does. And please fix typos: s/x86-64 platform/64-bit targets/ s/x86 pltform/32-bit targets/ +Get the current value of shadow stack pointer if shadow stack support +from Intel CET is enabled in the HW or @code{0} otherwise. s/HW/hardware,/ +@end deftypefn + +@deftypefn {CET Function} void _inc_ssp (unsigned int) +Increment the current shadow stack pointer by the size specified by the +function argument. For security reason only unsigned byte value is used +from the argument. Therefore for the size greater than @code{255} the +function should be called several times. How about rephrasing the last two sentences: The argument is masked to a byte value for security reasons, so to increment by more than 255 bytes you must call the function multiple times. +@end deftypefn + +The shadow stack unwind code looks like: + +@smallexample +#include + +/* Unwind the shadow stack for EH. */ +#define _Unwind_Frames_Extra(x)\ + do \ +@{ \ + _Unwind_Word ssp = _get_ssp (); \ + if (ssp != 0)\ + @{ \ + _Unwind_Word tmp = (x); \ + while (tmp > 255) \ + @{ \ + _inc_ssp (tmp); \ + tmp -= 255; \ + @} \ + _inc_ssp (tmp); \ + @} \ +@} \ +while (0) +@end smallexample Tabs in Texinfo input don't work well. Please use spaces to format code environments. + +@noindent +This code runs unconditionally on all x86-64 processors and all x86 +processors that support multi-byte NOP instructions. s/x86-64 and all x86/32-bit and 64-bit/ + @node Target Format Checks @section Format Checks Specific to Particular Target Machines -Sandra
Re: [PATCH] Improve dead code elimination with -fsanitize=address (PR84307)
On Fri, Feb 09, 2018 at 07:01:08PM +0100, Richard Biener wrote: > >which indeed fixes the testcase and seems not to break asan.exp. > > Huh. Need to double check why that makes sense ;) I think it does, for both ASAN_CHECK and ASAN_MARK the pointer argument is the second one, the first one is an integer argument with flags. And ASAN_MARK, both poison and unpoison, works kind like a clobber on the referenced variable, before unpoison it is generally inaccessible and after poison too. Jakub
Re: [PATCH] Improve dead code elimination with -fsanitize=address (PR84307)
On Fri, Feb 09, 2018 at 05:40:09PM +0100, Richard Biener wrote: > On February 9, 2018 5:08:24 PM GMT+01:00, Paolo Bonzini > wrote: > >Hi all, > > > >in this PR, a dead reference to a function pointer cannot be optimized > >out by the compiler because some ASAN_MARK UNPOISON calls, which are > >placed before the store, cause the containing struct to escape. > >(Without -fsanitize=address, the dead code is eliminated by the first > >DSE pass). > > > >The fix, which works at least for this testcase, is to copy part of the > >sanopt dead code elimination pass early, so that the compiler can see > >fewer UNPOISON calls. I am not sure this is general enough, due to > >the very limited data flow analysis done by > >sanitize_asan_mark_unpoison. > >Another possibility which I considered but did not implement is to mark > >the UNPOISON calls so that they do not cause the parameter to escape. > > I'd do this, thus assign proper fnspec attributes to the asan functions. It already uses ".R.." "fn spec". Jakub
Re: [PATCH] Avoid BSWAP with floating point modes on rs6000 (PR target/84226)
Hi Jakub, On Fri, Feb 09, 2018 at 07:43:16AM +0100, Jakub Jelinek wrote: > BSWAP is documented as: > > @item (bswap:@var{m} @var{x}) > Represents the value @var{x} with the order of bytes reversed, carried out > in mode @var{m}, which must be a fixed-point machine mode. > The mode of @var{x} must be @var{m} or @code{VOIDmode}. > > The fixed-point is something used widely in rtl.texi and is very confusing > now that we have FIXED_POINT_TYPE floats, I assume it talks about integral > modes or, because it is also used with vector modes, about INTEGRAL_MODE_P > modes. My understanding is that bswap on a vector integral mode is meant to > be bswap of each element individually. Right. The documentation could use some improvement ;-) > The rs6000 backend uses bswap not just on scalar integral modes and > vector integral modes, but also on V4SF and V2DF, but ICEs in simplify-rtx.c > where we don't expect bswap to be used on SF/DFmode (vector bswap is handled > as bswap of each element). > > The following patch adjusts the rs6000 backend to use well defined bswaps > on corresponding integral modes instead (and also does what we've done in > i386 backend years ago, avoid subreg on the lhs because it breaks combine > attempts to optimize it). > Bootstrapped/regtested on powerpc64{,le}-linux, on powerpc64-linux including > -m64/-m32, ok for trunk? > > 2018-02-09 Jakub Jelinek > > PR target/84226 > * config/rs6000/vsx.md (p9_xxbrq_v16qi): Change input operand > constraint from =wa to wa. Avoid a subreg on the output operand, > instead use a pseudo and subreg it in a move. > (p9_xxbrd_): Changed to ... > (p9_xxbrd_v2di): ... this insn, without VSX_D iterator. > (p9_xxbrd_v2df): New expander. > (p9_xxbrw_): Changed to ... > (p9_xxbrw_v4si): ... this insn, without VSX_W iterator. > (p9_xxbrw_v4sf): New expander. > > * gcc.target/powerpc/pr84226.c: New test. > +(define_expand "p9_xxbrd_v2df" > + [(use (match_operand:V2DF 0 "vsx_register_operand" "=wa")) > + (use (match_operand:V2DF 1 "vsx_register_operand" "wa"))] > + "TARGET_P9_VECTOR" > +{ > + rtx op0 = gen_reg_rtx (V2DImode); > + rtx op1 = gen_lowpart (V2DImode, operands[1]); > + emit_insn (gen_p9_xxbrd_v2di (op0, op1)); > + emit_move_insn (operands[0], gen_lowpart (V2DFmode, op0)); > + DONE; > +}) Okay for trunk with or without such an improvement. Thanks! Segher
Re: [PATCH] Improve dead code elimination with -fsanitize=address (PR84307)
On February 9, 2018 6:23:37 PM GMT+01:00, Paolo Bonzini wrote: >On 09/02/2018 17:40, Richard Biener wrote: >> On February 9, 2018 5:08:24 PM GMT+01:00, Paolo Bonzini > wrote: >>> Another possibility which I considered but did not implement is to >mark >>> the UNPOISON calls so that they do not cause the parameter to >escape. >> >> I'd do this, thus assign proper fnspec attributes to the asan >functions. > >Hmm, actually that might be as simple as fixing a typo: > >diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def >index 5970d0e..15d6151 100644 >--- a/gcc/internal-fn.def >+++ b/gcc/internal-fn.def >@@ -255,8 +255,8 @@ DEF_INTERNAL_FN (UBSAN_PTR, ECF_LEAF | ECF_NOTHROW, >".R.") > DEF_INTERNAL_FN (UBSAN_OBJECT_SIZE, ECF_LEAF | ECF_NOTHROW, NULL) > DEF_INTERNAL_FN (ABNORMAL_DISPATCHER, ECF_NORETURN, NULL) >DEF_INTERNAL_FN (BUILTIN_EXPECT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, >NULL) >-DEF_INTERNAL_FN (ASAN_CHECK, ECF_TM_PURE | ECF_LEAF | ECF_NOTHROW, >".R...") >-DEF_INTERNAL_FN (ASAN_MARK, ECF_LEAF | ECF_NOTHROW, ".R..") >+DEF_INTERNAL_FN (ASAN_CHECK, ECF_TM_PURE | ECF_LEAF | ECF_NOTHROW, >"..R..") >+DEF_INTERNAL_FN (ASAN_MARK, ECF_LEAF | ECF_NOTHROW, "..W.") >DEF_INTERNAL_FN (ASAN_POISON, ECF_LEAF | ECF_NOTHROW | ECF_NOVOPS, >NULL) >DEF_INTERNAL_FN (ASAN_POISON_USE, ECF_LEAF | ECF_NOTHROW | ECF_NOVOPS, >NULL) >DEF_INTERNAL_FN (ADD_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, >NULL) > >which indeed fixes the testcase and seems not to break asan.exp. Huh. Need to double check why that makes sense ;) >'W' is needed to avoid breaking the pr78541.c testcase, and I think it >makes sense since ASAN_MARK is "writing" the state of the object >(in the test case FRE moves a dereference across a poisoning). > >I'll look at it next week. Someone maybe should take a look at ubsan >fnspecs too. > >Paolo
Re: [PR c/84293] Unexpected strict-alias warning
On Fri, 9 Feb 2018, Nathan Sidwell wrote: > Joseph, are the C bits ok? Yes. -- Joseph S. Myers jos...@codesourcery.com
[PATCH] cortex-a57-fma-steering.c fixes (PR target/84272)
Hi! As mentioned in the PR, in the cortex-a57-fma-steering.c optimization pass we may use fma_node/fma_root_node data after they were deleted and leak leaf nodes because of an early continue; (the loop with to_process.safe_push (*child_iter); in the body will do nothing if list is empty, but freeing it is important). The following patch fixes it, Thomas as the pass author looked at the patch and provided feedback in the PR and Kyrill tested it on aarch64-linux. Ok for trunk? 2018-02-09 Jakub Jelinek PR target/84272 * config/aarch64/cortex-a57-fma-steering.c (fma_forest::merge_forest): Use ++iter rather than iter++ for std::list iterators. (func_fma_steering::dfs): Likewise. Don't delete nodes right away, defer deleting them until all nodes in the forest are processed. Do free even leaf nodes. Change to_process into auto_vec. * g++.dg/opt/pr84272.C: New test. --- gcc/config/aarch64/cortex-a57-fma-steering.c.jj 2018-02-09 06:44:24.990811997 +0100 +++ gcc/config/aarch64/cortex-a57-fma-steering.c2018-02-09 13:03:51.741447714 +0100 @@ -406,7 +406,7 @@ fma_forest::merge_forest (fma_forest *ot /* Update root nodes' pointer to forest. */ for (other_root_iter = other_roots->begin (); - other_root_iter != other_roots->end (); other_root_iter++) + other_root_iter != other_roots->end (); ++other_root_iter) (*other_root_iter)->set_forest (this); /* Remove other_forest from the list of forests and move its tree roots in @@ -847,14 +847,13 @@ func_fma_steering::dfs (void (*process_f void (*process_node) (fma_forest *, fma_node *), bool free) { - vec to_process; + auto_vec to_process; + auto_vec to_free; std::list::iterator forest_iter; - to_process.create (0); - /* For each forest. */ for (forest_iter = this->m_fma_forests.begin (); - forest_iter != this->m_fma_forests.end (); forest_iter++) + forest_iter != this->m_fma_forests.end (); ++forest_iter) { std::list::iterator root_iter; @@ -863,7 +862,7 @@ func_fma_steering::dfs (void (*process_f /* For each tree root in this forest. */ for (root_iter = (*forest_iter)->get_roots ()->begin (); - root_iter != (*forest_iter)->get_roots ()->end (); root_iter++) + root_iter != (*forest_iter)->get_roots ()->end (); ++root_iter) { if (process_root) process_root (*forest_iter, *root_iter); @@ -881,28 +880,30 @@ func_fma_steering::dfs (void (*process_f if (process_node) process_node (*forest_iter, node); - /* Absence of children might indicate an alternate root of a *chain*. -It's ok to skip it here as the chain will be renamed when -processing the canonical root for that chain. */ - if (node->get_children ()->empty ()) - continue; - for (child_iter = node->get_children ()->begin (); - child_iter != node->get_children ()->end (); child_iter++) + child_iter != node->get_children ()->end (); ++child_iter) to_process.safe_push (*child_iter); + + /* Defer freeing so that the process_node callback can access the +parent and children of the node being processed. */ if (free) + to_free.safe_push (node); + } + + if (free) + { + delete *forest_iter; + + while (!to_free.is_empty ()) { + fma_node *node = to_free.pop (); if (node->root_p ()) delete static_cast (node); else delete node; } } - if (free) - delete *forest_iter; } - - to_process.release (); } /* Build the dependency trees of FMUL and FMADD/FMSUB instructions. */ --- gcc/testsuite/g++.dg/opt/pr84272.C.jj 2018-02-09 12:59:28.215636324 +0100 +++ gcc/testsuite/g++.dg/opt/pr84272.C 2018-02-09 12:59:28.215636324 +0100 @@ -0,0 +1,23 @@ +// PR target/84272 +// { dg-do compile } +// { dg-options "-O2" } +// { dg-additional-options "-march=armv8-a -mtune=cortex-a57" { target aarch64-*-* } } + +struct A +{ + float b, c; + A (); + A (float, float, float); + float operator * (A) + { +float d = b * b + c * c; +return d; + } +}; + +void +foo () +{ + A g[1]; + A h (0, 0, h * g[2]); +} Jakub
Re: [PATCH] Improve dead code elimination with -fsanitize=address (PR84307)
On 09/02/2018 17:40, Richard Biener wrote: > On February 9, 2018 5:08:24 PM GMT+01:00, Paolo Bonzini > wrote: >> Another possibility which I considered but did not implement is to mark >> the UNPOISON calls so that they do not cause the parameter to escape. > > I'd do this, thus assign proper fnspec attributes to the asan functions. Hmm, actually that might be as simple as fixing a typo: diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index 5970d0e..15d6151 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -255,8 +255,8 @@ DEF_INTERNAL_FN (UBSAN_PTR, ECF_LEAF | ECF_NOTHROW, ".R.") DEF_INTERNAL_FN (UBSAN_OBJECT_SIZE, ECF_LEAF | ECF_NOTHROW, NULL) DEF_INTERNAL_FN (ABNORMAL_DISPATCHER, ECF_NORETURN, NULL) DEF_INTERNAL_FN (BUILTIN_EXPECT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) -DEF_INTERNAL_FN (ASAN_CHECK, ECF_TM_PURE | ECF_LEAF | ECF_NOTHROW, ".R...") -DEF_INTERNAL_FN (ASAN_MARK, ECF_LEAF | ECF_NOTHROW, ".R..") +DEF_INTERNAL_FN (ASAN_CHECK, ECF_TM_PURE | ECF_LEAF | ECF_NOTHROW, "..R..") +DEF_INTERNAL_FN (ASAN_MARK, ECF_LEAF | ECF_NOTHROW, "..W.") DEF_INTERNAL_FN (ASAN_POISON, ECF_LEAF | ECF_NOTHROW | ECF_NOVOPS, NULL) DEF_INTERNAL_FN (ASAN_POISON_USE, ECF_LEAF | ECF_NOTHROW | ECF_NOVOPS, NULL) DEF_INTERNAL_FN (ADD_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) which indeed fixes the testcase and seems not to break asan.exp. 'W' is needed to avoid breaking the pr78541.c testcase, and I think it makes sense since ASAN_MARK is "writing" the state of the object (in the test case FRE moves a dereference across a poisoning). I'll look at it next week. Someone maybe should take a look at ubsan fnspecs too. Paolo
Re: Please accept this commit for the trunk
Hi Mike, On Thu, Feb 08, 2018 at 06:06:05PM -0800, Mike Stump wrote: > On Feb 8, 2018, at 12:36 PM, Segher Boessenkool > wrote: > > > > On Wed, Feb 07, 2018 at 03:52:27PM -0800, Mike Stump wrote: > >> I dusted the pointed to patch off and check it in. Let us know how it > >> goes. > > > > I wanted to test this on the primary and secondary powerpc targets as > > well, but okay. > > I reviewed it, and it seemed to only trigger for darwin. Certainly doesn't > hurt to run a regression run and ensure that is the case. We'll find out if it regresses :-) > >> Does this resolve all of PR84113? If so, I can push the bug along. > > > > It makes bootstrap work. We don't know if it is correct otherwise. > > So, would be nice if someone could run a regression test. I'd do it by the > version just before the breakage, and then drop in the patch, and test again. > This minimizes all the other changes. Douglas has done a test (C family languages only it seems) at https://gcc.gnu.org/ml/gcc-testresults/2018-02/msg00374.html but it is hard to compare to previous results (not the same config, long ago, etc.) > >> What PR was the attachment url from? > > > > It is not from a PR, and it has never been sent to gcc-patches; it is > > from https://gcc.gnu.org/ml/gcc-testresults/2017-01/msg02971.html > > (attachment #2). > > Ah, that explains it. > > Sounds like 1, 3 and 4 also likely need to go it to make things nice. If > someone could regression test and let us know, that's likely the gating > factor. Yeah, it's not easy to accept those patches without proper regression testing, it's stage 4... But the patches probably are good. Segher
[Patch, Fortran] PR 84273: Reject allocatable passed-object dummy argument (proc_ptr_47.f90)
Hi all, the attached patch fixes some checking code for PASS arguments in procedure-pointer components, which does not properly account for the fact that the PASS argument needs to be polymorphic. [The reason for this issue is probably that PPCs were mostly implemented before polymorphism was available. The corresponding pass-arg checks for TBPs are ok.] The patch also fixes an invalid test case (which was detected thanks to Neil Carlson). It regtests cleanly on x86_64-linux-gnu. Ok for trunk? Cheers, Janus 2018-02-09 Janus Weil PR fortran/84273 * resolve.c (resolve_component): Fix checks of passed argument in procedure-pointer components. 2018-02-09 Janus Weil PR fortran/84273 * gfortran.dg/proc_ptr_47.f90: Fix invalid test case. * gfortran.dg/proc_ptr_comp_pass_4.f90: Fix and extend test case. Index: gcc/fortran/resolve.c === --- gcc/fortran/resolve.c (revision 257498) +++ gcc/fortran/resolve.c (working copy) @@ -13703,8 +13703,8 @@ resolve_component (gfc_component *c, gfc_symbol *s return false; } - /* Check for C453. */ - if (me_arg->attr.dimension) + /* Check for F03:C453. */ + if (CLASS_DATA (me_arg)->attr.dimension) { gfc_error ("Argument %qs of %qs with PASS(%s) at %L " "must be scalar", me_arg->name, c->name, me_arg->name, @@ -13713,7 +13713,7 @@ resolve_component (gfc_component *c, gfc_symbol *s return false; } - if (me_arg->attr.pointer) + if (CLASS_DATA (me_arg)->attr.class_pointer) { gfc_error ("Argument %qs of %qs with PASS(%s) at %L " "may not have the POINTER attribute", me_arg->name, @@ -13722,7 +13722,7 @@ resolve_component (gfc_component *c, gfc_symbol *s return false; } - if (me_arg->attr.allocatable) + if (CLASS_DATA (me_arg)->attr.allocatable) { gfc_error ("Argument %qs of %qs with PASS(%s) at %L " "may not be ALLOCATABLE", me_arg->name, c->name, Index: gcc/testsuite/gfortran.dg/proc_ptr_47.f90 === --- gcc/testsuite/gfortran.dg/proc_ptr_47.f90 (revision 257498) +++ gcc/testsuite/gfortran.dg/proc_ptr_47.f90 (working copy) @@ -21,13 +21,9 @@ contains function foo(A) -class(AA), allocatable :: A +class(AA) :: A type(AA) foo -if (.not.allocated (A)) then - allocate (A, source = AA (2, foo)) -endif - select type (A) type is (AA) foo = AA (3, foo) Index: gcc/testsuite/gfortran.dg/proc_ptr_comp_pass_4.f90 === --- gcc/testsuite/gfortran.dg/proc_ptr_comp_pass_4.f90 (revision 257498) +++ gcc/testsuite/gfortran.dg/proc_ptr_comp_pass_4.f90 (working copy) @@ -37,22 +37,23 @@ module m type :: t8 procedure(foo8), pass, pointer :: f8 ! { dg-error "must be of the derived type" } + procedure(foo9), pass, pointer :: f9 ! { dg-error "Non-polymorphic passed-object dummy argument" } end type contains subroutine foo1 (x1,y1) - type(t1) :: x1(:) + class(t1) :: x1(:) type(t1) :: y1 end subroutine subroutine foo2 (x2,y2) - type(t2),pointer :: x2 + class(t2),pointer :: x2 type(t2) :: y2 end subroutine subroutine foo3 (x3,y3) - type(t3),allocatable :: x3 + class(t3),allocatable :: x3 type(t3) :: y3 end subroutine @@ -69,4 +70,8 @@ contains integer :: i end function + subroutine foo9(x) + type(t8) :: x + end subroutine + end module m
Re: [PATCH, rs6000] Fix PR83926, ICE using __builtin_vsx_{div,udiv,mul}_2di builtins
On 2/9/18 10:51 AM, Segher Boessenkool wrote: >> +/* { dg-final { scan-assembler-times {\mvclzb\M} 2 } } */ >> +/* { dg-final { scan-assembler-times {\mvclzb\M} 2 } } */ This was a typo in the original test case I didn't notice. Looking closer, they meant one of these to be vclzd. I'll fix that, thanks. Peter
Re: [PATCH] Adjust PR84278 testcase
Richard Biener writes: > Got quoting of parens wrong. > > Tested on x86_64-unknown-linux-gnu, applied. > > Richard. > > 2018-02-09 Richard Biener > > PR tree-optimization/84278 > * gcc.target/i386/pr84278.c: Adjust regex. > > Index: gcc/testsuite/gcc.target/i386/pr84278.c > === > --- gcc/testsuite/gcc.target/i386/pr84278.c (revision 257491) > +++ gcc/testsuite/gcc.target/i386/pr84278.c (working copy) > @@ -15,4 +15,4 @@ void foo(void) > } > } > > -/* { dg-final { scan-assembler-not "\(%.sp\)" } } */ > +/* { dg-final { scan-assembler-not "\\\(%.sp\\\)" } } */ FWIW: /* { dg-final { scan-assembler-not {\(%.sp\)} } } */ works too and has saner escaping rules. Richard
Re: [PATCH, rs6000] Fix PR83926, ICE using __builtin_vsx_{div,udiv,mul}_2di builtins
Hi! On Thu, Feb 08, 2018 at 12:28:05PM -0600, Peter Bergner wrote: > On 2/8/18 10:38 AM, Peter Bergner wrote: > > * gcc.target/powerpc/builtins-1-be.c: Filter out gimple folding disabled > > message. Fix test for running in 32-bit mode. > > As we talked about offline, here's a bigger change to builtins-1-be.c that > cleans up the test a little more, since we generate xxlor in more cases > than just the __builtin_vec_or() call, so this change adds the -dp option > and we match the pattern name to verify we are getting as many as we expect > from that and that alone. This also splits the xxland and xxlandc into > their own matches, which match the source test cases use of vec_and() and > vec_andc(). > +/* { dg-final { scan-assembler-times {\mvclzb\M} 2 } } */ > +/* { dg-final { scan-assembler-times {\mvclzb\M} 2 } } */ Duplicate line. Fine otherwise, thanks! Segher
Re: [PATCH] Improve dead code elimination with -fsanitize=address (PR84307)
On February 9, 2018 5:08:24 PM GMT+01:00, Paolo Bonzini wrote: >Hi all, > >in this PR, a dead reference to a function pointer cannot be optimized >out by the compiler because some ASAN_MARK UNPOISON calls, which are >placed before the store, cause the containing struct to escape. >(Without -fsanitize=address, the dead code is eliminated by the first >DSE pass). > >The fix, which works at least for this testcase, is to copy part of the >sanopt dead code elimination pass early, so that the compiler can see >fewer UNPOISON calls. I am not sure this is general enough, due to >the very limited data flow analysis done by >sanitize_asan_mark_unpoison. >Another possibility which I considered but did not implement is to mark >the UNPOISON calls so that they do not cause the parameter to escape. I'd do this, thus assign proper fnspec attributes to the asan functions. Richard. >Any suggestions on how to proceed here (or approval is welcome too :))? > >Paolo > >2018-02-09 Paolo Bonzini > > * passes.def: add pass_sandce before first DSE pass. > * sanopt.c (pass_data_sandce, pass_sandce, make_pass_sandce): New. > * tree-pass.h (make_pass_sandce): Declare it. > >testsuite: >2018-02-09 Paolo Bonzini > > PR sanitizer/84307 > * gcc.dg/asan/pr84307.c: New test. > >diff --git a/gcc/passes.def b/gcc/passes.def >index 9802f08..180df50 100644 >--- a/gcc/passes.def >+++ b/gcc/passes.def >@@ -244,6 +244,7 @@ along with GCC; see the file COPYING3. If not see >only examines PHIs to discover const/copy propagation >opportunities. */ > NEXT_PASS (pass_phi_only_cprop); >+ NEXT_PASS (pass_sandce); > NEXT_PASS (pass_dse); > NEXT_PASS (pass_reassoc, true /* insert_powi_p */); > NEXT_PASS (pass_dce); >diff --git a/gcc/sanopt.c b/gcc/sanopt.c >index cd94638..23b8e6e 100644 >--- a/gcc/sanopt.c >+++ b/gcc/sanopt.c >@@ -906,6 +906,32 @@ sanopt_optimize (function *fun, bool >*contains_asan_mark) > > namespace { > >+const pass_data pass_data_sandce = >+{ >+ GIMPLE_PASS, /* type */ >+ "sandce", /* name */ >+ OPTGROUP_NONE, /* optinfo_flags */ >+ TV_NONE, /* tv_id */ >+ ( PROP_ssa | PROP_cfg | PROP_gimple_leh ), /* properties_required */ >+ 0, /* properties_provided */ >+ 0, /* properties_destroyed */ >+ 0, /* todo_flags_start */ >+ TODO_update_ssa, /* todo_flags_finish */ >+}; >+ >+class pass_sandce : public gimple_opt_pass >+{ >+public: >+ pass_sandce (gcc::context *ctxt) >+: gimple_opt_pass (pass_data_sandce, ctxt) >+ {} >+ >+ /* opt_pass methods: */ >+ virtual bool gate (function *) { return flag_sanitize & >SANITIZE_ADDRESS; } >+ virtual unsigned int execute (function *); >+ >+}; // class pass_sanopt >+ > const pass_data pass_data_sanopt = > { > GIMPLE_PASS, /* type */ >@@ -1244,6 +1270,31 @@ sanitize_rewrite_addressable_params (function >*fun) > } > > unsigned int >+pass_sandce::execute (function *fun) >+{ >+ basic_block bb; >+ bool contains_asan_mark = false; >+ >+ /* Try to remove redundant unpoisoning. */ >+ gimple_stmt_iterator gsi; >+ FOR_EACH_BB_FN (bb, fun) >+for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) >+ { >+ gimple *stmt = gsi_stmt (gsi); >+ if (gimple_call_internal_p (stmt, IFN_ASAN_MARK)) >+{ >+ contains_asan_mark = true; >+ break; >+} >+ } >+ >+ if (contains_asan_mark) >+sanitize_asan_mark_unpoison (); >+ >+ return 0; >+} >+ >+unsigned int > pass_sanopt::execute (function *fun) > { > basic_block bb; >@@ -1367,6 +1418,12 @@ pass_sanopt::execute (function *fun) > } // anon namespace > > gimple_opt_pass * >+make_pass_sandce (gcc::context *ctxt) >+{ >+ return new pass_sandce (ctxt); >+} >+ >+gimple_opt_pass * > make_pass_sanopt (gcc::context *ctxt) > { > return new pass_sanopt (ctxt); >diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h >index 93a6a99..d5eb055 100644 >--- a/gcc/tree-pass.h >+++ b/gcc/tree-pass.h >@@ -350,6 +350,7 @@ extern gimple_opt_pass *make_pass_tsan >(gcc::context *ctxt); > extern gimple_opt_pass *make_pass_tsan_O0 (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_sancov (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_sancov_O0 (gcc::context *ctxt); >+extern gimple_opt_pass *make_pass_sandce (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_lower_cf (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_refactor_eh (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_lower_eh (gcc::context *ctxt); >diff --git a/gcc/testsuite/gcc.dg/asan/pr84307.c >b/gcc/testsuite/gcc.dg/asan/pr84307.c >new file mode 100644 >index 000..6e1a197 >--- /dev/null >+++ b/gcc/testsuite/gcc.dg/asan/pr84307.c >@@ -0,0 +1,21 @@ >+/* PR middle-end/83185 */ >+/* { dg-do link } */ >+/* { dg-options "-O1" } */ >+ >+struct f { >+ void (*func)(void); >+}; >+ >+extern void link_error(void); >+extern int printf(const char *f, ...); >+ >+static inline struct f *gimme_null(struct f *result) >+{ >+ return 0; >
Re: [PATCH, rs6000] Fix PR83926, ICE using __builtin_vsx_{div,udiv,mul}_2di builtins
On 2/9/18 10:17 AM, Segher Boessenkool wrote: > On Thu, Feb 08, 2018 at 10:38:09AM -0600, Peter Bergner wrote: >> -/* { dg-final { scan-assembler-times "divd" 4 } } */ > >> +/* { dg-final { scan-assembler-times {\mdivd\M} 2 { target lp64 } } } */ > >> +/* { dg-final { scan-assembler-times {\mbl __divdi3\M} 2 { target ilp32 } } >> } */ > > Why does divd change from 4 to 2 times? Is it changed to muls? No, it's because "divd" was matching both divd and divdu insns, so it was overcounting before. Now that we use \m and \M word delimiters, we don't. There are only 2 vec_div() and 2 vec_udiv() calls, so we should only expect 2 for both. > Okay for trunk. Thanks! Thanks, committed. Peter
Re: PATCH to fix bogus warning with -Wstringop-truncation -g (PR tree-optimization/84228)
On 02/09/2018 03:54 AM, Jakub Jelinek wrote: On Fri, Feb 09, 2018 at 11:40:29AM +0100, Richard Biener wrote: I.e., having to track all pointers to d between the call to strncpy and the assignment of the nul and make sure none of them ends up used in a string function. It didn't seem the additional complexity would have been worth the effort (or the likely false negatives). Well, I'd just walk immediate uses of the VDEF of the strncpy call, not of the pointer argument. There will be exactly _one_ possible store (gimple_vdef () is non-NULL) that you need to verify (with using the current matching logic). But it'll skip non-store statements for you. Well, it should also punt on the immediate uses of the VDEF that have NULL gimple_vdef and the alias oracle says that might alias with that, i.e. warn about say strncpy (p, ...); foo (p); p[whatever] = '\0'; where foo is pure, because it might read the unterminated string, but don't warn about strncpy (p, ...); x = *q; p[whatever] = '\0'; if q[0] can't alias with p. Or just warn if there are any immediate uses of the strncpy VDEF that have gimple_vdef NULL, or non-NULL and aren't the zero store you are looking for. Thank you both for the suggestions. I'll try to remember to revisit this in stage 1, if only to get more experience with this sort of thing (gimple_vdef etc.) Martin
[PATCH rs6000] Fix for builtins-4-runnable.c testcase FAIL on power7/BE 32-bit
GCC maintainers: The following patch contains fixes for the builtins4-runnable.c test in 32-bit mode for the current GCC 8 branch. This is issue #290 in Bill Schmidt's issues. The int128 and uint128 variable types are not supported in 32-bit mode. The tests were moved to a new test file and restricted to run only when int128 type is supported. The patch passes the regression testing on Power 8 BE enabled for 64- bit and 32-bit modes. Additionally, the patch passes regression testing on Power 8 LE. Let me know if the patch looks OK or not. Carl Love gcc/testsuite/ChangeLog: 2018-02-09 Carl Love * gcc.target/powerpc/builtins-4-runnable.c (main): Move int128 and uint128 tests to new testfile. * gcc.target/powerpc/builtins-4-int128-runnable.c: New testfile for int128 and uint128 tests. * gcc.target/powerpc/powerpc.exp: Add builtins-4-int128- runnable.c to list oftorture tests. --- .../powerpc/builtins-4-int128-runnable.c | 108 + .../gcc.target/powerpc/builtins-4-runnable.c | 84 --- - gcc/testsuite/gcc.target/powerpc/powerpc.exp | 1 + 3 files changed, 109 insertions(+), 84 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-4-int128- runnable.c diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-4-int128- runnable.c b/gcc/testsuite/gcc.target/powerpc/builtins-4-int128- runnable.c new file mode 100644 index 000..a6973f7 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/builtins-4-int128-runnable.c @@ -0,0 +1,108 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vsx_hw } */ +/* { dg-options "-maltivec -mvsx" } */ + +#include +#include // vector + +#ifdef DEBUG +#include +#endif + +void abort (void); + +int main() { + int i; + __uint128_t data_u128[100]; + __int128_t data_128[100]; + + vector __int128_t vec_128_expected1, vec_128_result1; + vector __uint128_t vec_u128_expected1, vec_u128_result1; + signed long long zero = (signed long long) 0; + + for (i = 0; i < 100; i++) +{ + data_128[i] = i + 1280; + data_u128[i] = i + 1281; +} + + /* vec_xl() tests */ + + vec_128_expected1 = (vector __int128_t){1280}; + vec_128_result1 = vec_xl (zero, data_128); + + if (vec_128_expected1[0] != vec_128_result1[0]) +{ +#ifdef DEBUG + printf("Error: vec_xl(), vec_128_result1[0] = %lld %llu; ", + vec_128_result1[0] >> 64, + vec_128_result1[0] & (__int128_t)0x); + printf("vec_128_expected1[0] = %lld %llu\n", + vec_128_expected1[0] >> 64, + vec_128_expected1[0] & (__int128_t)0x); +#else + abort (); +#endif +} + + vec_u128_result1 = vec_xl (zero, data_u128); + vec_u128_expected1 = (vector __uint128_t){1281}; + if (vec_u128_expected1[0] != vec_u128_result1[0]) +{ +#ifdef DEBUG + printf("Error: vec_xl(), vec_u128_result1[0] = %lld; ", + vec_u128_result1[0] >> 64, + vec_u128_result1[0] & (__int128_t)0x); + printf("vec_u128_expected1[0] = %lld\n", + vec_u128_expected1[0] >> 64, + vec_u128_expected1[0] & (__int128_t)0x); +#else + abort (); +#endif +} + + /* vec_xl_be() tests */ + + vec_128_result1 = vec_xl_be (zero, data_128); +#ifdef __BIG_ENDIAN__ + vec_128_expected1 = (vector __int128_t){ (__int128_t)1280 }; +#else + vec_128_expected1 = (vector __int128_t){ (__int128_t)1280 }; +#endif + + if (vec_128_expected1[0] != vec_128_result1[0]) +{ +#ifdef DEBUG + printf("Error: vec_xl_be(), vec_128_result1[0] = %llu %llu;", + vec_128_result1[0] >> 64, + vec_128_result1[0] & 0x); + printf(" vec_128_expected1[0] = %llu %llu\n", + vec_128_expected1[0] >> 64, + vec_128_expected1[0] & 0x); +#else + abort (); +#endif +} + +#ifdef __BIG_ENDIAN__ + vec_u128_expected1 = (vector __uint128_t){ (__uint128_t)1281 }; +#else + vec_u128_expected1 = (vector __uint128_t){ (__uint128_t)1281 }; +#endif + + vec_u128_result1 = vec_xl_be (zero, data_u128); + + if (vec_u128_expected1[0] != vec_u128_result1[0]) +{ +#ifdef DEBUG + printf("Error: vec_xl_be(), vec_u128_result1[0] = %llu %llu;", + vec_u128_result1[0] >> 64, + vec_u128_result1[0] & 0x); + printf(" vec_u128_expected1[0] = %llu %llu\n", + vec_u128_expected1[0] >> 64, + vec_u128_expected1[0] & 0x); +#else + abort (); +#endif +} +} diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-4-runnable.c b/gcc/testsuite/gcc.target/powerpc/builtins-4-runnable.c index de9b916..35884b5 100644 --- a/gcc/testsuite/gcc.target/powerpc/builtins-4-runnable.c +++ b/gcc/testsuite/gcc.target/powerpc/builtins-4-runnable.c @@ -27,9 +27,6
Re: [PATCH, rs6000] Fix PR83926, ICE using __builtin_vsx_{div,udiv,mul}_2di builtins
On Thu, Feb 08, 2018 at 10:38:09AM -0600, Peter Bergner wrote: > On 2/6/18 10:36 AM, Peter Bergner wrote: > > On 2/6/18 10:20 AM, David Edelsohn wrote: > >> Do the gen_XXXdi3 calls work if you use SDI iterator instead of GPR > >> iterator, as Segher suggested? > > > > Well it works _if_ we use the first patch that changes the gen_* > > patterns. If we go this route, I agree we should use the SDI > > iterator instead of GPR. > > Actually, my bad. While bootstrapping this on a BE system, we get an > error when we attempt a 64-bit multiply in 32-bit mode. In this case, > the gen_muldi3() pattern calls expand_mult(DImode, ...) and the automatic > expand machinery notices the gen_muldi3() now allows DImode in the > !TARGET_POWERPC64 case and then calls gen_muldi3() to emit the multiply > and we go into infinite recursion. We don't have that problem in the > div/udiv case, because we call out to the lib routines, so no recursion. > Given this, I think we should probably go with the patch that modifies > vsx.md and guards the calls to gen_{div,udiv,mul}di3() with a TARGET_POWERPC64 > test. Okay, let's go with this, at least for now. Thanks for investigating. > -/* { dg-final { scan-assembler-times "divd" 4 } } */ > +/* { dg-final { scan-assembler-times {\mdivd\M} 2 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times {\mbl __divdi3\M} 2 { target ilp32 } } > } */ Why does divd change from 4 to 2 times? Is it changed to muls? Okay for trunk. Thanks! Segher
Re: [PATCH, rs6000] PR84220 fix altivec_vec_sld and vec_sldw intrinsic definitions
On Thu, 2018-02-08 at 17:48 -0600, Segher Boessenkool wrote: > Hi! > > On Wed, Feb 07, 2018 at 09:14:59AM -0600, Will Schmidt wrote: > > Our VEC_SLD definitions were mistakenly allowing the third argument to be > > of an invalid type, triggering an ICE (on invalid code) later in the build > > process. This fixes those definitions. The nearby VEC_SLDW definitions > > have > > the same issue, those have been fixed as part of this patch too. > > Testcases have been added to ensure we generate the 'invalid intrinsic' > > message as is appropriate, instead of ICEing. > > Giving proper credit, this was found by Peter Bergner while working a > > different issue. :-) > > > > Sniff-tests passed on P8. Doing larger reg-test across power systems now. > > OK for trunk? > > And,.. do we want this one backported too? > > > diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c > > index a68be51..26f9990 100644 > > --- a/gcc/config/rs6000/rs6000-c.c > > +++ b/gcc/config/rs6000/rs6000-c.c > > @@ -3654,39 +3654,39 @@ const struct altivec_builtin_types > > altivec_overloaded_builtins[] = { > >{ ALTIVEC_BUILTIN_VEC_SEL, ALTIVEC_BUILTIN_VSEL_16QI, > > RS6000_BTI_bool_V16QI, RS6000_BTI_bool_V16QI, RS6000_BTI_bool_V16QI, > > RS6000_BTI_bool_V16QI }, > >{ ALTIVEC_BUILTIN_VEC_SEL, ALTIVEC_BUILTIN_VSEL_16QI, > > RS6000_BTI_bool_V16QI, RS6000_BTI_bool_V16QI, RS6000_BTI_bool_V16QI, > > RS6000_BTI_unsigned_V16QI }, > >{ ALTIVEC_BUILTIN_VEC_SLD, ALTIVEC_BUILTIN_VSLDOI_4SF, > > -RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_V4SF, > > RS6000_BTI_NOT_OPAQUE }, > > +RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_INTSI }, > > It isn't clear to me what RS6000_BTI_NOT_OPAQUE means... rs6000-c.c says: > > /* For arguments after the last, we have RS6000_BTI_NOT_OPAQUE in >the opX fields. */ > > (whatever that means!), and the following code seems to allow anything in > such args? If you understand it, please update some comments somewhere? I do not... only got as far as figuring out it wasn't right for vec_sld*(). I'll poke around a bit to see what I can figure out. May need to punt to (???) to understand the intent.Mike?/Peter? A bit more context around that usage is: [rs6000-c.c: altivec_resolve_overloaded_builtin() ] ... /* For arguments after the last, we have RS6000_BTI_NOT_OPAQUE in the opX fields. */ for (; desc->code == fcode; desc++) { if ((desc->op1 == RS6000_BTI_NOT_OPAQUE || rs6000_builtin_type_compatible (types[0], desc->op1)) ... with that check repeated against types[1], desc->op2,.. etc. > >{ VSX_BUILTIN_VEC_XXPERMDI, VSX_BUILTIN_XXPERMDI_2DF, > XXPERMDI is the only other builtin that uses NOT_OPAQUE, does that suffer > from the same problem? If so, you can completely delete NOT_OPAQUE it > seems? Not sure. > So what is/was it for, that is what I wonder. > > Your patch looks fine if you can clear that up :-) Heh, Ok. > > > Segher >
[PATCH] Character length cleanup for Coarray Fortran library
Following the change to use size_t for Fortran character lengths (PR 78534), this patch modifies the Coarray ABI in a similar way. The single-image implementation that is included in libgfortran is updated, but this needs corresponding work in the OpenCoarray library as well for multi-image support. Damian, can you look into that? Regtested on x86_64-pc-linux-gnu, Ok for trunk? gcc/fortran/ChangeLog: 2018-02-09 Janne Blomqvist * gfortran.texi: Update Coarray API description. * trans-decl.c (gfc_build_builtin_function_decls): Use size_t for character lengths, int for exit codes. (generate_coarray_sym_init): Use size_t for character length. * trans-intrinsic.c (conv_co_collective): Likewise. * trans-stmt.c (gfc_trans_lock_unlock): Likewise. (gfc_trans_event_post_wait): Likewise. (gfc_trans_sync): Likewise. libgfortran/ChangeLog: 2018-02-09 Janne Blomqvist * caf/libcaf.h: Remove stdint.h include. (_gfortran_caf_register): Use size_t for character length. (_gfortran_caf_deregister): Likewise. (_gfortran_caf_sync_all): Likewise. (_gfortran_caf_sync_memory): Likewise. (_gfortran_caf_sync_images): Likewise. (_gfortran_caf_stop_numeric): Use int for exit code. (_gfortran_caf_stop_str): Use size_t for character length. (_gfortran_caf_error_stop_str): Likewise. (_gfortran_caf_error_stop): Use int for exit code. (_gfortran_caf_co_broadcast): Use size_t for character length. (_gfortran_caf_co_sum): Likewise. (_gfortran_caf_co_min): Likewise. (_gfortran_caf_co_max): Likewise. (_gfortran_caf_co_reduce): Likewise. (_gfortran_caf_lock): Likewise. (_gfortran_caf_unlock): Likewise. (_gfortran_caf_event_post): Likewise. (_gfortran_caf_event_wait): Likewise. * caf/mpi.c (_gfortran_caf_register): Update implementation to match prototype. (_gfortran_caf_deregister): Likewise. (_gfortran_caf_sync_all): Likewise. (_gfortran_caf_sync_images): Likewise. (_gfortran_caf_error_stop_str): Likewise. (_gfortran_caf_error_stop): Likewise. * caf/single.c (caf_internal_error): Likewise. (_gfortran_caf_register): Likewise. (_gfortran_caf_deregister): Likewise. (_gfortran_caf_sync_all): Likewise. (_gfortran_caf_sync_memory): Likewise. (_gfortran_caf_sync_images): Likewise. (_gfortran_caf_stop_numeric): Likewise. (_gfortran_caf_stop_str): Likewise. (_gfortran_caf_error_stop_str): Likewise. (_gfortran_caf_error_stop): Likewise. (_gfortran_caf_co_broadcast): Likewise. (_gfortran_caf_co_sum): Likewise. (_gfortran_caf_co_min): Likewise. (_gfortran_caf_co_max): Likewise. (_gfortran_caf_co_reduce): Likewise. (_gfortran_caf_event_post): Likewise. (_gfortran_caf_event_wait): Likewise. (_gfortran_caf_lock): Likewise. (_gfortran_caf_unlock): Likewise. --- gcc/fortran/gfortran.texi | 32 - gcc/fortran/trans-decl.c | 38 +++--- gcc/fortran/trans-intrinsic.c | 4 ++-- gcc/fortran/trans-stmt.c | 12 +- libgfortran/caf/libcaf.h | 37 ++--- libgfortran/caf/mpi.c | 28 +++--- libgfortran/caf/single.c | 55 ++- 7 files changed, 103 insertions(+), 103 deletions(-) diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi index 1124669..7c09bc4 100644 --- a/gcc/fortran/gfortran.texi +++ b/gcc/fortran/gfortran.texi @@ -4458,7 +4458,7 @@ NULL. @item @emph{Syntax}: @code{void caf_register (size_t size, caf_register_t type, caf_token_t *token, -gfc_descriptor_t *desc, int *stat, char *errmsg, int errmsg_len)} +gfc_descriptor_t *desc, int *stat, char *errmsg, size_t errmsg_len)} @item @emph{Arguments}: @multitable @columnfractions .15 .70 @@ -4510,7 +4510,7 @@ during a call to @code{_gfortran_caf_register}. @item @emph{Syntax}: @code{void caf_deregister (caf_token_t *token, caf_deregister_t type, -int *stat, char *errmsg, int errmsg_len)} +int *stat, char *errmsg, size_t errmsg_len)} @item @emph{Arguments}: @multitable @columnfractions .15 .70 @@ -4936,7 +4936,7 @@ which has already been locked by the same image is an error. @item @emph{Syntax}: @code{void _gfortran_caf_lock (caf_token_t token, size_t index, int image_index, -int *aquired_lock, int *stat, char *errmsg, int errmsg_len)} +int *aquired_lock, int *stat, char *errmsg, size_t errmsg_len)} @item @emph{Arguments}: @multitable @columnfractions .15 .70 @@ -4971,7 +4971,7 @@ which is unlocked or has been locked by a different image is an error. @item @emph{Syntax}: @code{void _gfortran_caf_unlock (caf_token_t token, size_t index, int image_index, -int *stat, char *errmsg, int
[PATCH] Improve dead code elimination with -fsanitize=address (PR84307)
Hi all, in this PR, a dead reference to a function pointer cannot be optimized out by the compiler because some ASAN_MARK UNPOISON calls, which are placed before the store, cause the containing struct to escape. (Without -fsanitize=address, the dead code is eliminated by the first DSE pass). The fix, which works at least for this testcase, is to copy part of the sanopt dead code elimination pass early, so that the compiler can see fewer UNPOISON calls. I am not sure this is general enough, due to the very limited data flow analysis done by sanitize_asan_mark_unpoison. Another possibility which I considered but did not implement is to mark the UNPOISON calls so that they do not cause the parameter to escape. Any suggestions on how to proceed here (or approval is welcome too :))? Paolo 2018-02-09 Paolo Bonzini * passes.def: add pass_sandce before first DSE pass. * sanopt.c (pass_data_sandce, pass_sandce, make_pass_sandce): New. * tree-pass.h (make_pass_sandce): Declare it. testsuite: 2018-02-09 Paolo Bonzini PR sanitizer/84307 * gcc.dg/asan/pr84307.c: New test. diff --git a/gcc/passes.def b/gcc/passes.def index 9802f08..180df50 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -244,6 +244,7 @@ along with GCC; see the file COPYING3. If not see only examines PHIs to discover const/copy propagation opportunities. */ NEXT_PASS (pass_phi_only_cprop); + NEXT_PASS (pass_sandce); NEXT_PASS (pass_dse); NEXT_PASS (pass_reassoc, true /* insert_powi_p */); NEXT_PASS (pass_dce); diff --git a/gcc/sanopt.c b/gcc/sanopt.c index cd94638..23b8e6e 100644 --- a/gcc/sanopt.c +++ b/gcc/sanopt.c @@ -906,6 +906,32 @@ sanopt_optimize (function *fun, bool *contains_asan_mark) namespace { +const pass_data pass_data_sandce = +{ + GIMPLE_PASS, /* type */ + "sandce", /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + TV_NONE, /* tv_id */ + ( PROP_ssa | PROP_cfg | PROP_gimple_leh ), /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + TODO_update_ssa, /* todo_flags_finish */ +}; + +class pass_sandce : public gimple_opt_pass +{ +public: + pass_sandce (gcc::context *ctxt) +: gimple_opt_pass (pass_data_sandce, ctxt) + {} + + /* opt_pass methods: */ + virtual bool gate (function *) { return flag_sanitize & SANITIZE_ADDRESS; } + virtual unsigned int execute (function *); + +}; // class pass_sanopt + const pass_data pass_data_sanopt = { GIMPLE_PASS, /* type */ @@ -1244,6 +1270,31 @@ sanitize_rewrite_addressable_params (function *fun) } unsigned int +pass_sandce::execute (function *fun) +{ + basic_block bb; + bool contains_asan_mark = false; + + /* Try to remove redundant unpoisoning. */ + gimple_stmt_iterator gsi; + FOR_EACH_BB_FN (bb, fun) +for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + if (gimple_call_internal_p (stmt, IFN_ASAN_MARK)) + { + contains_asan_mark = true; + break; + } + } + + if (contains_asan_mark) +sanitize_asan_mark_unpoison (); + + return 0; +} + +unsigned int pass_sanopt::execute (function *fun) { basic_block bb; @@ -1367,6 +1418,12 @@ pass_sanopt::execute (function *fun) } // anon namespace gimple_opt_pass * +make_pass_sandce (gcc::context *ctxt) +{ + return new pass_sandce (ctxt); +} + +gimple_opt_pass * make_pass_sanopt (gcc::context *ctxt) { return new pass_sanopt (ctxt); diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index 93a6a99..d5eb055 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -350,6 +350,7 @@ extern gimple_opt_pass *make_pass_tsan (gcc::context *ctxt); extern gimple_opt_pass *make_pass_tsan_O0 (gcc::context *ctxt); extern gimple_opt_pass *make_pass_sancov (gcc::context *ctxt); extern gimple_opt_pass *make_pass_sancov_O0 (gcc::context *ctxt); +extern gimple_opt_pass *make_pass_sandce (gcc::context *ctxt); extern gimple_opt_pass *make_pass_lower_cf (gcc::context *ctxt); extern gimple_opt_pass *make_pass_refactor_eh (gcc::context *ctxt); extern gimple_opt_pass *make_pass_lower_eh (gcc::context *ctxt); diff --git a/gcc/testsuite/gcc.dg/asan/pr84307.c b/gcc/testsuite/gcc.dg/asan/pr84307.c new file mode 100644 index 000..6e1a197 --- /dev/null +++ b/gcc/testsuite/gcc.dg/asan/pr84307.c @@ -0,0 +1,21 @@ +/* PR middle-end/83185 */ +/* { dg-do link } */ +/* { dg-options "-O1" } */ + +struct f { + void (*func)(void); +}; + +extern void link_error(void); +extern int printf(const char *f, ...); + +static inline struct f *gimme_null(struct f *result) +{ + return 0; +} + +int main(int argc, char **argv) +{ + struct f *x = gimme_null(&(struct f) { .func = link_error }); + printf("%p", x); +}
Re: [PATCH, rs6000] Update vsx-vector-6-le.c tests for p9 target
On Wed, 2018-02-07 at 12:28 -0600, Segher Boessenkool wrote: > Hi! > > On Wed, Feb 07, 2018 at 11:16:12AM -0600, Will Schmidt wrote: > > Noted during review of test results on P9. Due to changes and > > improvements, > > our codegen is different for this test on power9. > > Modified the existing test to target P8, and added a P9 variant with updated > > counts. > > > diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-le.c > > b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-le.c > > index ddb0089..7fe691b 100644 > > --- a/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-le.c > > +++ b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-le.c > > @@ -1,11 +1,11 @@ > > /* { dg-do compile { target { powerpc64le-*-* && lp64 } } } */ > > /* { dg-skip-if "" { powerpc*-*-darwin* } } */ > > /* { dg-require-effective-target powerpc_vsx_ok } */ > > -/* { dg-options "-mvsx -O2" } */ > > +/* { dg-options "-mvsx -O2 -mcpu=power8" } */ > > Why not -mcpu=power7? And you'll need > > /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { > "-mcpu=power7" } } */ > > You could also do instead > > /* { dg-skip-if "this is not for p9" { powerpc_p9vector_ok } } */ > > or something like that; a bit neater. I did try that, it had the effect of disabling the test if the system was able to generate P9 code, which included my p8 environment. :-) Ended up specifying "do not overrides" for both tests. (v2) has been posted. > > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-le.p9.c > > @@ -0,0 +1,32 @@ > > +/* { dg-do compile { target { powerpc64le-*-* && lp64 } } } */ > > +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ > > +/* { dg-require-effective-target powerpc_p9vector_ok } */ > > +/* { dg-options "-mvsx -O2 -mcpu=power9" } */ > > This needs the "do not override -mcpu" thing as well. > > > Segher >
[PATCH, rs6000] (v2) Update vsx-vector-6-le.c tests
Hi, Noted during review of test results on P9. Due to changes and improvements, our codegen is different for this test on power9. Modified the existing test to target P8 and added a P9 variant with updated counts. (v2) Updated/added dg-requires and dg-skip-if stanzas pre review and re-check of the test results. Sniff-tested good, now runs clean on P9 where it did not before. Runs clean on P8LE, P8BE, P7. OK for trunk? Thanks, -Will [testsuite] 2018-02-09 Will Schmidt * gcc.target/powerpc/vsx-vector-6-le.c: Update dg-* stanzas. * gcc.target/powerpc/vsx-vector-6-le.p9.c: New. diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-le.c b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-le.c index ddb0089..c3f795c 100644 --- a/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-le.c +++ b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-le.c @@ -1,11 +1,12 @@ /* { dg-do compile { target { powerpc64le-*-* && lp64 } } } */ /* { dg-skip-if "" { powerpc*-*-darwin* } } */ /* { dg-require-effective-target powerpc_vsx_ok } */ -/* { dg-options "-mvsx -O2" } */ +/* { dg-options "-mvsx -O2 -mcpu=power8" } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ -/* Expected instruction counts for Little Endian */ +/* Expected instruction counts for Little Endian targeting Power8. */ /* { dg-final { scan-assembler-times "xvabsdp" 1 } } */ /* { dg-final { scan-assembler-times "xvadddp" 1 } } */ /* { dg-final { scan-assembler-times "xxlnor" 8 } } */ /* { dg-final { scan-assembler-times "xxlor" 30 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-le.p9.c b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-le.p9.c new file mode 100644 index 000..290d4b4 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-le.p9.c @@ -0,0 +1,33 @@ +/* { dg-do compile { target { powerpc64le-*-* && lp64 } } } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-mvsx -O2 -mcpu=power9" } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ + +/* Expected instruction counts for Little Endian targeting Power9. */ + +/* { dg-final { scan-assembler-times "xvabsdp" 1 } } */ +/* { dg-final { scan-assembler-times "xvadddp" 1 } } */ +/* { dg-final { scan-assembler-times "xxlnor" 7 } } */ +/* { dg-final { scan-assembler-times "xxlor" 20 } } */ +/* { dg-final { scan-assembler-times "xvcmpeqdp" 5 } } */ +/* { dg-final { scan-assembler-times "xvcmpgtdp" 8 } } */ +/* { dg-final { scan-assembler-times "xvcmpgedp" 8 } } */ +/* { dg-final { scan-assembler-times "xvrdpim" 1 } } */ +/* { dg-final { scan-assembler-times "xvmaddadp" 1 } } */ +/* { dg-final { scan-assembler-times "xvmsubadp" 1 } } */ +/* { dg-final { scan-assembler-times "xvsubdp" 1 } } */ +/* { dg-final { scan-assembler-times "xvmaxdp" 1 } } */ +/* { dg-final { scan-assembler-times "xvmindp" 1 } } */ +/* { dg-final { scan-assembler-times "xvmuldp" 1 } } */ +/* { dg-final { scan-assembler-times "vperm" 1 } } */ +/* { dg-final { scan-assembler-times "xvrdpic" 1 } } */ +/* { dg-final { scan-assembler-times "xvsqrtdp" 1 } } */ +/* { dg-final { scan-assembler-times "xvrdpiz" 1 } } */ +/* { dg-final { scan-assembler-times "xvmsubasp" 1 } } */ +/* { dg-final { scan-assembler-times "xvnmaddasp" 1 } } */ +/* { dg-final { scan-assembler-times "vmsumshs" 1 } } */ +/* { dg-final { scan-assembler-times "xxland" 13 } } */ + +/* Source code for the test in vsx-vector-6.h */ +#include "vsx-vector-6.h"
Re: [SFN+LVU+IEPM v4 9/9] [IEPM] Introduce inline entry point markers
On 02/09/2018 03:34 AM, Alexandre Oliva wrote: > On Feb 9, 2018, Jeff Law wrote: > >> On 02/08/2018 08:53 PM, Alan Modra wrote: >>> On Fri, Feb 09, 2018 at 01:21:27AM -0200, Alexandre Oliva wrote: Here's what I checked in, right after the LVU patch. [IEPM] Introduce inline entry point markers >>> >>> One of these two patches breaks ppc64le bootstrap with the assembler >>> complaining "Error: view number mismatch" when compiling >>> libdecnumber. >>> >> I've just passed along a similar failure (.i, .s and command line >> options) to Alex for ppc64 (be) building glibc. > > This fixes at least the testcase Jeff provided me with. I'm going ahead > and checking it in as obvious. I suppose we might need more of these, > on this and other ports, if they have been sloppy about zero-length > pseudo insns :-( > > Would you guys please let me know whether you still see a problem, if > you get a chance to respin? I was just about to crash in bed when I saw > your email. > > When I get back up, I'll build the latest binutils release on ppc64, > ppc64el and aarch64, and then bootstrap gcc with it. I should have done > that when I broadened my testing of the SFN+LVU+IEPM patchset to those > platforms, but I didn't realize I was failing to test them with an > assembler with view support, doh! Sorry about that. No need for the binutils+gcc bootstrapping test when you get up. Mine's already run. ppc, ppc64, ppc64le, aarch64 all covered. My tester does have half-dozen or so other failures overnight, but I haven't looked at them yet. If any look similar I'll first check if we're dealing with a zero length pseudo insn (I wouldn't be surprised if blockage insns are consistently wrong on that). jeff
gotools patch committed: Increase test timeout
I'm finding that on my laptop the cmd/go tests sometimes time out when the laptop is heavily loaded because I'm running all the tests in parallel. The cmd/go tests are really a whole testsuite, not a single test. So bumping the timeout. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian 2018-02-09 Ian Lance Taylor * Makefile.am (GOTOOLS_TEST_TIMEOUT): Double value. Index: Makefile.am === --- Makefile.am (revision 257494) +++ Makefile.am (working copy) @@ -162,7 +162,7 @@ uninstall-local: GOTESTFLAGS = # Number of seconds before tests time out. -GOTOOLS_TEST_TIMEOUT = 240 +GOTOOLS_TEST_TIMEOUT = 480 # Run tests using the go tool, and frob the output to look like that # generated by DejaGNU. The main output of this is two files: Index: Makefile.in === --- Makefile.in (revision 257494) +++ Makefile.in (working copy) @@ -348,7 +348,7 @@ MOSTLYCLEANFILES = \ @NATIVE_TRUE@GOTESTFLAGS = # Number of seconds before tests time out. -@NATIVE_TRUE@GOTOOLS_TEST_TIMEOUT = 240 +@NATIVE_TRUE@GOTOOLS_TEST_TIMEOUT = 480 # CHECK_ENV sets up the environment to run the newly built go tool. # If you change this, change ECHO_ENV, below. @@ -635,8 +635,8 @@ distclean-generic: maintainer-clean-generic: @echo "This command is intended for maintainers to use" @echo "it deletes files that may require special tools to rebuild." -@NATIVE_FALSE@install-exec-local: @NATIVE_FALSE@uninstall-local: +@NATIVE_FALSE@install-exec-local: clean: clean-am clean-am: clean-binPROGRAMS clean-generic clean-noinstPROGRAMS \
Re: [PATCH][i386][3/3] PR target/84164: Make *cmpqi_ext_ patterns accept more zero_extract modes
Hi Uros, On 08/02/18 22:54, Uros Bizjak wrote: On Thu, Feb 8, 2018 at 6:11 PM, Kyrill Tkachov wrote: Hi all, This patch fixes some fallout in the i386 testsuite that occurs after the simplification in patch [1/3] [1]. The gcc.target/i386/extract-2.c FAILs because it expects to match: (set (reg:CC 17 flags) (compare:CC (subreg:QI (zero_extract:SI (reg:HI 98) (const_int 8 [0x8]) (const_int 8 [0x8])) 0) (const_int 4 [0x4]))) which is the *cmpqi_ext_2 pattern in i386.md but with the new simplification the combine/simplify-rtx machinery produces: (set (reg:CC 17 flags) (compare:CC (subreg:QI (zero_extract:HI (reg:HI 98) (const_int 8 [0x8]) (const_int 8 [0x8])) 0) (const_int 4 [0x4]))) Notice that the zero_extract now has HImode like the register source rather than SImode. The existing *cmpqi_ext_ patterns however explicitly demand an SImode on the zero_extract. I'm not overly familiar with the i386 port but I think that's too restrictive. The RTL documentation says: For (zero_extract:m loc size pos) "The mode m is the same as the mode that would be used for loc if it were a register." I'm not sure if that means that the mode of the zero_extract and the source register must always match (as is the case after patch [1/3]) but in any case it shouldn't matter semantically since we're taking a QImode subreg of the whole thing anyway. So the proposed solution in this patch is to allow HI, SI and DImode zero_extracts in these patterns as these are the modes that the ext_register_operand predicate accepts, so that the patterns can match the new form above. With this patch the aforementioned test passes again and bootstrap and testing on x86_64-unknown-linux-gnu shows no regressions. Is this ok for trunk if the first patch is accepted? Huh, there are many other zero-extract patterns besides cmpqi_ext_* with QImode subreg of SImode zero_extract in i386.md, used to access high QImode register of HImode pair. A quick grep shows these that have _ext_ in their name: (define_insn "*cmpqi_ext_1" (define_insn "*cmpqi_ext_2" (define_expand "cmpqi_ext_3" (define_insn "*cmpqi_ext_3" (define_insn "*cmpqi_ext_4" (define_insn "addqi_ext_1" (define_insn "*addqi_ext_2" (define_expand "testqi_ext_1_ccno" (define_insn "*testqi_ext_1" (define_insn "*testqi_ext_2" (define_insn_and_split "*testqi_ext_3" (define_insn "andqi_ext_1" (define_insn "*andqi_ext_1_cc" (define_insn "*andqi_ext_2" (define_insn "*qi_ext_1" (define_insn "*qi_ext_2" (define_expand "xorqi_ext_1_cc" (define_insn "*xorqi_ext_1_cc" There are also relevant splitters and peephole2 patterns. I see. Another approach I've looked at is removing the mode specifier from the zero_extract in these patterns. This means that they can be of any mode so they will match all of these modes without creating new patterns through iterators. That also works for the testcase and passes bootstrap and testing however there is the snag that the define_insns that don't start with a "*" are used to generate RTL through the gen_* mechanism and in that context the absence of a mode on the zero_extract would mean a VOIDmode zero_extract would be created, which I'm fairly sure is not good. So in my experiments I left those patterns alone (with an explicit SI on the zero_extract). IIRC, SImode zero_extract was enough to catch all high-register uses. There will be a pattern explosion if we want to handle all other integer modes here. However, I'm not a RTL expert, so someone will have to say what is the correct RTX form here. Jeff, Richard, could you please give us some guidance on this issue? Sorry for the trouble. Thanks, Kyrill Uros.
Re: PR84300, ICE in dwarf2cfi on ppc64le
On Fri, Feb 09, 2018 at 04:12:47PM +1030, Alan Modra wrote: > This PR is one of those with a really obvious cause, and fix. There's > nothing in the unspec rtl to say the insn needs lr! > Bootstrapped and regression tested powerpc64le-linux. OK? > > PR target/84300 > gcc/ > * config/rs6000/rs6000.md (split_stack_return): Use LR. > gcc/testsuite/ > * gcc.dg/pr84300.c: New. > > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md > index 33f0d95..287461f 100644 > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -13359,7 +13359,8 @@ (define_insn "load_split_stack_limit_si" > ;; Use r0 to stop regrename twiddling with lr restore insns emitted > ;; after the call to __morestack. > (define_insn "split_stack_return" > - [(unspec_volatile [(use (reg:SI 0))] UNSPECV_SPLIT_STACK_RETURN)] > + [(unspec_volatile [(use (reg:SI 0)) (use (reg:SI LR_REGNO))] > + UNSPECV_SPLIT_STACK_RETURN)] I'm not sure what a USE as input of an UNSPEC means -- it should work without the USEs? So please try without; otherwise okay of course. Thanks! Segher
Re: [PATCH][PR84213] DWARF: no location for non-definition DECLs with non-trivial DECL_VALUE_EXPR
On 02/09/2018 02:53 PM, Pierre-Marie de Rodat wrote: Pasto, or maybe I didn’t properly configure Vim on the machine I used. ;-) I’ll make sure these formatting issues are fixed before committing. Thanks! Committed! That was a Thunderbird pasto. I tried to update the PR. Hoping I did that well… -- Pierre-Marie de Rodat
[PATCH] Fix PR84037 some more
This improves SLP detection when swapping of operands is needed to match up stmts. Formerly we only considered swapping the not "matched" set of stmts but if we have a +- pair that might not have worked (also for other reasons). The following patch makes us instead see if we can eventually swap the operands in the set of matched stmts. This allows us to handle the + in the +- case as it happens in capacita. This doesn't get us to fully SLP the important loop but we now detect three out of four instances compared to just one before. This results in a speedup of 9.5% when using AVX2 ... if there were not the cost model rejecting the hybrid SLP vectorization now (like it does in the SSE case even without this patch). This means we're now using interleaving for this loop which improves runtime by "only" 7%. One of the reasons of the not profitable vectorization is the hybrid SLP which has shared stmts between the SLP instances and the interleaving instances - so tackling the last missed SLP group is next on my list. But using interleaving is also an improvement here. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2018-02-09 Richard Biener PR tree-optimization/84037 * tree-vect-slp.c (vect_build_slp_tree_2): Try swapping the matched stmts if we cannot swap the non-matched ones. Index: gcc/tree-vect-slp.c === --- gcc/tree-vect-slp.c (revision 257520) +++ gcc/tree-vect-slp.c (working copy) @@ -1308,37 +1308,65 @@ vect_build_slp_tree_2 (vec_info *vinfo, && nops == 2 && oprnds_info[1]->first_dt == vect_internal_def && is_gimple_assign (stmt) - && commutative_tree_code (gimple_assign_rhs_code (stmt)) - && ! two_operators /* Do so only if the number of not successful permutes was nor more than a cut-ff as re-trying the recursive match on possibly each level of the tree would expose exponential behavior. */ && *npermutes < 4) { - /* Verify if we can safely swap or if we committed to a specific -operand order already. */ - for (j = 0; j < group_size; ++j) - if (!matches[j] - && (swap[j] != 0 - || STMT_VINFO_NUM_SLP_USES (vinfo_for_stmt (stmts[j] - { - if (dump_enabled_p ()) - { - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, -"Build SLP failed: cannot swap operands " -"of shared stmt "); - dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, - stmts[j], 0); - } - goto fail; - } + /* See whether we can swap the matching or the non-matching +stmt operands. */ + bool swap_not_matching = true; + do + { + for (j = 0; j < group_size; ++j) + { + if (matches[j] != !swap_not_matching) + continue; + gimple *stmt = stmts[j]; + /* Verify if we can swap operands of this stmt. */ + if (!is_gimple_assign (stmt) + || !commutative_tree_code (gimple_assign_rhs_code (stmt))) + { + if (!swap_not_matching) + goto fail; + swap_not_matching = false; + break; + } + /* Verify if we can safely swap or if we committed to a +specific operand order already. +??? Instead of modifying GIMPLE stmts here we could +record whether we want to swap operands in the SLP +node and temporarily do that when processing it +(or wrap operand accessors in a helper). */ + else if (swap[j] != 0 + || STMT_VINFO_NUM_SLP_USES (vinfo_for_stmt (stmt))) + { + if (!swap_not_matching) + { + if (dump_enabled_p ()) + { + dump_printf_loc (MSG_MISSED_OPTIMIZATION, + vect_location, + "Build SLP failed: cannot swap " + "operands of shared stmt "); + dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, + TDF_SLIM, stmts[j], 0); + } + goto fail; + } + swap_not_matching = false; + break; + } + } + } + w
Re: [PATCH][PR84213] DWARF: no location for non-definition DECLs with non-trivial DECL_VALUE_EXPR
On 02/09/2018 02:47 PM, Richard Biener wrote: whitespace fixed here - vertical space missing before the comment and the first line of the comment (or cut&paste error with the patch?) Pasto, or maybe I didn’t properly configure Vim on the machine I used. ;-) I’ll make sure these formatting issues are fixed before committing. Thanks! -- Pierre-Marie de Rodat
Re: [PATCH][PR84213] DWARF: no location for non-definition DECLs with non-trivial DECL_VALUE_EXPR
On Fri, 9 Feb 2018, Pierre-Marie de Rodat wrote: > This patch restricts the set of cases in which we allow the generation of > location attributes for variables that are not defined in the current unit. > For such variables with complex DECL_VALUE_EXPR trees, generating a location > attribute can end up creating relocations to text symbols in the debug section > of LTO object files, which is not valid. > > Richard originally suggested to revert r248792 and then enhance > tree_add_const_value_attribute_for_decl to generate a DW_AT_location attribute > in the specific case of a DECL_VALUE_EXPR that is an > INDIRECT_REF(NOP_EXPR(INTEGER_CST)). I ended up with this patch because > changing a function called "tree_add_const_value_attribute*" to generate > instead a location attribute whereas there already exists a function called > "add_location_or_const_value_attribute" felt really wrong. ;-) Especially if > this would either re-implement part of the latter function in the former one. > > What I did instead was to make dwarf2out_late_global_decl call > add_location_or_const_value_attribute only when we know it's safe to do so > (i.e. when it won't generate relocations to text symbols). I have the feeling > that it is cleaner and from what I could see, it fixes the reported issue with > no regression. > > Bootstrapped and regtested on x86_64-linux. Ok to commit to trunk? Ok with ... > gcc/ > PR lto/84213 > * dwarf2out.c (is_trivial_indirect_ref): New function. > (dwarf2out_late_global_decl): Do not generate a location attribute for > variables that have a non-trivial DECL_VALUE_EXPR and that are not > defined in the current unit. > --- > gcc/dwarf2out.c | 31 +++ > 1 file changed, 27 insertions(+), 4 deletions(-) > > diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c > index 948b3cb..6538596 100644 > --- a/gcc/dwarf2out.c > +++ b/gcc/dwarf2out.c > @@ -25716,6 +25716,23 @@ dwarf2out_early_global_decl (tree decl) >symtab->global_info_ready = save; > } > +/* Return whether EXPR is an expression with the following pattern: > + INDIRECT_REF (NOP_EXPR (INTEGER_CST)). */ > + whitespace fixed here - vertical space missing before the comment and the first line of the comment (or cut&paste error with the patch?) Thanks, Richard. > +static bool > +is_trivial_indirect_ref (tree expr) > +{ > + if (expr == NULL_TREE || TREE_CODE (expr) != INDIRECT_REF) > +return false; > + > + tree nop = TREE_OPERAND (expr, 0); > + if (nop == NULL_TREE || TREE_CODE (nop) != NOP_EXPR) > +return false; > + > + tree int_cst = TREE_OPERAND (nop, 0); > + return int_cst != NULL_TREE && TREE_CODE (int_cst) == INTEGER_CST; > +} > + > /* Output debug information for global decl DECL. Called from > toplev.c after compilation proper has finished. */ > @@ -25740,11 +25757,17 @@ dwarf2out_late_global_decl (tree decl) >if (die) > { > /* We get called via the symtab code invoking late_global_decl > - for symbols that are optimized out. Do not add locations > - for those, except if they have a DECL_VALUE_EXPR, in which case > - they are relevant for debuggers. */ > + for symbols that are optimized out. > + > + Do not add locations for those, except if they have a > + DECL_VALUE_EXPR, in which case they are relevant for debuggers. > + Still don't add a location if the DECL_VALUE_EXPR is not a > trivial > + INDIRECT_REF expression, as this could generate relocations to > + text symbols in LTO object files, which is invalid. */ > varpool_node *node = varpool_node::get (decl); > - if ((! node || ! node->definition) && ! DECL_HAS_VALUE_EXPR_P > (decl)) > + if ((! node || ! node->definition) > + && ! (DECL_HAS_VALUE_EXPR_P (decl) > + && is_trivial_indirect_ref (DECL_VALUE_EXPR (decl > tree_add_const_value_attribute_for_decl (die, decl); > else > add_location_or_const_value_attribute (die, decl, false); > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Reduce inline limits a bit to compensate changes in inlining metrics
Hi, this patch addresses the code size regression by reducing max-inline-insns-auto 40->30 and increasing inline-min-speedup 8->15. The main reason why we need retuning is following - inline-min-speedup works in a way that if expected runtime of caller+calee combo after inlining reduces by more than 8% the inliner is going to bypass inline-insns-auto (because it knows the inlining is benefical rather than just inlining in hope it will be). The decrease either happens because callee is very lightweight at average or because we can track it will optimize well. During GCC 8 development I have switched time estimates from int to sreal. Original estimates was capping time to about 1000 instructions and thus large function rarely saw speedup because that was based comparing caped numbers. With sreals we can now track benefits better - We made quite some progress on early optimizations making function bodies to appear smaller to inliner which in turn inlines more of them. This is reason why we want to decrease inline-min-speedup to gain some code size back. The code size estimate difference at beggining of inlning is about 6% to gcc 6 and about 12% to gcc 4.9. I have benchmarked patch on Haswell SPEC2000, SPEC2006, polyhedron and our C++ benchmarks. Here I found no off-noise changes on SPEC2000/2006. I know that reducing inline-insns-auto to 10 still produces no regressions and even improves facerec 6600->8000 but that seems bit of effect of good luck (it also depends on setting of branch predictor weights and needs to be analyzed independently). min-speedup can be increased to 30 without measurable effects as well. On C++ benchmark suite I know that cray degrades with min-speedup set to 30 (it needs value of 22). Also there is degradation with profile-generate on tramp3d. So overall I believe that for Haswell the reduction of inline limits is doing very consistent code size improvement without perofrmance tradeoffs. I also tested Itanium and here things are slightly more sensitive. The reduction of limits affects gzip 337->332 (-1.5%), vpr 1000->980 (-2%), crafty (925->935) (+2%) and vortex (1165->1180) (+1%). So overall it is specint2000 neutral. Reducing inline-isns-auto to 10 brings off noise overall degradation by -1% and 20 is in-between. specfp2000 reacts positively by improving applu 520->525 (+1%) and mgrid 391->397 (+1.3%). It would let me to reduct inline-isns-auto to 10 without any other regressions. C++ benchmarks does not show any off-noise changes. I have also did some limited testing on ppc and arm. They reacted more similarly to Haswell also showing no important changes for reducing the inlining limits. Now reducing inline limits triggers failure of testsuite/g++.dg/pr83239.C which tests that inlining happens. The reason why it does not happen is becuae ipa-fnsplit is trying to second guess if inliner will evnetually consider function for inlining and the test is out of date. I decided to hack around it for stage4 and will try to clean these things up next stage1. Bootstraped/regtested x86_64-linux. I know it is late in stage4, but would it be OK to for GCC 8? PR middle-end/83665 * params.def (inline-min-speedup): Increase from 8 to 15. (max-inline-insns-auto): Decrease from 40 to 30. * ipa-split.c (consider_split): Add some buffer for function to be considered inlining candidate. * invoke.texi (max-inline-insns-auto, inline-min-speedup): UPdate default values. Index: params.def === --- params.def (revision 257520) +++ params.def (working copy) @@ -52,13 +52,13 @@ DEFPARAM (PARAM_PREDICTABLE_BRANCH_OUTCO DEFPARAM (PARAM_INLINE_MIN_SPEEDUP, "inline-min-speedup", "The minimal estimated speedup allowing inliner to ignore inline-insns-single and inline-insns-auto.", - 8, 0, 0) + 15, 0, 0) /* The single function inlining limit. This is the maximum size of a function counted in internal gcc instructions (not in real machine instructions) that is eligible for inlining by the tree inliner. - The default value is 450. + The default value is 400. Only functions marked inline (or methods defined in the class definition for C++) are affected by this. There are more restrictions to inlining: If inlined functions @@ -77,11 +77,11 @@ DEFPARAM (PARAM_MAX_INLINE_INSNS_SINGLE, that is applied to functions marked inlined (or defined in the class declaration in C++) given by the "max-inline-insns-single" parameter. - The default value is 40. */ + The default value is 30. */ DEFPARAM (PARAM_MAX_INLINE_INSNS_AUTO, "max-inline-insns-auto", "The maximum number of instructions when automatically inlining.", - 40, 0, 0) + 30, 0, 0) DEFPARAM (PARAM_MAX_INLINE_INSNS_RECURSIVE, "max-inline-insns-recursi
[PATCH][PR84213] DWARF: no location for non-definition DECLs with non-trivial DECL_VALUE_EXPR
This patch restricts the set of cases in which we allow the generation of location attributes for variables that are not defined in the current unit. For such variables with complex DECL_VALUE_EXPR trees, generating a location attribute can end up creating relocations to text symbols in the debug section of LTO object files, which is not valid. Richard originally suggested to revert r248792 and then enhance tree_add_const_value_attribute_for_decl to generate a DW_AT_location attribute in the specific case of a DECL_VALUE_EXPR that is an INDIRECT_REF(NOP_EXPR(INTEGER_CST)). I ended up with this patch because changing a function called "tree_add_const_value_attribute*" to generate instead a location attribute whereas there already exists a function called "add_location_or_const_value_attribute" felt really wrong. ;-) Especially if this would either re-implement part of the latter function in the former one. What I did instead was to make dwarf2out_late_global_decl call add_location_or_const_value_attribute only when we know it's safe to do so (i.e. when it won't generate relocations to text symbols). I have the feeling that it is cleaner and from what I could see, it fixes the reported issue with no regression. Bootstrapped and regtested on x86_64-linux. Ok to commit to trunk? gcc/ PR lto/84213 * dwarf2out.c (is_trivial_indirect_ref): New function. (dwarf2out_late_global_decl): Do not generate a location attribute for variables that have a non-trivial DECL_VALUE_EXPR and that are not defined in the current unit. --- gcc/dwarf2out.c | 31 +++ 1 file changed, 27 insertions(+), 4 deletions(-) diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index 948b3cb..6538596 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -25716,6 +25716,23 @@ dwarf2out_early_global_decl (tree decl) symtab->global_info_ready = save; } +/* Return whether EXPR is an expression with the following pattern: + INDIRECT_REF (NOP_EXPR (INTEGER_CST)). */ + +static bool +is_trivial_indirect_ref (tree expr) +{ + if (expr == NULL_TREE || TREE_CODE (expr) != INDIRECT_REF) +return false; + + tree nop = TREE_OPERAND (expr, 0); + if (nop == NULL_TREE || TREE_CODE (nop) != NOP_EXPR) +return false; + + tree int_cst = TREE_OPERAND (nop, 0); + return int_cst != NULL_TREE && TREE_CODE (int_cst) == INTEGER_CST; +} + /* Output debug information for global decl DECL. Called from toplev.c after compilation proper has finished. */ @@ -25740,11 +25757,17 @@ dwarf2out_late_global_decl (tree decl) if (die) { /* We get called via the symtab code invoking late_global_decl -for symbols that are optimized out. Do not add locations -for those, except if they have a DECL_VALUE_EXPR, in which case -they are relevant for debuggers. */ +for symbols that are optimized out. + +Do not add locations for those, except if they have a +DECL_VALUE_EXPR, in which case they are relevant for debuggers. +Still don't add a location if the DECL_VALUE_EXPR is not a trivial +INDIRECT_REF expression, as this could generate relocations to +text symbols in LTO object files, which is invalid. */ varpool_node *node = varpool_node::get (decl); - if ((! node || ! node->definition) && ! DECL_HAS_VALUE_EXPR_P (decl)) + if ((! node || ! node->definition) + && ! (DECL_HAS_VALUE_EXPR_P (decl) + && is_trivial_indirect_ref (DECL_VALUE_EXPR (decl tree_add_const_value_attribute_for_decl (die, decl); else add_location_or_const_value_attribute (die, decl, false); -- 2.1.4
Re: RFA: Sanitize deprecation messages (PR 84195)
Hi David, Hi Martin, OK, take 3 of the patch is attached. In this version: * The escaping is handled by a new class. * Self-tests have been added (and they helped find a bug - yay!) * The documentation has been extended to mention -fmessage-length's effect on the #-directives, and to add documentation for #pragma GCC error and #pragma GCC warning. (Which appeared to be missing). I did try to add backslash characters to the regexp in the new testcase but I just could not find a way to persuade dejagnu to accept them. OK to apply ? (Apologies for the poor C++ coding - it is not my strong point. I am an assembler level programmer at heart). Cheers Nick gcc/ChangeLog 2018-02-09 Nick Clifton PR 84195 * tree.c (escaped_string): New class. Converts an unescaped string into its escaped equivalent. (warn_deprecated_use): Use the new class to convert the deprecation message, if present. (test_escaped_strings): New self test. (test_c_tests): Add test_escaped_strings. gcc/testsuite/ChangeLog 2018-02-07 Nick Clifton * gcc.c-torture/compile/pr84195.c: New test. pr84195.patch.3 Description: Unix manual page
PR84239, Reimplement CET intrinsics for rdssp/incssp insn
Introduce a couple of new CET intrinsics for reading and updating a shadow stack pointer (_get_ssp and _inc_ssp), which are more user friendly. They replace the existing _rdssp[d|q] and _incssp[d|q] instrinsics. The _get_ssp intrinsic has more deterministic semantic: it returns a value of the shadow stack pointer if HW is CET capable and 0 otherwise. Ok for trunk? Igor 0001-Reimplement-CET-intrinsics-for-rdssp-incssp-insn.patch Description: 0001-Reimplement-CET-intrinsics-for-rdssp-incssp-insn.patch
[PR c/84293] Unexpected strict-alias warning
This bug comes from python sources. It has some type-punning macros to switch between representation, but these can cause unexpected strict-alias warnings, even though the punning's context is a system header file. (I guess that without macro expansion location information we'd've thought the context was user code.) As shown in the bug report, if the pun is spelt '(struct X *)&obj' we do not warn, but if it is '(X_t *)&obj' we do. We also don't warn in either case if we go via -save-temps. The patch changes things to not warn in either spelling. The problem is that strict_aliasing_warning uses input_location to identifier the current code point. But that's not necessarily the location of the above pun -- it can be the point where we use the pun. We cannot just use the incoming EXPR's location, because we may well have peeled some pieces off it, and we end up with the '&obj' fragment, which in this case is located in the user source. It also turns out that strict_aliasing_warning no longer uses the incoming 'otype' argument, immediately doing: STRIP_NOPS (expr); otype = TREE_TYPE (expr); So we may as well drop that arg, substituting a location_t from the caller. That's what this patch does. The callers in the C & C++ FEs pass the EXPR_LOCATION of the pointer being dereferenced, or converted, as appropriate. Joseph, are the C bits ok? nathan -- Nathan Sidwell 2018-02-09 Nathan Sidwell PR c/84293 gcc/c/ * c-typeck.c (build_indirect_ref, build_c_cast): Pass expr location to strict_aliasing_warning. gcc/c-family/ * c-common.h (strict_aliasing_warning): Drop OTYPE arg, insert LOC arg. * c-warn.c (strict_aliasing_warning): Drop OTYPE arg, require LOC arg. Adjust. gcc/cp/ * typeck.c (cp_build_indirect_ref_1, build_reinterpret_cast_1): Pass expr location to strict_aliasing_warning. gcc/testsuite/ * c-c++-common/pr84293.h: New. * c-c++-common/pr84293.c: New. Index: gcc/c/c-typeck.c === --- gcc/c/c-typeck.c (revision 257480) +++ gcc/c/c-typeck.c (working copy) @@ -2524,7 +2524,7 @@ build_indirect_ref (location_t loc, tree the backend. This only needs to be done at warn_strict_aliasing > 2. */ if (warn_strict_aliasing > 2) - if (strict_aliasing_warning (TREE_TYPE (TREE_OPERAND (pointer, 0)), + if (strict_aliasing_warning (EXPR_LOCATION (pointer), type, TREE_OPERAND (pointer, 0))) TREE_NO_WARNING (pointer) = 1; } @@ -5696,7 +5696,7 @@ build_c_cast (location_t loc, tree type, "of different size"); if (warn_strict_aliasing <= 2) -strict_aliasing_warning (otype, type, expr); +strict_aliasing_warning (EXPR_LOCATION (value), type, expr); /* If pedantic, warn for conversions between function and object pointer types, except for converting a null pointer constant Index: gcc/c-family/c-common.h === --- gcc/c-family/c-common.h (revision 257480) +++ gcc/c-family/c-common.h (working copy) @@ -1260,7 +1260,7 @@ extern void warn_tautological_cmp (locat extern void warn_logical_not_parentheses (location_t, enum tree_code, tree, tree); extern bool warn_if_unused_value (const_tree, location_t); -extern bool strict_aliasing_warning (tree, tree, tree); +extern bool strict_aliasing_warning (location_t, tree, tree); extern void sizeof_pointer_memaccess_warning (location_t *, tree, vec *, tree *, bool (*) (tree, tree)); Index: gcc/c-family/c-warn.c === --- gcc/c-family/c-warn.c (revision 257480) +++ gcc/c-family/c-warn.c (working copy) @@ -599,17 +599,21 @@ warn_if_unused_value (const_tree exp, lo } } -/* Print a warning about casts that might indicate violation - of strict aliasing rules if -Wstrict-aliasing is used and - strict aliasing mode is in effect. OTYPE is the original - TREE_TYPE of EXPR, and TYPE the type we're casting to. */ +/* Print a warning about casts that might indicate violation of strict + aliasing rules if -Wstrict-aliasing is used and strict aliasing + mode is in effect. LOC is the location of the expression being + cast, EXPR might be from inside it. TYPE is the type we're casting + to. */ bool -strict_aliasing_warning (tree otype, tree type, tree expr) +strict_aliasing_warning (location_t loc, tree type, tree expr) { + if (loc == UNKNOWN_LOCATION) +loc = input_location; + /* Strip pointer conversion chains and get to the correct original type. */ STRIP_NOPS (expr); - otype = TREE_TYPE (expr); + tree otype = TREE_TYPE (expr); if (!(flag_strict_aliasing && POINTER_TYPE_P (type) @@ -628,8 +632,9 @@ strict_aliasing_warning (tree otype, tre if the cast breaks type based aliasing. */ if (!COMPLETE_TYPE_P (TREE_TYPE (type)) && warn_strict_aliasing == 2) { - warning (OPT_Wstrict_aliasi
[PATCH][C++] Fix PR84281
The following patch optimizes the equal element case of cxx_eval_vec_init_1 to use a RANGE_EXPR in the built CONSTRUCTOR instead of repeating the same element over and over. This cuts down memory use of the invalid testcase in the PR from tens of GB to nothing and makes us rejct it instantanously instead of by going OOM. Bootstrapped and tested on x86_64-unknown-linux-gnu. Ok for trunk? Thanks, Richard. 2018-02-09 Richard Biener PR c++/84281 * constexpr.c (cxx_eval_vec_init_1): Use a RANGE_EXPR to compact uniform constructors and delay allocating them fully. Index: gcc/cp/constexpr.c === --- gcc/cp/constexpr.c (revision 257491) +++ gcc/cp/constexpr.c (working copy) @@ -2885,7 +2885,6 @@ cxx_eval_vec_init_1 (const constexpr_ctx unsigned HOST_WIDE_INT max = tree_to_uhwi (array_type_nelts_top (atype)); verify_ctor_sanity (ctx, atype); vec **p = &CONSTRUCTOR_ELTS (ctx->ctor); - vec_alloc (*p, max + 1); bool pre_init = false; unsigned HOST_WIDE_INT i; @@ -2978,13 +2977,14 @@ cxx_eval_vec_init_1 (const constexpr_ctx { if (new_ctx.ctor != ctx->ctor) eltinit = new_ctx.ctor; - for (i = 1; i < max; ++i) - { - idx = build_int_cst (size_type_node, i); - CONSTRUCTOR_APPEND_ELT (*p, idx, unshare_constructor (eltinit)); - } + tree range = build2 (RANGE_EXPR, size_type_node, + build_int_cst (size_type_node, 1), + build_int_cst (size_type_node, max - 1)); + CONSTRUCTOR_APPEND_ELT (*p, range, unshare_constructor (eltinit)); break; } + else + vec_safe_reserve (*p, max + 1); } if (!*non_constant_p)
[PATCH] POPCOUNT folding optimizations
The following patch implements a number of __builtin_popcount related optimizations. (i) popcount(x) == 0 can be simplified to x==0, and popcount(x) != 0 to x!=0. (ii) popcount(x&1) can be simplified to x&1, and for unsigned x, popcount(x>>31) to x>>31. (iii) popcount (x&6) + popcount(y&16) can be simplified to popcount((x&6)|(y&16)) These may seem obscure transformations, but performing these types of POPCOUNT operations are often the performance critical steps in some cheminformatics applications. To implement the above transformations I've introduced the tree_nonzero_bits function, which is a tree-level version of rtlanal's nonzero_bits used by the RTL optimizers. The following patch has been tested on x86_64-pc-linux-gnu with a "make bootstrap" and "make check" with no regressions, and passes for the four new gcc.dg test cases. Many thanks In advance. Best regards, Roger -- Roger Sayle, PhD. NextMove Software Limited Innovation Centre (Unit 23), Cambridge Science Park, Cambridge, CB4 0EY 2018-02-09 Roger Sayle * fold-const.c (tree_nonzero_bits): New function. * fold-const.h (tree_nonzero_bits): Likewise. * match.pd (POPCOUNT): New patterns to fold BUILTIN_POPCOUNT and friends. POPCOUNT(x&1) => x&1, POPCOUNT(x)==0 => x==0, etc. 2018-02-09 Roger Sayle * gcc.dg/fold-popcount-1.c: New testcase. * gcc.dg/fold-popcount-2.c: New testcase. * gcc.dg/fold-popcount-3.c: New testcase. * gcc.dg/fold-popcount-4.c: New testcase. /* { dg-do compile } */ /* { dg-options "-O2 -fdump-tree-original" } */ int test_eqzero(unsigned int a) { return __builtin_popcount(a) == 0; } int test_eqzerol(unsigned long b) { return __builtin_popcountl(b) == 0; } int test_eqzeroll(unsigned long long c) { return __builtin_popcountll(c) == 0; } int test_nezero(unsigned int d) { return __builtin_popcount(d) != 0; } int test_nezerol(unsigned long e) { return __builtin_popcountl(e) != 0; } int test_nezeroll(unsigned long long f) { return __builtin_popcountll(f) != 0; } /* { dg-final { scan-tree-dump-times "popcount" 0 "original" } } */ /* { dg-do compile } */ /* { dg-options "-O2 -fdump-tree-cddce1" } */ int test_andone(unsigned int a) { return __builtin_popcount(a&1); } int test_andonel(unsigned long b) { return __builtin_popcountl(b&1); } int test_andonell(unsigned long long c) { return __builtin_popcountll(c&1); } int test_oneand(unsigned int d) { return __builtin_popcount(1&d); } int test_oneandl(unsigned long e) { return __builtin_popcountl(1&e); } int test_oneandll(unsigned long long f) { return __builtin_popcountll(1&f); } /* { dg-final { scan-tree-dump-times "popcount" 0 "cddce1" } } */ /* { dg-do compile } */ /* { dg-options "-O2 -fdump-tree-cddce1" } */ int test_combine(unsigned int a, unsigned int b) { return __builtin_popcount(a&8) + __builtin_popcount(b&2); } /* { dg-final { scan-tree-dump-times "popcount" 1 "cddce1" } } */ /* { dg-do compile } */ /* { dg-options "-O2 -fdump-tree-cddce1" } */ int test_shiftmax(unsigned int a) { return __builtin_popcount(a>>(8*sizeof(a)-1)); } int test_shiftmaxl(unsigned long b) { return __builtin_popcountl(b>>(8*sizeof(b)-1)); } int test_shiftmaxll(unsigned long long c) { return __builtin_popcountll(c>>(8*sizeof(c)-1)); } int test_shift7(unsigned char d) { return __builtin_popcount(d>>7); } int test_shift7l(unsigned char e) { return __builtin_popcountl(e>>7); } int test_shift7ll(unsigned char f) { return __builtin_popcountll(f>>7); } int test_shift15(unsigned short g) { return __builtin_popcount(g>>15); } int test_shift15l(unsigned short h) { return __builtin_popcountl(h>>15); } int test_shift15ll(unsigned short i) { return __builtin_popcountll(i>>15); } /* { dg-final { scan-tree-dump-times "popcount" 0 "cddce1" } } */ Index: gcc/fold-const.c === --- gcc/fold-const.c(revision 257227) +++ gcc/fold-const.c(working copy) @@ -14580,6 +14580,75 @@ return string + offset; } +/* Given a tree T, compute which bits in T may be nonzero. */ + +wide_int +tree_nonzero_bits (const_tree t) +{ + switch (TREE_CODE (t)) +{ +case INTEGER_CST: + return wi::to_wide (t); +case SSA_NAME: + return get_nonzero_bits (t); +case NON_LVALUE_EXPR: +case SAVE_EXPR: + return tree_nonzero_bits (TREE_OPERAND (t, 0)); +case BIT_AND_EXPR: + return wi::bit_and (tree_nonzero_bits (TREE_OPERAND (t, 0)), + tree_nonzero_bits (TREE_OPERAND (t, 1))); +case BIT_IOR_EXPR: +case BIT_XOR_EXPR: + return wi::bit_or (tree_nonzero_bits (TREE_OPERAND (t, 0)), +tree_nonzero_bits (TREE_OPERAND (t, 1))); +case COND_EXPR: + return wi::bit_or (tree_nonzero_bits (TREE_OPERAND (t, 1)), +tree_nonzero_bits (TREE_OPERAND (t, 2))); +case NOP_EXPR: +case CONVERT_EXPR: + return wi
Re: [PATCH][GCC][ARM] Silence more re-definition warnings, make test case failure case more explicit.
Hi Tamar, On 07/02/18 18:35, Tamar Christina wrote: Hi All, The previous testcase would fail on a system where the initial mode is thumb and later switches to an arm mode. This would again cause some warnings to be emitted. This patch visits all builtins defined with builtin_define_with_int_value and undefines them if they could possibly change by changing the architecture. The testcase has also been updated to make it fail more easily on such cases. Bootstrapped and regtested on arm-none-Linux-gnueabihf and no issues. Ok for trunk? Ok. Thanks, Kyrill Thanks, Tamar gcc/ 2018-02-07 Tamar Christina PR target/82641 * config/arm/arm-c.c (arm_cpu_builtins): Un-define __ARM_FEATURE_LDREX, __ARM_ARCH_PROFILE, __ARM_ARCH_ISA_THUMB, __ARM_FP and __ARM_NEON_FP. gcc/testsuite 2018-02-07 Tamar Christina PR target/82641 * gcc.target/arm/pragma_arch_switch_2.c: Use armv6 and armv5t. --
Re: [SFN+LVU+IEPM v4 9/9] [IEPM] Introduce inline entry point markers
On Fri, Feb 09, 2018 at 08:34:08AM -0200, Alexandre Oliva wrote: > * config/rs6000/rs6000.md (blockage): Set length to zero. Thanks! This fixed the ppc64le libdecnumber error for me. -- Alan Modra Australia Development Lab, IBM
[Committed] S/390: Fix PR84295
Bootstrapped and regression tested with --with-arch=z14 and --with-arch=z900. Glibc build is clean with that patch using -mindirect-branch=thunk and -mfunction-return=thunk. gcc/ChangeLog: 2018-02-09 Andreas Krebbel PR target/PR84295 * config/s390/s390.c (s390_set_current_function): Invoke s390_indirect_branch_settings also if fndecl didn't change. gcc/testsuite/ChangeLog: 2018-02-09 Andreas Krebbel PR target/PR84295 * gcc.target/s390/pr84295.c: New test. --- gcc/config/s390/s390.c | 5 - gcc/testsuite/gcc.target/s390/pr84295.c | 14 ++ 2 files changed, 18 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/s390/pr84295.c diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 62a60e2..298fdd1 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -16135,7 +16135,10 @@ s390_set_current_function (tree fndecl) several times in the course of compiling a function, and we don't want to slow things down too much or call target_reinit when it isn't safe. */ if (fndecl == s390_previous_fndecl) -return; +{ + s390_indirect_branch_settings (fndecl); + return; +} tree old_tree; if (s390_previous_fndecl == NULL_TREE) diff --git a/gcc/testsuite/gcc.target/s390/pr84295.c b/gcc/testsuite/gcc.target/s390/pr84295.c new file mode 100644 index 000..4da43c3 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/pr84295.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -march=z900 -fgnu89-inline --save-temps -mfunction-return-reg=thunk -mindirect-branch-table" } */ + +extern void foo (void); +extern __inline void foo (void) {} +void foo (void) {} + +/* { dg-final { scan-assembler-times "jg\t__s390_indirect_jump" 1 } } */ +/* { dg-final { scan-assembler "ex\t" } } */ + +/* { dg-final { scan-assembler-not "section\t.s390_indirect_jump" } } */ +/* { dg-final { scan-assembler-not "section\t.s390_indirect_call" } } */ +/* { dg-final { scan-assembler "section\t.s390_return_reg" } } */ +/* { dg-final { scan-assembler-not "section\t.s390_return_mem" } } */ -- 2.9.1
Re: PATCH to fix bogus warning with -Wstringop-truncation -g (PR tree-optimization/84228)
On Fri, Feb 09, 2018 at 11:40:29AM +0100, Richard Biener wrote: > > I.e., having to track all pointers to d between the call to > > strncpy and the assignment of the nul and make sure none of > > them ends up used in a string function. It didn't seem > > the additional complexity would have been worth the effort > > (or the likely false negatives). > > Well, I'd just walk immediate uses of the VDEF of the > strncpy call, not of the pointer argument. There will be exactly _one_ > possible > store (gimple_vdef () is non-NULL) that you need to verify (with using > the current matching > logic). But it'll skip non-store statements for you. Well, it should also punt on the immediate uses of the VDEF that have NULL gimple_vdef and the alias oracle says that might alias with that, i.e. warn about say strncpy (p, ...); foo (p); p[whatever] = '\0'; where foo is pure, because it might read the unterminated string, but don't warn about strncpy (p, ...); x = *q; p[whatever] = '\0'; if q[0] can't alias with p. Or just warn if there are any immediate uses of the strncpy VDEF that have gimple_vdef NULL, or non-NULL and aren't the zero store you are looking for. Jakub
Re: [patch] Do not generate libcalls to __unord[sd]f2 on PowerPC SPE
On Fri, Feb 9, 2018 at 11:28 AM, Eric Botcazou wrote: > Hi, > > the change > > 2016-05-02 Marc Glisse > > * match.pd (X u< X, X u> X): New transformations. > > has an annoying effect on targets implementing (most) unordered comparisons in > hardware but not the UNORDERED operator itself, e.g. PowerPC SPE, because it > makes the compiler generate unnecessary libcalls to __unord[sd]f2. > > The attached patch contains a trick to undo the effect of the change at the > RTL level for such targets. It was tested on PowerPC SPE with a certified > version of VxWorks (which provides a minimal certified libgcc) where it fixes > the link failure of ACATS c45242b with optimization enabled. > > OK for the mainline? Ok. Richard. > > 2018-02-09 Eric Botcazou > > * optabs.c (prepare_cmp_insn): Try harder to emit a direct comparison > instead of a libcall for UNORDERED. > > -- > Eric Botcazou
Re: [PATCH/RFC] tree-if-conv.c: fix two ICEs seen with -fno-tree-forwprop (PR tree-optimization/84178)
On Thu, Feb 8, 2018 at 11:23 PM, David Malcolm wrote: > PR tree-optimization/84178 reports a couple of source files that ICE inside > ifcvt when compiled with -03 -fno-tree-forwprop (trunk and gcc 7). > > Both cases involve problems with ifcvt's per-BB gimplified predicates. > > Testcase 1 fails this assertion within release_bb_predicate during cleanup: > > 283 if (flag_checking) > 284 for (gimple_stmt_iterator i = gsi_start (stmts); > 285 !gsi_end_p (i); gsi_next (&i)) > 286 gcc_assert (! gimple_use_ops (gsi_stmt (i))); > > The testcase contains a division in the loop, which leads to > if_convertible_loop_p returning false (due to gimple_could_trap_p being true > for the division). This happens *after* the per-BB gimplified predicates > have been created in predicate_bbs (loop). > Hence tree_if_conversion bails out to "cleanup", but the gimplified predicates > exist and make use of SSA names; for example this conjunction for two BB > conditions: > > _4 = h4.1_112 != 0; > _175 = (signed char) _117; > _176 = _175 >= 0; > _174 = _4 & _176; > > is using SSA names. But then this shouldn't cause any stmt operands to be created - who is calling update_stmt () on a stmt using the SSA names? Maybe something calls force_gimple_operand_gsi to add to the sequence? > This assertion was added in r236498 (aka > c3deca2519d97c55876869c57cf11ae1e5c6cf8b): > > 2016-05-20 Richard Biener > > * tree-if-conv.c (add_bb_predicate_gimplified_stmts): Use > gimple_seq_add_seq_without_update. > (release_bb_predicate): Assert we have no operands to free. > (if_convertible_loop_p_1): Calculate post dominators later. > Do not free BB predicates here. > (combine_blocks): Do not recompute BB predicates. > (version_loop_for_if_conversion): Save BB predicates around > loop versioning. > > * gcc.dg/tree-ssa/ifc-cd.c: Adjust. > > The following patch fixes this by removing the assertion, and reinstating the > cleanup of the operands. But that was supposed to be not necessary... I'll note that simply restoring the old behavior is not ideal either - luckily we now have gimple_seq_discard () which should do a better job here (and actually does what the function comment says!). > > Testcase 2 segfaults inside update_ssa when called from > version_loop_for_if_conversion when a loop is versioned. > > During loop_version, some blocks are duplicated, and this can duplicate > SSA names, leading to the duplicated SSA names being added to > tree-into-ssa.c's old_ssa_names. > > For example, _117 is an SSA name defined in one of these to-be-duplicated > blocks, and hence is added to old_ssa_names when duplicated. > > One of the BBs has this gimplified predicate (again, these stmts are *not* > yet in a BB): > _173 = h4.1_112 != 0; > _171 = (signed char) _117; > _172 = _171 >= 0; > _170 = ~_172; > _169 = _170 & _173; > > Note the reference to SSA name _117 in the above. > > When update_ssa runs later on in version_loop_for_if_conversion, > SSA name _117 is in the old_ssa_names bitmap, and thus has > prepare_use_sites_for called on it: > > 2771 /* If an old name is in NAMES_TO_RELEASE, we cannot remove it from > 2772 OLD_SSA_NAMES, but we have to ignore its definition site. */ > 2773 EXECUTE_IF_SET_IN_BITMAP (old_ssa_names, 0, i, sbi) > 2774{ > 2775 if (names_to_release == NULL || !bitmap_bit_p > (names_to_release, i)) > 2776prepare_def_site_for (ssa_name (i), insert_phi_p); > 2777 prepare_use_sites_for (ssa_name (i), insert_phi_p); > 2778} > > prepare_use_sites_for iterates over the immediate users, which includes > the: > _171 = (signed char) _117; > in the gimplified predicate. > > This tried to call "mark_block_for_update" on the BB of this def_stmt, > leading to a read-through-NULL segfault, since this statement isn't in a > BB yet. > > With the caveat that this is at the limit of my understanding of the > innards of gimple, I'm wondering how this ever works: we have gimplified > predicates that aren't in a BB yet, which typically refer to > SSA names in the CFG proper, and we're attempting non-trivial manipulations > of that CFG that can e.g. duplicate those SSA names. > > The following patch fixes the 2nd ICE by inserting the gimplified predicates > earlier: immediately before the possible version_loop_for_if_conversion, > rather than within combine_blocks. That way they're in their BBs before > we attempt any further manipulation of the CFG. Hum, but that will alter both copies of the loops, no? I think the fix is to instead delay the update_ssa call to _after_ combine_blocks () (and remember if it is necessary just in case we didn't version). Richard. > This fixes the ICE, though it introduces a regression of > gcc.target/i386/avx2-vec-mask-bit-not.c > which no longer vectorizes for some reason (I haven't investigated >
Re: PATCH to fix bogus warning with -Wstringop-truncation -g (PR tree-optimization/84228)
On Thu, Feb 8, 2018 at 7:12 PM, Martin Sebor wrote: > On 02/08/2018 07:39 AM, Richard Biener wrote: >> >> On Thu, Feb 8, 2018 at 6:35 AM, Jeff Law wrote: >>> >>> On 02/06/2018 05:57 AM, Jakub Jelinek wrote: On Tue, Feb 06, 2018 at 01:46:21PM +0100, Marek Polacek wrote: > > --- gcc/testsuite/c-c++-common/Wstringop-truncation-3.c > +++ gcc/testsuite/c-c++-common/Wstringop-truncation-3.c > @@ -0,0 +1,20 @@ > +/* PR tree-optimization/84228 */ > +/* { dg-do compile } */ > +/* { dg-options "-Wstringop-truncation -O2 -g" } */ > + > +char *strncpy (char *, const char *, __SIZE_TYPE__); > +struct S > +{ > + char arr[64]; > +}; > + > +int > +foo (struct S *p1, const char *a) > +{ > + if (a) > +goto err; > + strncpy (p1->arr, a, sizeof p1->arr); /* { dg-bogus "specified > bound" } */ Just curious, what debug stmt is in between those? Wouldn't it be better to force them a couple of debug stmts? Say int b = 5, c = 6, d = 7; at the start of the function and b = 8; c = 9; d = 10; in between strncpy and the '\0' store? > + p1->arr[3] = '\0'; > +err: > + return 0; > +} > diff --git gcc/tree-ssa-strlen.c gcc/tree-ssa-strlen.c > index c3cf432a921..f0f6535017b 100644 > --- gcc/tree-ssa-strlen.c > +++ gcc/tree-ssa-strlen.c > @@ -1849,7 +1849,7 @@ maybe_diag_stxncpy_trunc (gimple_stmt_iterator > gsi, tree src, tree cnt) > >/* Look for dst[i] = '\0'; after the stxncpy() call and if found > avoid the truncation warning. */ > - gsi_next (&gsi); > + gsi_next_nondebug (&gsi); >gimple *next_stmt = gsi_stmt (gsi); > >if (!gsi_end_p (gsi) && is_gimple_assign (next_stmt)) Ok for trunk, though generally looking at just next stmt is very fragile, might be better to look at the strncpy's vuse immediate uses if they are within the same basic block and either don't alias with it, or are the store it is looking for, or something similar. >>> >>> Martin and I wandered down this approach a bit and ultimately decided >>> against it. While yes it could avoid a false positive by looking at the >>> immediate uses, but I'm not sure avoiding the false positive in those >>> cases is actually good! >>> >>> THe larger the separation between the strcpy and the truncation the more >>> likely it is that the code is wrong or at least poorly written and >>> deserves a developer looksie. >> >> >> But it's the next _GIMPLE_ stmt it looks at. Make it >> >> char d[3]; >> >> void f (const char *s, int x) >> { >> char d[x]; >> __builtin_strncpy (d, s, sizeof d); >> d[x-1] = 0; >> } >> >> and I bet it will again warn since x-1 is a separate GIMPLE stmt. > > > It doesn't but only because it doesn't know how to handle VLAs. > >> The patch is of course ok but still... simply looking at all >> immediate uses of the VDEF of the strncpy call, stopping >> at the single stmt with the "next" VDEF should be better >> (we don't have a next_vdef () helper). > > > IIRC, one of the problems I ran into was how to handle code > like this: > > void f (const char *s) > { > char d[8]; > char *p = strncpy (d, s, sizeof d); > > foo (p); > > d[7] = 0; > } > > I.e., having to track all pointers to d between the call to > strncpy and the assignment of the nul and make sure none of > them ends up used in a string function. It didn't seem > the additional complexity would have been worth the effort > (or the likely false negatives). Well, I'd just walk immediate uses of the VDEF of the strncpy call, not of the pointer argument. There will be exactly _one_ possible store (gimple_vdef () is non-NULL) that you need to verify (with using the current matching logic). But it'll skip non-store statements for you. Richard. > Martin >
Re: [SFN+LVU+IEPM v4 9/9] [IEPM] Introduce inline entry point markers
On Feb 9, 2018, Jeff Law wrote: > On 02/08/2018 08:53 PM, Alan Modra wrote: >> On Fri, Feb 09, 2018 at 01:21:27AM -0200, Alexandre Oliva wrote: >>> Here's what I checked in, right after the LVU patch. >>> >>> [IEPM] Introduce inline entry point markers >> >> One of these two patches breaks ppc64le bootstrap with the assembler >> complaining "Error: view number mismatch" when compiling >> libdecnumber. >> > I've just passed along a similar failure (.i, .s and command line > options) to Alex for ppc64 (be) building glibc. This fixes at least the testcase Jeff provided me with. I'm going ahead and checking it in as obvious. I suppose we might need more of these, on this and other ports, if they have been sloppy about zero-length pseudo insns :-( Would you guys please let me know whether you still see a problem, if you get a chance to respin? I was just about to crash in bed when I saw your email. When I get back up, I'll build the latest binutils release on ppc64, ppc64el and aarch64, and then bootstrap gcc with it. I should have done that when I broadened my testing of the SFN+LVU+IEPM patchset to those platforms, but I didn't realize I was failing to test them with an assembler with view support, doh! Sorry about that. for gcc/ChangeLog * config/rs6000/rs6000.md (blockage): Set length to zero. diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 33f0d959f5d0..8aa4e0e7c71e 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -11063,7 +11063,8 @@ (define_insn "blockage" [(unspec_volatile [(const_int 0)] UNSPECV_BLOCK)] "" - "") + "" + [(set_attr "length" "0")]) (define_expand "probe_stack_address" [(use (match_operand 0 "address_operand"))] -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
[patch] Do not generate libcalls to __unord[sd]f2 on PowerPC SPE
Hi, the change 2016-05-02 Marc Glisse * match.pd (X u< X, X u> X): New transformations. has an annoying effect on targets implementing (most) unordered comparisons in hardware but not the UNORDERED operator itself, e.g. PowerPC SPE, because it makes the compiler generate unnecessary libcalls to __unord[sd]f2. The attached patch contains a trick to undo the effect of the change at the RTL level for such targets. It was tested on PowerPC SPE with a certified version of VxWorks (which provides a minimal certified libgcc) where it fixes the link failure of ACATS c45242b with optimization enabled. OK for the mainline? 2018-02-09 Eric Botcazou * optabs.c (prepare_cmp_insn): Try harder to emit a direct comparison instead of a libcall for UNORDERED. -- Eric BotcazouIndex: optabs.c === --- optabs.c (revision 257404) +++ optabs.c (working copy) @@ -3935,7 +3935,20 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx if (methods != OPTAB_LIB_WIDEN) goto fail; - if (!SCALAR_FLOAT_MODE_P (mode)) + if (SCALAR_FLOAT_MODE_P (mode)) +{ + /* Small trick if UNORDERED isn't implemented by the hardware. */ + if (comparison == UNORDERED && rtx_equal_p (x, y)) + { + prepare_cmp_insn (x, y, UNLT, NULL_RTX, unsignedp, OPTAB_WIDEN, + ptest, pmode); + if (*ptest) + return; + } + + prepare_float_lib_cmp (x, y, comparison, ptest, pmode); +} + else { rtx result; machine_mode ret_mode; @@ -3982,8 +3995,6 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx prepare_cmp_insn (x, y, comparison, NULL_RTX, unsignedp, methods, ptest, pmode); } - else -prepare_float_lib_cmp (x, y, comparison, ptest, pmode); return;
Fix ICE with scalar_storage_order attribute at -O
This is a regression present on the mainline in the form of an assertion failure in optimize_bitfield_assignment_op with a scalar_storage_order attribute specifying the reverse endianness wrt the native endianness. I put this assertion because I failed to cover the path at the time but things have changed since then and the attached testcase triggers it now (and also checks that the generated code is correct in this case). Tested on x86_64-suse-linux, applied on the mainline. 2018-02-09 Eric Botcazou * expr.c (optimize_bitfield_assignment_op): Remove obsolete assertion. 2018-02-09 Eric Botcazou * gnat.dg/sso8.adb: New test. * gnat.dg/sso8_pkg.ads: New helper. -- Eric BotcazouIndex: expr.c === --- expr.c (revision 257404) +++ expr.c (working copy) @@ -4726,8 +4726,6 @@ optimize_bitfield_assignment_op (poly_ui } else if (!REG_P (str_rtx) && GET_CODE (str_rtx) != SUBREG) return false; - else -gcc_assert (!reverse); /* If the bit field covers the whole REG/MEM, store_field will likely generate better code. */ with Interfaces; with System; with Unchecked_Conversion; package SSO8_Pkg is Val8 : Interfaces.Unsigned_8; type Two_Bit_Int is range 0 .. 3; for Two_Bit_Int'size use 2; type Arr is array (1 .. 5) of Boolean; for Arr'scalar_storage_order use System.High_Order_First; pragma Pack (Arr); type Rec is record Boolean_Data : Boolean; Array_Data : Arr; Two_Bit_Data : Two_Bit_Int; end record; for Rec use record Boolean_Data at 0 range 0 .. 0; Array_Data at 0 range 1 .. 5; Two_Bit_Data at 0 range 6 .. 7; end record; for Rec'size use 8; for Rec'bit_order use System.High_Order_First; for Rec'scalar_storage_order use System.High_Order_First; function Conv is new Unchecked_Conversion (Rec, Interfaces.Unsigned_8); end SSO8_Pkg; -- { dg-do run } -- { dg-options "-O" } with Interfaces; use Interfaces; with SSO8_Pkg; use SSO8_Pkg; procedure SSO8 is Data : Rec; begin Data.Array_Data (2) := True; Val8 := Conv (Data); if Val8 /= 32 then raise Program_Error; end if; end;
Re: RFA: Fix PR 68028: LTO error when compiling PowerPC binaries with single precision floating point
Hi Segher, > I thought you were going to do a patch like the following, to make the > e500 cores less special (they are not): Sorry - my bad. I defer to your patch then. Whatever it takes to get the BZ fixed and off the books... :-) Cheers Nick
[PATCH] Adjust PR84278 testcase
Got quoting of parens wrong. Tested on x86_64-unknown-linux-gnu, applied. Richard. 2018-02-09 Richard Biener PR tree-optimization/84278 * gcc.target/i386/pr84278.c: Adjust regex. Index: gcc/testsuite/gcc.target/i386/pr84278.c === --- gcc/testsuite/gcc.target/i386/pr84278.c (revision 257491) +++ gcc/testsuite/gcc.target/i386/pr84278.c (working copy) @@ -15,4 +15,4 @@ void foo(void) } } -/* { dg-final { scan-assembler-not "\(%.sp\)" } } */ +/* { dg-final { scan-assembler-not "\\\(%.sp\\\)" } } */
Re: [PATCH] Avoid BSWAP with floating point modes on rs6000 (PR target/84226)
On Fri, 9 Feb 2018, Jakub Jelinek wrote: > Hi! > > BSWAP is documented as: > > @item (bswap:@var{m} @var{x}) > Represents the value @var{x} with the order of bytes reversed, carried out > in mode @var{m}, which must be a fixed-point machine mode. > The mode of @var{x} must be @var{m} or @code{VOIDmode}. > > The fixed-point is something used widely in rtl.texi and is very confusing > now that we have FIXED_POINT_TYPE floats, I assume it talks about integral > modes or, because it is also used with vector modes, about INTEGRAL_MODE_P > modes. My understanding is that bswap on a vector integral mode is meant to > be bswap of each element individually. > > The rs6000 backend uses bswap not just on scalar integral modes and > vector integral modes, but also on V4SF and V2DF, but ICEs in simplify-rtx.c > where we don't expect bswap to be used on SF/DFmode (vector bswap is handled > as bswap of each element). > > The following patch adjusts the rs6000 backend to use well defined bswaps > on corresponding integral modes instead (and also does what we've done in > i386 backend years ago, avoid subreg on the lhs because it breaks combine > attempts to optimize it). > > Or do we want to change documentation and simplify-rtx.c and whatever else > in the middle-end to also make floating point bswaps well defined? No, I think the current restriction is sound. > How > exactly, as bswaps of the underlying bits, i.e. for simplify-rtx.c as > subreg of the constant to a corresponding integral mode, bswap in it and > subreg back? IMHO changing the rs6000 backend is easier and defining what > exactly means a floating point bswap may be hard to understand. Agreed. > Bootstrapped/regtested on powerpc64{,le}-linux, on powerpc64-linux including > -m64/-m32, ok for trunk? > > 2018-02-09 Jakub Jelinek > > PR target/84226 > * config/rs6000/vsx.md (p9_xxbrq_v16qi): Change input operand > constraint from =wa to wa. Avoid a subreg on the output operand, > instead use a pseudo and subreg it in a move. > (p9_xxbrd_): Changed to ... > (p9_xxbrd_v2di): ... this insn, without VSX_D iterator. > (p9_xxbrd_v2df): New expander. > (p9_xxbrw_): Changed to ... > (p9_xxbrw_v4si): ... this insn, without VSX_W iterator. > (p9_xxbrw_v4sf): New expander. > > * gcc.target/powerpc/pr84226.c: New test. > > --- gcc/config/rs6000/vsx.md.jj 2018-01-22 23:57:21.299779544 +0100 > +++ gcc/config/rs6000/vsx.md 2018-02-08 17:21:13.197642776 +0100 > @@ -5311,35 +5311,60 @@ (define_insn "p9_xxbrq_v1ti" > > (define_expand "p9_xxbrq_v16qi" >[(use (match_operand:V16QI 0 "vsx_register_operand" "=wa")) > - (use (match_operand:V16QI 1 "vsx_register_operand" "=wa"))] > + (use (match_operand:V16QI 1 "vsx_register_operand" "wa"))] >"TARGET_P9_VECTOR" > { > - rtx op0 = gen_lowpart (V1TImode, operands[0]); > + rtx op0 = gen_reg_rtx (V1TImode); >rtx op1 = gen_lowpart (V1TImode, operands[1]); >emit_insn (gen_p9_xxbrq_v1ti (op0, op1)); > + emit_move_insn (operands[0], gen_lowpart (V16QImode, op0)); >DONE; > }) > > ;; Swap all bytes in each 64-bit element > -(define_insn "p9_xxbrd_" > - [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa") > - (bswap:VSX_D (match_operand:VSX_D 1 "vsx_register_operand" "wa")))] > +(define_insn "p9_xxbrd_v2di" > + [(set (match_operand:V2DI 0 "vsx_register_operand" "=wa") > + (bswap:V2DI (match_operand:V2DI 1 "vsx_register_operand" "wa")))] >"TARGET_P9_VECTOR" >"xxbrd %x0,%x1" >[(set_attr "type" "vecperm")]) > > +(define_expand "p9_xxbrd_v2df" > + [(use (match_operand:V2DF 0 "vsx_register_operand" "=wa")) > + (use (match_operand:V2DF 1 "vsx_register_operand" "wa"))] > + "TARGET_P9_VECTOR" > +{ > + rtx op0 = gen_reg_rtx (V2DImode); > + rtx op1 = gen_lowpart (V2DImode, operands[1]); > + emit_insn (gen_p9_xxbrd_v2di (op0, op1)); > + emit_move_insn (operands[0], gen_lowpart (V2DFmode, op0)); > + DONE; > +}) > + > ;; Swap all bytes in each 32-bit element > -(define_insn "p9_xxbrw_" > - [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa") > - (bswap:VSX_W (match_operand:VSX_W 1 "vsx_register_operand" "wa")))] > +(define_insn "p9_xxbrw_v4si" > + [(set (match_operand:V4SI 0 "vsx_register_operand" "=wa") > + (bswap:V4SI (match_operand:V4SI 1 "vsx_register_operand" "wa")))] >"TARGET_P9_VECTOR" >"xxbrw %x0,%x1" >[(set_attr "type" "vecperm")]) > > +(define_expand "p9_xxbrw_v4sf" > + [(use (match_operand:V4SF 0 "vsx_register_operand" "=wa")) > + (use (match_operand:V4SF 1 "vsx_register_operand" "wa"))] > + "TARGET_P9_VECTOR" > +{ > + rtx op0 = gen_reg_rtx (V4SImode); > + rtx op1 = gen_lowpart (V4SImode, operands[1]); > + emit_insn (gen_p9_xxbrw_v4si (op0, op1)); > + emit_move_insn (operands[0], gen_lowpart (V4SFmode, op0)); > + DONE; > +}) > + > ;; Swap all bytes in each element of vector > (define_expand "revb_" > - [(set (match_operand:VEC_REVB 0 "vsx_reg
Fix small regression in -fdump-ada-spec
We cannot generate the 'constant' keyword for components in Ada, it's illegal. Tested on x86_64-suse-linux, applied on the mainline. 2018-02-09 Eric Botcazou c-family/ * c-ada-spec.c (dump_ada_declaration): Do not generate the 'constant' keyword for components. -- Eric BotcazouIndex: c-family/c-ada-spec.c === --- c-family/c-ada-spec.c (revision 257404) +++ c-family/c-ada-spec.c (working copy) @@ -3131,7 +3131,7 @@ dump_ada_declaration (pretty_printer *bu { pp_string (buffer, "aliased "); - if (TREE_READONLY (t)) + if (TREE_READONLY (t) && TREE_CODE (t) != FIELD_DECL) pp_string (buffer, "constant "); if (TYPE_NAME (TREE_TYPE (t))) @@ -3147,7 +3147,7 @@ dump_ada_declaration (pretty_printer *bu || TREE_CODE (TREE_TYPE (t)) != INTEGER_TYPE)) pp_string (buffer, "aliased "); - if (TREE_READONLY (t)) + if (TREE_READONLY (t) && TREE_CODE (t) != FIELD_DECL) pp_string (buffer, "constant "); dump_generic_ada_node