Re: Fwd: [PING 2][PATCH] libgcc: Add CFI directives to the soft floating point support code for ARM
On 10/05/15 23:16, Martin Galvan wrote: Hi Ramana! Sorry to bother, but I looked at the repository and didn't see this committed. As I don't have write access could you please commit this for me? Thanks a lot! sorry about the slow response, I was travelling for a bit and missed your emails. Trying your patch out gives me failures possibly because my mail client munged it when it received this inline. Can you please rebase if necessary, test and send it again as an attachment ? patch -p1 --dry-run ~/Downloads/martin-patch.txt checking file libgcc/config/arm/ieee754-df.S Hunk #2 FAILED at 57. Hunk #3 succeeded at 70 with fuzz 2. Hunk #4 FAILED at 86. Hunk #5 FAILED at 153. Hunk #6 FAILED at 418. Hunk #7 FAILED at 425. Hunk #8 FAILED at 440. Hunk #9 FAILED at 462. Hunk #10 FAILED at 485. Hunk #11 FAILED at 555. Hunk #12 FAILED at 566. Hunk #13 FAILED at 601. Hunk #14 FAILED at 653. Hunk #15 FAILED at 720. Hunk #16 FAILED at 868. Hunk #17 FAILED at 1057. Hunk #18 FAILED at 1068. Hunk #19 FAILED at 1082. Hunk #20 FAILED at 1090. Hunk #21 FAILED at 1122. Hunk #22 FAILED at 1133. Hunk #23 succeeded at 1145 with fuzz 2. Hunk #24 FAILED at 1155. Hunk #25 FAILED at 1168. Hunk #26 FAILED at 1228. Hunk #27 succeeded at 1236 with fuzz 2. Hunk #28 FAILED at 1254. Hunk #29 succeeded at 1263 with fuzz 2. Hunk #30 FAILED at 1297. Hunk #31 succeeded at 1306 with fuzz 2. Hunk #32 FAILED at 1336. Hunk #33 succeeded at 1345 with fuzz 2. Hunk #34 FAILED at 1410. 27 out of 34 hunks FAILED checking file libgcc/config/arm/ieee754-sf.S Hunk #1 FAILED at 31. Hunk #4 FAILED at 294. Hunk #9 FAILED at 460. Hunk #10 FAILED at 471. Hunk #16 FAILED at 845. Hunk #20 FAILED at 898. 6 out of 27 hunks FAILED checking file libgcc/config/arm/lib1funcs.S regards Ramana On Tue, Apr 28, 2015 at 2:07 PM, Martin Galvan martin.gal...@tallertechnologies.com wrote: Thanks a lot. I don't have write access to the repository, could you commit this for me? On Tue, Apr 28, 2015 at 1:21 PM, Ramana Radhakrishnan ramana@googlemail.com wrote: On Tue, Apr 28, 2015 at 4:19 PM, Martin Galvan martin.gal...@tallertechnologies.com wrote: This patch adds CFI directives to the soft floating point support code for ARM. Previously, if we tried to do a backtrace from that code in a debug session we'd get something like this: (gdb) bt #0 __nedf2 () at ../../../../../../gcc-4.9.2/libgcc/config/arm/ieee754-df.S:1082 #1 0x0db6 in __aeabi_cdcmple () at ../../../../../../gcc-4.9.2/libgcc/config/arm/ieee754-df.S:1158 #2 0xf5c28f5c in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) Now we'll get something like this: (gdb) bt #0 __nedf2 () at ../../../../../../gcc-4.9.2/libgcc/config/arm/ieee754-df.S:1156 #1 0x0db6 in __aeabi_cdcmple () at ../../../../../../gcc-4.9.2/libgcc/config/arm/ieee754-df.S:1263 #2 0x0dc8 in __aeabi_dcmpeq () at ../../../../../../gcc-4.9.2/libgcc/config/arm/ieee754-df.S:1285 #3 0x0504 in main () I have a company-wide copyright assignment. I don't have commit access, though, so it would be great if anyone could commit this for me. Thanks a lot! this is OK , thanks. Sorry about the delay in reviewing this. Ramana
Re: [PATCH 2/6] combine: If recog fails, try again with zero_ext{ract,end} simplified
On Sun, May 10, 2015 at 10:15:34PM -0600, Jeff Law wrote: (recog_for_combine): If recog fails, try again with the pattern modified by change_zero_ext; if that still fails, restore the pattern. I like it. Attacking the extensions are the most obvious candidates, but I wonder if there's others (like the whole ASHIFT vs MULT stuff that we were recently looking at for ARM). Yeah, I thought about that as well. But that case, MULT instead of shift, outside of MEM, is simply non-canonical RTL; and combine created it on purpose. It just shouldn't. There certainly will be other cases though; hopefully not overlapping, I don't want to recog an exponentially expanding number of times ;-) Segher
Re: [PATCH][PR66013] Update address_taken after ifn_va_arg expansion
On Fri, May 8, 2015 at 9:16 PM, Tom de Vries tom_devr...@mentor.com wrote: On 08-05-15 17:31, Richard Biener wrote: OK for trunk? As noted in one of the PRs I think that it is the proper time to re-implement the stdarg optimization on the un-lowered form which should also fix this. AFAIU, the implementation of the stdarg optimization on the un-lowered form should contain an independent fix for this problem, and this fix won't be functional any more then, so indeed, it's a patch with (hopefully) a limited lifespan. But I thought that the patch is simple and low-risk enough to accept for trunk now, hence the submission. Also, I suppose it's not bad to start developing from a situation where more tests are passing. Sure - but this adds a pass over the IL plus an SSA update to the compile. Sth which may also enable va-unrelated optimization and thus may cause regressions if you remove it later... Otherwise I agree of course. Thanks, Richard. Thanks, - Tom
Re: [C frontend] Fix construction of TYPE_STUB_DECL
On Sun, May 10, 2015 at 7:33 PM, Jan Hubicka hubi...@ucw.cz wrote: Hi, TREE_PUBLIC of TYPE_DECL is defined to say if the type is public: /* In a VAR_DECL, FUNCTION_DECL, NAMESPACE_DECL or TYPE_DECL, nonzero means name is to be accessible from outside this translation unit. In an IDENTIFIER_NODE, nonzero means an external declaration accessible from outside this translation unit was previously seen for this name in an inner scope. */ #define TREE_PUBLIC(NODE) ((NODE)-base.public_flag) This is properly honored by C++ FE but other FEs are bit random, which in turn confuses type_in_anonymous_namespace_p predicate that leads to flase poistives on type mismatch warnings. I used to be able to get around by checking only C++ types at LTO time, but with type checking in lto-symtab I can not, because I do want to compare type compatibility cross translation units and cross languages and we have no reliable way to say what type originated as C++ and what did not. The idea was you can walk up the context chain until you reach a TRANSLATION_UNIT_DECL where the source language is encoded in... of course most FEs are quite lazy here and eventually end up at NULL_TREE instead. Richard. This fixed TYPE_STUB_DECl construction in C frontend. I will check other FEs separately. I can also add way to recognize C++ types, but I think it is good idea to make type representation consistent across FEs. Bootstrapped/regtested x86_64-linux, OK? Honza * c-decl.c (pushtag): Declare type as public. Index: c/c-decl.c === --- c/c-decl.c (revision 222981) +++ c/c-decl.c (working copy) @@ -1563,6 +1563,7 @@ pushtag (location_t loc, tree name, tree TYPE_STUB_DECL (type) = pushdecl (build_decl (loc, TYPE_DECL, NULL_TREE, type)); + TREE_PUBLIC (TYPE_STUB_DECL (type)) = 1; /* An approximation for now, so we can tell this is a function-scope tag. This will be updated in pop_scope. */
Re: [PATCH, alpha]: Remove dead (HOST_BITS_PER_WIDE_INT 64) code
On Fri, May 8, 2015 at 10:23 PM, Richard Henderson r...@redhat.com wrote: On 05/08/2015 09:43 AM, Uros Bizjak wrote: @@ -1509,8 +1509,7 @@ (and:DI (ashift:DI (match_operand:DI 1 register_operand r) (match_operand:DI 2 mul8_operand I)) (match_operand:DI 3 immediate_operand i)))] - HOST_BITS_PER_WIDE_INT == 64 -CONST_INT_P (operands[3]) + CONST_INT_P (operands[3]) Just noticed that we can eliminate this test if we use const_int_operand instead. + operands[2] = force_reg (DImode, GEN_INT (HOST_WIDE_INT_1 63));) I bet we're now supposed to use HOST_WIDE_INT_1U, to avoid any ubsan shift silliness. @@ -1929,7 +1925,7 @@ alpha_emit_set_const_1 (rtx target, machine_mode m temp = alpha_emit_set_const (subtarget, mode, new_const, i, no_output); if (!temp) { - new_const = (c bits) | (((HOST_WIDE_INT) 1 bits) - 1); + new_const = (c bits) | ((HOST_WIDE_INT_1 bits) - 1); Likewise. Thanks for the review! I have committed the attached patch. Uros. Index: config/alpha/alpha.c === --- config/alpha/alpha.c(revision 222913) +++ config/alpha/alpha.c(working copy) @@ -1771,11 +1771,9 @@ alpha_emit_set_const_1 (rtx target, machine_mode m rtx temp, insn; /* If this is a sign-extended 32-bit constant, we can do this in at most - three insns, so do it if we have enough insns left. We always have - a sign-extended 32-bit constant when compiling on a narrow machine. */ + three insns, so do it if we have enough insns left. */ - if (HOST_BITS_PER_WIDE_INT != 64 - || c 31 == -1 || c 31 == 0) + if (c 31 == -1 || c 31 == 0) { HOST_WIDE_INT low = ((c 0x) ^ 0x8000) - 0x8000; HOST_WIDE_INT tmp1 = c - low; @@ -1917,11 +1915,9 @@ alpha_emit_set_const_1 (rtx target, machine_mode m /* Now try high-order zero bits. Here we try the shifted-in bits as all zero and all ones. Be careful to avoid shifting outside the mode and to avoid shifting outside the host wide int size. */ - /* On narrow hosts, don't shift a 1 into the high bit, since we'll -confuse the recursive call and set all of the high 32 bits. */ bits = (MIN (HOST_BITS_PER_WIDE_INT, GET_MODE_SIZE (mode) * 8) - - floor_log2 (c) - 1 - (HOST_BITS_PER_WIDE_INT 64)); + - floor_log2 (c) - 1); if (bits 0) for (; bits 0; bits--) { @@ -1929,7 +1925,7 @@ alpha_emit_set_const_1 (rtx target, machine_mode m temp = alpha_emit_set_const (subtarget, mode, new_const, i, no_output); if (!temp) { - new_const = (c bits) | (((HOST_WIDE_INT) 1 bits) - 1); + new_const = (c bits) | ((HOST_WIDE_INT_1U bits) - 1); temp = alpha_emit_set_const (subtarget, mode, new_const, i, no_output); } @@ -1955,7 +1951,7 @@ alpha_emit_set_const_1 (rtx target, machine_mode m temp = alpha_emit_set_const (subtarget, mode, new_const, i, no_output); if (!temp) { - new_const = (c bits) | (((HOST_WIDE_INT) 1 bits) - 1); + new_const = (c bits) | ((HOST_WIDE_INT_1U bits) - 1); temp = alpha_emit_set_const (subtarget, mode, new_const, i, no_output); } @@ -1969,7 +1965,6 @@ alpha_emit_set_const_1 (rtx target, machine_mode m } } -#if HOST_BITS_PER_WIDE_INT == 64 /* Finally, see if can load a value into the target that is the same as the constant except that all bytes that are 0 are changed to be 0xff. If we can, then we can do a ZAPNOT to obtain the desired constant. */ @@ -1996,7 +1991,6 @@ alpha_emit_set_const_1 (rtx target, machine_mode m target, 0, OPTAB_WIDEN); } } -#endif return 0; } @@ -2077,7 +2071,7 @@ alpha_emit_set_long_const (rtx target, HOST_WIDE_I HOST_WIDE_INT d1, d2, d3, d4; /* Decompose the entire word */ -#if HOST_BITS_PER_WIDE_INT = 64 + gcc_assert (c2 == -(c1 0)); d1 = ((c1 0x) ^ 0x8000) - 0x8000; c1 -= d1; @@ -2087,17 +2081,6 @@ alpha_emit_set_long_const (rtx target, HOST_WIDE_I c1 -= d3; d4 = ((c1 0x) ^ 0x8000) - 0x8000; gcc_assert (c1 == d4); -#else - d1 = ((c1 0x) ^ 0x8000) - 0x8000; - c1 -= d1; - d2 = ((c1 0x) ^ 0x8000) - 0x8000; - gcc_assert (c1 == d2); - c2 += (d2 0); - d3 = ((c2 0x) ^ 0x8000) - 0x8000; - c2 -= d3; - d4 = ((c2 0x) ^ 0x8000) - 0x8000; - gcc_assert (c2 == d4); -#endif /* Construct the high word */ if (d4) @@ -2138,16 +2121,11 @@ alpha_extract_integer (rtx x, HOST_WIDE_INT *p0, H i0 = INTVAL (x); i1 = -(i0 0); } - else if
RE: [patch, avr] extend part-clobbered check to AVR_TINY architecture
-Original Message- From: Denis Chertykov [mailto:cherty...@gmail.com] Sent: Sunday, May 10, 2015 12:55 PM To: Sivanupandi, Pitchumani Cc: Georg-Johann Lay; GCC Patches Subject: Re: [patch, avr] extend part-clobbered check to AVR_TINY architecture Sorry for delay. (I was at vacation in Kazakhstan without internet.) 2015-05-08 8:32 GMT+03:00 Sivanupandi, Pitchumani pitchumani.sivanupa...@atmel.com: Ping! -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Sivanupandi, Pitchumani Sent: Tuesday, April 21, 2015 8:21 PM To: Georg-Johann Lay; Denis Chertykov Cc: GCC Patches Subject: [patch, avr] extend part-clobbered check to AVR_TINY architecture Hi, When tried backporting AVR_TINY architecture support to 4.9, build failed in libgcc for AVR_TINY. Failure was due to ICE same as: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53065 Fix provided for that bug checks for if the mode crosses the callee saved register. Below patch updates that check as the AVR_TINY has different set of callee saved registers (r18 and r19). This patch is against trunk. NOTE: ICE is re-produciable only with 4.9 + tiny patch and --with-dwarf2 enabled. Is this ok for trunk? diff --git a/gcc/config/avr/avr.c b/gcc/config/avr/avr.c index 68d5ddc..2f441e5 100644 --- a/gcc/config/avr/avr.c +++ b/gcc/config/avr/avr.c @@ -11333,9 +11333,10 @@ avr_hard_regno_call_part_clobbered (unsigned regno, machine_mode mode) return 0; /* Return true if any of the following boundaries is crossed: - 17/18, 27/28 and 29/30. */ + 17/18 or 19/20 (if AVR_TINY), 27/28 and 29/30. */ - return ((regno 18 regno + GET_MODE_SIZE (mode) 18) + return ((regno = LAST_CALLEE_SAVED_REG + regno + GET_MODE_SIZE (mode) (LAST_CALLEE_SAVED_REG + + 1)) || (regno REG_Y regno + GET_MODE_SIZE (mode) REG_Y) || (regno REG_Z regno + GET_MODE_SIZE (mode) REG_Z)); } I think it's ok. Could you please commit? I do not have commit access. (--patch--) diff --git a/gcc/config/avr/avr.c b/gcc/config/avr/avr.c index 68d5ddc..2f441e5 100644 --- a/gcc/config/avr/avr.c +++ b/gcc/config/avr/avr.c @@ -11333,9 +11333,10 @@ avr_hard_regno_call_part_clobbered (unsigned regno, machine_mode mode) return 0; /* Return true if any of the following boundaries is crossed: - 17/18, 27/28 and 29/30. */ + 17/18 or 19/20 (if AVR_TINY), 27/28 and 29/30. */ - return ((regno 18 regno + GET_MODE_SIZE (mode) 18) + return ((regno = LAST_CALLEE_SAVED_REG + regno + GET_MODE_SIZE (mode) (LAST_CALLEE_SAVED_REG + 1)) || (regno REG_Y regno + GET_MODE_SIZE (mode) REG_Y) || (regno REG_Z regno + GET_MODE_SIZE (mode) REG_Z)); } (--patch--) Regards, Pitchumani gcc/ChangeLog 2015-05-11 Pitchumani Sivanupandi pitchuman...@atmel.com * config/avr/avr.c (avr_hard_regno_call_part_clobbered): Use LAST_CALLEE_SAVED_REG instead of hard-coded register number. (Last callee saved reg is different for AVR_TINY architecture)
Re: [PATCH][PR66010] Don't take address of ap unless necessary
On Fri, 8 May 2015, Tom de Vries wrote: Hi, this patch fixes PR66010. I. Consider this test-case, with a va_list passed from f2 to f2_1: ... #include stdarg.h inline int __attribute__((always_inline)) f2_1 (va_list ap) { return va_arg (ap, int); } int f2 (int i, ...) { int res; va_list ap; va_start (ap, i); res = f2_1 (ap); va_end (ap); return res; } ... When compiling at -O2, before eline we have: ... f2_1 (struct * apD.1832) { intD.6 _3; # .MEM_2 = VDEF .MEM_1(D) # USE = anything # CLB = anything _3 = VA_ARG (apD.1832, 0B); # VUSE .MEM_2 return _3; } f2 (intD.6 iD.1835) { struct apD.1839[1]; intD.6 resD.1838; intD.6 _6; # .MEM_2 = VDEF .MEM_1(D) # USE = anything # CLB = anything __builtin_va_startD.1030 (apD.1839, 0); # .MEM_3 = VDEF .MEM_2 # USE = anything # CLB = anything res_4 = f2_1D.1833 (apD.1839); # .MEM_5 = VDEF .MEM_3 # USE = anything # CLB = anything __builtin_va_endD.1029 (apD.1839); _6 = res_4; # .MEM_7 = VDEF .MEM_5 apD.1839 ={v} {CLOBBER}; # VUSE .MEM_7 return _6; } ... Because the va_list type is an array type: ... struct apD.1839[1]; ... we're passing the location of the initial element: ... res_4 = f2_1D.1833 (apD.1839); ... And the type of the parameter in f2_1 is accordingly a pointer to array element: ... f2_1 (struct * apD.1832) ... That means the address operator here is superfluous. ... _3 = VA_ARG (apD.1832, 0B); ... Or, differently put, when we take the address of ap in va_start and va_end in f2, the result is a pointer to array element type. When we take the address of ap in f2_2, the result is a pointer to pointer to array element type. This extra indirection doesn't cause wrong code to be generated. The va_arg expansion code handles this correctly, thanks to the combination of: - an unconditional build_fold_indirect_ref in expand_ifn_va_arg_1 which removes an indirection, and - a fixup in gimplify_va_arg_internal that again adds an indirection in some cases. The call gets inlined, and before pass_stdarg we have: ... f2 (intD.6 iD.1835) { struct * apD.1849; struct apD.1839[1]; intD.6 _6; # .MEM_2 = VDEF .MEM_1(D) # USE = nonlocal escaped # CLB = nonlocal escaped { D.1839 } (escaped) __builtin_va_startD.1030 (apD.1839, 0); # .MEM_3 = VDEF .MEM_2 apD.1849 = apD.1839; # .MEM_7 = VDEF .MEM_3 # USE = nonlocal null { D.1839 D.1849 } (escaped) # CLB = nonlocal null { D.1839 D.1849 } (escaped) _6 = VA_ARG (apD.1849, 0B); ... And after expanding ifn_va_arg, we have: ... f2 (intD.6 iD.1835) { struct * ap.3D.1853; struct apD.1839[1]; # .MEM_2 = VDEF .MEM_1(D) # USE = nonlocal escaped # CLB = nonlocal escaped { D.1839 } (escaped) __builtin_va_startD.1030 (apD.1839, 0); ap_3 = apD.1839; ap.3_10 = ap_3; # VUSE .MEM_2 _11 = ap.3_10-gp_offsetD.2; SNIP ... The pass_stdarg optimization fails: ... f2: va_list escapes 1, needs to save all GPR units and all FPR units. ... It fails on the superfluous address operator: ... va_list escapes in ap_3 = apD.1839; ... II. The patch prevents the superfluous address operator from being added. It also removes the need for the fixup in gimplify_va_arg_internal, by deciding in gimplify_va_arg_expr whether the build_fold_indirect_ref in expand_ifn_va_arg_1 is needed before passing ap on to gimplify_va_arg_internal. This decision is encoded as an extra argument to ifn_va_arg. III. Using the patch, before inlining we can see the address operator has been removed in va_arg: ... f2_1 (struct * apD.1832) { intD.6 _4; # .MEM_3 = VDEF .MEM_1(D) # USE = anything # CLB = anything _4 = VA_ARG (ap_2(D), 0B, 0); # VUSE .MEM_3 return _4; } ... And after inlining, we see that va_start and va_arg now use the same ap: ... f2 (intD.6 iD.1835) { struct apD.1839[1]; # .MEM_2 = VDEF .MEM_1(D) # USE = anything # CLB = anything __builtin_va_startD.1030 (apD.1839, 0); # .MEM_3 = VDEF .MEM_2 # USE = anything # CLB = anything _8 = VA_ARG (apD.1839, 0B, 0); ... That allows the pass_stdarg optimization to succeed: ... f2: va_list escapes 0, needs to save 8 GPR units and 0 FPR units. ... Bootstrapped and reg-tested on x86_64, with and without -m32. OK for trunk? [ FWIW, I suspect this patch will make life easier for the reimplementation of the pass_stdarg optimization using ifn_va_arg. ] + if (canon_va_type != NULL) +{ + if (!(TREE_CODE (canon_va_type) == ARRAY_TYPE +TREE_CODE (va_type) != ARRAY_TYPE)) + /* In gimplify_va_arg_expr we take the address of the ap argument, mark + it addressable now. */ + mark_addressable (expr); can we simplify this and ... - } - + gcc_assert (TREE_CODE (TREE_TYPE
Fix pattern validation in genrecog
Thomas pointed out that, while I'd kept the code to validate patterns for things like missing modes, the code wasn't being used. Fixed with the patch below. I ended up reinstating the original code to create a single rtx pattern for a define_peephole2 sequence, so that it could be passed to validate_pattern as before. One of the reasons I got rid of the code originally was because I didn't like the idea of a sequence of define_peephole2 instructions being put into a PARALLEL -- there's nothing parallel about them, and it would be easy to confuse the result with a real parallel in a define_insn or define_split match. I therefore changed it to use SEQUENCE instead. (Not that it matters: nothing actually looks at the code.) I also threw in a check for zero-length define_peephole2s while there. This is enforced syntactically for other define_*s, but it's possible to write a define_peephole2 that just asks for scratch registers and doesn't actually match anything. Tested on x86_64-linux-gnu. OK to install? Thanks, Richard gcc/ * genrecog.c (match_pattern_1): Expect the pattern to be a SEQUENCE for define_peephole2s. (get_peephole2_pattern): New function. (main): Use it. Call validate_pattern. Index: gcc/genrecog.c === --- gcc/genrecog.c 2015-05-10 10:45:41.750134123 +0100 +++ gcc/genrecog.c 2015-05-10 10:45:41.746134173 +0100 @@ -4080,14 +4080,14 @@ match_pattern_2 (state *s, rtx top_patte (2) the rtx matches TOP_PATTERN and (3) C_TEST is true. - For peephole2, TOP_PATTERN is the DEFINE_PEEPHOLE2 itself, otherwise - it is the rtx pattern to match (PARALLEL, SET, etc.). */ + For peephole2, TOP_PATTERN is a SEQUENCE of the instruction patterns + to match, otherwise it is a single instruction pattern. */ static void match_pattern_1 (state *s, rtx top_pattern, const char *c_test, acceptance_type acceptance) { - if (GET_CODE (top_pattern) == DEFINE_PEEPHOLE2) + if (acceptance.type == PEEPHOLE2) { /* Match each individual instruction. */ position **subpos_ptr = peep2_insn_pos_list; @@ -4095,18 +4095,14 @@ match_pattern_1 (state *s, rtx top_patte for (int i = 0; i XVECLEN (top_pattern, 0); ++i) { rtx x = XVECEXP (top_pattern, 0, i); - /* Ignore scratch register requirements. */ - if (GET_CODE (x) != MATCH_SCRATCH GET_CODE (x) != MATCH_DUP) - { - position *subpos = next_position (subpos_ptr, root_pos, - POS_PEEP2_INSN, count); - if (count 0) - s = add_decision (s, rtx_test::peep2_count (count + 1), - true, false); - s = match_pattern_2 (s, top_pattern, subpos, x); - subpos_ptr = subpos-next; - count += 1; - } + position *subpos = next_position (subpos_ptr, root_pos, + POS_PEEP2_INSN, count); + if (count 0) + s = add_decision (s, rtx_test::peep2_count (count + 1), + true, false); + s = match_pattern_2 (s, top_pattern, subpos, x); + subpos_ptr = subpos-next; + count += 1; } acceptance.u.full.u.match_len = count - 1; } @@ -5165,6 +5161,30 @@ add_implicit_parallel (rtvec vec) } } +/* Return the rtx pattern for the list of rtxes in a define_peephole2. */ + +static rtx +get_peephole2_pattern (rtvec vec) +{ + int i, j; + rtx pattern = rtx_alloc (SEQUENCE); + XVEC (pattern, 0) = rtvec_alloc (GET_NUM_ELEM (vec)); + for (i = j = 0; i GET_NUM_ELEM (vec); i++) +{ + rtx x = RTVEC_ELT (vec, i); + /* Ignore scratch register requirements. */ + if (GET_CODE (x) != MATCH_SCRATCH GET_CODE (x) != MATCH_DUP) + { + XVECEXP (pattern, 0, j) = x; + j++; + } +} + XVECLEN (pattern, 0) = j; + if (j == 0) +error_with_line (pattern_lineno, empty define_peephole2); + return pattern; +} + /* Return true if *PATTERN_PTR is a PARALLEL in which at least one trailing rtx can be added automatically by add_clobbers. If so, update *ACCEPTANCE_PTR so that its num_clobbers field contains the number @@ -5231,20 +5251,20 @@ main (int argc, char **argv) if (desc == NULL) break; - rtx pattern; - acceptance_type acceptance; acceptance.partial_p = false; acceptance.u.full.code = next_insn_code; + rtx pattern; switch (GET_CODE (desc)) { case DEFINE_INSN: { /* Match the instruction in the original .md form. */ - pattern = add_implicit_parallel (XVEC (desc, 1)); acceptance.type = RECOG; acceptance.u.full.u.num_clobbers = 0; + pattern = add_implicit_parallel (XVEC (desc, 1)); + validate_pattern (pattern,
PR 66076: invalid vec_grow in rtx iterators
The rtx iterators start out using a stack worklist but fall back to a heap worklist if the stack one turns out to be too small (which is rare). PR 66076 showed up a bug in the code that makes the switch from the stack worklist to the heap worklist: the heap one might already have been allocated if the worklist structure was shared with a previous FOR_EACH_SUBRTX and if that FOR_EACH_SUBRTX also blew the stack worklist. Fixed by making the vec_safe_grow conditional on a vec_safe_length check. (Note that vec_safe_grow starts with a call to vec_safe_length, so this is no less efficient than a check for a null pointer.) If one FOR_EACH_SUBRTX needs to use the heap worklist, it might be slightly more efficient for future FOR_EACH_SUBRTXes on the same worklist structure to start out using the heap worklist, in order to avoid any future stack-to-heap memcpy. The problem is that that would require extra set-up code for all FOR_EACH_SUBRTXes, and the idea is to optimise for the common case where the stack worklist is enough. The situation handled by the patch should be very rare -- backed up by the fact that the ICE would always trigger in that case, yet it didn't show up in pre-release testing. Maybe it would be worth bumping up the size of the stack worklist to cope with those long AVX512 constants though. Tested on x86_64-linux-gnu. Also tested that the testcase passes on aarch64-elf. OK to install? Thanks, Richard gcc/ PR rtl-optimization/66076 * rtlanal.c (generic_subrtx_iterator T::add_single_to_queue): Don't grow the heap array if it is already big enough from a previous iteration. gcc/testsuite/ PR rtl-optimization/66076 * gcc.dg/torture/pr66076.c: New test. Index: gcc/rtlanal.c === --- gcc/rtlanal.c 2015-05-10 10:44:24.091059530 +0100 +++ gcc/rtlanal.c 2015-05-10 20:49:56.618230132 +0100 @@ -104,7 +104,10 @@ generic_subrtx_iterator T::add_single_ return base; } gcc_checking_assert (i == LOCAL_ELEMS); - vec_safe_grow (array.heap, i + 1); + /* A previous iteration might also have moved from the stack to the +heap, in which case the heap array will already be big enough. */ + if (vec_safe_length (array.heap) = i) + vec_safe_grow (array.heap, i + 1); base = array.heap-address (); memcpy (base, array.stack, sizeof (array.stack)); base[LOCAL_ELEMS] = x; Index: gcc/testsuite/gcc.dg/torture/pr66076.c === --- /dev/null 2015-04-20 08:05:53.260830006 +0100 +++ gcc/testsuite/gcc.dg/torture/pr66076.c 2015-05-10 20:49:56.626230040 +0100 @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options } */ +/* { dg-options -mno-prefer-avx128 -march=bdver4 { target i?86-*-* x86_64-*-* } } */ + +void +f0a (char *result, char *arg1, char *arg4, char temp_6) +{ + int idx = 0; + for (idx = 0; idx 416; idx += 1) +result[idx] = (arg1[idx] + arg4[idx]) * temp_6; +}
Re: PR 66076: invalid vec_grow in rtx iterators
gcc/ PR rtl-optimization/66076 * rtlanal.c (generic_subrtx_iterator T::add_single_to_queue): Don't grow the heap array if it is already big enough from a previous iteration. gcc/testsuite/ PR rtl-optimization/66076 * gcc.dg/torture/pr66076.c: New test. OK, thanks. -- Eric Botcazou
Re: PATCH] PR target/65612: Multiversioning doesn't work with DSO nor PIE
On Mon, May 11, 2015 at 11:20:15AM +0100, Szabolcs Nagy wrote: On 09/05/15 19:57, Szabolcs Nagy wrote: * H.J. Lu hjl.to...@gmail.com [2015-05-09 10:41:41 -0700]: There are 4: 2b70 806 FUNCGLOBAL DEFAULT 12 __cpu_indicator_init@GCC_4.8.0 38: 002153d016 OBJECT GLOBAL DEFAULT 25 __cpu_model@GCC_4.8.0 and 0215000 00040001 R_X86_64_64 2b70 __cpu_indicator_init@GCC_4.8.0 + 0 00215220 00260006 R_X86_64_GLOB_DAT 002153d0 __cpu_model@GCC_4.8.0 + 0 in libgcc_s.so.1. Musl ld.so must be fixed to handle it. Rich is looking at how to do this non-intrusively, but it seems non-trivial (some users of musl prefer not to resolve such versioned symbols). i think it might be enough to add __cpu_indicator_init_local as an alias to __cpu_indicator_init in libgcc.a and then use the *_local symbol from the ifunc resolver, that way no new dependency is added to libgcc_s.so handling. i tried this approach and it seems to work: passes all multiversioning tests on x86_64. i think it's no worse than the symver approach. is it ok to change the current fix to this? No. Instead of piling hacks like this just fix it in musl. libgcc certainly isn't the only library that uses @ symbol versions, e.g. libstdc++ does as well, as well as many other shared libraries. Jakub
Re: [patch 0/10] debug-early merge
Perfect. Thanks for doing this. Aldy On May 11, 2015 4:23 AM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi Aldy, On 08/05/15 01:30, Aldy Hernandez wrote: Hi folks. I have divided the patches into 10 pieces. The patches are interdependent and cannot be applied independently. I am merely dividing them up to aid the relevant reviewers. As I've mentioned elsewhere, the patchset as posted has been bootstrapped and GCC tested on: x86_64-unknown-linux-gnu powerpc-ibm-aix7.1.2.0 powerpc64-unknown-linux-gnu aarch64-unknown-linux-gnu I bootstrapped the branch (aldyh/debug-early from the git mirror, I hope ;)) on arm-none-linux-gnueabihf --with-float=hard --with-cpu=cortex-a15 --with-mode=thumb --with-fpu=neon-vfpv4 --enable-languages=c,c++,fortran which is my usual arm bootstrap config. It completed ok. Cheers, Kyrill I have also GDB tested the patchset on x86_64-linux. Thanks for your help in this ordeal. Bring it on! Aldy
Re: [PATCH, ARM] Fix testcase for PR64616
On Mon, May 11, 2015 at 10:43 AM, Thomas Preud'homme thomas.preudho...@arm.com wrote: Hi, Testcase made for PR64616 was only passing when using a litteral pool. Rather than having an alternative for systems where this is not true, this patch changes the test to check that a global copy propagation occurs in cprop2. This should work accross all ARM targets (it works when targetting Cortex-M0, Cortex-M3 and whatever default core for ARMv7-a with vfpv3-d16 FPU). ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2015-05-04 Thomas Preud'homme thomas.preudho...@arm.com * gcc.target/arm/pr64616.c: Test dump rather than assembly to work accross ARM targets. diff --git a/gcc/testsuite/gcc.target/arm/pr64616.c b/gcc/testsuite/gcc.target/arm/pr64616.c index c686ffa..2280f21 100644 --- a/gcc/testsuite/gcc.target/arm/pr64616.c +++ b/gcc/testsuite/gcc.target/arm/pr64616.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options -O2 } */ +/* { dg-options -O2 -fdump-rtl-cprop2 } */ int f (int); unsigned int glob; @@ -11,4 +11,5 @@ g () glob = 5; } -/* { dg-final { scan-assembler-times ldr 2 } } */ +/* { dg-final { scan-rtl-dump GLOBAL COPY-PROP cprop2 } } */ +/* { dg-final { cleanup-rtl-dump cprop2 } } */ Patch was tested by verifying that the pattern appears when targeting Cortex-M0, Cortex-M3 and the default core for ARMv7-a with vfpv3-d16 FPU. Best regards, Thomas OK - thanks. Ramana
Re: [patch 0/10] debug-early merge
Hi Aldy, On 08/05/15 01:30, Aldy Hernandez wrote: Hi folks. I have divided the patches into 10 pieces. The patches are interdependent and cannot be applied independently. I am merely dividing them up to aid the relevant reviewers. As I've mentioned elsewhere, the patchset as posted has been bootstrapped and GCC tested on: x86_64-unknown-linux-gnu powerpc-ibm-aix7.1.2.0 powerpc64-unknown-linux-gnu aarch64-unknown-linux-gnu I bootstrapped the branch (aldyh/debug-early from the git mirror, I hope ;)) on arm-none-linux-gnueabihf --with-float=hard --with-cpu=cortex-a15 --with-mode=thumb --with-fpu=neon-vfpv4 --enable-languages=c,c++,fortran which is my usual arm bootstrap config. It completed ok. Cheers, Kyrill I have also GDB tested the patchset on x86_64-linux. Thanks for your help in this ordeal. Bring it on! Aldy
Re: [PATCH, ARM] attribute target (thumb,arm) [3/6] respin (4th)
binunvnEzkM1k.bin Description: PGP/MIME version identification encrypted.asc Description: OpenPGP encrypted message
Re: [PATCH, ARM] attribute target (thumb,arm) [1/6] respin (4th)
On Mon, May 11, 2015 at 10:13 AM, Christian Bruel christian.br...@st.com wrote: OK with those changes. Ramana thanks, done following up the thumb_code cleanup, here is a missing chunk for the vxworks config. arm-vxworks build checked. ok for trunk ? thanks, Christian OK thanks - please post the version of p1 that you committed for archival purposes. Ramana
Re: [rs6000] Fix compare debug failure on AIX
I'm sorry that XCOFF debugging changes the generated code (only in the sense of allocating a frame), but that is a system dependency. It's been this way for over 20 years. I see no reason to produce worse code at -O0 when not debugging simply to make testcases happier. You apparently read my previous message too quickly, this was reported to AdaCore on real code, i.e. someone opened a ticket and complained that the compiler was generating different code with and without -g for XCOFF. Note that someone a little perverse could easily use this obvious defect to prevent the compiler from bootstrapping on AIX. ;-) -- Eric Botcazou
Re: [PATCH, ARM] attribute target (thumb,arm) [1/6] respin (4th)
OK with those changes. Ramana thanks, done following up the thumb_code cleanup, here is a missing chunk for the vxworks config. arm-vxworks build checked. ok for trunk ? thanks, Christian 2015-05-11 Christian Bruel christian.br...@st.com * config/arm/arm-protos.h (thumb_code, thumb1_code): Remove. * config/arm/vxworks.h (thumb_code): Replace with TARGET_THUMB. Index: gcc/config/arm/arm-protos.h === --- gcc/config/arm/arm-protos.h (revision 222997) +++ gcc/config/arm/arm-protos.h (working copy) @@ -462,12 +462,6 @@ /* Nonzero if tuning for Cortex-A9. */ extern int arm_tune_cortex_a9; -/* Nonzero if generating Thumb instructions. */ -extern int thumb_code; - -/* Nonzero if generating Thumb-1 instructions. */ -extern int thumb1_code; - /* Nonzero if we should define __THUMB_INTERWORK__ in the preprocessor. XXX This is a bit of a hack, it's intended to help work around Index: gcc/config/arm/vxworks.h === --- gcc/config/arm/vxworks.h (revision 222997) +++ gcc/config/arm/vxworks.h (working copy) @@ -40,7 +40,7 @@ builtin_define (CPU=ARMARCH5); \ else if (arm_arch4)\ { \ - if (thumb_code)\ + if (TARGET_THUMB) \ builtin_define (CPU=ARMARCH4_T); \ else \ builtin_define (CPU=ARMARCH4); \
[PATCH, ARM] Fix testcase for PR64616
Hi, Testcase made for PR64616 was only passing when using a litteral pool. Rather than having an alternative for systems where this is not true, this patch changes the test to check that a global copy propagation occurs in cprop2. This should work accross all ARM targets (it works when targetting Cortex-M0, Cortex-M3 and whatever default core for ARMv7-a with vfpv3-d16 FPU). ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2015-05-04 Thomas Preud'homme thomas.preudho...@arm.com * gcc.target/arm/pr64616.c: Test dump rather than assembly to work accross ARM targets. diff --git a/gcc/testsuite/gcc.target/arm/pr64616.c b/gcc/testsuite/gcc.target/arm/pr64616.c index c686ffa..2280f21 100644 --- a/gcc/testsuite/gcc.target/arm/pr64616.c +++ b/gcc/testsuite/gcc.target/arm/pr64616.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options -O2 } */ +/* { dg-options -O2 -fdump-rtl-cprop2 } */ int f (int); unsigned int glob; @@ -11,4 +11,5 @@ g () glob = 5; } -/* { dg-final { scan-assembler-times ldr 2 } } */ +/* { dg-final { scan-rtl-dump GLOBAL COPY-PROP cprop2 } } */ +/* { dg-final { cleanup-rtl-dump cprop2 } } */ Patch was tested by verifying that the pattern appears when targeting Cortex-M0, Cortex-M3 and the default core for ARMv7-a with vfpv3-d16 FPU. Best regards, Thomas
Re: [Patch, Fortran, 66035, v1] [5/6 Regression] gfortran ICE segfault
Hi Mikael, diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c index cf607d0..402d9b9 100644 --- a/gcc/fortran/trans-expr.c +++ b/gcc/fortran/trans-expr.c @@ -6881,6 +6881,30 @@ alloc_scalar_allocatable_for_subcomponent_assignment (stmtblock_t *block, TREE_TYPE (tmp), tmp, fold_convert (TREE_TYPE (tmp), size)); } + else if (cm-ts.type == BT_CLASS) +{ + gcc_assert (expr2-ts.type == BT_CLASS || expr2-ts.type == BT_DERIVED); + if (expr2-ts.type == BT_DERIVED) + { + tmp = gfc_get_symbol_decl (gfc_find_vtab (expr2-ts)); + tmp = gfc_build_addr_expr (NULL_TREE, tmp); + size = fold_convert (size_type_node, gfc_vptr_size_get (tmp)); + } Use TYPE_SIZE_UNIT of the rhs in this case, in the same way as in the else branch further below. Er, but when I get TYPE_SIZE_UNIT () correctly, then it will grab the size needed to store the %_vptr%size component. Do you really intent that? I need to alloc the size of the polymorphic type of expr2 here. snipp @@ -7008,7 +7032,9 @@ gfc_trans_subcomponent_assign (tree dest, gfc_component * cm, gfc_expr * expr, gfc_add_expr_to_block (block, tmp); } else if (init (cm-attr.allocatable - || (cm-ts.type == BT_CLASS CLASS_DATA (cm)-attr.allocatable))) + || (cm-ts.type == BT_CLASS CLASS_DATA (cm)-attr.allocatable + (expr-ts.type != BT_CLASS + || CLASS_DATA (expr)-attr.allocatable maybe: || !CLASS_DATA (expr)-attr.allocatable (with a '!')? No, I only want to copy the rhs to new memory, when it is allocatable. For all other cases, one should not do this, to prevent a memory leak. Furthermore, is the data copied to freshly allocated memory anyway. Have a look at the pseudo code generated for the testcase in the patch. Regards, Andre -- Andre Vehreschild * Email: vehre ad gmx dot de
Re: [C frontend] Fix construction of TYPE_STUB_DECL
On Sun, May 10, 2015 at 7:33 PM, Jan Hubicka hubi...@ucw.cz wrote: Hi, TREE_PUBLIC of TYPE_DECL is defined to say if the type is public: /* In a VAR_DECL, FUNCTION_DECL, NAMESPACE_DECL or TYPE_DECL, nonzero means name is to be accessible from outside this translation unit. In an IDENTIFIER_NODE, nonzero means an external declaration accessible from outside this translation unit was previously seen for this name in an inner scope. */ #define TREE_PUBLIC(NODE) ((NODE)-base.public_flag) This is properly honored by C++ FE but other FEs are bit random, which in turn confuses type_in_anonymous_namespace_p predicate that leads to flase poistives on type mismatch warnings. I used to be able to get around by checking only C++ types at LTO time, but with type checking in lto-symtab I can not, because I do want to compare type compatibility cross translation units and cross languages and we have no reliable way to say what type originated as C++ and what did not. The idea was you can walk up the context chain until you reach a TRANSLATION_UNIT_DECL where the source language is encoded in... of course most FEs are quite lazy here and eventually end up at NULL_TREE instead. Yep, I can walk up and look for NAMESPACE_DECL without TREE_PUBLIC flag, too, or cleanup the flag in free_lang_data when I know if we do C++ (probably). The predicate for anonymous_namespace is called on few quite hot paths in devirt code, but that is not my main concern either. In general I think we ought to fix frontends to agree with each other (and docs) on how to represent semantics instead of adding special cases and workarounds to middle-end/LTO. I am trying to keep the anonymous namespace thing indepent of C++. While I am not sure we support other language with similar notion, it seems to make sense independently of C++. After all, it gives the type escape info for free :) Honza Richard.
Re: [PATCH][expr.c] PR 65358 Avoid clobbering partial argument during sibcall
On 01/05/15 19:51, Jeff Law wrote: On 04/28/2015 03:54 AM, Kyrill Tkachov wrote: On 27/04/15 21:13, Jeff Law wrote: On 04/21/2015 11:33 AM, Kyrill Tkachov wrote: On 21/04/15 15:09, Jeff Law wrote: On 04/21/2015 02:30 AM, Kyrill Tkachov wrote: From reading config/stormy16/stormy-abi it seems to me that we don't pass arguments partially in stormy16, so this code would never be called there. That leaves pa as the potential problematic target. I don't suppose there's an easy way to test on pa? My checkout of binutils doesn't seem to include a sim target for it. No simulator, no machines in the testfarm, the box I had access to via parisc-linux.org seems dead and my ancient PA overheats well before a bootstrap could complete. I often regret knowing about the backwards way many things were done on the PA because it makes me think about cases that only matter on dead architectures. So what should be the action plan here? I can't add an assert on positive result as a negative result is valid. We want to catch the case where this would cause trouble on pa, or change the patch until we're confident that it's fine for pa. That being said, reading the documentation of STACK_GROWS_UPWARD and ARGS_GROW_DOWNWARD I'm having a hard time visualising a case where this would cause trouble on pa. Is the problem that in the function: +/* Add SIZE to X and check whether it's greater than Y. + If it is, return the constant amount by which it's greater or smaller. + If the two are not statically comparable (for example, X and Y contain + different registers) return -1. This is used in expand_push_insn to + figure out if reading SIZE bytes from location X will end up reading from + location Y. */ +static int +memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size) +{ + rtx tmp = plus_constant (Pmode, x, size); + rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y); + + if (!CONST_INT_P (sub)) +return -1; + + return INTVAL (sub); +} for ARGS_GROW_DOWNWARD we would be reading 'backwards' from x, so the function should something like the following? So I had to go back and compile some simple examples. References to outgoing arguments will be SP relative. References to the incoming arguments will be ARGP relative. And that brings me to the another issue. Isn't X in this context the incoming argument slot and the destination an outgoing argument slot? If so, the approach of memory_load_overlap simply won't work on a target with calling conventions like the PA. And you might really want to consider punting for these kind of calling conventions Ok, thanks for the guidance. How about this? This patch disables sibcall optimisation when encountering a partial argument when ARGS_GROW_DOWNWARD !STACK_GROWS_DOWNWARD. Hopefully this shouldn't harm codegen on parisc if, as you say, it's rare to have partial arguments anyway on PA due to the large number of argument regs. I tested this on arm and bootstrapped on x86_64. I am now going through the process of getting access to a Debian PA machine to give it a test there (thanks Dave!) Ok if testing comes clean? Thanks, Kyrill 2015-04-28 Kyrylo Tkachov kyrylo.tkac...@arm.com PR target/65358 * calls.c (expand_call): Cancel sibcall optimisation when encountering partial argument on targets with ARGS_GROW_DOWNWARD and !STACK_GROWS_DOWNWARD. * expr.c (memory_load_overlap): New function. (emit_push_insn): When pushing partial args to the stack would clobber the register part load the overlapping part into a pseudo and put it into the hard reg after pushing. 2015-04-28 Honggyu Kim hong.gyu@lge.com PR target/65358 * gcc.dg/pr65358.c: New test. The more I think about this, the more I think it's an ugly can of worms and maybe we should just disable sibcalls for partial arguments. I doubt it's a big performance issue in general. We already have quite a bit of code in calls.c to detect cases with partial argument overlap for the explicit purpose of allowing sibcalls when partial arguments occur in the general case. However, that code only detects when a partial argument overlaps with other arguments in a call. In this PR the partial argument overlaps with itself. It would be a shame to disable sibcalls for all partial arguments when there is already infrastructure in place to handle them. In addition to the argument/stack direction stuff, I've been pondering the stack/frame/arg pointer issues. Your approach assumes that the incoming and outgoing areas are always referenced off the same base register. If they aren't, then the routine returns no overlap. But we'd need to consider the case where we have a reference to the arg or frame pointer which later gets rewritten into a stack pointer relative address. Is it too late at the point were you do the checks to reject the sibling call? If not, then maybe the overlap routine should return a tri-state. No overlap, overlap, don't
Re: PATCH] PR target/65612: Multiversioning doesn't work with DSO nor PIE
On 09/05/15 19:57, Szabolcs Nagy wrote: * H.J. Lu hjl.to...@gmail.com [2015-05-09 10:41:41 -0700]: There are 4: 2b70 806 FUNCGLOBAL DEFAULT 12 __cpu_indicator_init@GCC_4.8.0 38: 002153d016 OBJECT GLOBAL DEFAULT 25 __cpu_model@GCC_4.8.0 and 0215000 00040001 R_X86_64_64 2b70 __cpu_indicator_init@GCC_4.8.0 + 0 00215220 00260006 R_X86_64_GLOB_DAT 002153d0 __cpu_model@GCC_4.8.0 + 0 in libgcc_s.so.1. Musl ld.so must be fixed to handle it. Rich is looking at how to do this non-intrusively, but it seems non-trivial (some users of musl prefer not to resolve such versioned symbols). i think it might be enough to add __cpu_indicator_init_local as an alias to __cpu_indicator_init in libgcc.a and then use the *_local symbol from the ifunc resolver, that way no new dependency is added to libgcc_s.so handling. i tried this approach and it seems to work: passes all multiversioning tests on x86_64. i think it's no worse than the symver approach. is it ok to change the current fix to this? libgcc/Changelog: 2015-05-11 Szabolcs Nagy szabolcs.n...@arm.com * config/i386/cpuinfo.c (__cpu_indicator_init_local): Add. (__cpu_indicator_init@GCC_4.8.0, __cpu_model@GCC_4.8.0): Remove. * config/i386/t-linux (HOST_LIBGCC2_CFLAGS): Remove -DUSE_ELF_SYMVER. gcc/Changelog: 2015-05-11 Szabolcs Nagy szabolcs.n...@arm.com * config/i386/i386.c (ix86_expand_builtin): Make __builtin_cpu_init call __cpu_indicator_init_local instead of __cpu_indicator_init. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 7bd9ff3..7327cf3 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -38398,10 +38398,10 @@ ix86_expand_builtin (tree exp, rtx target, rtx subtarget, { case IX86_BUILTIN_CPU_INIT: { - /* Make it call __cpu_indicator_init in libgcc. */ + /* Make it call __cpu_indicator_init_local in libgcc.a. */ tree call_expr, fndecl, type; type = build_function_type_list (integer_type_node, NULL_TREE); - fndecl = build_fn_decl (__cpu_indicator_init, type); + fndecl = build_fn_decl (__cpu_indicator_init_local, type); call_expr = build_call_expr (fndecl, 0); return expand_expr (call_expr, target, mode, EXPAND_NORMAL); } diff --git a/libgcc/config/i386/cpuinfo.c b/libgcc/config/i386/cpuinfo.c index f6f91dd..86b6d21 100644 --- a/libgcc/config/i386/cpuinfo.c +++ b/libgcc/config/i386/cpuinfo.c @@ -425,7 +425,7 @@ __cpu_indicator_init (void) return 0; } -#if defined SHARED defined USE_ELF_SYMVER -__asm__ (.symver __cpu_indicator_init, __cpu_indicator_init@GCC_4.8.0); -__asm__ (.symver __cpu_model, __cpu_model@GCC_4.8.0); +#ifndef SHARED +int __cpu_indicator_init_local (void) + __attribute__ ((weak, alias (__cpu_indicator_init))); #endif diff --git a/libgcc/config/i386/t-linux b/libgcc/config/i386/t-linux index 11bb46e..4f47f7b 100644 --- a/libgcc/config/i386/t-linux +++ b/libgcc/config/i386/t-linux @@ -3,4 +3,4 @@ # t-slibgcc-elf-ver and t-linux SHLIB_MAPFILES = libgcc-std.ver $(srcdir)/config/i386/libgcc-glibc.ver -HOST_LIBGCC2_CFLAGS += -mlong-double-80 -DUSE_ELF_SYMVER +HOST_LIBGCC2_CFLAGS += -mlong-double-80
Re: [PATCH, ARM] committed: attribute target (thumb,arm) [1/6] respin (4th)
here the p1 patch committed at rev 222995 On 05/11/2015 11:56 AM, Ramana Radhakrishnan wrote: On Mon, May 11, 2015 at 10:13 AM, Christian Bruel christian.br...@st.com wrote: OK with those changes. Ramana thanks, done following up the thumb_code cleanup, here is a missing chunk for the vxworks config. arm-vxworks build checked. ok for trunk ? thanks, Christian OK thanks - please post the version of p1 that you committed for archival purposes. Ramana Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 222994) +++ gcc/ChangeLog (revision 222995) @@ -1,3 +1,15 @@ +2014-09-23 Christian Bruel christian.br...@st.com + + * config/arm/arm.c (arm_option_override): Reoganized and split into : + (arm_option_params_internal); New function. + (arm_option_check_internal): New function. + (arm_option_override_internal): New function. + (thumb_code, thumb1_code): Remove. + * config/arm/arm.h (TREE_TARGET_THUMB, TREE_TARGET_THUMB1): New macros. + (TREE_TARGET_THUM2, TREE_TARGET_ARM): Likewise. + (thumb_code, thumb1_code): Remove. + * config/arm/arm.md (is_thumb, is_thumb1): Check TARGET flag. + 2015-05-11 Uros Bizjak ubiz...@gmail.com * config/alpha/alpha.c (alpha_emit_set_const_1) Index: gcc/config/arm/arm.c === --- gcc/config/arm/arm.c (revision 222994) +++ gcc/config/arm/arm.c (revision 222995) @@ -846,12 +846,6 @@ /* Nonzero if tuning for Cortex-A9. */ int arm_tune_cortex_a9 = 0; -/* Nonzero if generating Thumb instructions. */ -int thumb_code = 0; - -/* Nonzero if generating Thumb-1 instructions. */ -int thumb1_code = 0; - /* Nonzero if we should define __THUMB_INTERWORK__ in the preprocessor. XXX This is a bit of a hack, it's intended to help work around @@ -2669,6 +2663,150 @@ return std_gimplify_va_arg_expr (valist, type, pre_p, post_p); } +/* Check any incompatible options that the user has specified. */ +static void +arm_option_check_internal (struct gcc_options *opts) +{ + /* Make sure that the processor choice does not conflict with any of the + other command line choices. */ + if (TREE_TARGET_ARM (opts) !(insn_flags FL_NOTM)) +error (target CPU does not support ARM mode); + + /* TARGET_BACKTRACE calls leaf_function_p, which causes a crash if done + from here where no function is being compiled currently. */ + if ((TARGET_TPCS_FRAME || TARGET_TPCS_LEAF_FRAME) TREE_TARGET_ARM (opts)) +warning (0, enabling backtrace support is only meaningful when compiling for the Thumb); + + if (TREE_TARGET_ARM (opts) TARGET_CALLEE_INTERWORKING) +warning (0, enabling callee interworking support is only meaningful when compiling for the Thumb); + + /* If this target is normally configured to use APCS frames, warn if they + are turned off and debugging is turned on. */ + if (TREE_TARGET_ARM (opts) + write_symbols != NO_DEBUG + !TARGET_APCS_FRAME + (TARGET_DEFAULT MASK_APCS_FRAME)) +warning (0, -g with -mno-apcs-frame may not give sensible debugging); + + /* iWMMXt unsupported under Thumb mode. */ + if (TREE_TARGET_THUMB (opts) TARGET_IWMMXT) +error (iWMMXt unsupported under Thumb mode); + + if (TARGET_HARD_TP TREE_TARGET_THUMB1 (opts)) +error (can not use -mtp=cp15 with 16-bit Thumb); + + if (TREE_TARGET_THUMB (opts) TARGET_VXWORKS_RTP flag_pic) +{ + error (RTP PIC is incompatible with Thumb); + flag_pic = 0; +} + + /* We only support -mslow-flash-data on armv7-m targets. */ + if (target_slow_flash_data + ((!(arm_arch7 !arm_arch_notm) !arm_arch7em) + || (TREE_TARGET_THUMB1 (opts) || flag_pic || TARGET_NEON))) +error (-mslow-flash-data only supports non-pic code on armv7-m targets); +} + +/* Set params depending on attributes and optimization options. */ +static void +arm_option_params_internal (struct gcc_options *opts) +{ + /* If we are not using the default (ARM mode) section anchor offset + ranges, then set the correct ranges now. */ + if (TREE_TARGET_THUMB1 (opts)) +{ + /* Thumb-1 LDR instructions cannot have negative offsets. + Permissible positive offset ranges are 5-bit (for byte loads), + 6-bit (for halfword loads), or 7-bit (for word loads). + Empirical results suggest a 7-bit anchor range gives the best + overall code size. */ + targetm.min_anchor_offset = 0; + targetm.max_anchor_offset = 127; +} + else if (TREE_TARGET_THUMB2 (opts)) +{ + /* The minimum is set such that the total size of the block + for a particular anchor is 248 + 1 + 4095 bytes, which is + divisible by eight, ensuring natural spacing of anchors. */ + targetm.min_anchor_offset = -248; + targetm.max_anchor_offset = 4095; +} + else +{ + targetm.min_anchor_offset = TARGET_MIN_ANCHOR_OFFSET; + targetm.max_anchor_offset = TARGET_MAX_ANCHOR_OFFSET; +
Re: [PATCH][MIPS] Enable load-load/store-store bonding
On Tuesday 21 April 2015 12:39 AM, Matthew Fortune wrote: Sameera Deshpande sameera.deshpa...@imgtec.com writes: Gentle reminder! Thanks Sameera. Just a couple of comments inline below and a question for Catherine at the end. - Thanks and regards, Sameera D. On Monday 30 March 2015 04:58 PM, sameera wrote: Hi! Sorry for delay in sending this patch for review. Please find attached updated patch. In P5600, 2 consecutive loads/stores of same type which access contiguous memory locations are bonded together by instruction issue unit to dispatch single load/store instruction which accesses both locations. This allows 2X improvement in memory intensive code. This optimization can be performed for LH, SH, LW, SW, LWC, SWC, LDC, SDC instructions. This patch adds peephole2 patterns to identify such loads/stores, and put them in parallel, so that the scheduler will not split it - thereby guaranteeing h/w level load/store bonding. The patch is tested with dejagnu for correctness, and tested on hardware for performance. Ok for trunk? Changelog: gcc/ * config/mips/mips.md (JOIN_MODE): New mode iterator. (join2_load_StoreJOIN_MODE:mode): New pattern. (join2_loadhi): Likewise. (define_peehole2): Add peephole2 patterns to join 2 HI/SI/SF/DF- mode load-load and store-stores. * config/mips/mips.opt (mload-store-pairs): New option. (TARGET_LOAD_STORE_PAIRS): New macro. *config/mips/mips.h (ENABLE_LD_ST_PAIRS): Likewise. *config/mips/mips-protos.h (mips_load_store_bonding_p): New prototype. *config/mips/mips.c(mips_load_store_bonding_p): New function. I don't know if this has been corrupted by mail clients but a single space after '*' and a space before '('. diff --git a/gcc/config/mips/mips-protos.h b/gcc/config/mips/mips-protos.h index b48e04f..244eb8d 100644 --- a/gcc/config/mips/mips-protos.h +++ b/gcc/config/mips/mips-protos.h @@ -360,6 +360,7 @@ extern bool mips_epilogue_uses (unsigned int); extern void mips_final_prescan_insn (rtx_insn *, rtx *, int); extern int mips_trampoline_code_size (void); extern void mips_function_profiler (FILE *); +extern bool mips_load_store_bonding_p (rtx *, machine_mode, bool); typedef rtx (*mulsidi3_gen_fn) (rtx, rtx, rtx); #ifdef RTX_CODE diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index 1733457..85f0591 100644 --- a/gcc/config/mips/mips.c +++ b/gcc/config/mips/mips.c @@ -18241,6 +18241,64 @@ umips_load_store_pair_p_1 (bool load_p, bool swap_p, return true; } +bool +mips_load_store_bonding_p (rtx *operands, enum machine_mode mode, bool load_p) Remove enum from machine_mode. +{ + rtx reg1, reg2, mem1, mem2, base1, base2; + enum reg_class rc1, rc2; + HOST_WIDE_INT offset1, offset2; + + if (load_p) +{ + reg1 = operands[0]; + reg2 = operands[2]; + mem1 = operands[1]; + mem2 = operands[3]; +} + else +{ + reg1 = operands[1]; + reg2 = operands[3]; + mem1 = operands[0]; + mem2 = operands[2]; +} + + if (mips_address_insns (XEXP (mem1, 0), mode, false) == 0 + || mips_address_insns (XEXP (mem2, 0), mode, false) == 0) +return false; + + mips_split_plus (XEXP (mem1, 0), base1, offset1); + mips_split_plus (XEXP (mem2, 0), base2, offset2); + + /* Base regs do not match. */ + if (!REG_P (base1) || !rtx_equal_p (base1, base2)) +return false; + + /* Either of the loads is clobbering base register. */ + if (load_p + (REGNO (reg1) == REGNO (base1) + || (REGNO (reg2) == REGNO (base1 +return false; Can you add a comment saying that this case does not get bonded by any known hardware even though it could be valid to bond them if it is the second load that clobbers the base. + /* Loading in same registers. */ + if (load_p + REGNO (reg1) == REGNO (reg2)) +return false; + + /* The loads/stores are not of same type. */ + rc1 = REGNO_REG_CLASS (REGNO (reg1)); + rc2 = REGNO_REG_CLASS (REGNO (reg2)); + if (rc1 != rc2 + !reg_class_subset_p (rc1, rc2) + !reg_class_subset_p (rc2, rc1)) +return false; + + if (abs (offset1 - offset2) != GET_MODE_SIZE (mode)) +return false; + + return true; +} + /* OPERANDS describes the operands to a pair of SETs, in the order dest1, src1, dest2, src2. Return true if the operands can be used in an LWP or SWP instruction; LOAD_P says which. */ diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h index ec69ed5..1bd0dae 100644 --- a/gcc/config/mips/mips.h +++ b/gcc/config/mips/mips.h @@ -3147,3 +3147,7 @@ extern GTY(()) struct target_globals *mips16_globals; #define STANDARD_STARTFILE_PREFIX_1 /lib64/ #define STANDARD_STARTFILE_PREFIX_2 /usr/lib64/ #endif + +#define ENABLE_LD_ST_PAIRS \ + (TARGET_LOAD_STORE_PAIRS TUNE_P5600 \ +!TARGET_MICROMIPS !TARGET_FIX_24K) I've already forgotten why these extra micromips/fix24k conditions were required. Can you add a comment? diff --git
Re: PATCHes to help with C++11 bootstrap
On 2015.05.08 at 23:30 -0500, Jason Merrill wrote: One C++11 compatibility issue that turns up a lot in the GCC sources is that in C++98, #define BAR bar const char *p = fooBAR; There was a missing fix for gcc/config/rs6000/option-defaults.h. This broke bootstrap on ppc64. Fix committed as obvious. 2015-05-11 Markus Trippelsdorf mar...@trippelsdorf.de PR bootstrap/66105 * config/rs6000/option-defaults.h: Add space between string literal and macro name. diff --git a/gcc/config/rs6000/option-defaults.h b/gcc/config/rs6000/option-defaults.h index 7da2c7c77b86..95a147206186 100644 --- a/gcc/config/rs6000/option-defaults.h +++ b/gcc/config/rs6000/option-defaults.h @@ -39,11 +39,11 @@ #endif #if TARGET_DEFAULT OPTION_MASK_64BIT -#define OPT_ARCH64 !OPT_32 +#define OPT_ARCH64 ! OPT_32 #define OPT_ARCH32 OPT_32 #else #define OPT_ARCH64 OPT_64 -#define OPT_ARCH32 !OPT_64 +#define OPT_ARCH32 ! OPT_64 #endif /* Support for a compile-time default CPU, et cetera. The rules are: -- Markus
[Patch][loop-invariant.c] Fix a couple of bugs regarding loop invariant motion discovered by spec2k6 on aarch64
Hi, This patch fixes a couple of issues I saw during the compilation of shell_lam.f for 410.bwaves test in spec2006: * create_new_invariant: We shouldn't bother attempting to calculate the address cost for something that clearly isn't an address. Use a SCALAR_INT_MODE_P check to eliminate the most obvious cases, such as vector modes and so on. * get_inv_cost: Change the code so that we don't treat something as an address if it is never actually used as an address, i.e. we didn't use it at all. This occurs due to the way we process the innermost loop first - invariants get pulled out of inner most loop and then get processed during next innermost loop. Here is more detail from shell_lam.f.210r.loop2_invariant dump file that shows what currently happens: 4427 *ending processing of loop 27 ** 4442 (const_vector:V2DF [ 4443 (const_double:DF 4.0e+0 [0x0.8p+3]) (const_double:DF 4.0e+0 [0x0.8p+3]) 4445 ]) 4446 4447 Hot cost: 8 (final) 4448 (set (reg:V2DF 3179) 4449 (const_vector:V2DF [ 4450 (const_double:DF 4.0e+0 [0x0.8p+3]) 4451 (const_double:DF 4.0e+0 [0x0.8p+3]) 4452 ])) 4453 4454 Hot cost: 8 (final) 4455 Set in insn 2764 is invariant (1), cost 8, depends on 4471 Decided to move invariant 1 -- gain 8 move_invariant_reg moves invariant (the const_vector above) into 3490 and replaces use of 3179 with 3490, but conservatively keeps the definition of 3179, which gets processed in outer loop, but is never used. Debug for outer loop now follows: 4497 *starting processing of loop 26 ** 6379 (insn 2764 2748 2766 110 (set (reg:V2DF 3490) 6380 (const_vector:V2DF [ 6381 (const_double:DF 4.0e+0 [0x0.8p+3]) 6382 (const_double:DF 4.0e+0 [0x0.8p+3]) 6383 ])) shell_lam.f:287 819 {*aarch64_simd_movv2df} 6384 (nil)) 6385 ;; UD chains for insn luid 88 uid 2766 6604 ;; UD chains for insn luid 21 uid 3836 6605 ;; reg 3490 { d419(bb 110 insn 2764) } 6606 (insn 3836 2763 2765 111 (set (reg:V2DF 3179) 6607 (reg:V2DF 3490)) shell_lam.f:287 -1 6608 (nil)) ^^^ additional insn generated by move_invariant_reg from loop 27 6836 ;; UD chains for insn luid 40 uid 2790 6837 ;; reg 3161 { d272(bb 111 insn 2746) } 6838 ;; reg 3201 { d305(bb 111 insn 2788) } 6839 ;; reg 3490 { d419(bb 110 insn 2764) } 6840 ;; eq_note reg 3161 { d272(bb 111 insn 2746) } 6841 ;; eq_note reg 3201 { d305(bb 111 insn 2788) } 6842 (insn 2790 2788 2791 111 (set (reg:V2DF 3203 [ vect__930.427 ]) 6843 (fma:V2DF (reg:V2DF 3161 [ MEM[base: vectp_q.371_2137, index: ivtmp.679_3214, offset: 0B] ]) 6844 (neg:V2DF (reg:V2DF 3490)) 6845 (reg:V2DF 3201 [ vect__928.424 ]))) shell_lam.f:287 1219 {fnmav2df4} 6846 (expr_list:REG_DEAD (reg:V2DF 3201 [ vect__928.424 ]) 6847 (expr_list:REG_DEAD (reg:V2DF 3179) ... ^^^ 3179 marked as DEAD 8366 *ending processing of loop 26 ** 8472 (const_vector:V2DF [ 8473 (const_double:DF 4.0e+0 [0x0.8p+3]) 8474 (const_double:DF 4.0e+0 [0x0.8p+3]) 8475 ]) 8476 8477 Hot cost: 8 (final) 8478 (set (reg:V2DF 3490) 8479 (const_vector:V2DF [ 8480 (const_double:DF 4.0e+0 [0x0.8p+3]) 8481 (const_double:DF 4.0e+0 [0x0.8p+3]) 8482 ])) 8483 8484 Hot cost: 8 (final) 8485 Set in insn 2764 is invariant (12), cost 8, depends on 8505 (set (reg:V2DF 3179) 8506 (reg:V2DF 3490)) 8507 8508 Hot cost: 8 (final) 8509 Set in insn 3836 is invariant (15), cost 8, depends on 12 ^^^ Never moves 3179 invariant out, even though it's never used. After my patch we now get this: 8586 Decided to move invariant 15 -- gain 16 8587 Decided to move dependent invariant 12 since get_inv_cost sees the number of uses is zero and the dependent invariant (12) gets moved too. Of course, without my patch the definition of 3179 would ultimately be eliminated as dead code in a later pass anyway. Is this ok to go in? Regards, David Sherwood. ChangeLog entry follows ... 2015-05-08 David Sherwood david.sherw...@arm.com * loop-invariant.c (create_new_invariant): Don't calculate address cost if mode is not scalar integers. (get_inv_cost): Increase computational cost for unused invariants. loop-invariant.patch Description: Binary data
Re: PATCH] PR target/65612: Multiversioning doesn't work with DSO nor PIE
On 11/05/15 11:31, Jakub Jelinek wrote: On Mon, May 11, 2015 at 11:20:15AM +0100, Szabolcs Nagy wrote: i think it might be enough to add __cpu_indicator_init_local as an alias to __cpu_indicator_init in libgcc.a and then use the *_local symbol from the ifunc resolver, that way no new dependency is added to libgcc_s.so handling. i tried this approach and it seems to work: passes all multiversioning tests on x86_64. i think it's no worse than the symver approach. is it ok to change the current fix to this? No. Instead of piling hacks like this just fix it in musl. libgcc certainly isn't the only library that uses @ symbol versions, e.g. libstdc++ does as well, as well as many other shared libraries. can you explain how using a standard elf feature is a hack, but the current symver asm directive is not? fyi, musl loader loads libstdc++ just fine because it has no relocations for symbols which only has sym@version definition (libgcc_s.so.1 has because __cpu_indicator_init is a ctor). musl may end up supporting @version but that's an independent quest. (the one big pile of hacks here is multiversioning: each function with mv has its own ifunc resolver repeating the same logic, then the 1.5K cpuinfo.o is static linked into every single dso that uses mv and the separate __cpu_model structs all have to be initialized.. including the unused one in libgcc_s.so.1 because of the ctor.. adding more startup overhead. i'd gladly propose a patch to remove this feature if getting rid of piling hacks has priority.. the current design seems problematic to me for other archs that may need to call libc functions to do the dispatch anyway.)
Re: PATCH] PR target/65612: Multiversioning doesn't work with DSO nor PIE
On Mon, May 11, 2015 at 01:39:17PM +0100, Szabolcs Nagy wrote: fyi, musl loader loads libstdc++ just fine because it has no But will it load any shared library that uses any of the 26 (if I count well on x86_64) @ symbols from libstdc++.so.6? readelf -Ws /lib64/libstdc++.so.6 | grep '@' | grep -v 'UND\|@@' For libstdc++ that is primarily C++ apps and shared libraries compiled/linked with GCC 4.0.0. musl may end up supporting @version but that's an independent quest. It is not independent. If musl claims to support symbol versioning, it should support it properly, if not, then supposedly gcc configured for musl can't be compatible with gcc configured for other linux C libraries, and should force symbol versioning off. Jakub
[PATCH] combine: Don't create (set (reg:CC) (compare (reg:CC) (const0)))
There already was code to just use the original reg:CC, but it was positioned incorrectly: if the original code (that this RTL is simplified from) did not already start with a COMPARE (or not in the right mode), it didn't trigger. But it is valid in that case as well. This then allows merging the other two arms of this conditional; do so. Bootstrapped and regression tested on powerpc64-linux and x86_64-linux; no regressions. Also built toolchains and Linux kernels for every arch where that works; no new failures on that. Any objections? Segher 2015-05-11 Segher Boessenkool seg...@kernel.crashing.org * combine.c (simplify_set): When generating a CC set, if the source already is in the correct mode, do not wrap it in a compare. Simplify the rest of that code. --- gcc/combine.c | 16 ++-- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/gcc/combine.c b/gcc/combine.c index 896d9d2..51f78a5 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -6675,20 +6675,16 @@ simplify_set (rtx x) if (other_changed) undobuf.other_insn = other_insn; - /* Otherwise, if we didn't previously have a COMPARE in the -correct mode, we need one. */ - if (GET_CODE (src) != COMPARE || GET_MODE (src) != compare_mode) - { - SUBST (SET_SRC (x), gen_rtx_COMPARE (compare_mode, op0, op1)); - src = SET_SRC (x); - } - else if (GET_MODE (op0) == compare_mode op1 == const0_rtx) + /* Don't generate a compare of a CC with 0, just use that CC. */ + if (GET_MODE (op0) == compare_mode op1 == const0_rtx) { SUBST (SET_SRC (x), op0); src = SET_SRC (x); } - /* Otherwise, update the COMPARE if needed. */ - else if (XEXP (src, 0) != op0 || XEXP (src, 1) != op1) + /* Otherwise, if we didn't previously have the same COMPARE we +want, create it from scratch. */ + else if (GET_CODE (src) != COMPARE || GET_MODE (src) != compare_mode + || XEXP (src, 0) != op0 || XEXP (src, 1) != op1) { SUBST (SET_SRC (x), gen_rtx_COMPARE (compare_mode, op0, op1)); src = SET_SRC (x); -- 1.8.1.4
Re: [PATCH] Expand PIC calls without PLT with -fno-plt
Hi, On Wed, 6 May 2015, Rich Felker wrote: I don't see how this case is improved unless GCC is failing to consider strong definitions in the same TU as locally-binding. Interposition of non-static non-inline non-weak symbols is supported independend of if they are defined in the same TU or not (if you're producing a shared lib, that is). I.e. no, they are not considered locally-binding (for instance, they aren't automatically inlined). If this is the case, is there a reason for that behavior? Because IMHO interposition is orthogonal to TU placement, and hence shouldn't be influenced by it. There's visibility, inline hints or static-ness to achieve different effects. (perhaps the real reason is: because it always worked like that :) ) IMO it's wrong. Why? I think it's right. Ciao, Michael.
Re: [patch, libgomp] Re-factor GOMP_MAP_POINTER handling
Ping. On 2015/4/21 08:21 PM, Chung-Lin Tang wrote: Hi, while investigating some issues in the variable mapping code, I observed that the GOMP_MAP_POINTER handling is essentially duplicated under the PSET case. This patch abstracts and unifies the handling code, basically just a cleanup patch. Ran libgomp tests to ensure no regressions, ok for trunk? Thanks, Chung-Lin 2015-04-21 Chung-Lin Tang clt...@codesourcery.com libgomp/ * target.c (gomp_map_pointer): New function abstracting out GOMP_MAP_POINTER handling. (gomp_map_vars): Remove GOMP_MAP_POINTER handling code and use gomp_map_pointer().
Fix verify_type ICE on TYPE_METHOS
Jason, this patch is trying to fix the following verify_type_ice that check that TYPE_METHODS is same for main variant and all its complete variants. - /* FIXME: this check triggers during libstdc++ build that is a bug. - It affects non-LTO debug output only, because free_lang_data clears - this anyway. */ - if (RECORD_OR_UNION_TYPE_P (t) COMPLETE_TYPE_P (t) 0 - TYPE_METHODS (t) != TYPE_METHODS (tv)) I tried to go in a way of making TYPE_METHODS to be defined only on main vairants as it seems impractical to update variants each time we change TYPE_METHODS. While doing so, I noticed that lazily_declare_fn sometimes adds new method to variant only (that is probably source of ICE as fixup_type_variants copies the TYPE_METHODS). The attach patch however removes the copying and verifies that TYPE_MEHTODS is NULL at all variants. It also updates middle-end uses to always check for main variant. Bootstrapped/regtested ppc64-linux, does this seem to make sense? I also noticed that TYPE_METHODS != NULL matters in some cases: it makes function.c to drop register keyword and it makes us to speak of class instead of struct in diagnostics, so I changed free_lang_data to zap the list to error_mark_node instead of NULL (in past I tried to stream it for ODR warnings and I remember it added measurable extra overhead that I considered too large for mainline). Honza * class.c (fixup_type_variants): Do not copy TYPE_METHODS (one_inheriting_sig): Assert tat we always set TYPE_METHODS of main variant. * semantics.c (finish_member_declaration): Likewise. * method.c (lazily_declare_fn): Allways add method to main variant list. * dwarf2out.c (gen_member_die): Sanity check that we access TYPE_MAIN_VARIANT for TYPE_METHODS. * function.c (use_register_for_decl): Look for TYPE_MAIN_VARIANT when checking TYPE_METHODS. * tree.c (free_lang_data_in_type): See TYPE_METHODS to error_mark_node if non-null. (build_distinct_type_copy): Clear TYPE_METHODS. (verify_type_variant): Verify that TYPE_METHODS is NULL for variants. (verify_type): Allow TYPE_METHODS to be error_mark_node. * tree.def: Update docs of YTPE_STUB_DECL and TYPE_METHODS. Index: cp/class.c === --- cp/class.c (revision 222991) +++ cp/class.c (working copy) @@ -1972,7 +1972,6 @@ fixup_type_variants (tree t) /* Copy whatever these are holding today. */ TYPE_VFIELD (variants) = TYPE_VFIELD (t); - TYPE_METHODS (variants) = TYPE_METHODS (t); TYPE_FIELDS (variants) = TYPE_FIELDS (t); } } @@ -3238,6 +3237,7 @@ one_inheriting_sig (tree t, tree ctor, t parmlist = tree_cons (NULL_TREE, parms[i], parmlist); tree fn = implicitly_declare_fn (sfk_inheriting_constructor, t, false, ctor, parmlist); + gcc_assert (TYPE_MAIN_VARIANT (t) == t); if (add_method (t, fn, NULL_TREE)) { DECL_CHAIN (fn) = TYPE_METHODS (t); Index: cp/method.c === --- cp/method.c (revision 222991) +++ cp/method.c (working copy) @@ -2189,11 +2189,11 @@ lazily_declare_fn (special_function_kind DECL_VIRTUAL_P (fn)) /* The ABI requires that a virtual destructor go at the end of the vtable. */ -TYPE_METHODS (type) = chainon (TYPE_METHODS (type), fn); +TYPE_METHODS (type) = chainon (TYPE_METHODS (TYPE_MAIN_VARIANT (type)), fn); else { - DECL_CHAIN (fn) = TYPE_METHODS (type); - TYPE_METHODS (type) = fn; + DECL_CHAIN (fn) = TYPE_METHODS (TYPE_MAIN_VARIANT (type)); + TYPE_METHODS (TYPE_MAIN_VARIANT (type)) = fn; } maybe_add_class_template_decl_list (type, fn, /*friend_p=*/0); if (DECL_MAYBE_IN_CHARGE_CONSTRUCTOR_P (fn) Index: cp/semantics.c === --- cp/semantics.c (revision 222991) +++ cp/semantics.c (working copy) @@ -2913,6 +2913,7 @@ finish_member_declaration (tree decl) CLASSTYPE_METHOD_VEC. */ if (add_method (current_class_type, decl, NULL_TREE)) { + gcc_assert (TYPE_MAIN_VARIANT (current_class_type) == current_class_type); DECL_CHAIN (decl) = TYPE_METHODS (current_class_type); TYPE_METHODS (current_class_type) = decl; Index: dwarf2out.c === --- dwarf2out.c (revision 222991) +++ dwarf2out.c (working copy) @@ -19945,6 +19945,8 @@ gen_member_die (tree type, dw_die_ref co gen_decl_die (member, NULL, context_die); } + /* We do not keep type methods in type variants. */ + gcc_assert (TYPE_MAIN_VARIANT (type) == type); /* Now output info about the function members (if any). */ for (member = TYPE_METHODS (type); member; member = DECL_CHAIN (member)) { Index: function.c
RE: [PATCH][MIPS] Enable load-load/store-store bonding
Hi Sameera, Sameera Deshpande sameera.deshpa...@imgtec.com writes: Changelog: gcc/ * config/mips/mips.md (JOIN_MODE): New mode iterator. (join2_load_StoreJOIN_MODE:mode): New pattern. (join2_loadhi): Likewise. (define_peehole2): Add peephole2 patterns to join 2 HI/SI/SF/DF-mode load-load and store-stores. * config/mips/mips.opt (mload-store-pairs): New option. (TARGET_LOAD_STORE_PAIRS): New macro. * config/mips/mips.h (ENABLE_LD_ST_PAIRS): Likewise. * config/mips/mips-protos.h (mips_load_store_bonding_p): New prototype. * config/mips/mips.c (mips_load_store_bonding_p): New function. gcc/testsuite/ * gcc.target/mips/p5600-bonding.c : New testcase to test bonding. Just 'New file.' is fine for the changelog. diff --git a/gcc/testsuite/gcc.target/mips/p5600-bonding.c b/gcc/testsuite/gcc.target/mips/p5600-bonding.c new file mode 100644 index 000..122b9f8 --- /dev/null +++ b/gcc/testsuite/gcc.target/mips/p5600-bonding.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options -dp -mtune=p5600 -mno-micromips -mno-mips16 } */ +/* { dg-skip-if Bonding needs peephole optimization. { *-*-* } { -O0 -O1 } { } } */ +typedef int VINT32 __attribute__ ((vector_size((16; + +void memory_operation_fun2_si(void * __restrict src, void * __restrict dest, int num) Code style applies for testcases too, return type on line above, space after function name, line length. +{ +VINT32 *vsrc = (VINT32 *)src; Indentation. +VINT32 *vdest = (VINT32 *)dest; +int i; + +for (i = 0; i num - 1; i+=2) +{ Indentation + vdest[i] = (vdest[i] + vsrc[i]); Unnecessary brackets. + vdest[i + 1] = vdest[i + 1] + vsrc[i + 1]; +} +} +/* { dg-final { scan-assembler join2_ } } */ + OK with those changes. Thanks, Matthew
[patch, fortran] pr66100 bound simplification refactoring fallout
Hello, This fixes a regression introduced by my bound simplification refactoring at https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00843.html The assert was introduced in the change above and checks that the refactoring didn't lose anything regarding the condition used in the existing code. The condition was assuming full object arrays, and fails with subobject arrays. The fix (attached) is obvious enough. The testcase is Thomas' reduced one, which is independent on matmul simplification. I plan to commit this patch tonight. Mikael 2015-05-11 Mikael Morin mik...@gcc.gnu.org * simplify.c (simplify_bound): Fix assert to accept subobject arrays. Index: simplify.c === --- simplify.c (révision 222979) +++ simplify.c (copie de travail) @@ -3463,8 +3463,8 @@ simplify_bound (gfc_expr *array, gfc_expr *dim, gf gcc_assert (!as || (as-type != AS_DEFERRED array-expr_type == EXPR_VARIABLE - !array-symtree-n.sym-attr.allocatable - !array-symtree-n.sym-attr.pointer)); + !gfc_expr_attr (array).allocatable + !gfc_expr_attr (array).pointer)); if (dim == NULL) { ! { dg-do compile } ! { dg-additional-options -fdump-tree-original } ! ! PR fortran/66100 ! ICE on lbound simplification ! ! Original test case by Joost VandeVondele joost.vandevond...@mat.ethz.ch ! Reduced by Thomas Koenig tkoe...@gcc.gnu.org ! MODULE qs_integrate_potential_low INTEGER, PARAMETER :: dp = 8 TYPE cell_type REAL(KIND=8) :: h_inv(3,3) END TYPE TYPE(cell_type), POINTER :: cell REAL(KIND=dp), DIMENSION(3) :: rp CONTAINS SUBROUTINE integrate_general_opt() REAL(KIND=dp) :: gp(3) INTEGER :: ng if (any(lbound(cell%h_inv) /= 1)) call abort if (any(ubound(cell%h_inv) /= 3)) call abort END SUBROUTINE integrate_general_opt END MODULE qs_integrate_potential_low ! { dg-final { scan-tree-dump-not bound original } } ! { dg-final { scan-tree-dump-not abort original } } ! { dg-final { cleanup-tree-dump original } }
[PATCH] libiberty: cleanup Makefile.in
* maint-tool: Refactor pic/ and noasan/ handling. * Makefile.in: Likewise. Regenerate dependencies. --- It's hard to see which parts of libiberty get built with which flags due to the current way the checks for pic and noasan are performed. So push this decision to make(1). For the jit it would be nice to build pic variants of libbackend etc so let's tweak the way libiberty does this before the current way gets copied over and over. I'd have used an order-only prerequisite for the output-dirs pic/ and noasan/ but that's left for a possible follow-up. Bootstrapped with configure -v --enable-languages=c,lto,fortran,c++,go,objc,obj-c++ \ CFLAGS=-g -O2 CXXFLAGS=-g -O2 BOOT_CFLAGS=-g -O2 \ BOOT_CXXFLAGS=-g -O2 CFLAGS_FOR_TARGET=-g -O2 \ CXXFLAGS_FOR_TARGET=-g -O2 \ --prefix=/opt/x86_64/gcc-6.0.mine/ --enable-shared \ --without-system-zlib --enable-nls --without-included-gettext \ --enable-threads=posix --enable-__cxa_atexit \ --enable-libstdcxx-allocator=mt --enable-clocale=gnu \ --enable-libstdcxx-debug --enable-mpfr --disable-werror \ --enable-checking=yes --enable-debug -C \ --disable-intermodule --enable-multilib --disable-libstdcxx-pch \ --enable-bootstrap --enable-checking=yes \ --with-cpu=native --with-tune=native --enable-plugin and regtested on x86_64-linux with no new regeressions. In addition, manual inspection of the flags used to compile the libiberty/{,*/}*.o indicate no behavioural change. Ok for trunk? (should wipe intermodule from configure and libgfortran too, just because i see that i still have it in this build-script..) --- libiberty/Makefile.in | 1425 + libiberty/maint-tool | 15 +- 2 files changed, 483 insertions(+), 957 deletions(-) diff --git a/libiberty/Makefile.in b/libiberty/Makefile.in index f06cc69..de0c8c9 100644 --- a/libiberty/Makefile.in +++ b/libiberty/Makefile.in @@ -69,6 +69,22 @@ MAKEOVERRIDES = TARGETLIB = ./libiberty.a TESTLIB = ./testlib.a +TARGETLIB_PIC = $(dir $(TARGETLIB))pic/$(notdir $(TARGETLIB)) +TARGETLIB_NOASAN = $(dir $(TARGETLIB))noasan/$(notdir $(TARGETLIB)) + +TARGETLIBS := $(TARGETLIB) + +DEFAULT_OFILES = $(REQUIRED_OFILES) $(EXTRA_OFILES) $(LIBOBJS) +ifneq ($(strip $(PICFLAG)),) +TARGETLIBS += $(TARGETLIB_PIC) +PIC_OFILES = $(patsubst ./%,$(dir $(TARGETLIB_PIC))%,$(DEFAULT_OFILES)) +endif + +ifneq ($(strip $(NOASANFLAG)),) +TARGETLIBS += $(TARGETLIB_NOASAN) +NOASAN_OFILES = $(patsubst ./%,$(dir $(TARGETLIB_NOASAN))%,$(DEFAULT_OFILES)) +endif + LIBOBJS = @LIBOBJS@ # A configuration can specify extra .o files that should be included, @@ -102,7 +118,7 @@ FLAGS_TO_PASS = \ SUBDIRS = testsuite # FIXME: add @BUILD_INFO@ once we're sure it works for everyone. -all: stamp-picdir stamp-noasandir $(TARGETLIB) required-list all-subdir +all: $(TARGETLIBS) required-list all-subdir @: $(MAKE) ; $(MULTIDO) $(FLAGS_TO_PASS) multi-do DO=all .PHONY: check installcheck @@ -113,7 +129,7 @@ installcheck: installcheck-subdir INCDIR=$(srcdir)/$(MULTISRCTOP)../include -COMPILE.c = $(CC) -c @DEFS@ $(CFLAGS) $(CPPFLAGS) -I. -I$(INCDIR) $(HDEFINES) @ac_libiberty_warn_cflags@ +COMPILE.c = $(CC) -c @DEFS@ $(CFLAGS) $(SUB_CFLAGS) $(CPPFLAGS) -I. -I$(INCDIR) $(HDEFINES) @ac_libiberty_warn_cflags@ # Just to make sure we don't use a built-in rule with VPATH .c.$(objext): @@ -243,25 +259,18 @@ INSTALLED_HEADERS = \ $(INCDIR)/splay-tree.h \ $(INCDIR)/timeval-utils.h -$(TARGETLIB): $(REQUIRED_OFILES) $(EXTRA_OFILES) $(LIBOBJS) - -rm -f $(TARGETLIB) pic/$(TARGETLIB) noasan/$(TARGETLIB) - $(AR) $(AR_FLAGS) $(TARGETLIB) \ - $(REQUIRED_OFILES) $(EXTRA_OFILES) $(LIBOBJS) - $(RANLIB) $(TARGETLIB) - if [ x$(PICFLAG) != x ]; then \ - cd pic; \ - $(AR) $(AR_FLAGS) $(TARGETLIB) \ - $(REQUIRED_OFILES) $(EXTRA_OFILES) $(LIBOBJS); \ - $(RANLIB) $(TARGETLIB); \ - cd ..; \ - else true; fi; \ - if [ x$(NOASANFLAG) != x ]; then \ - cd noasan; \ - $(AR) $(AR_FLAGS) $(TARGETLIB) \ - $(REQUIRED_OFILES) $(EXTRA_OFILES) $(LIBOBJS); \ - $(RANLIB) $(TARGETLIB); \ - cd ..; \ - else true; fi +$(TARGETLIB): $(DEFAULT_OFILES) + +$(TARGETLIB_PIC): $(PIC_OFILES) +$(PIC_OFILES): SUB_CFLAGS = $(PICFLAG) + +$(TARGETLIB_NOASAN): $(NOASAN_OFILES) +$(NOASAN_OFILES): SUB_CFLAGS = $(PICFLAG) $(NOASANFLAG) + +$(TARGETLIB) $(TARGETLIB_PIC) $(TARGETLIB_NOASAN): + -rm -f $@ + $(AR) $(AR_FLAGS) $@ $^ + $(RANLIB) $@ $(TESTLIB): $(REQUIRED_OFILES) $(CONFIGURED_OFILES) -rm -f $(TESTLIB) @@ -444,7 +453,7 @@ maint-deps : mostlyclean: mostlyclean-subdir -rm -rf *.$(objext) pic noasan core errs \#* *.E a.out -rm -f errors dummy config.h stamp-* - -rm -f $(CONFIG_H) stamp-picdir stamp-noasandir + -rm -f $(CONFIG_H) -rm -f libiberty.aux libiberty.cp libiberty.cps
Re: [PATCH][MIPS] Enable load-load/store-store bonding
On Monday 11 May 2015 05:43 PM, Matthew Fortune wrote: Hi Sameera, Sameera Deshpande sameera.deshpa...@imgtec.com writes: Changelog: gcc/ * config/mips/mips.md (JOIN_MODE): New mode iterator. (join2_load_StoreJOIN_MODE:mode): New pattern. (join2_loadhi): Likewise. (define_peehole2): Add peephole2 patterns to join 2 HI/SI/SF/DF-mode load-load and store-stores. * config/mips/mips.opt (mload-store-pairs): New option. (TARGET_LOAD_STORE_PAIRS): New macro. * config/mips/mips.h (ENABLE_LD_ST_PAIRS): Likewise. * config/mips/mips-protos.h (mips_load_store_bonding_p): New prototype. * config/mips/mips.c (mips_load_store_bonding_p): New function. gcc/testsuite/ * gcc.target/mips/p5600-bonding.c : New testcase to test bonding. Just 'New file.' is fine for the changelog. diff --git a/gcc/testsuite/gcc.target/mips/p5600-bonding.c b/gcc/testsuite/gcc.target/mips/p5600-bonding.c new file mode 100644 index 000..122b9f8 --- /dev/null +++ b/gcc/testsuite/gcc.target/mips/p5600-bonding.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options -dp -mtune=p5600 -mno-micromips -mno-mips16 } */ +/* { dg-skip-if Bonding needs peephole optimization. { *-*-* } { -O0 -O1 } { } } */ +typedef int VINT32 __attribute__ ((vector_size((16; + +void memory_operation_fun2_si(void * __restrict src, void * __restrict dest, int num) Code style applies for testcases too, return type on line above, space after function name, line length. +{ +VINT32 *vsrc = (VINT32 *)src; Indentation. +VINT32 *vdest = (VINT32 *)dest; +int i; + +for (i = 0; i num - 1; i+=2) +{ Indentation + vdest[i] = (vdest[i] + vsrc[i]); Unnecessary brackets. + vdest[i + 1] = vdest[i + 1] + vsrc[i + 1]; +} +} +/* { dg-final { scan-assembler join2_ } } */ + OK with those changes. Thanks, Matthew Hi Matthew, Thanks for the comments. Please find attached updated patch. I do not have permissions to apply the patch in GCC. Can you please submit the patch for me? Changelog: gcc/ * config/mips/mips.md (JOIN_MODE): New mode iterator. (join2_load_StoreJOIN_MODE:mode): New pattern. (join2_loadhi): Likewise. (define_peehole2): Add peephole2 patterns to join 2 HI/SI/SF/DF-mode load-load and store-stores. * config/mips/mips.opt (mload-store-pairs): New option. (TARGET_LOAD_STORE_PAIRS): New macro. * config/mips/mips.h (ENABLE_LD_ST_PAIRS): Likewise. * config/mips/mips-protos.h (mips_load_store_bonding_p): New prototype. * config/mips/mips.c (mips_load_store_bonding_p): New function. gcc/testsuite/ * gcc.target/mips/p5600-bonding.c : New file. diff --git a/gcc/config/mips/mips-protos.h b/gcc/config/mips/mips-protos.h index b48e04f..244eb8d 100644 --- a/gcc/config/mips/mips-protos.h +++ b/gcc/config/mips/mips-protos.h @@ -360,6 +360,7 @@ extern bool mips_epilogue_uses (unsigned int); extern void mips_final_prescan_insn (rtx_insn *, rtx *, int); extern int mips_trampoline_code_size (void); extern void mips_function_profiler (FILE *); +extern bool mips_load_store_bonding_p (rtx *, machine_mode, bool); typedef rtx (*mulsidi3_gen_fn) (rtx, rtx, rtx); #ifdef RTX_CODE diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index bf69850..4fc15c4 100644 --- a/gcc/config/mips/mips.c +++ b/gcc/config/mips/mips.c @@ -18241,6 +18241,66 @@ umips_load_store_pair_p_1 (bool load_p, bool swap_p, return true; } +bool +mips_load_store_bonding_p (rtx *operands, machine_mode mode, bool load_p) +{ + rtx reg1, reg2, mem1, mem2, base1, base2; + enum reg_class rc1, rc2; + HOST_WIDE_INT offset1, offset2; + + if (load_p) +{ + reg1 = operands[0]; + reg2 = operands[2]; + mem1 = operands[1]; + mem2 = operands[3]; +} + else +{ + reg1 = operands[1]; + reg2 = operands[3]; + mem1 = operands[0]; + mem2 = operands[2]; +} + + if (mips_address_insns (XEXP (mem1, 0), mode, false) == 0 + || mips_address_insns (XEXP (mem2, 0), mode, false) == 0) +return false; + + mips_split_plus (XEXP (mem1, 0), base1, offset1); + mips_split_plus (XEXP (mem2, 0), base2, offset2); + + /* Base regs do not match. */ + if (!REG_P (base1) || !rtx_equal_p (base1, base2)) +return false; + + /* Either of the loads is clobbering base register. It is legitimate to bond + loads if second load clobbers base register. However, hardware does not + support such bonding. */ + if (load_p + (REGNO (reg1) == REGNO (base1) + || (REGNO (reg2) == REGNO (base1 +return false; + + /* Loading in same registers. */ + if (load_p + REGNO (reg1) == REGNO (reg2)) +return false; + + /* The loads/stores are not of same type. */ + rc1 = REGNO_REG_CLASS (REGNO (reg1)); + rc2 = REGNO_REG_CLASS (REGNO (reg2)); + if (rc1 != rc2 + !reg_class_subset_p (rc1, rc2)
Re: [PATCH] Fix memory leak in C++ pretty printer
On 11/05/15 03:34, Patrick Palka wrote: In gcc/cp/error.c we initialize the C++ pretty printer object twice: first during statics initialization and later in a placement-new in init_error(). This double-initialization causes a memory leak of about 7kb according to valgrind. I don't see a reason to initialize the object a second time so I elected to remove init_error(). I seem to remember there is some issue with the constructors when using static initialization that requires the placement-new. We also do the placement-new dance in the other FEs and the reason should be the same. My preference would be to replace the static with a pointer and placement-new with proper new and delete, but see: https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00910.html If you change it here, please change it everywhere else where we use placement-new, such that all FEs are consistent on this. Cheers, Manuel.
Re: [RFC 0/6] Flags outputs for asms
On Fri, May 08, 2015 at 01:15:25PM -0700, Richard Henderson wrote: But it *does* try to match an intermediate pattern, (set (reg:CCGC 17 flags) (compare:CCGC (reg:CCGC 17 flags) (const_int 0 [0]))) Patch posted at http://gcc.gnu.org/ml/gcc-patches/2015-05/msg00918.html. Segher
[PATCH 00/13] S/390 Implement support for IBM z13
The attached patchset adds support for the IBM z13 machine to the GCC S/390 backend. The machine has been announced recently: http://www-03.ibm.com/press/us/en/pressrelease/45808.wss IBM z/Architecture Principles of Operation http://publibfi.boulder.ibm.com/epubs/pdf/dz9zr010.pdf The required Binutils support is upstream since January: https://sourceware.org/ml/binutils/2015-01/msg00197.html Highlights from a toolchain perspective are: - 32 128 bit vector registers (overlapping with the existing 16 64 bit floating point registers) - vector double instructions - vector integer instructions - scalar vector instructions (allowing to have more floating point registers for scalar operations) - vector string instructions I would like to commit this patchset also to GCC 5 branch in order to enable distros to pick it up more easily. Andreas Krebbel (13): recog: Increased max number of alternatives. optabs: Fix vec_perm - V16QI middle end lowering. S/390 Fix secondary reload issue with store/load relative operands. S/390 Add -march/-mtune=z13 option. S/390 Vector base support. Vector base support - testcases S/390 Add vector scalar instruction support. S/390 zvector builtin support. S/390 Add zvector testcases. Testsuite These testcases require disabling hardware vector support on S/390. Testsuite S/390 vector types are only 8 byte aligned. S/390 Vector ABI GNU Attribute. S/390 Invalid vector binary ops gcc/common/config/s390/s390-common.c |3 + gcc/config.gcc | 26 +- gcc/config/s390/constraints.md | 28 + gcc/config/s390/predicates.md | 12 +- gcc/config/s390/s390-builtin-types.def | 747 ++ gcc/config/s390/s390-builtins.def | 2486 gcc/config/s390/s390-builtins.h| 160 ++ gcc/config/s390/s390-c.c | 907 +++ gcc/config/s390/s390-modes.def | 61 + gcc/config/s390/s390-opts.h|1 + gcc/config/s390/s390-protos.h | 17 + gcc/config/s390/s390.c | 2314 +++--- gcc/config/s390/s390.h | 220 +- gcc/config/s390/s390.md| 800 +-- gcc/config/s390/s390.opt | 11 + gcc/config/s390/s390intrin.h |3 + gcc/config/s390/t-s390 | 27 + gcc/config/s390/vecintrin.h| 311 +++ gcc/config/s390/vector.md | 1228 ++ gcc/config/s390/vx-builtins.md | 2081 gcc/configure | 36 + gcc/configure.ac |7 + gcc/optabs.c | 18 +- gcc/recog.h|2 +- gcc/testsuite/gcc.dg/tree-ssa/gen-vect-11b.c |1 + gcc/testsuite/gcc.dg/tree-ssa/gen-vect-11c.c |1 + gcc/testsuite/gcc.target/s390/s390.exp | 18 + gcc/testsuite/gcc.target/s390/vector/vec-abi-1.c | 18 + gcc/testsuite/gcc.target/s390/vector/vec-abi-2.c | 15 + gcc/testsuite/gcc.target/s390/vector/vec-abi-3.c | 101 + gcc/testsuite/gcc.target/s390/vector/vec-abi-4.c | 19 + .../gcc.target/s390/vector/vec-abi-align-1.c | 48 + .../gcc.target/s390/vector/vec-abi-attr-1.c| 18 + .../gcc.target/s390/vector/vec-abi-attr-2.c| 53 + .../gcc.target/s390/vector/vec-abi-attr-3.c| 18 + .../gcc.target/s390/vector/vec-abi-attr-4.c| 17 + .../gcc.target/s390/vector/vec-abi-attr-5.c| 19 + .../gcc.target/s390/vector/vec-abi-attr-6.c| 24 + .../gcc.target/s390/vector/vec-abi-single-1.c | 24 + .../gcc.target/s390/vector/vec-abi-single-2.c | 12 + .../gcc.target/s390/vector/vec-abi-struct-1.c | 37 + .../gcc.target/s390/vector/vec-abi-vararg-1.c | 60 + .../gcc.target/s390/vector/vec-abi-vararg-2.c | 18 + .../gcc.target/s390/vector/vec-clobber-1.c | 38 + gcc/testsuite/gcc.target/s390/vector/vec-cmp-1.c | 45 + gcc/testsuite/gcc.target/s390/vector/vec-cmp-2.c | 38 + .../s390/vector/vec-dbl-math-compile-1.c | 48 + .../gcc.target/s390/vector/vec-genbytemask-1.c | 70 + .../gcc.target/s390/vector/vec-genbytemask-2.c | 46 + .../gcc.target/s390/vector/vec-genmask-1.c | 70 + .../gcc.target/s390/vector/vec-genmask-2.c | 46 + gcc/testsuite/gcc.target/s390/vector/vec-init-1.c | 68 + .../s390/vector/vec-int-math-compile-1.c | 40 + .../gcc.target/s390/vector/vec-scalar-cmp-1.c | 49 + gcc/testsuite/gcc.target/s390/vector/vec-shift-1.c | 108 + gcc/testsuite/gcc.target/s390/vector/vec-sub-1.c | 51 +
[PATCH 03/13] S/390 Fix secondary reload issue with store/load relative operands.
We need a scratch register for loading from or storing to a symbolic memory reference where we cannot use the load/store relative instructions for. However, the check currently fails to handle floating point modes in GPRs correctly. gcc/ * config/s390/s390.c (s390_secondary_reload): Fix check for load/store relative. --- gcc/config/s390/s390.c | 16 +++- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 7d16048..cc37618 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -3141,17 +3141,15 @@ s390_secondary_reload (bool in_p, rtx x, reg_class_t rclass_i, sri-icode = ((mode == DImode) ? CODE_FOR_reloaddi_larl_odd_addend_z10 : CODE_FOR_reloadsi_larl_odd_addend_z10); - /* On z10 we need a scratch register when moving QI, TI or floating -point mode values from or to a memory location with a SYMBOL_REF -or if the symref addend of a SI or DI move is not aligned to the -width of the access. */ + /* Handle all the (mem (symref)) accesses we cannot use the z10 +instructions for. */ if (MEM_P (x) s390_loadrelative_operand_p (XEXP (x, 0), NULL, NULL) - (mode == QImode || mode == TImode || FLOAT_MODE_P (mode) - || (!TARGET_ZARCH mode == DImode) - || ((mode == HImode || mode == SImode || mode == DImode) - (!s390_check_symref_alignment (XEXP (x, 0), - GET_MODE_SIZE (mode)) + (mode == QImode + || !reg_classes_intersect_p (GENERAL_REGS, rclass) + || GET_MODE_SIZE (mode) UNITS_PER_WORD + || !s390_check_symref_alignment (XEXP (x, 0), + GET_MODE_SIZE (mode { #define __SECONDARY_RELOAD_CASE(M,m) \ case M##mode: \ -- 1.7.9.5
[PATCH 01/13] recog: Increased max number of alternatives.
With the vector facility support z13 mov patterns have more than 30 alternatives. gcc/ * recog.h: Increase MAX_RECOG_ALTERNATIVES. --- gcc/recog.h |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/recog.h b/gcc/recog.h index 8a38b26..4d8ca0c 100644 --- a/gcc/recog.h +++ b/gcc/recog.h @@ -23,7 +23,7 @@ along with GCC; see the file COPYING3. If not see /* Random number that should be large enough for all purposes. Also define a type that has at least MAX_RECOG_ALTERNATIVES + 1 bits, with the extra bit giving an invalid value that can be used to mean uninitialized. */ -#define MAX_RECOG_ALTERNATIVES 30 +#define MAX_RECOG_ALTERNATIVES 35 typedef unsigned int alternative_mask; /* A mask of all alternatives. */ -- 1.7.9.5
[PATCH 05/13] S/390 Vector base support.
gcc/ * config/s390/constraints.md (j00, jm1, jxx, jyy, v): New constraints. * config/s390/predicates.md (const0_operand, constm1_operand) (constable_operand): Accept vector operands. * config/s390/s390-modes.def: Add supported vector modes. * config/s390/s390-protos.h (s390_cannot_change_mode_class) (s390_function_arg_vector, s390_contiguous_bitmask_vector_p) (s390_bytemask_vector_p, s390_expand_vec_strlen) (s390_expand_vec_compare, s390_expand_vcond) (s390_expand_vec_init): Add prototypes. * config/s390/s390.c (VEC_ARG_NUM_REG): New macro. (s390_vector_mode_supported_p): New function. (s390_contiguous_bitmask_p): Mask out the irrelevant bits. (s390_contiguous_bitmask_vector_p): New function. (s390_bytemask_vector_p): New function. (s390_split_ok_p): Vector regs don't work either. (regclass_map): Add VEC_REGS. (s390_legitimate_constant_p): Handle vector constants. (s390_cannot_force_const_mem): Handle CONST_VECTOR. (legitimate_reload_vector_constant_p): New function. (s390_preferred_reload_class): Handle CONST_VECTOR. (s390_reload_symref_address): Likewise. (s390_secondary_reload): Vector memory instructions only support short displacements. Rename reload*_nonoffmem* to reload*_la*. (s390_emit_ccraw_jump): New function. (s390_expand_vec_strlen): New function. (s390_expand_vec_compare): New function. (s390_expand_vcond): New function. (s390_expand_vec_init): New function. (s390_dwarf_frame_reg_mode): New function. (print_operand): Handle addresses with 'O' and 'R' constraints. (NR_C_MODES, constant_modes): Add vector modes. (s390_output_pool_entry): Handle vector constants. (s390_hard_regno_mode_ok): Handle vector registers. (s390_class_max_nregs): Likewise. (s390_cannot_change_mode_class): New function. (s390_invalid_arg_for_unprototyped_fn): New function. (s390_function_arg_vector): New function. (s390_function_arg_float): Remove size variable. (s390_pass_by_reference): Handle vector arguments. (s390_function_arg_advance): Likewise. (s390_function_arg): Likewise. (s390_return_in_memory): Vector values are returned in a VR if possible. (s390_function_and_libcall_value): Handle vector arguments. (s390_gimplify_va_arg): Likewise. (s390_call_saved_register_used): Consider the arguments named. (s390_conditional_register_usage): Disable v16-v31 for non-vec targets. (s390_preferred_simd_mode): New function. (s390_support_vector_misalignment): New function. (s390_vector_alignment): New function. (TARGET_STRICT_ARGUMENT_NAMING, TARGET_DWARF_FRAME_REG_MODE) (TARGET_VECTOR_MODE_SUPPORTED_P) (TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN) (TARGET_VECTORIZE_PREFERRED_SIMD_MODE) (TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT) (TARGET_VECTOR_ALIGNMENT): Define target macro. * config/s390/s390.h (FUNCTION_ARG_PADDING): Define macro. (FIRST_PSEUDO_REGISTER): Increase value. (VECTOR_NOFP_REGNO_P, VECTOR_REGNO_P, VECTOR_NOFP_REG_P) (VECTOR_REG_P): Define macros. (FIXED_REGISTERS, CALL_USED_REGISTERS) (CALL_REALLY_USED_REGISTERS, REG_ALLOC_ORDER) (HARD_REGNO_CALL_PART_CLOBBERED, REG_CLASS_NAMES) (FUNCTION_ARG_REGNO_P, FUNCTION_VALUE_REGNO_P, REGISTER_NAMES): Add vector registers. (CANNOT_CHANGE_MODE_CLASS): Call C function. (enum reg_class): Add VEC_REGS, ADDR_VEC_REGS, GENERAL_VEC_REGS. (SECONDARY_MEMORY_NEEDED): Allow SF-SI mode moves without memory. (DBX_REGISTER_NUMBER, FIRST_VEC_ARG_REGNO, LAST_VEC_ARG_REGNO) (SHORT_DISP_IN_RANGE, VECTOR_STORE_FLAG_VALUE): Define macro. * config/s390/s390.md (UNSPEC_VEC_*): New constants. (VR*_REGNUM): New constants. (ALL): New mode iterator. (INTALL): Remove mode iterator. Include vector.md. (movti): Implement TImode moves for VRs. Disable TImode splitter for VR targets. Implement splitting TImode GPR-VR moves. (reload*_tomem_z10, reload*_toreg_z10): Replace INTALL with ALL. (reloadmode_nonoffmem_in, reloadmode_nonoffmem_out): Rename to reloadmode_la_in, reloadmode_la_out. (*movdi_64, *movsi_zarch, *movhi, *movqi, *movmode_64dfp) (*movmode_64, *movmode_31): Add vector instructions. (TD/TF mode splitter): Enable for GPRs only (formerly !FP). (movmode SF SD): Prefer lder, lde for loading. Add lrl and strl instructions. Add vector instructions. (strlenmode): Rename old strlenmode to strlen_srstmode. Call s390_expand_vec_strlen on z13. (*cc_to_int): Change
[PATCH 06/13] Vector base support - testcases
gcc/testsuite/ * gcc.target/s390/s390.exp (check_effective_target_vector): New check. * gcc.target/s390/vector/vec-abi-1.c: New test. * gcc.target/s390/vector/vec-abi-2.c: New test. * gcc.target/s390/vector/vec-abi-3.c: New test. * gcc.target/s390/vector/vec-abi-4.c: New test. * gcc.target/s390/vector/vec-abi-align-1.c: New test. * gcc.target/s390/vector/vec-abi-single-1.c: New test. * gcc.target/s390/vector/vec-abi-single-2.c: New test. * gcc.target/s390/vector/vec-abi-struct-1.c: New test. * gcc.target/s390/vector/vec-abi-vararg-1.c: New test. * gcc.target/s390/vector/vec-abi-vararg-2.c: New test. * gcc.target/s390/vector/vec-clobber-1.c: New test. * gcc.target/s390/vector/vec-cmp-1.c: New test. * gcc.target/s390/vector/vec-cmp-2.c: New test. * gcc.target/s390/vector/vec-dbl-math-compile-1.c: New test. * gcc.target/s390/vector/vec-genbytemask-1.c: New test. * gcc.target/s390/vector/vec-genbytemask-2.c: New test. * gcc.target/s390/vector/vec-genmask-1.c: New test. * gcc.target/s390/vector/vec-genmask-2.c: New test. * gcc.target/s390/vector/vec-init-1.c: New test. * gcc.target/s390/vector/vec-int-math-compile-1.c: New test. * gcc.target/s390/vector/vec-shift-1.c: New test. * gcc.target/s390/vector/vec-sub-1.c: New test. --- gcc/testsuite/gcc.target/s390/s390.exp | 18 gcc/testsuite/gcc.target/s390/vector/vec-abi-1.c | 17 +++ gcc/testsuite/gcc.target/s390/vector/vec-abi-2.c | 15 +++ gcc/testsuite/gcc.target/s390/vector/vec-abi-3.c | 101 ++ gcc/testsuite/gcc.target/s390/vector/vec-abi-4.c | 19 .../gcc.target/s390/vector/vec-abi-align-1.c | 48 + .../gcc.target/s390/vector/vec-abi-single-1.c | 24 + .../gcc.target/s390/vector/vec-abi-single-2.c | 12 +++ .../gcc.target/s390/vector/vec-abi-struct-1.c | 37 +++ .../gcc.target/s390/vector/vec-abi-vararg-1.c | 60 +++ .../gcc.target/s390/vector/vec-abi-vararg-2.c | 18 .../gcc.target/s390/vector/vec-clobber-1.c | 38 +++ gcc/testsuite/gcc.target/s390/vector/vec-cmp-1.c | 45 gcc/testsuite/gcc.target/s390/vector/vec-cmp-2.c | 38 +++ .../s390/vector/vec-dbl-math-compile-1.c | 48 + .../gcc.target/s390/vector/vec-genbytemask-1.c | 70 + .../gcc.target/s390/vector/vec-genbytemask-2.c | 46 + .../gcc.target/s390/vector/vec-genmask-1.c | 70 + .../gcc.target/s390/vector/vec-genmask-2.c | 46 + gcc/testsuite/gcc.target/s390/vector/vec-init-1.c | 68 .../s390/vector/vec-int-math-compile-1.c | 40 gcc/testsuite/gcc.target/s390/vector/vec-shift-1.c | 108 gcc/testsuite/gcc.target/s390/vector/vec-sub-1.c | 51 + 23 files changed, 1037 insertions(+) create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-1.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-2.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-3.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-4.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-align-1.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-single-1.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-single-2.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-struct-1.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-vararg-1.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-vararg-2.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-clobber-1.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-cmp-1.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-cmp-2.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-dbl-math-compile-1.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-genbytemask-1.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-genbytemask-2.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-genmask-1.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-genmask-2.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-init-1.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-int-math-compile-1.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-shift-1.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-sub-1.c diff --git a/gcc/testsuite/gcc.target/s390/s390.exp b/gcc/testsuite/gcc.target/s390/s390.exp index 431e2c0..eb1d73b 100644 --- a/gcc/testsuite/gcc.target/s390/s390.exp +++ b/gcc/testsuite/gcc.target/s390/s390.exp @@ -37,6 +37,21 @@ proc check_effective_target_htm { } { }] -march=zEC12 -mzarch ] } { return 0 } else { return
[PATCH 07/13] S/390 Add vector scalar instruction support.
With this patch GCC makes use of the vector instruction which are available in single element mode. By using these instructions scalar double operations can use 32 registers. gcc/ * config/s390/s390-modes.def: Add new modes CCVEQ, CCVFH, and CCVFHE. * config/s390/s390.c (s390_match_ccmode_set): Handle new modes. (s390_select_ccmode): Likewise. (s390_canonicalize_comparison): Swap operands if necessary. (s390_expand_vec_compare_scalar): Expand DFmode compare using single element vector instructions. (s390_emit_compare): Call s390_expand_vec_compare_scalar. (s390_branch_condition_mask): Generate CC masks for the new modes. * config/s390/s390.md (v0, vf, vd): New mode attributes. (VFCMP, asm_fcmp, insn_cmp): New mode iterator and attributes. (*vec_cmpinsn_cmpdf_cconly, *fixuns_truncdfdi2_z13) (*fix_truncBFP:modeGPR:mode2_bfp, *floatunsdidf2_z13) (*floatunsGPR:modeFP:mode2, *extendsfdf2_z13) (*extendDSF:modeBFP:mode2): New insn definition. (fix_truncBFP:modeGPR:mode2_bfp, loatunsGPR:modeFP:mode2) (extendDSF:modeBFP:mode2): Turn into expander. (floatdimode2, truncdfsf2, addmode3, submode3, mulmode3) (divmode3, *negmode2, *absmode2, *negabsmode2) (sqrtmode2): Add vector instruction. gcc/testsuite/ * gcc.target/s390/vector/vec-scalar-cmp-1.c: New test. --- gcc/config/s390/s390-modes.def | 10 + gcc/config/s390/s390.c | 131 - gcc/config/s390/s390.md| 304 ++-- .../gcc.target/s390/vector/vec-scalar-cmp-1.c | 49 4 files changed, 403 insertions(+), 91 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-scalar-cmp-1.c diff --git a/gcc/config/s390/s390-modes.def b/gcc/config/s390/s390-modes.def index a40559e..26c0a81 100644 --- a/gcc/config/s390/s390-modes.def +++ b/gcc/config/s390/s390-modes.def @@ -84,7 +84,12 @@ Requested mode- Destination CC register mode CCS, CCU, CCT, CCSR, CCUR - CCZ CCA - CCAP, CCAN +Vector comparison modes +CCVEQEQ -- NE (VCEQ) + +CCVFHGT -- UNLE (VFCH) +CCVFHE GE -- UNLT (VFCHE) *** Comments *** CCAP, CCAN @@ -182,6 +187,11 @@ CC_MODE (CCT2); CC_MODE (CCT3); CC_MODE (CCRAW); +CC_MODE (CCVEQ); +CC_MODE (CCVFH); +CC_MODE (CCVFHE); + + /* Vector modes. */ VECTOR_MODES (INT, 2);/* V2QI */ diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 11fed14..848cc0c 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -681,6 +681,9 @@ s390_match_ccmode_set (rtx set, machine_mode req_mode) case CCT1mode: case CCT2mode: case CCT3mode: +case CCVEQmode: +case CCVFHmode: +case CCVFHEmode: if (req_mode != set_mode) return 0; break; @@ -781,6 +784,29 @@ s390_tm_ccmode (rtx op1, rtx op2, bool mixed) machine_mode s390_select_ccmode (enum rtx_code code, rtx op0, rtx op1) { + if (TARGET_VX + register_operand (op0, DFmode) + register_operand (op1, DFmode)) +{ + /* LT, LE, UNGT, UNGE require swapping OP0 and OP1. Either +s390_emit_compare or s390_canonicalize_comparison will take +care of it. */ + switch (code) + { + case EQ: + case NE: + return CCVEQmode; + case GT: + case UNLE: + return CCVFHmode; + case GE: + case UNLT: + return CCVFHEmode; + default: + ; + } +} + switch (code) { case EQ: @@ -1058,8 +1084,74 @@ s390_canonicalize_comparison (int *code, rtx *op0, rtx *op1, rtx tem = *op0; *op0 = *op1; *op1 = tem; *code = (int)swap_condition ((enum rtx_code)*code); } + + /* Using the scalar variants of vector instructions for 64 bit FP + comparisons might require swapping the operands. */ + if (TARGET_VX + register_operand (*op0, DFmode) + register_operand (*op1, DFmode) + (*code == LT || *code == LE || *code == UNGT || *code == UNGE)) +{ + rtx tmp; + + switch (*code) + { + case LT: *code = GT; break; + case LE: *code = GE; break; + case UNGT: *code = UNLE; break; + case UNGE: *code = UNLT; break; + default: ; + } + tmp = *op0; *op0 = *op1; *op1 = tmp; +} } +/* Helper function for s390_emit_compare. If possible emit a 64 bit + FP compare using the single element variant of vector instructions. + Replace CODE with the comparison code to be used in the CC reg + compare and return the condition code register RTX in CC. */ + +static bool +s390_expand_vec_compare_scalar (enum rtx_code *code, rtx cmp1, rtx cmp2, + rtx *cc) +{ + machine_mode
[PATCH 02/13] optabs: Fix vec_perm - V16QI middle end lowering.
The current implementation re-uses the location of the selection pattern to generate a new one. This fails if the pattern resides in a read-only location. With the patch a new temporary register is allocated for that purpose. gcc/ * optabs.c (expand_vec_perm): Allocate a temp reg for the new select pattern. --- gcc/optabs.c | 18 +++--- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/gcc/optabs.c b/gcc/optabs.c index 983c8d9..8926efa 100644 --- a/gcc/optabs.c +++ b/gcc/optabs.c @@ -6784,14 +6784,18 @@ expand_vec_perm (machine_mode mode, rtx v0, rtx v1, rtx sel, rtx target) { /* Multiply each element by its byte size. */ machine_mode selmode = GET_MODE (sel); + /* We cannot re-use SEL as a temp operand since it might by in +read-only storage. */ + rtx sel_reg = gen_reg_rtx (selmode); + if (u == 2) - sel = expand_simple_binop (selmode, PLUS, sel, sel, - sel, 0, OPTAB_DIRECT); + sel_reg = expand_simple_binop (selmode, PLUS, sel, sel, + sel_reg, 0, OPTAB_DIRECT); else - sel = expand_simple_binop (selmode, ASHIFT, sel, - GEN_INT (exact_log2 (u)), - sel, 0, OPTAB_DIRECT); - gcc_assert (sel != NULL); + sel_reg = expand_simple_binop (selmode, ASHIFT, sel, + GEN_INT (exact_log2 (u)), + sel_reg, 0, OPTAB_DIRECT); + gcc_assert (sel_reg != NULL); /* Broadcast the low byte each element into each of its bytes. */ vec = rtvec_alloc (w); @@ -6803,7 +6807,7 @@ expand_vec_perm (machine_mode mode, rtx v0, rtx v1, rtx sel, rtx target) RTVEC_ELT (vec, i) = GEN_INT (this_e); } tmp = gen_rtx_CONST_VECTOR (qimode, vec); - sel = gen_lowpart (qimode, sel); + sel = gen_lowpart (qimode, sel_reg); sel = expand_vec_perm (qimode, sel, sel, tmp, NULL); gcc_assert (sel != NULL); -- 1.7.9.5
[RFC 12/13] S/390 Vector ABI GNU Attribute.
With this patch .gnu_attribute is used to mark binaries with a vector ABI tag. This is required since the z13 vector support breaks the ABI of existing vector_size attribute generated vector types: 1. vector_size(16) and bigger vectors are aligned to 8 byte boundaries (formerly vectors were always naturally aligned) 2. vector_size(16) or smaller vectors are passed via VR if available or by value on the stack (formerly vector were passed on the stack by reference). The .gnu_attribute will be used by ld to emit a warning if binaries with incompatible ABIs are being linked together: https://sourceware.org/ml/binutils/2015-04/msg00316.html And it will be used by GDB to perform inferior function calls using a vector ABI which fits to the binary being debugged: https://sourceware.org/ml/gdb-patches/2015-04/msg00833.html The current implementation tries to only set the attribute if the vector types are really used in ABI relevant contexts in order to avoid false positives during linking. However, this unfortunately has some limitations like in the following case where an ABI relevant context cannot be detected properly: typedef int __attribute__((vector_size(16))) v4si; struct A { char x; v4si y; }; char a[sizeof(struct A)]; The number of elements in a depends on the ABI (24 with -mvx and 32 with -mno-vx). However, the implementation is not able to detect this since the struct type is not used anywhere else and consequently does not survive until the checking code is able to see it. Ideas about how to improve the implementation without creating too many false postives are welcome. In particular we do not want to set the attribute for local uses of vector types as they would be natural for ifunc optimizations. gcc/ * config/s390/s390.c (s390_vector_abi): New variable definition. (s390_check_type_for_vector_abi): New function. (TARGET_ASM_FILE_END): New macro definition. (s390_asm_file_end): New function. (s390_function_arg): Call s390_check_type_for_vector_abi. (s390_gimplify_va_arg): Likewise. * configure: Regenerate. * configure.ac: Check for .gnu_attribute Binutils feature. gcc/testsuite/ * gcc.target/s390/vector/vec-abi-1.c: Add gnu attribute check. * gcc.target/s390/vector/vec-abi-attr-1.c: New test. * gcc.target/s390/vector/vec-abi-attr-2.c: New test. * gcc.target/s390/vector/vec-abi-attr-3.c: New test. * gcc.target/s390/vector/vec-abi-attr-4.c: New test. * gcc.target/s390/vector/vec-abi-attr-5.c: New test. * gcc.target/s390/vector/vec-abi-attr-6.c: New test. --- gcc/config/s390/s390.c | 120 gcc/configure | 36 ++ gcc/configure.ac |7 ++ gcc/testsuite/gcc.target/s390/vector/vec-abi-1.c |1 + .../gcc.target/s390/vector/vec-abi-attr-1.c| 18 +++ .../gcc.target/s390/vector/vec-abi-attr-2.c| 53 + .../gcc.target/s390/vector/vec-abi-attr-3.c| 18 +++ .../gcc.target/s390/vector/vec-abi-attr-4.c| 17 +++ .../gcc.target/s390/vector/vec-abi-attr-5.c| 19 .../gcc.target/s390/vector/vec-abi-attr-6.c| 24 10 files changed, 313 insertions(+) create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-1.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-2.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-3.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-4.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-5.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-6.c diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index e682516..e1ae1ed 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -466,6 +466,97 @@ struct GTY(()) machine_function #define PREDICT_DISTANCE (TARGET_Z10 ? 384 : 2048) +/* Indicate which ABI has been used for passing vector args. + 0 - no vector type arguments have been passed where the ABI is relevant + 1 - the old ABI has been used + 2 - a vector type argument has been passed either in a vector register + or on the stack by value */ +static int s390_vector_abi = 0; + +/* Set the vector ABI marker if TYPE is subject to the vector ABI + switch. The vector ABI affects only vector data types. There are + two aspects of the vector ABI relevant here: + + 1. vectors = 16 bytes have an alignment of 8 bytes with the new + ABI and natural alignment with the old. + + 2. vector = 16 bytes are passed in VRs or by value on the stack + with the new ABI but by reference on the stack with the old. + + If ARG_P is true TYPE is used for a function argument or return + value. The ABI marker then is set for all vector data types. If + ARG_P is false only type 1 vectors are being checked. */ +
[PATCH 10/13] Testsuite These testcases require disabling hardware vector support on S/390.
gcc/testsuite/ * gcc.dg/tree-ssa/gen-vect-11b.c: Disable vector instructions on s390*. * gcc.dg/tree-ssa/gen-vect-11c.c: Likewise. --- gcc/testsuite/gcc.dg/tree-ssa/gen-vect-11b.c |1 + gcc/testsuite/gcc.dg/tree-ssa/gen-vect-11c.c |1 + 2 files changed, 2 insertions(+) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-11b.c b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-11b.c index 50dea9c..41d0e0c 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-11b.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-11b.c @@ -1,5 +1,6 @@ /* { dg-do run { target vect_cmdline_needed } } */ /* { dg-options -O2 -ftree-vectorize -fdump-tree-vect-details } */ +/* { dg-options -O2 -ftree-vectorize -fdump-tree-vect-details -mno-vx { target { s390*-*-* } } } */ /* { dg-options -O2 -ftree-vectorize -fdump-tree-vect-details -mno-sse { target { i?86-*-* x86_64-*-* } } } */ #include stdlib.h diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-11c.c b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-11c.c index f3ada99..cf0aef7 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-11c.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-11c.c @@ -1,5 +1,6 @@ /* { dg-do run { target vect_cmdline_needed } } */ /* { dg-options -O2 -ftree-vectorize -fdump-tree-vect-details } */ +/* { dg-options -O2 -ftree-vectorize -fdump-tree-vect-details -mno-vx { target { s390*-*-* } } } */ /* { dg-options -O2 -ftree-vectorize -fdump-tree-vect-details -mno-sse { target { i?86-*-* x86_64-*-* } } } */ #include stdlib.h -- 1.7.9.5
[PATCH 11/13] Testsuite S/390 vector types are only 8 byte aligned.
gcc/testsuite/ * lib/target-supports.exp: Vector do not always have natural alignment on s390*. --- gcc/testsuite/lib/target-supports.exp |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index c5d0ffe..155cefa 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -4347,7 +4347,8 @@ proc check_effective_target_vect_natural_alignment { } { } else { set et_vect_natural_alignment_saved 1 if { [check_effective_target_arm_eabi] -|| [istarget nvptx-*-*] } { +|| [istarget nvptx-*-*] +|| [istarget s390*-*-*] } { set et_vect_natural_alignment_saved 0 } } -- 1.7.9.5
[PATCH 13/13] S/390 Invalid vector binary ops
This is a first try to implement at least some of the requirements regarding the vector bool type documented for IBM XLC. With this patch error messages will be issued for invalid uses of vector bool types in binary operators. vector bool types are being marked opaque in order to prevent the front-end from complaining about vector bool long vs vector bool long long combinations on 64 bit. The opaque flag basically suppresses any type checking. However, we still want vector bool to be accepted only in contexts specified in the documentation (to be published soon). Implementing the invalid binary op hook does this for binary operators at least. But this is far from being complete :( gcc/ * config/s390/s390.c (s390_vector_bool_type_p): New function. (s390_invalid_binary_op): New function. (TARGET_INVALID_BINARY_OP): Define macro. --- gcc/config/s390/s390.c | 61 1 file changed, 61 insertions(+) diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index e1ae1ed..a64836e 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -13764,6 +13764,64 @@ s390_asm_file_end (void) #endif } +/* Return true if TYPE is a vector bool type. */ +static inline bool +s390_vector_bool_type_p (const_tree type) +{ + return TYPE_VECTOR_OPAQUE (type); +} + +/* Return the diagnostic message string if the binary operation OP is + not permitted on TYPE1 and TYPE2, NULL otherwise. */ +static const char* +s390_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1, const_tree type2) +{ + bool bool1_p, bool2_p; + bool plusminus_p; + bool muldiv_p; + bool compare_p; + machine_mode mode1, mode2; + + if (!TARGET_ZVECTOR) +return NULL; + + if (!VECTOR_TYPE_P (type1) || !VECTOR_TYPE_P (type2)) +return NULL; + + bool1_p = s390_vector_bool_type_p (type1); + bool2_p = s390_vector_bool_type_p (type2); + + /* Mixing signed and unsigned types is forbidden for all + operators. */ + if (!bool1_p !bool2_p + TYPE_UNSIGNED (type1) != TYPE_UNSIGNED (type2)) +return N_(types differ in signess); + + plusminus_p = (op == PLUS_EXPR || op == MINUS_EXPR); + muldiv_p = (op == MULT_EXPR || op == RDIV_EXPR || op == TRUNC_DIV_EXPR + || op == CEIL_DIV_EXPR || op == FLOOR_DIV_EXPR + || op == ROUND_DIV_EXPR); + compare_p = (op == LT_EXPR || op == LE_EXPR || op == GT_EXPR || op == GE_EXPR + || op == EQ_EXPR || op == NE_EXPR); + + if (bool1_p bool2_p (plusminus_p || muldiv_p)) +return N_(binary operator does not support two vector bool operands); + + if (bool1_p != bool2_p (muldiv_p || compare_p)) +return N_(binary operator does not support vector bool operand); + + mode1 = TYPE_MODE (type1); + mode2 = TYPE_MODE (type2); + + if (bool1_p != bool2_p plusminus_p + (GET_MODE_CLASS (mode1) == MODE_VECTOR_FLOAT + || GET_MODE_CLASS (mode2) == MODE_VECTOR_FLOAT)) +return N_(binary operator does not support mixing vector + bool with floating point vector operands); + + return NULL; +} + /* Initialize GCC target structure. */ #undef TARGET_ASM_ALIGNED_HI_OP @@ -13981,6 +14039,9 @@ s390_asm_file_end (void) #undef TARGET_ASM_FILE_END #define TARGET_ASM_FILE_END s390_asm_file_end +#undef TARGET_INVALID_BINARY_OP +#define TARGET_INVALID_BINARY_OP s390_invalid_binary_op + struct gcc_target targetm = TARGET_INITIALIZER; #include gt-s390.h -- 1.7.9.5
[PATCH 09/13] S/390 Add zvector testcases.
gcc/testsuite/ * gcc.target/s390/zvector/vec-dbl-math-compile-1.c: New test. * gcc.target/s390/zvector/vec-genbytemask-1.c: New test. * gcc.target/s390/zvector/vec-genmask-1.c: New test. * gcc.target/s390/zvector/vec-lcbb-1.c: New test. * gcc.target/s390/zvector/vec-overloading-1.c: New test. * gcc.target/s390/zvector/vec-overloading-2.c: New test. * gcc.target/s390/zvector/vec-overloading-3.c: New test. * gcc.target/s390/zvector/vec-overloading-4.c: New test. * gcc.target/s390/zvector/vec-test-mask-1.c: New test. * gcc.target/s390/zvector/vec-elem-1.c: New test. --- .../s390/zvector/vec-dbl-math-compile-1.c | 67 + gcc/testsuite/gcc.target/s390/zvector/vec-elem-1.c | 11 +++ .../gcc.target/s390/zvector/vec-genbytemask-1.c| 21 ++ .../gcc.target/s390/zvector/vec-genmask-1.c| 24 ++ gcc/testsuite/gcc.target/s390/zvector/vec-lcbb-1.c | 31 .../gcc.target/s390/zvector/vec-overloading-1.c| 77 .../gcc.target/s390/zvector/vec-overloading-2.c| 54 ++ .../gcc.target/s390/zvector/vec-overloading-3.c| 19 + .../gcc.target/s390/zvector/vec-overloading-4.c| 18 + .../gcc.target/s390/zvector/vec-test-mask-1.c | 25 +++ 10 files changed, 347 insertions(+) create mode 100644 gcc/testsuite/gcc.target/s390/zvector/vec-dbl-math-compile-1.c create mode 100644 gcc/testsuite/gcc.target/s390/zvector/vec-elem-1.c create mode 100644 gcc/testsuite/gcc.target/s390/zvector/vec-genbytemask-1.c create mode 100644 gcc/testsuite/gcc.target/s390/zvector/vec-genmask-1.c create mode 100644 gcc/testsuite/gcc.target/s390/zvector/vec-lcbb-1.c create mode 100644 gcc/testsuite/gcc.target/s390/zvector/vec-overloading-1.c create mode 100644 gcc/testsuite/gcc.target/s390/zvector/vec-overloading-2.c create mode 100644 gcc/testsuite/gcc.target/s390/zvector/vec-overloading-3.c create mode 100644 gcc/testsuite/gcc.target/s390/zvector/vec-overloading-4.c create mode 100644 gcc/testsuite/gcc.target/s390/zvector/vec-test-mask-1.c diff --git a/gcc/testsuite/gcc.target/s390/zvector/vec-dbl-math-compile-1.c b/gcc/testsuite/gcc.target/s390/zvector/vec-dbl-math-compile-1.c new file mode 100644 index 000..31b277b --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/zvector/vec-dbl-math-compile-1.c @@ -0,0 +1,67 @@ +/* { dg-do compile { target { s390*-*-* } } } */ +/* { dg-options -O3 -mzarch -march=z13 -mzvector --save-temps } */ + +/* { dg-final { scan-assembler-times vfcedb\t 1 } } */ +/* { dg-final { scan-assembler-times vfchdb\t 2 } } */ +/* { dg-final { scan-assembler-times vfchedb\t 2 } } */ + +/* { dg-final { scan-assembler-times vfcedbs\t 2 } } */ +/* { dg-final { scan-assembler-times vfchdbs\t 2 } } */ + +/* { dg-final { cleanup-saved-temps } } */ + +#include vecintrin.h + +vector bool long long +cmpeq (vector double a, vector double b) +{ + return vec_cmpeq (a, b); /* vfcedb */ +} + +vector bool long long +cmpgt (vector double a, vector double b) +{ + return vec_cmpgt (a, b); /* vfchdb */ +} + +vector bool long long +cmpge (vector double a, vector double b) +{ + return vec_cmpge (a, b); /* vfchedb */ +} + +vector bool long long +cmplt (vector double a, vector double b) +{ + return vec_cmplt (a, b); /* vfchdb */ +} + +vector bool long long +cmple (vector double a, vector double b) +{ + return vec_cmple (a, b); /* vfchedb */ +} + +int +all_eq (vector double a, vector double b) +{ + return vec_all_eq (a, b); +} + +int +any_eq (vector double a, vector double b) +{ + return vec_any_eq (a, b); +} + +int +all_lt (vector double a, vector double b) +{ + return vec_all_lt (a, b); +} + +int +any_lt (vector double a, vector double b) +{ + return vec_any_lt (a, b); +} diff --git a/gcc/testsuite/gcc.target/s390/zvector/vec-elem-1.c b/gcc/testsuite/gcc.target/s390/zvector/vec-elem-1.c new file mode 100644 index 000..c8578bf8 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/zvector/vec-elem-1.c @@ -0,0 +1,11 @@ +/* { dg-do compile { target { s390*-*-* } } } */ +/* { dg-options -O3 -mzarch -march=z13 -mzvector } */ + +/* { dg-final { scan-assembler nilf\t%r2,15 } } */ +/* { dg-final { scan-assembler vlgvb } } */ + +signed char +foo(unsigned char uc) +{ + return __builtin_s390_vec_extract((__vector signed char){ 0 }, uc); +} diff --git a/gcc/testsuite/gcc.target/s390/zvector/vec-genbytemask-1.c b/gcc/testsuite/gcc.target/s390/zvector/vec-genbytemask-1.c new file mode 100644 index 000..09471f8 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/zvector/vec-genbytemask-1.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options -O3 -mzarch -march=z13 -mzvector } */ + +#include vecintrin.h + + +vector unsigned char a, b, c, d; + +int +foo () +{ + a = vec_genmask (0); + b = vec_genmask (65535); + c = vec_genmask (43605); + d = vec_genmask (37830); +} + +/* { dg-final { scan-assembler-times vzero 1 } } */ +/* {
[PATCH 04/13] S/390 Add -march/-mtune=z13 option.
gcc/ * common/config/s390/s390-common.c (processor_flags_table): Add z13. * config.gcc: Add z13. * config/s390/s390-opts.h (enum processor_type): Add PROCESSOR_2964_Z13. * config/s390/s390.c (s390_adjust_priority): Check for PROCESSOR_2964_Z13. (s390_reorg): Likewise. (s390_sched_reorder): Likewise. (s390_sched_variable_issue): Likewise. (s390_loop_unroll_adjust): Likewise. (s390_option_override): Likewise. Default to -mvx when available. * config/s390/s390.h (enum processor_flags): Add PF_Z13 and PF_VX. (TARGET_CPU_Z13, TARGET_CPU_VX, TARGET_Z13, TARGET_VX) (TARGET_VX_ABI): Define macros. macros. (TARGET_DEFAULT): Add MASK_OPT_VX. * config/s390/s390.md (cpu attribute): Add z13. (cpu_facility attribute): Add vec. * config/s390/s390.opt (processor_type): Add z13. (mvx): New options. --- gcc/common/config/s390/s390-common.c |3 +++ gcc/config.gcc |2 +- gcc/config/s390/s390-opts.h |1 + gcc/config/s390/s390.c | 35 -- gcc/config/s390/s390.h | 19 -- gcc/config/s390/s390.md |8 ++-- gcc/config/s390/s390.opt |7 +++ 7 files changed, 64 insertions(+), 11 deletions(-) diff --git a/gcc/common/config/s390/s390-common.c b/gcc/common/config/s390/s390-common.c index 7181beb..43459c8 100644 --- a/gcc/common/config/s390/s390-common.c +++ b/gcc/common/config/s390/s390-common.c @@ -42,7 +42,10 @@ EXPORTED_CONST int processor_flags_table[] = /* z196 */ PF_IEEE_FLOAT | PF_ZARCH | PF_LONG_DISPLACEMENT | PF_EXTIMM | PF_DFP | PF_Z10 | PF_Z196, /* zEC12 */ PF_IEEE_FLOAT | PF_ZARCH | PF_LONG_DISPLACEMENT + | PF_EXTIMM | PF_DFP | PF_Z10 | PF_Z196 | PF_ZEC12 | PF_TX, +/* z13 */PF_IEEE_FLOAT | PF_ZARCH | PF_LONG_DISPLACEMENT | PF_EXTIMM | PF_DFP | PF_Z10 | PF_Z196 | PF_ZEC12 | PF_TX + | PF_Z13 | PF_VX }; /* Change optimizations to be performed, depending on the diff --git a/gcc/config.gcc b/gcc/config.gcc index a1df043..a2af2a0 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -4071,7 +4071,7 @@ case ${target} in for which in arch tune; do eval val=\$with_$which case ${val} in -| g5 | g6 | z900 | z990 | z9-109 | z9-ec | z10 | z196 | zEC12) +| g5 | g6 | z900 | z990 | z9-109 | z9-ec | z10 | z196 | zEC12 | z13) # OK ;; *) diff --git a/gcc/config/s390/s390-opts.h b/gcc/config/s390/s390-opts.h index cb9ebc7..5bde333 100644 --- a/gcc/config/s390/s390-opts.h +++ b/gcc/config/s390/s390-opts.h @@ -35,6 +35,7 @@ enum processor_type PROCESSOR_2097_Z10, PROCESSOR_2817_Z196, PROCESSOR_2827_ZEC12, + PROCESSOR_2964_Z13, PROCESSOR_max }; diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index cc37618..843a860 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -5851,7 +5851,8 @@ s390_adjust_priority (rtx_insn *insn, int priority) s390_tune != PROCESSOR_2094_Z9_109 s390_tune != PROCESSOR_2097_Z10 s390_tune != PROCESSOR_2817_Z196 - s390_tune != PROCESSOR_2827_ZEC12) + s390_tune != PROCESSOR_2827_ZEC12 + s390_tune != PROCESSOR_2964_Z13) return priority; switch (s390_safe_attr_type (insn)) @@ -11451,7 +11452,8 @@ s390_reorg (void) /* Walk over the insns and do some =z10 specific changes. */ if (s390_tune == PROCESSOR_2097_Z10 || s390_tune == PROCESSOR_2817_Z196 - || s390_tune == PROCESSOR_2827_ZEC12) + || s390_tune == PROCESSOR_2827_ZEC12 + || s390_tune == PROCESSOR_2964_Z13) { rtx_insn *insn; bool insn_added_p = false; @@ -11700,7 +11702,8 @@ s390_sched_reorder (FILE *file, int verbose, if (reload_completed *nreadyp 1) s390_z10_prevent_earlyload_conflicts (ready, nreadyp); - if (s390_tune == PROCESSOR_2827_ZEC12 + if ((s390_tune == PROCESSOR_2827_ZEC12 + || s390_tune == PROCESSOR_2964_Z13) reload_completed *nreadyp 1) { @@ -11783,7 +11786,8 @@ s390_sched_variable_issue (FILE *file, int verbose, rtx_insn *insn, int more) { last_scheduled_insn = insn; - if (s390_tune == PROCESSOR_2827_ZEC12 + if ((s390_tune == PROCESSOR_2827_ZEC12 + || s390_tune == PROCESSOR_2964_Z13) reload_completed recog_memoized (insn) = 0) { @@ -11863,7 +11867,8 @@ s390_loop_unroll_adjust (unsigned nunroll, struct loop *loop) if (s390_tune != PROCESSOR_2097_Z10 s390_tune != PROCESSOR_2817_Z196 - s390_tune != PROCESSOR_2827_ZEC12) + s390_tune != PROCESSOR_2827_ZEC12 + s390_tune != PROCESSOR_2964_Z13)
Re: [patch] Perform anonymous constant propagation during inlining
Would you be OK with a slight variation of your earlier idea, i.e. calling fold_stmt with a specific valueizer from fold_marked_statements instead of the implicit no_follow_ssa_edges in the inliner? Something like: tree follow_anonymous_single_use_edges (tree val) { if (TREE_CODE (val) == SSA_NAME (!SSA_NAME_VAR (val) || DECL_IGNORED_P (SSA_NAME_VAR (var))) has_single_use (val)) return val return NULL_TREE; } Yes, that works for me as well. But not for GCC. :-) The propagation per se works but, since the statement is not folded in the end, no replacement is made at all... So we're back to square one and anonymous constant propagation seems to be the only way out at -O0. The attached patch implements it in a less hackish way (and enables it unconditionally when optimizing as suggested by Jan) by doing it just before invoking fold_stmt on the marked statements so this should make folding more effective in the process. Tested (compiler only) on x86_64-suse-linux, OK for the mainline? 2015-05-11 Eric Botcazou ebotca...@adacore.com * tree-inline.c: Include tree-ssa-propagate.h. (replace_constant_uses_in): New function. (fold_marked_statements): Call it before folding the statement. * gimple-expr.h (is_gimple_constant): Reorder. * tree-ssa-propagate.c (before_dom_children): Use inline accessor. -- Eric BotcazouIndex: tree-ssa-propagate.c === --- tree-ssa-propagate.c (revision 222673) +++ tree-ssa-propagate.c (working copy) @@ -1246,9 +1246,7 @@ substitute_and_fold_dom_walker::before_d gimple_call_noreturn_p (stmt)) stmts_to_fixup.safe_push (stmt); - if (is_gimple_assign (stmt) - (get_gimple_rhs_class (gimple_assign_rhs_code (stmt)) - == GIMPLE_SINGLE_RHS)) + if (gimple_assign_single_p (stmt)) { tree rhs = gimple_assign_rhs1 (stmt); Index: tree-inline.c === --- tree-inline.c (revision 222673) +++ tree-inline.c (working copy) @@ -82,6 +82,7 @@ along with GCC; see the file COPYING3. #include expr.h #include tree-dfa.h #include tree-ssa.h +#include tree-ssa-propagate.h #include tree-pretty-print.h #include except.h #include debug.h @@ -4801,6 +4802,54 @@ gimple_expand_calls_inline (basic_block return inlined; } +/* Replace USE references in statement STMT with constant values. Return + true if at least one reference was replaced. + + We do it at -O0 for anonymous SSA names or SSA names of anonymous decls, + this makes it possible to generate reasonable code for operators that are + implemented as functions and inlined on constants. */ + +static bool +replace_constant_uses_in (gimple stmt) +{ + bool replaced = false; + use_operand_p use; + ssa_op_iter iter; + + FOR_EACH_SSA_USE_OPERAND (use, stmt, iter, SSA_OP_USE) +{ + tree tuse = USE_FROM_PTR (use); + + if (TREE_CODE (tuse) != SSA_NAME || SSA_NAME_IS_DEFAULT_DEF (tuse)) + continue; + + if (!optimize + SSA_NAME_VAR (tuse) + !DECL_IGNORED_P (SSA_NAME_VAR (tuse))) + continue; + + if (!gimple_assign_single_p (SSA_NAME_DEF_STMT (tuse))) + continue; + + tree rhs = gimple_assign_rhs1 (SSA_NAME_DEF_STMT (tuse)); + if (!is_gimple_constant (rhs)) + continue; + + propagate_value (use, rhs); + + replaced = true; +} + + if (replaced gimple_assign_single_p (stmt)) +{ + tree rhs = gimple_assign_rhs1 (stmt); + + if (TREE_CODE (rhs) == ADDR_EXPR) + recompute_tree_invariant_for_addr_expr (rhs); +} + + return replaced; +} /* Walk all basic blocks created after FIRST and try to fold every statement in the STATEMENTS pointer set. */ @@ -4819,7 +4868,10 @@ fold_marked_statements (int first, hash_ if (statements-contains (gsi_stmt (gsi))) { gimple old_stmt = gsi_stmt (gsi); - tree old_decl = is_gimple_call (old_stmt) ? gimple_call_fndecl (old_stmt) : 0; + tree old_decl + = is_gimple_call (old_stmt) + ? gimple_call_fndecl (old_stmt) : NULL_TREE; + bool replaced = replace_constant_uses_in (old_stmt); if (old_decl DECL_BUILT_IN (old_decl)) { @@ -4827,7 +4879,7 @@ fold_marked_statements (int first, hash_ we need to look at all of them. */ gimple_stmt_iterator i2 = gsi; gsi_prev (i2); - if (fold_stmt (gsi)) + if (fold_stmt (gsi) || replaced) { gimple new_stmt; /* If a builtin at the end of a bb folded into nothing, @@ -4871,7 +4923,7 @@ fold_marked_statements (int first, hash_ } } } - else if (fold_stmt (gsi)) + else if (fold_stmt (gsi) || replaced) { /* Re-read the statement from GSI as fold_stmt() may have changed it. */ Index: gimple-expr.h === --- gimple-expr.h (revision 222673)
Re: [PATCH] Fix memory leak in C++ pretty printer
On 11/05/15 03:34, Patrick Palka wrote: In gcc/cp/error.c we initialize the C++ pretty printer object twice: first during statics initialization and later in a placement-new in init_error(). This double-initialization causes a memory leak of about 7kb according to valgrind. I don't see a reason to initialize the object a second time so I elected to remove init_error(). I seem to remember there is some issue with the constructors when using static initialization that requires the placement-new. We also do the placement-new dance in the other FEs and the reason should be the same. My preference would be to replace the static with a pointer and placement-new with proper new and delete, but see: https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00910.html If you change it here, please change it everywhere else where we use placement-new, such that all FEs are consistent on this. Cheers, Manuel.
Re: [PATCH 01/13] recog: Increased max number of alternatives.
On Mon, May 11, 2015 at 03:23:29PM +0200, Andreas Krebbel wrote: With the vector facility support z13 mov patterns have more than 30 alternatives. Wow, that is a lot! --- a/gcc/recog.h +++ b/gcc/recog.h @@ -23,7 +23,7 @@ along with GCC; see the file COPYING3. If not see /* Random number that should be large enough for all purposes. Also define a type that has at least MAX_RECOG_ALTERNATIVES + 1 bits, with the extra bit giving an invalid value that can be used to mean uninitialized. */ -#define MAX_RECOG_ALTERNATIVES 30 +#define MAX_RECOG_ALTERNATIVES 35 typedef unsigned int alternative_mask; int isn't at least 36 bits. Segher
Re: [Patch, Fortran] Simplify lbound
Le 11/05/2015 00:08, Thomas Koenig a écrit : Am 10.05.2015 um 22:43 schrieb H.J. Lu: Here is what I have committed. It caused: /export/gnu/import/git/sources/gcc/gcc/testsuite/gfortran.dg/inline_matmul_3.f90:38:39: Error: Variable 'c1' cannot appear in the expression at (1)^M I know that error message, I got it when developing the inline matmul patches with the same test cases. I had a fix for this error message in one of my matmul patches, but it was removed in the review process because it could no longer be reproduced. So, here is the fix again. I think it is close to obvious (since it fixes the problem and can obviously do no harm), but anyway: OK for trunk? For what it's worth, I have looked at it further, and it seems to be gfc_current_ns not being set to the internal namespace. A patch like this also removes the error. Index: frontend-passes.c === --- frontend-passes.c (révision 223002) +++ frontend-passes.c (copie de travail) @@ -581,6 +581,9 @@ insert_block () else ns = inserted_block-ext.block.ns; + /* From now on, everything will happen in the inserted block. */ + gfc_current_ns = ns; + return ns; } To be honest, both patches look fragile to me. Yours because it leaves gfc_current_ns to its value, leaving the door open to other problems. Mine, well, because it's playing with a global variable, with the possible side-effects this could have. However, without a better idea, I'm OK with either patch (or both). Mikael
[PATCH 08/13] S/390 zvector builtin support.
With this patch GCC implements an Altivec style set of builtins to make use of vector instructions in C/C++ code. This is provided for compatibility with the IBM XL compiler. gcc/ * config.gcc: Add vecintrin.h to extra_headers. Add s390-c.o to c_target_objs and cxx_target_objs. Add t-s390 to tmake_file. * config/s390/s390-builtin-types.def: New file. * config/s390/s390-builtins.def: New file. * config/s390/s390-builtins.h: New file. * config/s390/s390-c.c: New file. * config/s390/s390-modes.def: Add modes CCVEQANY, CCVH, CCVHANY, CCVHU, CCVHUANY, CCVFHANY, CCVFHEANY. * config/s390/s390-protos.h (s390_expand_vec_compare_cc) (s390_cpu_cpp_builtins, s390_register_target_pragmas): Add prototypes. * config/s390/s390.c (s390-builtins.h, s390-builtins.def): Include. (flags_builtin, flags_overloaded_builtin_var, s390_builtin_types) (s390_builtin_fn_types, s390_builtin_decls, code_for_builtin): New variable definitions. (s390_const_operand_ok): New function. (s390_expand_builtin): Rewrite. (s390_init_builtins): New function. (s390_handle_vectorbool_attribute): New function. (s390_attribute_table): Add s390_vector_bool attribute. (s390_match_ccmode_set): Handle new cc modes CCVH, CCVHU. (s390_branch_condition_mask): Generate masks for new modes. (s390_expand_vec_compare_cc): New function. (s390_mangle_type): Add mangling for vector bool types. (enum s390_builtin): Remove. (s390_atomic_assign_expand_fenv): Rename constants for sfpc and efpc builtins. * config/s390/s390.h (TARGET_CPU_CPP_BUILTINS): Call s390_cpu_cpp_builtins. (REGISTER_TARGET_PRAGMAS): New macro. * config/s390/s390.md: Define more UNSPEC_VEC_* constants. (insn_cmp mode attribute): Add new CC modes. (s390_sfpc, s390_efpc): Rename patterns to sfpc and efpc. (lcbb): New pattern definition. * config/s390/s390intrin.h: Include vecintrin.h. * config/s390/t-s390: New file. * config/s390/vecintrin.h: New file. * config/s390/vector.md: Include vx-builtins.md. * config/s390/vx-builtins.md: New file.S/390 zvector builtin support. --- gcc/config.gcc | 24 +- gcc/config/s390/s390-builtin-types.def | 747 ++ gcc/config/s390/s390-builtins.def | 2486 gcc/config/s390/s390-builtins.h| 160 ++ gcc/config/s390/s390-c.c | 907 gcc/config/s390/s390-modes.def | 30 + gcc/config/s390/s390-protos.h |8 + gcc/config/s390/s390.c | 833 --- gcc/config/s390/s390.h | 27 +- gcc/config/s390/s390.md| 118 +- gcc/config/s390/s390.opt |4 + gcc/config/s390/s390intrin.h |3 + gcc/config/s390/t-s390 | 27 + gcc/config/s390/vecintrin.h| 311 gcc/config/s390/vector.md |2 + gcc/config/s390/vx-builtins.md | 2081 ++ 16 files changed, 7494 insertions(+), 274 deletions(-) create mode 100644 gcc/config/s390/s390-builtin-types.def create mode 100644 gcc/config/s390/s390-builtins.def create mode 100644 gcc/config/s390/s390-builtins.h create mode 100644 gcc/config/s390/s390-c.c create mode 100644 gcc/config/s390/t-s390 create mode 100644 gcc/config/s390/vecintrin.h create mode 100644 gcc/config/s390/vx-builtins.md 0008-S-390-zvector-builtin-support.patch.gz Description: Binary data
Re: C PATCH for -Wshift-negative-value (PR c/66066)
On 11/05/15 16:21, Marek Polacek wrote: The -Wshift-negative-value patch caused grief since it breaks building some programs. The following patch should alleviate the pain a bit: mark a left shift of a negative value as non-const only if pedantic. Either this is not correct according to the guidelines (the flag pedantic should not cause generated code differences or errors, https://gcc.gnu.org/wiki/DiagnosticsGuidelines) or the guidelines need updating. My take is that this should work exactly like OPT_Woverflow (see constant_expression_warning): !flag_isoc99 || !constant_required should produce a warning() with OPT_Wshift_negative_value flag_isoc99 constant_required should produce a pedwarn() using (pedantic ? OPT_Wpedantic : OPT_Wshift_negative_value). The above means that we only produce errors with -std=c99|gnu99 -pedantic-errors in the places that are mandated by the standard. Otherwise, there are no code or error differences and at most there are warnings that are controlled by -Wshift-negative-value for those that want to disable them. Of course, the problem here is that we are folding this even before we know if a constant is actually required by the standard, no? Is there no way to move the warning to the point where we actually know whether a constant is required or not? Cheers, Manuel.
Re: [PATCH][PR66010] Don't take address of ap unless necessary
On 11-05-15 09:47, Richard Biener wrote: Bootstrapped and reg-tested on x86_64, with and without -m32. OK for trunk? [ FWIW, I suspect this patch will make life easier for the reimplementation of the pass_stdarg optimization using ifn_va_arg. ] + if (canon_va_type != NULL) +{ + if (!(TREE_CODE (canon_va_type) == ARRAY_TYPE +TREE_CODE (va_type) != ARRAY_TYPE)) + /* In gimplify_va_arg_expr we take the address of the ap argument, mark + it addressable now. */ + mark_addressable (expr); can we simplify this and ... - } - + gcc_assert (TREE_CODE (TREE_TYPE (valist)) != ARRAY_TYPE); gimplify_expr (valist, pre_p, post_p, is_gimple_val, fb_rvalue); this to use [!]POINTER_TYPE_P ()? I'm not sure. Why do we arrive with non-array va_type but array canon_va_type in build_va_arg? grokdeclarator in c-decl.c: ... /* A parameter declared as an array of T is really a pointer to T. One declared as a function is really a pointer to a function. */ if (TREE_CODE (type) == ARRAY_TYPE) { /* Transfer const-ness of array into that of type pointed to. */ type = TREE_TYPE (type); if (type_quals) type = c_build_qualified_type (type, type_quals); type = c_build_pointer_type (type); ... I suppose the va_list argument already decayed to a pointer then The above means that the va_list function parameter decayed to a pointer. AFAIU, the va_list argument to va_arg just uses the same type (for parsing, grep for RID_VA_ARG in c-parser.c). (in which case the argument should already be addressable?)? The argument is of pointer type. That pointer-typed-argument will only be addressable if we take the address of it, which is precisely the thing we're trying to avoid in this patch. I think the overall idea of the patch is good - I'm just worried about special-casing of ARRAY_TYPE vs. non-pointer-type (because it's the latter that we ultimately want...). AFAIU, the special casing of ARRAY_TYPE in the patch is a consequence of the special-casing of ARRAY_TYPE as a parameter. I don't see how [!]POINTER_TYPE_P () can help here. I've rewritten and attached the build_va_arg bit using POINTER_TYPE_P and expanded comments a bit to demonstrate. Thanks, - Tom diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c index 378f237..95cf69b 100644 --- a/gcc/c-family/c-common.c +++ b/gcc/c-family/c-common.c @@ -5918,9 +5918,45 @@ set_compound_literal_name (tree decl) tree build_va_arg (location_t loc, tree expr, tree type) { - /* In gimplify_va_arg_expr we take the address of the ap argument, mark it - addressable now. */ - mark_addressable (expr); + tree va_type = TREE_TYPE (expr); + tree canon_va_type = (va_type == error_mark_node + ? NULL_TREE + : targetm.canonical_va_list_type (va_type)); + + if (va_type == error_mark_node + || canon_va_type == NULL_TREE) +/* Let's not bother about addressable or not, if expr: + - has parse errors, or + - is not an va_list type. */ +; + else +{ + if (POINTER_TYPE_P (va_type)) + { + /* If it's a pointer type, there are two possibilities: + - expr is a va_list param decl that is declared as an array type + but was turned into a pointer type by grok_declarator. + - expr is a va_list decl declared as pointer type. + Detect the former case by looking at the canonical type. */ + if (TREE_CODE (canon_va_type) == ARRAY_TYPE) + /* We know that the pointer is constant, so there's no need to modify + it, so there's no need to pass it around using an address + operator, so there's no need to mark it addressable. */ + ; + else + /* The type is an actual va_list. It might be modified by va_arg, so + we need to pass it around using an address operator, so we need + to mark it addressable. */ + mark_addressable (expr); + } + else + { + /* The type is an actual va_list. It might be modified by va_arg, so + we need to pass it around using an address operator, so we need to + mark it addressable. */ + mark_addressable (expr); + } +} expr = build1 (VA_ARG_EXPR, type, expr); SET_EXPR_LOCATION (expr, loc);
C PATCH for -Wshift-negative-value (PR c/66066)
The -Wshift-negative-value patch caused grief since it breaks building some programs. The following patch should alleviate the pain a bit: mark a left shift of a negative value as non-const only if pedantic. Bootstrapped/regtested on x86_64-linux, ok for trunk? 2015-05-11 Marek Polacek pola...@redhat.com PR c/66066 * c-typeck.c (build_binary_op): Mark left shift of a negative value as non-const only if pedantic. * gcc.dg/c99-left-shift-2.c: New test. diff --git gcc/c/c-typeck.c gcc/c/c-typeck.c index 3fcb7c2..05b2709 100644 --- gcc/c/c-typeck.c +++ gcc/c/c-typeck.c @@ -10702,7 +10702,7 @@ build_binary_op (location_t location, enum tree_code code, { /* Don't reject a left shift of a negative value in a context where a constant expression is needed in C90. */ - if (flag_isoc99) + if (pedantic flag_isoc99) int_const = false; if (c_inhibit_evaluation_warnings == 0) warning_at (location, OPT_Wshift_negative_value, diff --git gcc/testsuite/gcc.dg/c99-left-shift-2.c gcc/testsuite/gcc.dg/c99-left-shift-2.c index e69de29..a4cd9d0 100644 --- gcc/testsuite/gcc.dg/c99-left-shift-2.c +++ gcc/testsuite/gcc.dg/c99-left-shift-2.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options -std=iso9899:1999 } */ + +enum E { A = -2 1 }; +int i = -1 0; + +int +f (int i) +{ + switch (i) + case -1 0: break; +} Marek
Re: [PATCH] Expand PIC calls without PLT with -fno-plt
On Mon, May 11, 2015 at 01:48:03PM +0200, Michael Matz wrote: Hi, On Wed, 6 May 2015, Rich Felker wrote: I don't see how this case is improved unless GCC is failing to consider strong definitions in the same TU as locally-binding. Interposition of non-static non-inline non-weak symbols is supported independend of if they are defined in the same TU or not (if you're producing a shared lib, that is). I.e. no, they are not considered locally-binding (for instance, they aren't automatically inlined). If this is the case, is there a reason for that behavior? Because IMHO interposition is orthogonal to TU placement, and hence shouldn't be influenced by it. There's visibility, inline hints or static-ness to achieve different effects. (perhaps the real reason is: because it always worked like that :) ) IMO it's wrong. Why? I think it's right. I see it as an unnecessary pessimization. The ELF shared library semantics for allowing interposition were designed to avoid behavioral regressions versus static linking, and this is not such a case. With an archive-type library, it's possible to cause whole TUs to be omitted when linking as long as whatever symbol(s) may have been needed from them are already defined elsewhere; interposition makes the same possible with dynamic linking. But if symbols A and B were both in the same TU, having A defined prior to searching an archive but B undefined will cause the TU that defines both to be pulled in, and is such a linking error (multiple definitions). So I'm not sure why it's desirable to support this. The it always worked like that argument may suffice if people are depending on this behavior now (OTOH I'd rather see it break so they fix their breakage of static linking) but I suspect the historical reason it worked like that is that compilers were not smart enough to process whole TUs at a time but just worked with one function at a time and did not know that a referenced symbol was in the same TU. BTW visibility can't really address the issue except with hacks (hidden aliases) or protected visibility (which is hard to use because it's broken on lots of toolchain versions). Rich
Re: [PATCH 01/13] recog: Increased max number of alternatives.
On 05/11/2015 04:01 PM, Segher Boessenkool wrote: On Mon, May 11, 2015 at 03:23:29PM +0200, Andreas Krebbel wrote: With the vector facility support z13 mov patterns have more than 30 alternatives. Wow, that is a lot! --- a/gcc/recog.h +++ b/gcc/recog.h @@ -23,7 +23,7 @@ along with GCC; see the file COPYING3. If not see /* Random number that should be large enough for all purposes. Also define a type that has at least MAX_RECOG_ALTERNATIVES + 1 bits, with the extra bit giving an invalid value that can be used to mean uninitialized. */ -#define MAX_RECOG_ALTERNATIVES 30 +#define MAX_RECOG_ALTERNATIVES 35 typedef unsigned int alternative_mask; int isn't at least 36 bits. Right. That should be unsigned HOST_WIDE_INT instead. Thanks! -Andreas-
[wwwdocs] Update changes.html
Ok to commit? Index: changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v retrieving revision 1.3 diff -u -r1.3 changes.html --- changes.html6 May 2015 09:55:29 - 1.3 +++ changes.html11 May 2015 14:44:38 - @@ -16,15 +16,36 @@ !-- .. -- -!-- h2 id=generalGeneral Optimizer Improvements/h2 -- - +h2 id=generalGeneral Optimizer Improvements/h2 + ul +liUndefinedBehaviorSanitizer gained a new sanitization option: +ul + licode-fsanitize=bounds-strict/code: enables strict checking + of array bounds. In particular, it enables + code-fsanitize=bounds/code as well as instrumentation of + flexible array member-like arrays./li +/ul + /ul !-- .. -- h2 id=languagesNew Languages and Language specific improvements/h2 !-- h3 id=adaAda/h3 -- -!-- h3 id=c-familyC family/h3 -- +h3 id=c-familyC family/h3 + ul +liA new command-line option code-Wshift-negative-value/code has been + added for the C and C++ compilers, which warns about left shifting + a negative value./li + /ul + +h3 id=cC/h3 + ul +liIt is possible to disable warnings when an initialized field of + a structure or an union with side effects is being overridden when + using designated initializers via a new warning option + code-Woverride-init-side-effects/code./li + /ul h3 id=cxxC++/h3 Marek
Re: PATCH] PR target/65612: Multiversioning doesn't work with DSO nor PIE
On Mon, May 11, 2015 at 12:31:51PM +0200, Jakub Jelinek wrote: On Mon, May 11, 2015 at 11:20:15AM +0100, Szabolcs Nagy wrote: On 09/05/15 19:57, Szabolcs Nagy wrote: * H.J. Lu hjl.to...@gmail.com [2015-05-09 10:41:41 -0700]: There are 4: 2b70 806 FUNCGLOBAL DEFAULT 12 __cpu_indicator_init@GCC_4.8.0 38: 002153d016 OBJECT GLOBAL DEFAULT 25 __cpu_model@GCC_4.8.0 and 0215000 00040001 R_X86_64_64 2b70 __cpu_indicator_init@GCC_4.8.0 + 0 00215220 00260006 R_X86_64_GLOB_DAT 002153d0 __cpu_model@GCC_4.8.0 + 0 in libgcc_s.so.1. Musl ld.so must be fixed to handle it. Rich is looking at how to do this non-intrusively, but it seems non-trivial (some users of musl prefer not to resolve such versioned symbols). i think it might be enough to add __cpu_indicator_init_local as an alias to __cpu_indicator_init in libgcc.a and then use the *_local symbol from the ifunc resolver, that way no new dependency is added to libgcc_s.so handling. i tried this approach and it seems to work: passes all multiversioning tests on x86_64. i think it's no worse than the symver approach. is it ok to change the current fix to this? No. Instead of piling hacks like this just fix it in musl. I wouldn't call it piling hacks; it's an improvement as far as I can tell since it remove symbolic relocations and replaces them with relative ones. libgcc certainly isn't the only library that uses @ symbol versions, e.g. libstdc++ does as well, as well as many other shared libraries. We haven't encountered such issues there. Rich
False ODR violation positives on anonymous namespace types
Jason, I got my firefox tree building again and noticed that my patch to enable ODR merging on non-class types caused false positives: /aux/hubicka/trunk-install/include/c++/6.0.0/ext/new_allocator.h:66:26: warning: type ‘(anonymous namespace)::ObservationWithStack const’ violates one definition rule [-Wodr] typedef const _Tp const_reference; ^ /aux/hubicka/trunk-install/include/c++/6.0.0/ext/alloc_traits.h:110:53: note: it is defined as a pointer to different type in another translation unit typedef const value_type const_reference; ^ /aux/hubicka/firefox8/xpcom/build/MainThreadIOLogger.cpp:28:8: note: type ‘const struct ObservationWithStack’ should match type ‘const struct value_type’ struct ObservationWithStack ^ /aux/hubicka/trunk-install/include/c++/6.0.0/ext/alloc_traits.h:103:53: note: the incompatible type is defined here typedef typename _Base_type::value_type value_type; ^ Here the obvious problem is that we try to merge type in anonymous namespace. This is because at LTO time odr_type_p returns true and type_in_anonymous_namespace returns false. odr_type_p basically checks that we computed mangled type name for it which is done in free_lang_data as follows: { /* We use DECL_ASSEMBLER_NAME to hold mangled type names for One Definition Rule merging. */ if (flag_lto_odr_type_mering TREE_CODE (decl) == TYPE_DECL DECL_NAME (decl) decl == TYPE_NAME (TREE_TYPE (decl)) !is_lang_specific (TREE_TYPE (decl)) /* Save some work. Names of builtin types are always derived from properties of its main variant. A special case are integer types where mangling do make differences between char/signed char/unsigned char etc. Storing name for these makes e.g. -fno-signed-char/-fsigned-char mismatches to be handled well. See cp/mangle.c:write_builtin_type for details. */ (TREE_CODE (TREE_TYPE (decl)) != VOID_TYPE TREE_CODE (TREE_TYPE (decl)) != BOOLEAN_TYPE TREE_CODE (TREE_TYPE (decl)) != REAL_TYPE TREE_CODE (TREE_TYPE (decl)) != FIXED_POINT_TYPE) !TYPE_ARTIFICIAL (TREE_TYPE (decl)) !variably_modified_type_p (TREE_TYPE (decl), NULL_TREE) !type_in_anonymous_namespace_p (TREE_TYPE (decl))) return !DECL_ASSEMBLER_NAME_SET_P (decl); and obviously we are not intended to mangle tyhis type and it should be caught by type_in_anonymous_namespace_p check: bool type_in_anonymous_namespace_p (const_tree t) { /* TREE_PUBLIC of TYPE_STUB_DECL may not be properly set for bulitin types; those have CONTEXT NULL. */ if (!TYPE_CONTEXT (t)) return false; return (TYPE_STUB_DECL (t) !TREE_PUBLIC (TYPE_STUB_DECL (t))); } We already discussed earlier that type_in_anonymous_namespace_p is not working on compund types, because these do not have TYPE_STUB_DECL. I tought those are !TYPE_NAME types. What is reason for !TYPE_NAME type with no TYPE_STUB_DECL? Is it always a compound type with typedef name? One possible solution is to avoid producing mangled names so these won't be odr_type_p at LTO time: Index: tree.c === --- tree.c (revision 222991) +++ tree.c (working copy) @@ -5173,6 +5173,7 @@ need_assembler_name_p (tree decl) TREE_CODE (TREE_TYPE (decl)) != REAL_TYPE TREE_CODE (TREE_TYPE (decl)) != FIXED_POINT_TYPE) !TYPE_ARTIFICIAL (TREE_TYPE (decl)) + TYPE_STUB_DECL (TREE_TYPE (decl)) !variably_modified_type_p (TREE_TYPE (decl), NULL_TREE) !type_in_anonymous_namespace_p (TREE_TYPE (decl))) return !DECL_ASSEMBLER_NAME_SET_P (decl); which silence the warnings, but I think it also may cause false negatives. Ohter option is to make type_in_anonymous_namespace_p work. The types in question are anonymous because they are defined within anonymous template, would the following work in type_in_anonymous_namespace_p? + /* Types (such as pointer_type) defined within class types may not + have their own TYPE_STUB_DECL. Look for the outer type. */ + while (!TYPE_STUB_DECL (t) TYPE_NAME (t) + TREE_CODE (TYPE_NAME (t)) == TYPE_DECL + DECL_CONTEXT (TYPE_NAME (t)) + TYPE_P (DECL_CONTEXT (TYPE_NAME (t +t = DECL_CONTEXT (TYPE_NAME (t)); Last option would be make anonymous_namespace_p to walk to compound type bases. I made variant of this in https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00880.html (odr_or_derived_type_p, modifying type_in_anonymous_namespace_p to work same way is definitly easy) What would be a preffered fix for this? Honza
Re: [PATCH 4/6] rs6000: Delete some now-superfluous zero_ext{end,ract} patterns
On Sun, May 10, 2015 at 12:13 PM, Segher Boessenkool seg...@kernel.crashing.org wrote: After the change to combine, we no longer need all the special-case patterns. Tested on powerpc64-linux, as usual. As mentioned with the combine patch, there are no differences to generated code in cc1. This does not delete DImode lshiftrt patterns, because those do not exist (yet). Is this okay for trunk? Segher [ Save the planet! Delete 11% of your machine descriptions! ] 2015-05-10 Segher Boessenkool seg...@kernel.crashing.org * config/rs6000/rs6000.md (extzv): FAIL for SImode. (extzvsi_internal, *extzvsi_internal1, *extzvsi_internal2, *rotlsi3_internal7le, *rotlsi3_internal7be, *rotlsi3_internal8le, *rotlsi3_internal8be, *rotlsi3_internal9le, *rotlsi3_internal9be, *rotlsi3_internal10le, *rotlsi3_internal10be, *rotlsi3_internal11le, *rotlsi3_internal11be, *rotlsi3_internal12le, *rotlsi3_internal12be, *lshiftrt_internal1le, *lshiftrt_internal1be, *lshiftrt_internal2le, *lshiftrt_internal2be, *lshiftrt_internal3le, *lshiftrt_internal3be, *lshiftrt_internal4le, *lshiftrt_internal4be, *lshiftrt_internal5le, *lshiftrt_internal5be, *lshiftrt_internal5le, *lshiftrt_internal5be, *rotldi3_internal7le, *rotldi3_internal7be, *rotldi3_internal8le, *rotldi3_internal8be, *rotldi3_internal9le, *rotldi3_internal9be, *rotldi3_internal10le, *rotldi3_internal10be, *rotldi3_internal11le, *rotldi3_internal11be, *rotldi3_internal12le, *rotldi3_internal12be, *rotldi3_internal13le, *rotldi3_internal13be, *rotldi3_internal14le, *rotldi3_internal14be, *rotldi3_internal15le, *rotldi3_internal15be, and 30 corresponding splitters): Delete. Okay. Thanks, David
Re: [PATCH 5/6] rs6000: Don't use gen_rlwinm
On Sun, May 10, 2015 at 12:13 PM, Segher Boessenkool seg...@kernel.crashing.org wrote: The next patch will rename the rlwinm pattern (as well as the other patterns that implement rlwinm, now unnamed). The only place that uses gen_rlwinm (an expander) is better off expanding the separate operations separately. Do so. Okay for trunk? Segher 2015-05-10 Segher Boessenkool seg...@kernel.crashing.org * config/rs6000/rs6000.md (rs6000_adjust_atomic_subword): Use gen_ashlsi3 and gen_andsi3 instead of gen_rlwinm. Okay. Thanks, David
Re: PATCH] PR target/65612: Multiversioning doesn't work with DSO nor PIE
On 11/05/15 14:05, Jakub Jelinek wrote: On Mon, May 11, 2015 at 01:39:17PM +0100, Szabolcs Nagy wrote: fyi, musl loader loads libstdc++ just fine because it has no But will it load any shared library that uses any of the 26 (if I count well on x86_64) @ symbols from libstdc++.so.6? i looked, but all of those symbols have @@ variant too so at least the libraries would load with musl. musl may end up supporting @version but that's an independent quest. It is not independent. If musl claims to support symbol versioning, it should support it properly, if not, then supposedly gcc configured for musl can't be compatible with gcc configured for other linux C libraries, and should force symbol versioning off. ok, but the current solution does not make that easy: configuring gcc with --disable-gnu-indirect-function --disable-symvers has no effect on libgcc_s.so.1 on linux. (i can try to create a patch that removes the new -DUSE_ELF_SYMVER from the libgcc cflags for musl, but that seems a worse solution than the weak alias one). (note that previously a simple spec file was enough to use an existing gcc on linux to build things against musl... this didnt work for c++ code that used libstdc++ headers, but now it also fails for any build using -lgcc_s).
Re: [PATCH 6/6] rs6000: Clean up the various rlwinm patterns
On Sun, May 10, 2015 at 12:13 PM, Segher Boessenkool seg...@kernel.crashing.org wrote: Some cleanups: * Give every define_insn a name; * Add missing conditions for some of the dot forms; * Use define_insn_and_split to reduce duplication; * Renumber operands so 0,1,2,3 are the actual operands of the machine instruction, in order; * Reformat some patterns. Is this okay for trunk? Segher 2015-05-10 Segher Boessenkool seg...@kernel.crashing.org * config/rs6000/rs6000.md (*rotlsi3_internal4, *rotlsi3_internal5, *rotlsi3_internal6, rlwinm, 5 unnamed define_insns, and 6 define_splits): Delete. (*rotlsi3_mask, *rotlsi3_mask_dot, *rotlsi3_mask_dot2, *ashlsi3_imm_mask, *ashlsi3_imm_mask_dot, *ashlsi3_imm_mask_dot2, *lshrsi3_imm_mask, *lshrsi3_imm_mask_dot, *lshrsi3_imm_mask_dot2): New. Okay. Thanks, David
Re: [PATCH 3/6] rs6000: Don't use zero_extract in the bswap:HI splitter
On Sun, May 10, 2015 at 12:13 PM, Segher Boessenkool seg...@kernel.crashing.org wrote: The next patch removes the zero_extract insn this splits to. Write it as (and (lshiftrt ... instead. Okay for trunk? Segher 2015-05-10 Segher Boessenkool seg...@kernel.crashing.org * config/rs6000/rs6000.md (define_split for bswaphi): Don't use zero_extract. Okay. Thanks, David
[patch, MIPS, testsuite] Fix gcc.target/mips/branch-1.c
The test gcc.target/mips/branch-1.c has started failing because it is trying to verify that each of 4 functions generates and 'andi' instruction and only finding 2 of them. With a recent change (fixing PR 65150) GCC determined that the f1 and f2 functions generate identical code and that the f3 and f4 functions generate identical code and so it replaced f1 with a tail call to f2 and f3 with a tail call to f4. Thus only 2 'andi' instructions show up and the test fails. There is an existing bug report (PR 65534) about whether or not this is a good optimization but I would like to fix this test so that that optimization can not happen since that is not what this test is intended to check for. My solution is to pass different arguments to bar() in each function so the code in each function is unique. Tested with mips-mti-linux-gnu, OK to checkin? Steve Ellcey sell...@imgtec.com 2015-05-11 Steve Ellcey sell...@mips.com * gcc.target/mips/branch-1.c: Pass argument to bar(). diff --git a/gcc/testsuite/gcc.target/mips/branch-1.c b/gcc/testsuite/gcc.target/mips/branch-1.c index 6ef50e8..83d7180 100644 --- a/gcc/testsuite/gcc.target/mips/branch-1.c +++ b/gcc/testsuite/gcc.target/mips/branch-1.c @@ -4,11 +4,11 @@ on zero. */ /* { dg-options forbid_cpu=octeon.* } */ -void bar (void); -NOMIPS16 void f1 (int x) { if (x 4) bar (); } -NOMIPS16 void f2 (int x) { if ((x 2) 1) bar (); } -NOMIPS16 void f3 (unsigned int x) { if (x 0x10) bar (); } -NOMIPS16 void f4 (unsigned int x) { if ((x 4) 1) bar (); } +void bar (int); +NOMIPS16 void f1 (int x) { if (x 4) bar (1); } +NOMIPS16 void f2 (int x) { if ((x 2) 1) bar (2); } +NOMIPS16 void f3 (unsigned int x) { if (x 0x10) bar (3); } +NOMIPS16 void f4 (unsigned int x) { if ((x 4) 1) bar (4); } /* { dg-final { scan-assembler \tandi\t.*\tandi\t.*\tandi\t.*\tandi\t } } */ /* { dg-final { scan-assembler-not \tsrl\t } } */ /* { dg-final { scan-assembler-not \tsra\t } } */
Re: [wwwdocs] Update changes.html
On Mon, May 11, 2015 at 06:04:48PM +0200, Gerald Pfeifer wrote: Hi Marek, On Mon, 11 May 2015, Marek Polacek wrote: Ok to commit? as maintainer I am happy for you to commit documentation/web changes without approval, though I am also happy to review. I was hoping you could glance over it before I commit the patch; and given the below, I'm glad you did ;). Here I'm wondering whether the nested list could just be merged into the primary item (and I'd omit the colon before enables). Agreed. If we have more new options, we can make a list out of it. a union. English can be tricky for us non-native speakers at times, since usually you would use an before u, except when that u is pronounced as a consonant or a syllable starting with one. And the u in union actually is prounced as you-nion. Yeah, I'm familiar with this rule, and I usually got that right. Not sure how I goofed it up this time. Is the following any better? Thanks, Index: changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v retrieving revision 1.3 diff -u -r1.3 changes.html --- changes.html6 May 2015 09:55:29 - 1.3 +++ changes.html11 May 2015 16:16:04 - @@ -16,15 +16,34 @@ !-- .. -- -!-- h2 id=generalGeneral Optimizer Improvements/h2 -- - +h2 id=generalGeneral Optimizer Improvements/h2 + ul +liUndefinedBehaviorSanitizer gained a new sanitization option, + code-fsanitize=bounds-strict/code, which enables strict checking + of array bounds. In particular, it enables + code-fsanitize=bounds/code as well as instrumentation of + flexible array member-like arrays. + /ul !-- .. -- h2 id=languagesNew Languages and Language specific improvements/h2 !-- h3 id=adaAda/h3 -- -!-- h3 id=c-familyC family/h3 -- +h3 id=c-familyC family/h3 + ul +liA new command-line option code-Wshift-negative-value/code has been + added for the C and C++ compilers, which warns about left shifting + a negative value./li + /ul + +h3 id=cC/h3 + ul +liIt is possible to disable warnings when an initialized field of + a structure or a union with side effects is being overridden when + using designated initializers via a new warning option + code-Woverride-init-side-effects/code./li + /ul h3 id=cxxC++/h3 Marek
[PATCH, i386]: Use insn PATTERN when looking for memory_references in ix86_loop_unroll_adjust
Hello! 2015-05-11 Uros Bizjak ubiz...@gmail.com * config/i386/i386.c (ix86_loop_unroll_adjust): Use PATTERN (insn) when looking for memory references. Tested on x86_64-linux-gnu {,-m32} and committed to mainline SVN. Uros. Index: config/i386/i386.c === --- config/i386/i386.c (revision 222999) +++ config/i386/i386.c (working copy) @@ -51446,7 +51446,7 @@ ix86_loop_unroll_adjust (unsigned nunroll, struct for (i = 0; i loop-num_nodes; i++) FOR_BB_INSNS (bbs[i], insn) if (NONDEBUG_INSN_P (insn)) - FOR_EACH_SUBRTX (iter, array, insn, NONCONST) + FOR_EACH_SUBRTX (iter, array, PATTERN (insn), NONCONST) if (const_rtx x = *iter) if (MEM_P (x)) {
Re: C PATCH for -Wshift-negative-value (PR c/66066)
On 05/11/2015 12:14 PM, Manuel López-Ibáñez wrote: On 11 May 2015 at 17:54, Marek Polacek pola...@redhat.com wrote: I'm sorry to sound so pedantic but we have faced this same issue in the past (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19976#c7) and the solution was to delay folding (https://gcc.gnu.org/ml/gcc-patches/2008-10/msg01061.html). And it's still the answer to many issues. There's an effort to delay folding of various things in the C++ front-end and once done, that effort may be extended to other things. Jeff
Re: [PATCH] Fix typo
On Sun, 10 May 2015 22:07:53 -0600, Jeff Law wrote: On 05/10/2015 03:00 PM, Paulo Matos wrote: Yes. This would fall under the obvious rule and can be committed without waiting for approvals. jeff Thanks. Committed. -- Paulo Matos
Re: C PATCH for -Wshift-negative-value (PR c/66066)
On 11 May 2015 at 17:54, Marek Polacek pola...@redhat.com wrote: The problem here isn't in the -Wshift-negative-value warning itself; the problem is with marking -1 0 as a non-constant: later on, we warn in a context where a constant expression is needed (initializer element is not a constant expression), and for e.g. int foo = -1 0 | 9; there's an error (initializer element is not constant). Yes, I understand. What I'm saying is that if folding was done at the moment that the constant is requested, this would be a case of pedwarn-if-pedantic, which is exactly what Joseph is discussing here for case (b): https://gcc.gnu.org/ml/gcc/2013-11/msg00253.html It doesn't matter that the error initializer element is not a constant expression is not given as long as an error (e.g., left shift of negative value is undefined according to ISO C99) is given with -pedantic-errors. I'm sorry to sound so pedantic but we have faced this same issue in the past (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19976#c7) and the solution was to delay folding (https://gcc.gnu.org/ml/gcc-patches/2008-10/msg01061.html). My change means that we wouldn't complain unless -pedantic (to not upset too many users). I'm not particularly fond of it, but it seems like the simplest solution. -Wpedantic (which is the same as -pedantic) should not give errors. Otherwise, it would mean that someone who uses -Wpedantic- -Wno-pedantic will silence the error, but someone that uses -Werror=pedantic -Wno-error=pedantic will still see an error (!). There is also no way to tell that error is related to -Wpedantic. But if this is the accepted solution, please at least test global_dc-pedantic_errors instead and put a big /* This is a hack to reject non-conforming programs with -pedantic-errors and accept them otherwise. See https://gcc.gnu.org/ml/gcc-patches/2008-10/msg01061.html and https://gcc.gnu.org/ml/gcc/2013-11/msg00253.html about to fix this properly. */ Cheers, Manuel.
Re: [PATCH] Add myself to MAINTAINERS
On Sun, 10 May 2015 21:08:04 +, Paulo Matos wrote: Somehow I never added myself to the MAINTAINERS file. Apologies for that. OK to commit? 2015-05-10 Paulo Matos pa...@matos-sorge.com * MAINTAINERS: Add myself as commit after approval. diff --git a/MAINTAINERS b/MAINTAINERS index 7dc4c8f..c5d6c99 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -483,6 +483,7 @@ David Malcolm dmalc...@redhat.com Mikhail Maltsev malts...@gmail.com Simon Martin simar...@users.sourceforge.net Ranjit Mathew rmat...@hotmail.com +Paulo Matospa...@matos-sorge.com Michael Matz m...@suse.de Greg McGaryg...@gnu.org Roland McGrath rol...@hack.frob.com Committed as obvious. -- Paulo Matos
Re: Fix pattern validation in genrecog
On 05/11/2015 03:05 AM, Richard Sandiford wrote: Thomas pointed out that, while I'd kept the code to validate patterns for things like missing modes, the code wasn't being used. Fixed with the patch below. I ended up reinstating the original code to create a single rtx pattern for a define_peephole2 sequence, so that it could be passed to validate_pattern as before. One of the reasons I got rid of the code originally was because I didn't like the idea of a sequence of define_peephole2 instructions being put into a PARALLEL -- there's nothing parallel about them, and it would be easy to confuse the result with a real parallel in a define_insn or define_split match. I therefore changed it to use SEQUENCE instead. (Not that it matters: nothing actually looks at the code.) I also threw in a check for zero-length define_peephole2s while there. This is enforced syntactically for other define_*s, but it's possible to write a define_peephole2 that just asks for scratch registers and doesn't actually match anything. Tested on x86_64-linux-gnu. OK to install? Thanks, Richard gcc/ * genrecog.c (match_pattern_1): Expect the pattern to be a SEQUENCE for define_peephole2s. (get_peephole2_pattern): New function. (main): Use it. Call validate_pattern. OK. jeff
Re: [PATCH i386] Extend sibcall peepholes to allow source in %eax
On Sun, 10 May 2015, Jan Hubicka wrote: On i386, peepholes that transform memory load and register-indirect jump into memory-indirect jump are overly restrictive in that they don't allow combining when the jump target is loaded into %eax, and the called function returns a value (also in %eax, so it's not dead after the call). Fix this by checking for same source and output register operands separately. OK? * config/i386/i386.md (sibcall_value_memory): Extend peepholes to allow memory address in %eax. (sibcall_value_pop_memory): Likewise. Why do we need the check for liveness after all? There is SIBLING_CALL_P (peep2_next_insn (1)) so we know that the function terminates by the call and there are no other uses of the value. Indeed. Uros, the peep2_reg_dead_p check was added by your patch as svn revision 211776, git commit e51f8b8fed. Would you agree that the check is not necessary for sibcalls as Honza explains? Would you approve a patch that removes it in the sibcall peepholes I modify in the patch under discussion? Don't we however need to check that operands[0] is not used by the call_insn as parameter of the call? I.e. something like void test(void (*callback ())) { callback(callback); } You need a pointer-to-pointer-to-function to trigger the peephole. Something like this: void foo() { void (**bar)(void*); asm(:=r(bar)); (*bar)(*bar); } I think instead of peep2_reg_dead_p we want to check that the parameter is not in CALL_INSN_FUNCTION_USAGE of the sibcall.. Playing with the above testcase I can't induce failure. It seems today GCC won't allocate the same register as callee address and one of the arguments. Do you want me to implement such a check anyway? Hmm, only way I can trigger same register is: void foo() { void (**bar)(void*); asm(:=r(bar)); register void (*var)(void *) asm(%eax); var=*bar; asm(:+r(var)); var(var); } removing the second asm makes CSE to forward propagae the memory operand to call that makes the call different from the register variable. Still I would check for that, but this is more Uros' area. Honza
Re: [PATCH] Fix memory leak in C++ pretty printer
On 05/11/2015 12:57 PM, Jason Merrill wrote: On 05/11/2015 08:03 AM, Manuel López-Ibáñez wrote: My preference would be to replace the static with a pointer and placement-new with proper new and delete Actually, on second thought, there really doesn't seem to be a need for that. The patch should be OK; if it doesn't work I'd like to know why. I think the existing pattern is just a holdover from the C days. Jason
Re: [PATCH 6/6] rs6000: Clean up the various rlwinm patterns
On Sun, 10 May 2015, Segher Boessenkool wrote: This clearly renames rather than removing the `rlwinm' pattern, please correctly reflect that in ChangeLog. Some other, unnamed patterns are given names rather than deleted as well, just as you've noted at the top. And none of the other changes are mentioned in your ChangeLog entry. The changelog says it deletes the patterns with the old names. Which it does. And it adds the ones with the new names. Which it does. No one said changelogs are useful ;-) No one? Well, I'm saying it now, then. :) Would you be able to split this change up further by any chance? I could, but it would be a lot of extra work. First, I haven't done those steps in order (they aren't steps at all): I take one pattern, and fix it up, then the next, etc. Yeah, I know the pain, it's usually how it happens. You start cleaning up things and then you find yourself having done a number conceptually independent changes that overlap one another in the source modified. I've been through it many times. The thing is this extra work of untangling is worth doing as it'll help the maintainer and other members of the community, including those investigating change history months if not years from now for one reason or another, understand what the consequences of conceptually individual changes are. Specifically which changes are obviously harmless (e.g. formatting changes or the addition of debug `*' pattern names) and which are potential trouble makers (e.g. the reordering of operands or folding `define_insn' and `define_split' into `define_insn_and_split'), that e.g. need to be taken into account when porting changes that may rely on them. When all the bits are lumped together, it's very difficult to decipher them and easy to miss something. So I've done such patch splitting and rearrangement many times, using various techniques. For example it's often faster and easier if you hand-edit the patch being split into two rather than trying to reproduce the intermediate stage in the source being patched. Other people may have other hints and tricks. Of course you'll want to keep the original final version of the file changed around and then you can compare it to the result of applying a patch series, to make sure both results are identical. But much worse: if you do tiny pattern changes like you suggest, I can guarantee you the patches _will_ mis-apply. All of the time :-( Well, if you get the line numbers recorded in patches right, which is how `diff' generates them, then it won't happen. And as a committer you can verify you applied your own patches correctly, by taking a diff against your original patched version before pushing the changes into the repo. Especially when other upstream source changes have since caused line numbers to shift. change addresses a single issue only. That would avoid problems with ChangeLog and make the review easier. This isn't the first patch like this, I've cleaned up most other PowerPC integer patterns already. It's enough work already. Often the changes cannot be separated, and even if they can you then need them in the fixed order you made for them, etc. Of course changes can be separated (if they cannot, then they weren't really conceptually separate in the first place), and you do need to get the order right. E.g. patches that are potentially of interest to older release branches need to go first, followed by ones that are meant to stay on trunk only. I'll leave it up to David to decide anyway as he's the port maintainer, and I'm only a member of the community who happened to poke at this port once or twice in the past. Maciej
Re: [C frontend] Fix construction of TYPE_STUB_DECL
Bootstrapped/regtested x86_64-linux, OK? Honza * c-decl.c (pushtag): Declare type as public. What I'm struggling with here is how do you know the stub decl is public? I realize these things are a bit special, but I don't see the C++ front-end doing anything similar. What am I missing? It is used by type_in_anonymous_namespace_p. C++ FE definitely sets them accordingly, but I have no idea how it is done. Jason, can you help? Honza jeff
Re: [PATCH, RFC]: Next stage1, refactoring: propagating rtx subclasses
On Mon, 11 May 2015, Mikhail Maltsev wrote: In general, is there a recommended set of targets that cover most conditionally compiled code? Also, the GCC Wiki mentions some automated See contrib/config-list.mk (note that some of those targets may have pre-existing build failures, and note that you need to start with a current trunk native compiler so that --enable-werror-always works; don't try to build all those cross compilers using an older GCC). -- Joseph S. Myers jos...@codesourcery.com
Re: C PATCH for -Wshift-negative-value (PR c/66066)
On Mon, 11 May 2015, Marek Polacek wrote: The problem here isn't in the -Wshift-negative-value warning itself; the problem is with marking -1 0 as a non-constant: later on, we warn in a context where a constant expression is needed (initializer element is not a constant expression), and for e.g. int foo = -1 0 | 9; there's an error (initializer element is not constant). For cases that are not integer constant expressions but can be folded to constants (e.g. where the unevaluated half of ?: contains a function call, which is not allowed in constant expressions), the general approach is a pedwarn-if-pedantic, e.g.: if (TREE_CODE (value) != INTEGER_CST) { value = c_fully_fold (value, false, NULL); if (TREE_CODE (value) == INTEGER_CST) pedwarn (loc, OPT_Wpedantic, enumerator value for %qE is not an integer constant expression, name); } if (TREE_CODE (value) != INTEGER_CST) { error (enumerator value for %qE is not an integer constant, name); value = 0; } (and similar for various cases other than enums - it turns out the Linux kernel has or had lots of cases that aren't strictly integer constant expressions but are used in contexts requiring such). So I think the issue here is that the above expression isn't folded to a constant by c_fully_fold. If the expression involves division by zero, say, clearly it's appropriate not to fold; no meaningful result value can be assigned. But for the above expression, a meaningful result can be assigned, so it would seem to make sense to fold. That is, it would make sense for c_fully_fold_internal to fold inside a C_MAYBE_CONST_EXPR with C_MAYBE_CONST_EXPR_INT_OPERANDS set, in the cases where meaningful results can be assigned. The rationale for not folding the contents of a C_MAYBE_CONST_EXPR in c_fully_fold is that folding already took place when the C_MAYBE_CONST_EXPR was created, and it's inefficient to potentially fold the same expression recursively many times. But that's only the case for normal C_MAYBE_CONST_EXPRs, not those with C_MAYBE_CONST_EXPR_INT_OPERANDS set that indicate something that could occur an unevaluated part of an integer constant expression. And in any case, if folding had already occurred, there would already be the potential for duplicate folding through the existing uses of remove_c_maybe_const_expr. Thus I think this issue should be addressed through folding in c_fully_fold_internal in this case, while watching carefully to make sure it doesn't reduce any other cases (division by zero, negative or out of range shift, overflows) to pedwarns-if-pedantic if they aren't already. There would then remain the matter of initializer element is not a constant expression being an unconditional pedwarn_init rather than a pedwarn-if-pedantic, so I think you'd still get warnings that couldn't be disabled for the above code (which would turn into errors with -Werror). The answer for that is probably to make this pedwarn_init use OPT_Wpedantic, consistent with the various other cases that handle things that aren't constant expressions but folded to constants. -- Joseph S. Myers jos...@codesourcery.com
Re: False ODR violation positives on anonymous namespace types
On 05/11/2015 01:05 PM, Jan Hubicka wrote: On 05/11/2015 12:46 PM, Jan Hubicka wrote: Well, my main motivatoin to extend from RECORD_OR_UNION_TYPE_P was to handle enums. But other case I would like to deal with are integer types - i.e. preserve difference between char/signed char/unsigned char/short/int/long/wchar in cases where they structurally coincide. In what context? Won't you get that from comparing e.g. the field types of two definitions of the same class? If one class define int foo; and other long foo; we currently do not complain about ODR on 32bit targets while I think we could. We certainly should. But that's a problem because foo is subject to the ODR. I don't see why you need to treat int as an ODR type to get checking for foo. What happens in LTO is that lto-symtab decide to merge the two declarations of foo. At this time it has chance to output the warning. For that it needs to be able to work out that these declarations are having different types, but because LTO merge canonical types on structural basis, types_compatible_p (long, int) will return true. The idea behind LTO canonical type computation is that it needs to safely work cross-language and if the two declarations came from C and fortran, the mismatch in type name is useless (see gimple_canonical_types_compatible_p) So what I want is to have odr_or_derived_type_p to return true on those types (because it sees they do have mangled name attached) and then call odr_types_equivalent_p which is the same comparer I use to output ODR waring about types and it will complain about mangled type names being different. This is already implemented in https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00870.html Honza Jason
Fix RTL checking failure in emit_pattern_{after|before}_setloc
This fixes an RTL checking failure I ran into while working on a change and it is latent in the pristine compiler. emit_pattern_{after|before}_setloc have: if (active_insn_p (after) !INSN_LOCATION (after)) if (active_insn_p (first) !INSN_LOCATION (first)) Now active_insn_p still has the FIXME: int active_insn_p (const_rtx insn) { return (CALL_P (insn) || JUMP_P (insn) || JUMP_TABLE_DATA_P (insn) /* FIXME */ || (NONJUMP_INSN_P (insn) (! reload_completed || (GET_CODE (PATTERN (insn)) != USE GET_CODE (PATTERN (insn)) != CLOBBER; } so if AFTER or FIRST are JUMP_TABLE_DATA_P, INSN_LOCATION is invoked on them and this triggers the RTL checking failure. Fixed thusly, tested on x86_64-suse-linux, applied on the mainline. 2015-05-11 Eric Botcazou ebotca...@adacore.com * emit-rtl.c (emit_pattern_after_setloc): Add missing guard. (emit_pattern_before_setloc): Likewise. -- Eric BotcazouIndex: emit-rtl.c === --- emit-rtl.c (revision 222673) +++ emit-rtl.c (working copy) @@ -4680,7 +4680,9 @@ emit_pattern_after_setloc (rtx pattern, after = NEXT_INSN (after); while (1) { - if (active_insn_p (after) !INSN_LOCATION (after)) + if (active_insn_p (after) + !JUMP_TABLE_DATA_P (after) /* FIXME */ + !INSN_LOCATION (after)) INSN_LOCATION (after) = loc; if (after == last) break; @@ -4791,7 +4793,9 @@ emit_pattern_before_setloc (rtx pattern, first = NEXT_INSN (first); while (1) { - if (active_insn_p (first) !INSN_LOCATION (first)) + if (active_insn_p (first) + !JUMP_TABLE_DATA_P (first) /* FIXME */ + !INSN_LOCATION (first)) INSN_LOCATION (first) = loc; if (first == last) break;
Re: [PATCH] combine: Don't create (set (reg:CC) (compare (reg:CC) (const0)))
On Mon, May 11, 2015 at 11:23:47AM -0600, Jeff Law wrote: * combine.c (simplify_set): When generating a CC set, if the source already is in the correct mode, do not wrap it in a compare. Simplify the rest of that code. Seems reasonable. Might not hurt to do a little testing on a cc0 target though. I tested on mn10300, cris, and m68k. On m68k it triggers while building libgcc (for 040). I verified the transform is correct. I don't have a setup to actually bootstrap or regression check any cc0 target. A recurring theme :-( Segher
Re: [PATCH] combine: Don't create (set (reg:CC) (compare (reg:CC) (const0)))
On 05/11/2015 03:11 PM, Segher Boessenkool wrote: On Mon, May 11, 2015 at 11:23:47AM -0600, Jeff Law wrote: * combine.c (simplify_set): When generating a CC set, if the source already is in the correct mode, do not wrap it in a compare. Simplify the rest of that code. Seems reasonable. Might not hurt to do a little testing on a cc0 target though. I tested on mn10300, cris, and m68k. On m68k it triggers while building libgcc (for 040). I verified the transform is correct. Seems reasonable to me. I don't have a setup to actually bootstrap or regression check any cc0 target. A recurring theme :-( Understood. As I mentioned in IRC and in another thread, it may be useful to set up an aranym instance for m68k bootstrap testing in the compile farm. For other targets, building foo-elf in a single tree build with binutils, newlib and the simulator, then running the gcc testsuite with the simulator is often a good alternative. Probably wise to run through this once or twice just to get familiar with it, then pull it out when really needed ;) jeff
Re: False ODR violation positives on anonymous namespace types
On 05/11/2015 01:05 PM, Jan Hubicka wrote: On 05/11/2015 12:46 PM, Jan Hubicka wrote: Well, my main motivatoin to extend from RECORD_OR_UNION_TYPE_P was to handle enums. But other case I would like to deal with are integer types - i.e. preserve difference between char/signed char/unsigned char/short/int/long/wchar in cases where they structurally coincide. In what context? Won't you get that from comparing e.g. the field types of two definitions of the same class? If one class define int foo; and other long foo; we currently do not complain about ODR on 32bit targets while I think we could. We certainly should. But that's a problem because foo is subject to the ODR. I don't see why you need to treat int as an ODR type to get checking for foo. Jason
Re: [C frontend] Fix construction of TYPE_STUB_DECL
On 05/10/2015 12:33 PM, Jan Hubicka wrote: This is properly honored by C++ FE but other FEs are bit random, which in turn confuses type_in_anonymous_namespace_p predicate that leads to flase poistives on type mismatch warnings. I used to be able to get around by checking only C++ types at LTO time, but with type checking in lto-symtab I can not, because I do want to compare type compatibility cross translation units and cross languages and we have no reliable way to say what type originated as C++ and what did not. I think we should, as only C++ declarations are subject to the ODR. C has different (structural) compatibility rules, and I think they should apply when comparing C and C++ types. Since C struct names have no linkage, I don't think it's right to set TREE_PUBLIC on them. Jason
Re: [C frontend] Fix construction of TYPE_STUB_DECL
On 05/10/2015 12:33 PM, Jan Hubicka wrote: This is properly honored by C++ FE but other FEs are bit random, which in turn confuses type_in_anonymous_namespace_p predicate that leads to flase poistives on type mismatch warnings. I used to be able to get around by checking only C++ types at LTO time, but with type checking in lto-symtab I can not, because I do want to compare type compatibility cross translation units and cross languages and we have no reliable way to say what type originated as C++ and what did not. I think we should, as only C++ declarations are subject to the ODR. C has different (structural) compatibility rules, and I think they should apply when comparing C and C++ types. Since C struct names have no linkage, I don't think it's right to set TREE_PUBLIC on them. Yes, I need safe way to tell what type is subject to ODR and what is not. I am not quite sure what is the best approach here. This is what the code is doing currently: To detect ODR types at LTO time I use combination of presence of mangled name and type_in_anonymous_namespace_p check. (The idea is that we do not really need to mangle anonymous type as we do not need to deal with cross-module merging.): odr_type_p (const_tree t) { if (type_in_anonymous_namespace_p (t)) return true; /* We do not have this information when not in LTO, but we do not need to care, since it is used only for type merging. */ gcc_assert (in_lto_p || flag_lto); return (TYPE_NAME (t) (DECL_ASSEMBLER_NAME_SET_P (TYPE_NAME (t; } bool type_in_anonymous_namespace_p (const_tree t) { /* TREE_PUBLIC of TYPE_STUB_DECL may not be properly set for bulitin types; those have CONTEXT NULL. */ if (!TYPE_CONTEXT (t)) return false; return (TYPE_STUB_DECL (t) !TREE_PUBLIC (TYPE_STUB_DECL (t))); } the catch is that type_in_anonymous_namespace_p returns true for some C types. The TYPE_CONTEXT test is already hack as I run into cases of pre-streamed LTO types to get injected into C++ classes (i.e. if you refere to int, LTO streaming will replace C++ int with TREE_PUBLIC (TYPE_STUB_DECL (t)) by its own int that is !TREE_PUBLIC (TYPE_STUB_DECL (t)). The hack to avoid builtin types made type_in_anonymous_namespace_p to work well in cases I needed for class types. (I believe it is because C builds record with TREE_PUBLIC (TYPE_STUB_DECL (t))=1 but it is a while I checked this) We could certainly just add a flag TYPE_ODR_P that says this type is controlled by odr rule. I considered that but it is generally not so nice to introduce new flags and it seemed to me that because C standard allows to match all types cross-module on structural basis, it makes sense to consider them all as having public linkage because they are accessible from outside this translation unit. as tree.h defines the flag. I would be happy with TYPE_ODR_P or any other solution that looks better. Honza Jason
Re: False ODR violation positives on anonymous namespace types
On 05/11/2015 12:46 PM, Jan Hubicka wrote: Well, my main motivatoin to extend from RECORD_OR_UNION_TYPE_P was to handle enums. But other case I would like to deal with are integer types - i.e. preserve difference between char/signed char/unsigned char/short/int/long/wchar in cases where they structurally coincide. In what context? Won't you get that from comparing e.g. the field types of two definitions of the same class? If one class define int foo; and other long foo; we currently do not complain about ODR on 32bit targets while I think we could. Other case was the ODR violations dragged by the signed/unsigned: char switch: $ cat t.C char a; $ ./xgcc -B ./ -O2 t.C -o t1.o -fno-signed-char -c -flto $ ./xgcc -B ./ -O2 t.C -o t2.o -fsigned-char -c -flto $ ./xgcc -B ./ -O2 t1.o t2.o -flto -fno-signed-char -flto built-in: warning: type ïcharï violates one definition rule [-Wodr] built-in: note: a type with different signedness is defined in another translation unit t.C:1:6: warning: type of ïaï does not match original declaration char a; ^ t.C:1:6: note: previously declared here char a; ^ I also want to use ODR for more fine grained TBAA on LTO and there the differences between integer types matter more. Honza Jason
Re: [RFC]: Remove Mem/address type assumption in combiner
Hi Steve, On Mon, May 11, 2015 at 10:50:02AM -0700, Steve Ellcey wrote: This patch broke a number of MIPS tests, specifically mips32r6 tests that look for the lsa instruction (load scaled address) which shifts one register and then adds it to a second register. I am not sure if this needs to be addressed in combine.c or if we need to add a peephole optimization to mips.md to handle the new instruction sequence. What do you think? Is the change here what you would expect to see from your patch? Yes, this is as expected. AFAICS the only change you need in the MIPS backend is to change the GPR:dlsa pattern to match a shift instead of a mult (and change the const_immlsa_operand predicate to just match 1..4 instead of the powers). Segher
Re: [Patch][loop-invariant.c] Fix a couple of bugs regarding loop invariant motion discovered by spec2k6 on aarch64
On 05/11/2015 06:02 AM, David Sherwood wrote: Hi, This patch fixes a couple of issues I saw during the compilation of shell_lam.f for 410.bwaves test in spec2006: * create_new_invariant: We shouldn't bother attempting to calculate the address cost for something that clearly isn't an address. Use a SCALAR_INT_MODE_P check to eliminate the most obvious cases, such as vector modes and so on. * get_inv_cost: Change the code so that we don't treat something as an address if it is never actually used as an address, i.e. we didn't use it at all. This occurs due to the way we process the innermost loop first - invariants get pulled out of inner most loop and then get processed during next innermost loop. Here is more detail from shell_lam.f.210r.loop2_invariant dump file that shows what currently happens: 4427 *ending processing of loop 27 ** 4442 (const_vector:V2DF [ 4443 (const_double:DF 4.0e+0 [0x0.8p+3]) (const_double:DF 4.0e+0 [0x0.8p+3]) 4445 ]) 4446 4447 Hot cost: 8 (final) 4448 (set (reg:V2DF 3179) 4449 (const_vector:V2DF [ 4450 (const_double:DF 4.0e+0 [0x0.8p+3]) 4451 (const_double:DF 4.0e+0 [0x0.8p+3]) 4452 ])) 4453 4454 Hot cost: 8 (final) 4455 Set in insn 2764 is invariant (1), cost 8, depends on 4471 Decided to move invariant 1 -- gain 8 move_invariant_reg moves invariant (the const_vector above) into 3490 and replaces use of 3179 with 3490, but conservatively keeps the definition of 3179, which gets processed in outer loop, but is never used. Debug for outer loop now follows: 4497 *starting processing of loop 26 ** 6379 (insn 2764 2748 2766 110 (set (reg:V2DF 3490) 6380 (const_vector:V2DF [ 6381 (const_double:DF 4.0e+0 [0x0.8p+3]) 6382 (const_double:DF 4.0e+0 [0x0.8p+3]) 6383 ])) shell_lam.f:287 819 {*aarch64_simd_movv2df} 6384 (nil)) 6385 ;; UD chains for insn luid 88 uid 2766 6604 ;; UD chains for insn luid 21 uid 3836 6605 ;; reg 3490 { d419(bb 110 insn 2764) } 6606 (insn 3836 2763 2765 111 (set (reg:V2DF 3179) 6607 (reg:V2DF 3490)) shell_lam.f:287 -1 6608 (nil)) ^^^ additional insn generated by move_invariant_reg from loop 27 6836 ;; UD chains for insn luid 40 uid 2790 6837 ;; reg 3161 { d272(bb 111 insn 2746) } 6838 ;; reg 3201 { d305(bb 111 insn 2788) } 6839 ;; reg 3490 { d419(bb 110 insn 2764) } 6840 ;; eq_note reg 3161 { d272(bb 111 insn 2746) } 6841 ;; eq_note reg 3201 { d305(bb 111 insn 2788) } 6842 (insn 2790 2788 2791 111 (set (reg:V2DF 3203 [ vect__930.427 ]) 6843 (fma:V2DF (reg:V2DF 3161 [ MEM[base: vectp_q.371_2137, index: ivtmp.679_3214, offset: 0B] ]) 6844 (neg:V2DF (reg:V2DF 3490)) 6845 (reg:V2DF 3201 [ vect__928.424 ]))) shell_lam.f:287 1219 {fnmav2df4} 6846 (expr_list:REG_DEAD (reg:V2DF 3201 [ vect__928.424 ]) 6847 (expr_list:REG_DEAD (reg:V2DF 3179) ... ^^^ 3179 marked as DEAD 8366 *ending processing of loop 26 ** 8472 (const_vector:V2DF [ 8473 (const_double:DF 4.0e+0 [0x0.8p+3]) 8474 (const_double:DF 4.0e+0 [0x0.8p+3]) 8475 ]) 8476 8477 Hot cost: 8 (final) 8478 (set (reg:V2DF 3490) 8479 (const_vector:V2DF [ 8480 (const_double:DF 4.0e+0 [0x0.8p+3]) 8481 (const_double:DF 4.0e+0 [0x0.8p+3]) 8482 ])) 8483 8484 Hot cost: 8 (final) 8485 Set in insn 2764 is invariant (12), cost 8, depends on 8505 (set (reg:V2DF 3179) 8506 (reg:V2DF 3490)) 8507 8508 Hot cost: 8 (final) 8509 Set in insn 3836 is invariant (15), cost 8, depends on 12 ^^^ Never moves 3179 invariant out, even though it's never used. After my patch we now get this: 8586 Decided to move invariant 15 -- gain 16 8587 Decided to move dependent invariant 12 since get_inv_cost sees the number of uses is zero and the dependent invariant (12) gets moved too. Of course, without my patch the definition of 3179 would ultimately be eliminated as dead code in a later pass anyway. Is this ok to go in? Regards, David Sherwood. ChangeLog entry follows ... 2015-05-08 David Sherwood david.sherw...@arm.com * loop-invariant.c (create_new_invariant): Don't calculate address cost if mode is not scalar integers. (get_inv_cost): Increase computational cost for unused invariants. So you'd need to bootstrap and regression test this change for it to be fully ready for review. It would also be good to include a test for the testsuite where you can show the improvements in behaviour due to this change. I'm not offhand sure of the license on bwaves, so you may or may not be able to reduce it and use the reduced code in the GCC testsuite. With some
Re: [PATCH i386] Extend sibcall peepholes to allow source in %eax
On 05/11/2015 01:46 PM, Uros Bizjak wrote: On Mon, May 11, 2015 at 8:00 PM, Jan Hubicka hubi...@ucw.cz wrote: On Sun, 10 May 2015, Jan Hubicka wrote: On i386, peepholes that transform memory load and register-indirect jump into memory-indirect jump are overly restrictive in that they don't allow combining when the jump target is loaded into %eax, and the called function returns a value (also in %eax, so it's not dead after the call). Fix this by checking for same source and output register operands separately. OK? * config/i386/i386.md (sibcall_value_memory): Extend peepholes to allow memory address in %eax. (sibcall_value_pop_memory): Likewise. Why do we need the check for liveness after all? There is SIBLING_CALL_P (peep2_next_insn (1)) so we know that the function terminates by the call and there are no other uses of the value. Indeed. Uros, the peep2_reg_dead_p check was added by your patch as svn revision 211776, git commit e51f8b8fed. Would you agree that the check is not necessary for sibcalls as Honza explains? Would you approve a patch that removes it in the sibcall peepholes I modify in the patch under discussion? Don't we however need to check that operands[0] is not used by the call_insn as parameter of the call? I.e. something like void test(void (*callback ())) { callback(callback); } You need a pointer-to-pointer-to-function to trigger the peephole. Something like this: void foo() { void (**bar)(void*); asm(:=r(bar)); (*bar)(*bar); } I think instead of peep2_reg_dead_p we want to check that the parameter is not in CALL_INSN_FUNCTION_USAGE of the sibcall.. Playing with the above testcase I can't induce failure. It seems today GCC won't allocate the same register as callee address and one of the arguments. Do you want me to implement such a check anyway? Hmm, only way I can trigger same register is: void foo() { void (**bar)(void*); asm(:=r(bar)); register void (*var)(void *) asm(%eax); var=*bar; asm(:+r(var)); var(var); } removing the second asm makes CSE to forward propagae the memory operand to call that makes the call different from the register variable. Still I would check for that, but this is more Uros' area. This is from [1], and reading this reference, it looks to me that the check was introduced due to: - Adds check that eliminated register is really dead after the call (maybe an overkill, but some hard-to-debug problems surfaced due to missing liveness checks in the past) Going down that memory lane, it looks like a safety check for something that *might* happen. Looking at the comment, I'd say we can remove the check, but we should look for possible fallout. I'd tend to agree. Yes, to make my original email clear, I think we are safe to remove peep2_reg_dead_p. I would however introduce a check that the call target is not also among parameters of the function. In this case the peephole would remove the load and make the parameter unefined. While current mainline don't seem to be able to translate the testcase above that way, perhaps future improvements to LRA/postreload gcse may make it happen and generally RTL patterns are better to be safe by definition not only for the actual RTL we are able to generate. I suppose reg_mentioned_p on call usage is enough. Honza jeff
RE: [RFC]: Remove Mem/address type assumption in combiner
Jeff Law l...@redhat.com writes: On 05/11/2015 01:46 PM, Jeff Law wrote: On 05/11/2015 01:44 PM, Steve Ellcey wrote: On Mon, 2015-05-11 at 13:22 -0500, Segher Boessenkool wrote: Hi Steve, On Mon, May 11, 2015 at 10:50:02AM -0700, Steve Ellcey wrote: This patch broke a number of MIPS tests, specifically mips32r6 tests that look for the lsa instruction (load scaled address) which shifts one register and then adds it to a second register. I am not sure if this needs to be addressed in combine.c or if we need to add a peephole optimization to mips.md to handle the new instruction sequence. What do you think? Is the change here what you would expect to see from your patch? Yes, this is as expected. AFAICS the only change you need in the MIPS backend is to change the GPR:dlsa pattern to match a shift instead of a mult (and change the const_immlsa_operand predicate to just match 1..4 instead of the powers). Segher Hm, I thought it was going to be more complicated than that, but it seems to be working. I will do a complete test run and then submit a patch. Yea, it really should be that easy. I'm pretty sure the sh[123]add insns in the PA need to be updated in a similar manner. Oh, and just to be clear, I'm not expecting you to do this Steve. It'd be great if you did, but not required. Does this patch effectively change the canonicalization rules? The following Still exists in md.texi: @item Within address computations (i.e., inside @code{mem}), a left shift is converted into the appropriate multiplication by a power of two. Thanks, Matthew
[PATCH] FreeBSD add functionality to build PIE executables.
All, this patch adds the ability to build PIE executables for FreeBSD. The core is since a longer time in the code base of FreeBSD itself and is working fine. This patch makes it available for all FreeBSD targets. Tested on x86_64-*-freebsd11.0 and armv6/hf-*-freebsd11.0, i386-*-freebsd11.0 is progress. In the same turn I removed the STARTFILE/ENDFILE_SPEC from config/i386/freebsd.h and use the ones from config/freebsd-spec.h. Here the results before the patch: https://gcc.gnu.org/ml/gcc-testresults/2015-05/msg01267.html and with the patch: https://gcc.gnu.org/ml/gcc-testresults/2015-05/msg01324.html Is this ok for trunk and for 5.1X? Thanks, Andreas 2015-05-11 Andreas Tobler andre...@gcc.gnu.org * config/freebsd-spec.h (FBSD_STARTFILE_SPEC): Add the bits to build pie executables. (FBSD_ENDFILE_SPEC): Likewise. * config/i386/freebsd.h (STARTFILE_SPEC): Remove and use the one from config/freebsd-spec.h. (ENDFILE_SPEC): Likewise. 2015-05-11 Andreas Tobler andre...@gcc.gnu.org * lib/target-supports.exp (check_effective_target_pie): Add *-*-freebsd* to the familiy of pie capable targets. Index: gcc/config/freebsd-spec.h === --- gcc/config/freebsd-spec.h (revision 223017) +++ gcc/config/freebsd-spec.h (working copy) @@ -66,8 +66,9 @@ %{!shared: \ %{pg:gcrt1.o%s} %{!pg:%{p:gcrt1.o%s} \ %{!p:%{profile:gcrt1.o%s} \ -%{!profile:crt1.o%s \ - crti.o%s %{!shared:crtbegin.o%s} %{shared:crtbeginS.o%s} +%{!profile: \ +%{pie: Scrt1.o%s;:crt1.o%s} \ + crti.o%s %{static:crtbeginT.o%s;shared|pie:crtbeginS.o%s;:crtbegin.o%s} /* Provide a ENDFILE_SPEC appropriate for FreeBSD. Here we tack on the magical crtend.o file (see crtstuff.c) which provides part of @@ -76,7 +77,7 @@ `crtn.o'. */ #define FBSD_ENDFILE_SPEC \ - %{!shared:crtend.o%s} %{shared:crtendS.o%s} crtn.o%s + %{shared|pie:crtendS.o%s;:crtend.o%s} crtn.o%s /* Provide a LIB_SPEC appropriate for FreeBSD as configured and as required by the user-land thread model. Before __FreeBSD_version Index: gcc/config/i386/freebsd.h === --- gcc/config/i386/freebsd.h (revision 223017) +++ gcc/config/i386/freebsd.h (working copy) @@ -59,29 +59,16 @@ #define SUBTARGET_EXTRA_SPECS \ { fbsd_dynamic_linker, FBSD_DYNAMIC_LINKER } -/* Provide a STARTFILE_SPEC appropriate for FreeBSD. Here we add - the magical crtbegin.o file (see crtstuff.c) which provides part - of the support for getting C++ file-scope static object constructed - before entering `main'. */ - -#undef STARTFILE_SPEC -#define STARTFILE_SPEC \ - %{!shared: \ - %{pg:gcrt1.o%s} %{!pg:%{p:gcrt1.o%s} \ - %{!p:%{profile:gcrt1.o%s} \ -%{!profile:crt1.o%s \ - crti.o%s %{!shared:crtbegin.o%s} %{shared:crtbeginS.o%s} +/* Use the STARTFILE_SPEC from config/freebsd-spec.h. */ -/* Provide a ENDFILE_SPEC appropriate for FreeBSD. Here we tack on - the magical crtend.o file (see crtstuff.c) which provides part of - the support for getting C++ file-scope static object constructed - before entering `main', followed by a normal finalizer file, - `crtn.o'. */ +#undef STARTFILE_SPEC +#define STARTFILE_SPEC FBSD_STARTFILE_SPEC -#undef ENDFILE_SPEC -#define ENDFILE_SPEC \ - %{!shared:crtend.o%s} %{shared:crtendS.o%s} crtn.o%s +/* Use the ENDFILE_SPEC from config/freebsd-spec.h. */ +#undef ENDFILE_SPEC +#define ENDFILE_SPEC FBSD_ENDFILE_SPEC + /* Provide a LINK_SPEC appropriate for FreeBSD. Here we provide support for the special GCC options -static and -shared, which allow us to link things in one of these three modes by applying the appropriate Index: gcc/testsuite/lib/target-supports.exp === --- gcc/testsuite/lib/target-supports.exp (revision 223017) +++ gcc/testsuite/lib/target-supports.exp (working copy) @@ -954,6 +954,7 @@ proc check_effective_target_pie { } { if { [istarget *-*-darwin\[912\]*] +|| [istarget *-*-freebsd*] || [istarget *-*-linux*] || [istarget *-*-gnu*] } { return 1;
Re: [PATCH] libiberty: cleanup Makefile.in
On Mon, 11 May 2015, Bernhard Reutner-Fischer wrote: +TARGETLIB_PIC = $(dir $(TARGETLIB))pic/$(notdir $(TARGETLIB)) +TARGETLIB_NOASAN = $(dir $(TARGETLIB))noasan/$(notdir $(TARGETLIB)) Is it OK to use GNU make features in libiberty now? That is, do binutils and GDB now require GNU make? -- Joseph S. Myers jos...@codesourcery.com
Re: Re: Fwd: [PING 2][PATCH] libgcc: Add CFI directives to the soft floating point support code for ARM
On Mon, May 11, 2015 at 4:45 AM, Ramana Radhakrishnan ramana.radhakrish...@arm.com wrote: sorry about the slow response, I was travelling for a bit and missed your emails. Trying your patch out gives me failures possibly because my mail client munged it when it received this inline. Can you please rebase if necessary, test and send it again as an attachment? I checked it again and it seems to be a whitespace issue. I re-generated the patch and this one seems to apply just fine; try it out again. --- libgcc/config/arm/ieee754-df.S | 177 +--- libgcc/config/arm/ieee754-sf.S | 101 ++-- libgcc/config/arm/lib1funcs.S | 10 ++ 3 files changed, 261 insertions(+), 27 deletions(-) diff --git a/libgcc/config/arm/ieee754-df.S b/libgcc/config/arm/ieee754-df.S index f75630b..d1f9066 100644 --- a/libgcc/config/arm/ieee754-df.S +++ b/libgcc/config/arm/ieee754-df.S @@ -33,8 +33,12 @@ * Only the default rounding mode is intended for best performances. * Exceptions aren't supported yet, but that can be added quite easily * if necessary without impacting performances. + * + * In the CFI related comments, 'previousOffset' refers to the previous offset + * from sp used to compute the CFA. */ +.cfi_sections .debug_frame #ifndef __ARMEB__ #define xl r0 @@ -53,11 +57,13 @@ ARM_FUNC_START negdf2 ARM_FUNC_ALIAS aeabi_dneg negdf2 +CFI_START_FUNCTION @ flip sign bit eor xh, xh, #0x8000 RET +CFI_END_FUNCTION FUNC_END aeabi_dneg FUNC_END negdf2 @@ -66,6 +72,7 @@ ARM_FUNC_ALIAS aeabi_dneg negdf2 #ifdef L_arm_addsubdf3 ARM_FUNC_START aeabi_drsub +CFI_START_FUNCTION eor xh, xh, #0x8000 @ flip sign bit of first arg b 1f @@ -81,7 +88,11 @@ ARM_FUNC_ALIAS aeabi_dsub subdf3 ARM_FUNC_START adddf3 ARM_FUNC_ALIAS aeabi_dadd adddf3 -1: do_push {r4, r5, lr} +1: do_push {r4, r5, lr}@ sp -= 12 +.cfi_adjust_cfa_offset 12 @ CFA is now sp + previousOffset + 12 +.cfi_rel_offset r4, 0 @ Registers are saved from sp to sp + 8 +.cfi_rel_offset r5, 4 +.cfi_rel_offset lr, 8 @ Look for zeroes, equal values, INF, or NAN. shift1 lsl, r4, xh, #1 @@ -148,6 +159,11 @@ ARM_FUNC_ALIAS aeabi_dadd adddf3 @ Since this is not common case, rescale them off line. teq r4, r5 beq LSYM(Lad_d) + +@ CFI note: we're lucky that the branches to Lad_* that appear after this function +@ have a CFI state that's exactly the same as the one we're in at this +@ point. Otherwise the CFI would change to a different state after the branch, +@ which would be disastrous for backtracing. LSYM(Lad_x): @ Compensate for the exponent overlapping the mantissa MSB added later @@ -413,6 +429,7 @@ LSYM(Lad_i): orrne xh, xh, #0x0008 @ quiet NAN RETLDM r4, r5 +CFI_END_FUNCTION FUNC_END aeabi_dsub FUNC_END subdf3 FUNC_END aeabi_dadd @@ -420,12 +437,19 @@ LSYM(Lad_i): ARM_FUNC_START floatunsidf ARM_FUNC_ALIAS aeabi_ui2d floatunsidf +CFI_START_FUNCTION teq r0, #0 do_it eq, t moveq r1, #0 RETc(eq) -do_push {r4, r5, lr} + +do_push {r4, r5, lr}@ sp -= 12 +.cfi_adjust_cfa_offset 12 @ CFA is now sp + previousOffset + 12 +.cfi_rel_offset r4, 0 @ Registers are saved from sp + 0 to sp + 8. +.cfi_rel_offset r5, 4 +.cfi_rel_offset lr, 8 + mov r4, #0x400 @ initial exponent add r4, r4, #(52-1 - 1) mov r5, #0 @ sign bit is 0 @@ -435,17 +459,25 @@ ARM_FUNC_ALIAS aeabi_ui2d floatunsidf mov xh, #0 b LSYM(Lad_l) +CFI_END_FUNCTION FUNC_END aeabi_ui2d FUNC_END floatunsidf ARM_FUNC_START floatsidf ARM_FUNC_ALIAS aeabi_i2d floatsidf +CFI_START_FUNCTION teq r0, #0 do_it eq, t moveq r1, #0 RETc(eq) -do_push {r4, r5, lr} + +do_push {r4, r5, lr}@ sp -= 12 +.cfi_adjust_cfa_offset 12 @ CFA is now sp + previousOffset + 12 +.cfi_rel_offset r4, 0 @ Registers are saved from sp + 0 to sp + 8. +.cfi_rel_offset r5, 4 +.cfi_rel_offset lr, 8 + mov r4, #0x400 @ initial exponent add r4, r4, #(52-1 - 1) andsr5, r0, #0x8000 @ sign bit in r5 @@ -457,11 +489,13 @@ ARM_FUNC_ALIAS aeabi_i2d floatsidf mov xh, #0 b LSYM(Lad_l) +CFI_END_FUNCTION FUNC_END aeabi_i2d FUNC_END floatsidf ARM_FUNC_START extendsfdf2 ARM_FUNC_ALIAS aeabi_f2d extendsfdf2 +CFI_START_FUNCTION movsr2, r0, lsl #1 @ toss sign bit mov xh, r2, asr #3 @ stretch exponent @@ -480,34 +514,54 @@ ARM_FUNC_ALIAS aeabi_f2d extendsfdf2 @ value was denormalized. We can normalize it now. do_push {r4, r5, lr} +.cfi_adjust_cfa_offset 12 @ CFA is now sp + previousOffset + 12 +.cfi_rel_offset r4, 0 @ Registers are saved from sp + 0 to sp + 8. +.cfi_rel_offset r5, 4 +.cfi_rel_offset lr, 8 + mov r4, #0x380 @ setup corresponding exponent
Re: [PATCH i386] Extend sibcall peepholes to allow source in %eax
On 05/11/2015 01:46 PM, Uros Bizjak wrote: On Mon, May 11, 2015 at 8:00 PM, Jan Hubicka hubi...@ucw.cz wrote: On Sun, 10 May 2015, Jan Hubicka wrote: On i386, peepholes that transform memory load and register-indirect jump into memory-indirect jump are overly restrictive in that they don't allow combining when the jump target is loaded into %eax, and the called function returns a value (also in %eax, so it's not dead after the call). Fix this by checking for same source and output register operands separately. OK? * config/i386/i386.md (sibcall_value_memory): Extend peepholes to allow memory address in %eax. (sibcall_value_pop_memory): Likewise. Why do we need the check for liveness after all? There is SIBLING_CALL_P (peep2_next_insn (1)) so we know that the function terminates by the call and there are no other uses of the value. Indeed. Uros, the peep2_reg_dead_p check was added by your patch as svn revision 211776, git commit e51f8b8fed. Would you agree that the check is not necessary for sibcalls as Honza explains? Would you approve a patch that removes it in the sibcall peepholes I modify in the patch under discussion? Don't we however need to check that operands[0] is not used by the call_insn as parameter of the call? I.e. something like void test(void (*callback ())) { callback(callback); } You need a pointer-to-pointer-to-function to trigger the peephole. Something like this: void foo() { void (**bar)(void*); asm(:=r(bar)); (*bar)(*bar); } I think instead of peep2_reg_dead_p we want to check that the parameter is not in CALL_INSN_FUNCTION_USAGE of the sibcall.. Playing with the above testcase I can't induce failure. It seems today GCC won't allocate the same register as callee address and one of the arguments. Do you want me to implement such a check anyway? Hmm, only way I can trigger same register is: void foo() { void (**bar)(void*); asm(:=r(bar)); register void (*var)(void *) asm(%eax); var=*bar; asm(:+r(var)); var(var); } removing the second asm makes CSE to forward propagae the memory operand to call that makes the call different from the register variable. Still I would check for that, but this is more Uros' area. This is from [1], and reading this reference, it looks to me that the check was introduced due to: - Adds check that eliminated register is really dead after the call (maybe an overkill, but some hard-to-debug problems surfaced due to missing liveness checks in the past) Going down that memory lane, it looks like a safety check for something that *might* happen. Looking at the comment, I'd say we can remove the check, but we should look for possible fallout. I'd tend to agree. jeff